ICAP2012 - Classification: 08 High Performance Computing

Paper

Title

Page

Analyzing Multipacting Problems in Accelerators using ACE3P on High Performance Computers

54

L. Ge, C. Ko, K.H. Lee, Z. Li, C.-K. Ng, L. Xiao
SLAC, Menlo Park, California, USA

Funding: This material is based upon work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DE-SC0000661
Track3P is the particle tracking module of ACE3P, a 3D parallel finite element electromagnetic code suite developed at SLAC which has been implemented on the US DOE supercomputers at NERSC to simulate large-scale complex accelerator designs. Using the higher-order cavity fields generated by ACE3P codes, Track3P has been used for analyzing multipacting (MP) in accelerator cavities. The prediction of the MP barriers in the ICHIRO cavity at KEK was the first Track3P benchmark against measurements. Using a large number of processors, Track3P can scan through the field gradient and cavity surface efficiently, and its comprehensive postprocessing tool allows the identifications of both the hard and soft MP barriers and the locations of MP activities. Results from applications of this high performance simulation capability to accelerators such as the Quarter Wave Resonator for FRIB, the 704 MHz SRF gun cavity for BNL ERL and the Muon cooling cavity for Muon Collider will be presented.

MOSDC2

GPGPU Implementation of Matrix Formalism for Beam Dynamics Simulation

59

N.V. Kulabukhova
St. Petersburg State University, St. Petersburg, Russia

Matrix formalism is a map integration method for ODE solving. It allows to present solution of the system as sums and multiplications of 2-indexes numeric matrix. This approach can be easy implement in parallel codes. As the most natural for matrix operation GPU architecture has been choosen. The set of the methods for beam dynamics has been implemented. Particles and envelope dynamics are supported. The computing facilities are located in St. Petersburg State University and presented by the NVIDIA Tesla clusters.

Slides MOSDC2 [0.770 MB]

MOSDC3

Fast Determination of Spurious Oscillations in an Entire Klystron Tube with ACE3P

A. Jensen, L. Ge, C. Ko, L. Xiao
SLAC, Menlo Park, California, USA

Funding: USDOE
Spurious oscillations remain one of the challenges in the development of high-power klystrons which prevent the tube from reaching the design performance. ACE3P is a parallel electromagnetic code suite comprising Omega3P which computes the eigenmodes of open cavities and Track3P which calculates the particle trajectory in the cavity fields. The oscillation condition is determined by the total Q of the mode which is the sum of the external Q from Omega3P and the beam loaded Q due to energy gain or loss computed with Track3P. With massively parallel computing it is possible to perform an exhaustive search of unstable modes in a given klystron from the gun to the collector on a time scale much shorter than existing tools. Application to the XC8 and LBSK klystrons at SLAC will be presented.

Slides MOSDC3 [1.018 MB]

THP09

Global Scan of All Stable Settings (GLASS) for the ANKA Storage Ring

239

M. Streichert, N. Hiller, E. Huttel, V. Judin, B. Kehrer, M. Klein, S. Marsching, C.A.J. Meuter, A.-S. Müller, M.J. Nasse, M. Schuh, N.J. Smale
KIT, Karlsruhe, Germany

Funding: This work has been supported by the Initiative and Networking Fund of the Helmholtz Association under contract number VH-NG-320.
The design of an optimal magnetic optics for a storage ring is not a simple optimization problem, since numerous objectives have to be considered. For instance, figures of merit could be tune values, optical functions, momentum compaction factor, emittance, etc. There is a technique called “GLobal scan of All Stable Settings” (GLASS), which provides a systematic analysis of the magnetic optics and gives a global overview of the capabilities of the storage ring. We developed a parallel version of GLASS, which can run on multi-core processors, decreasing significantly the computational time. In this paper we present our GLASS implementation and show results for the ANKA lattice.

THP10

GPU-Accelerated Beam Dynamics Simulations with ELEGANT

K.M. Amyx, D.T. Abell, J. Balasalle, I.V. Pogorelov
Tech-X, Boulder, Colorado, USA
M. Borland, R. Soliday, Y. Wang
ANL, Argonne, USA

Funding: Work supported by the DOE Office of Science, Office of Basic Energy Sciences grant No. DE-SC0004585, and in part by Tech-X Corporation.
Efficient implementation of general-purpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and tracking-based lattice optimization simulations. We present the latest results of our work on accelerating Argonne National Lab's accelerator simulation code ELEGANT* using CUDA-enabled GPUs**. We provide a list of ELEGANT's beamline elements ported to GPUs, identify performance-limiting factors, and briefly discuss optimization techniques for efficient utilization of the device memory space, with an emphasis on register usage. We also present a novel hardware-assisted technique for efficiently calculating a histogram from a large distribution of particle coordinates, and compare this to data-parallel implementations. Finally, we discuss results of simulations performed with realistic test lattices, and give a brief outline of future work on GPU-enabled version of ELEGANT.
* M. Borland, "elegant: A Flexible SDDS-compliant Code for Accel. Simulation", APS LS-287 (2000); Y. Wang, M. Borland, Proc. of PAC07, THPAN095 (2007)
** CUDA home page: http://www.nvidia.com/cuda

THSDI1

Coherent Electron Cooling Simulations for Parameters of the BNL Proof-of-principle Experiment

D.L. Bruhwiler, G.I. Bell, I.V. Pogorelov, B.T. Schwartz, S.D. Webb
Tech-X, Boulder, Colorado, USA
Y. Hao, V. Litvinenko, G. Wang
BNL, Upton, Long Island, New York, USA

Funding: Work funded by the US Department of Energy, Office of Science, Office of Nuclear Physics.
Increasing the luminosity of relativistic hadron beams is critical for the advancement of nuclear physics. Coherent electron cooling promises to cool such beams significantly faster than alternative methods. We present simulations of 40 GeV/n Au⁷⁹⁺ ions for a single pass, which consists of a modulator, an FEL amplifier and a kicker. In the modulator, the electron beam copropagates with the ion beam, which perturbs the electron beam density and velocity via anisotropic Debye shielding. Self-amplified spontaneous emission lasing in the FEL both amplifies and imparts wavelength-scale modulation on the electron beam perturbations. The modulated electric fields appropriately accelerate or decelerate the copropagating ions in the kicker. In analogy with stochastic cooling, these field strengths are crucial for estimating the effective drag force on the hadrons and, hence, the cooling time. The inherently 3D particle and field dynamics is modeled with the parallel VORPAL framework (modulator and kicker) and with GENESIS (amplifier), with careful coupling between codes. Physical parameters are taken from the CeC proof-of-principle experiment under development at Brookhaven National Lab.

Slides THSDI1 [14.817 MB]

FRSAC1

Hybrid Programming and Performance for Beam Propagation Modeling

284

M. Min, A. Mametjanov
ANL, Argonne, USA
J. Fu
RPI, Troy, New York, USA

Funding: DOE ASCR (Advanced Scientific Computing Research) Program
We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam propagation modeling as we move toward extreme-scale systems. Using an MPI programming interface for parallel algorithms, we expanded the capability of our existing electromagnetic solver to a hybrid (MPI/shared-memory) model that can potentially use the computer resources on future-generation computing architecture more efficiently. As a preliminary step, we discuss a hybrid MPI/OpenMP model and demonstrate performance and analysis on the leadership-class computing systems such as the IBM BG/P, BG/Q, and Cray XK6. Our hybrid MPI/OpenMP model achieves speedup when the computation amounts are large enough to compensate the OMP threading overhead.

Slides FRSAC1 [4.252 MB]

FRSAC2

Comparison of Eigenvalue Solvers for Large Sparse Matrix Pencils

287

F. Yaman, W. Ackermann, T. Weiland
TEMF, TU Darmstadt, Darmstadt, Germany

Funding: Work supported by the DFG through SFB 634
Efficient and accurate computation of eigenvalues and eigenvectors is of fundamental importance in the accelerator physics community. Moreover, the eigensystem analysis is generally used for the identifications of many physical phenomena connected to vibrations. Therefore, various types of algorithms such that Arnoldi, Lanczos, Krylov-Schur, Jacobi-Davidson etc. were implemented to solve the eigenvalue problem efficiently. In this direction, we investigate the performance of selected commercial and freely available software tools for the solution of a generalized eigenvalue problem. We choose two setups by considering spherical and billiard resonators in order to test the robustness, accuracy, and computational speed and memory consumption issues of the recent versions of CST, Matlab, Pysparse, SLEPc and CEM3D. Simulations were performed on a standard personal computer as well as on a cluster computer to enable the handling of large sparse matrices in the order of hundreds of thousands up to several millions degrees of freedom. We obtain interesting comparison results with the examined solvers which is useful for choosing the appropriate solvers for a given practical application.

Slides FRSAC2 [10.095 MB]

Paper	Title	Page
MOSDI1	Analyzing Multipacting Problems in Accelerators using ACE3P on High Performance Computers	54
	L. Ge, C. Ko, K.H. Lee, Z. Li, C.-K. Ng, L. Xiao SLAC, Menlo Park, California, USA
	Funding: This material is based upon work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DE-SC0000661 Track3P is the particle tracking module of ACE3P, a 3D parallel finite element electromagnetic code suite developed at SLAC which has been implemented on the US DOE supercomputers at NERSC to simulate large-scale complex accelerator designs. Using the higher-order cavity fields generated by ACE3P codes, Track3P has been used for analyzing multipacting (MP) in accelerator cavities. The prediction of the MP barriers in the ICHIRO cavity at KEK was the first Track3P benchmark against measurements. Using a large number of processors, Track3P can scan through the field gradient and cavity surface efficiently, and its comprehensive postprocessing tool allows the identifications of both the hard and soft MP barriers and the locations of MP activities. Results from applications of this high performance simulation capability to accelerators such as the Quarter Wave Resonator for FRIB, the 704 MHz SRF gun cavity for BNL ERL and the Muon cooling cavity for Muon Collider will be presented.

MOSDC2	GPGPU Implementation of Matrix Formalism for Beam Dynamics Simulation	59
	N.V. Kulabukhova St. Petersburg State University, St. Petersburg, Russia
	Matrix formalism is a map integration method for ODE solving. It allows to present solution of the system as sums and multiplications of 2-indexes numeric matrix. This approach can be easy implement in parallel codes. As the most natural for matrix operation GPU architecture has been choosen. The set of the methods for beam dynamics has been implemented. Particles and envelope dynamics are supported. The computing facilities are located in St. Petersburg State University and presented by the NVIDIA Tesla clusters.
	Slides MOSDC2 [0.770 MB]

MOSDC3	Fast Determination of Spurious Oscillations in an Entire Klystron Tube with ACE3P
	A. Jensen, L. Ge, C. Ko, L. Xiao SLAC, Menlo Park, California, USA
	Funding: USDOE Spurious oscillations remain one of the challenges in the development of high-power klystrons which prevent the tube from reaching the design performance. ACE3P is a parallel electromagnetic code suite comprising Omega3P which computes the eigenmodes of open cavities and Track3P which calculates the particle trajectory in the cavity fields. The oscillation condition is determined by the total Q of the mode which is the sum of the external Q from Omega3P and the beam loaded Q due to energy gain or loss computed with Track3P. With massively parallel computing it is possible to perform an exhaustive search of unstable modes in a given klystron from the gun to the collector on a time scale much shorter than existing tools. Application to the XC8 and LBSK klystrons at SLAC will be presented.
	Slides MOSDC3 [1.018 MB]

THP09	Global Scan of All Stable Settings (GLASS) for the ANKA Storage Ring	239
	M. Streichert, N. Hiller, E. Huttel, V. Judin, B. Kehrer, M. Klein, S. Marsching, C.A.J. Meuter, A.-S. Müller, M.J. Nasse, M. Schuh, N.J. Smale KIT, Karlsruhe, Germany
	Funding: This work has been supported by the Initiative and Networking Fund of the Helmholtz Association under contract number VH-NG-320. The design of an optimal magnetic optics for a storage ring is not a simple optimization problem, since numerous objectives have to be considered. For instance, figures of merit could be tune values, optical functions, momentum compaction factor, emittance, etc. There is a technique called “GLobal scan of All Stable Settings” (GLASS), which provides a systematic analysis of the magnetic optics and gives a global overview of the capabilities of the storage ring. We developed a parallel version of GLASS, which can run on multi-core processors, decreasing significantly the computational time. In this paper we present our GLASS implementation and show results for the ANKA lattice.

THP10	GPU-Accelerated Beam Dynamics Simulations with ELEGANT
	K.M. Amyx, D.T. Abell, J. Balasalle, I.V. Pogorelov Tech-X, Boulder, Colorado, USA M. Borland, R. Soliday, Y. Wang ANL, Argonne, USA
	Funding: Work supported by the DOE Office of Science, Office of Basic Energy Sciences grant No. DE-SC0004585, and in part by Tech-X Corporation. Efficient implementation of general-purpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and tracking-based lattice optimization simulations. We present the latest results of our work on accelerating Argonne National Lab's accelerator simulation code ELEGANT* using CUDA-enabled GPUs*. We provide a list of ELEGANT's beamline elements ported to GPUs, identify performance-limiting factors, and briefly discuss optimization techniques for efficient utilization of the device memory space, with an emphasis on register usage. We also present a novel hardware-assisted technique for efficiently calculating a histogram from a large distribution of particle coordinates, and compare this to data-parallel implementations. Finally, we discuss results of simulations performed with realistic test lattices, and give a brief outline of future work on GPU-enabled version of ELEGANT. M. Borland, "elegant: A Flexible SDDS-compliant Code for Accel. Simulation", APS LS-287 (2000); Y. Wang, M. Borland, Proc. of PAC07, THPAN095 (2007) ** CUDA home page: http://www.nvidia.com/cuda

THSDI1	Coherent Electron Cooling Simulations for Parameters of the BNL Proof-of-principle Experiment
	D.L. Bruhwiler, G.I. Bell, I.V. Pogorelov, B.T. Schwartz, S.D. Webb Tech-X, Boulder, Colorado, USA Y. Hao, V. Litvinenko, G. Wang BNL, Upton, Long Island, New York, USA
	Funding: Work funded by the US Department of Energy, Office of Science, Office of Nuclear Physics. Increasing the luminosity of relativistic hadron beams is critical for the advancement of nuclear physics. Coherent electron cooling promises to cool such beams significantly faster than alternative methods. We present simulations of 40 GeV/n Au⁷⁹⁺ ions for a single pass, which consists of a modulator, an FEL amplifier and a kicker. In the modulator, the electron beam copropagates with the ion beam, which perturbs the electron beam density and velocity via anisotropic Debye shielding. Self-amplified spontaneous emission lasing in the FEL both amplifies and imparts wavelength-scale modulation on the electron beam perturbations. The modulated electric fields appropriately accelerate or decelerate the copropagating ions in the kicker. In analogy with stochastic cooling, these field strengths are crucial for estimating the effective drag force on the hadrons and, hence, the cooling time. The inherently 3D particle and field dynamics is modeled with the parallel VORPAL framework (modulator and kicker) and with GENESIS (amplifier), with careful coupling between codes. Physical parameters are taken from the CeC proof-of-principle experiment under development at Brookhaven National Lab.
	Slides THSDI1 [14.817 MB]

FRSAC1	Hybrid Programming and Performance for Beam Propagation Modeling	284
	M. Min, A. Mametjanov ANL, Argonne, USA J. Fu RPI, Troy, New York, USA
	Funding: DOE ASCR (Advanced Scientific Computing Research) Program We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam propagation modeling as we move toward extreme-scale systems. Using an MPI programming interface for parallel algorithms, we expanded the capability of our existing electromagnetic solver to a hybrid (MPI/shared-memory) model that can potentially use the computer resources on future-generation computing architecture more efficiently. As a preliminary step, we discuss a hybrid MPI/OpenMP model and demonstrate performance and analysis on the leadership-class computing systems such as the IBM BG/P, BG/Q, and Cray XK6. Our hybrid MPI/OpenMP model achieves speedup when the computation amounts are large enough to compensate the OMP threading overhead.
	Slides FRSAC1 [4.252 MB]

FRSAC2	Comparison of Eigenvalue Solvers for Large Sparse Matrix Pencils	287
	F. Yaman, W. Ackermann, T. Weiland TEMF, TU Darmstadt, Darmstadt, Germany
	Funding: Work supported by the DFG through SFB 634 Efficient and accurate computation of eigenvalues and eigenvectors is of fundamental importance in the accelerator physics community. Moreover, the eigensystem analysis is generally used for the identifications of many physical phenomena connected to vibrations. Therefore, various types of algorithms such that Arnoldi, Lanczos, Krylov-Schur, Jacobi-Davidson etc. were implemented to solve the eigenvalue problem efficiently. In this direction, we investigate the performance of selected commercial and freely available software tools for the solution of a generalized eigenvalue problem. We choose two setups by considering spherical and billiard resonators in order to test the robustness, accuracy, and computational speed and memory consumption issues of the recent versions of CST, Matlab, Pysparse, SLEPc and CEM3D. Simulations were performed on a standard personal computer as well as on a cluster computer to enable the handling of large sparse matrices in the order of hundreds of thousands up to several millions degrees of freedom. We obtain interesting comparison results with the examined solvers which is useful for choosing the appropriate solvers for a given practical application.
	Slides FRSAC2 [10.095 MB]