Paper  Title  Page 

MOSDI1  Analyzing Multipacting Problems in Accelerators using ACE3P on High Performance Computers  54 


Funding: This material is based upon work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DESC0000661 Track3P is the particle tracking module of ACE3P, a 3D parallel finite element electromagnetic code suite developed at SLAC which has been implemented on the US DOE supercomputers at NERSC to simulate largescale complex accelerator designs. Using the higherorder cavity fields generated by ACE3P codes, Track3P has been used for analyzing multipacting (MP) in accelerator cavities. The prediction of the MP barriers in the ICHIRO cavity at KEK was the first Track3P benchmark against measurements. Using a large number of processors, Track3P can scan through the field gradient and cavity surface efficiently, and its comprehensive postprocessing tool allows the identifications of both the hard and soft MP barriers and the locations of MP activities. Results from applications of this high performance simulation capability to accelerators such as the Quarter Wave Resonator for FRIB, the 704 MHz SRF gun cavity for BNL ERL and the Muon cooling cavity for Muon Collider will be presented. 

MOSDC2  GPGPU Implementation of Matrix Formalism for Beam Dynamics Simulation  59 


Matrix formalism is a map integration method for ODE solving. It allows to present solution of the system as sums and multiplications of 2indexes numeric matrix. This approach can be easy implement in parallel codes. As the most natural for matrix operation GPU architecture has been choosen. The set of the methods for beam dynamics has been implemented. Particles and envelope dynamics are supported. The computing facilities are located in St. Petersburg State University and presented by the NVIDIA Tesla clusters.  
Slides MOSDC2 [0.770 MB]  
MOSDC3 
Fast Determination of Spurious Oscillations in an Entire Klystron Tube with ACE3P  


Funding: USDOE Spurious oscillations remain one of the challenges in the development of highpower klystrons which prevent the tube from reaching the design performance. ACE3P is a parallel electromagnetic code suite comprising Omega3P which computes the eigenmodes of open cavities and Track3P which calculates the particle trajectory in the cavity fields. The oscillation condition is determined by the total Q of the mode which is the sum of the external Q from Omega3P and the beam loaded Q due to energy gain or loss computed with Track3P. With massively parallel computing it is possible to perform an exhaustive search of unstable modes in a given klystron from the gun to the collector on a time scale much shorter than existing tools. Application to the XC8 and LBSK klystrons at SLAC will be presented. 

Slides MOSDC3 [1.018 MB]  
THP09  Global Scan of All Stable Settings (GLASS) for the ANKA Storage Ring  239 


Funding: This work has been supported by the Initiative and Networking Fund of the Helmholtz Association under contract number VHNG320. The design of an optimal magnetic optics for a storage ring is not a simple optimization problem, since numerous objectives have to be considered. For instance, figures of merit could be tune values, optical functions, momentum compaction factor, emittance, etc. There is a technique called “GLobal scan of All Stable Settings” (GLASS), which provides a systematic analysis of the magnetic optics and gives a global overview of the capabilities of the storage ring. We developed a parallel version of GLASS, which can run on multicore processors, decreasing significantly the computational time. In this paper we present our GLASS implementation and show results for the ANKA lattice. 

THP10 
GPUAccelerated Beam Dynamics Simulations with ELEGANT  


Funding: Work supported by the DOE Office of Science, Office of Basic Energy Sciences grant No. DESC0004585, and in part by TechX Corporation. Efficient implementation of generalpurpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and trackingbased lattice optimization simulations. We present the latest results of our work on accelerating Argonne National Lab's accelerator simulation code ELEGANT* using CUDAenabled GPUs**. We provide a list of ELEGANT's beamline elements ported to GPUs, identify performancelimiting factors, and briefly discuss optimization techniques for efficient utilization of the device memory space, with an emphasis on register usage. We also present a novel hardwareassisted technique for efficiently calculating a histogram from a large distribution of particle coordinates, and compare this to dataparallel implementations. Finally, we discuss results of simulations performed with realistic test lattices, and give a brief outline of future work on GPUenabled version of ELEGANT. * M. Borland, "elegant: A Flexible SDDScompliant Code for Accel. Simulation", APS LS287 (2000); Y. Wang, M. Borland, Proc. of PAC07, THPAN095 (2007) ** CUDA home page: http://www.nvidia.com/cuda 

THSDI1 
Coherent Electron Cooling Simulations for Parameters of the BNL Proofofprinciple Experiment  


Funding: Work funded by the US Department of Energy, Office of Science, Office of Nuclear Physics. Increasing the luminosity of relativistic hadron beams is critical for the advancement of nuclear physics. Coherent electron cooling promises to cool such beams significantly faster than alternative methods. We present simulations of 40 GeV/n Au^{79+} ions for a single pass, which consists of a modulator, an FEL amplifier and a kicker. In the modulator, the electron beam copropagates with the ion beam, which perturbs the electron beam density and velocity via anisotropic Debye shielding. Selfamplified spontaneous emission lasing in the FEL both amplifies and imparts wavelengthscale modulation on the electron beam perturbations. The modulated electric fields appropriately accelerate or decelerate the copropagating ions in the kicker. In analogy with stochastic cooling, these field strengths are crucial for estimating the effective drag force on the hadrons and, hence, the cooling time. The inherently 3D particle and field dynamics is modeled with the parallel VORPAL framework (modulator and kicker) and with GENESIS (amplifier), with careful coupling between codes. Physical parameters are taken from the CeC proofofprinciple experiment under development at Brookhaven National Lab. 

Slides THSDI1 [14.817 MB]  
FRSAC1  Hybrid Programming and Performance for Beam Propagation Modeling  284 


Funding: DOE ASCR (Advanced Scientific Computing Research) Program We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam propagation modeling as we move toward extremescale systems. Using an MPI programming interface for parallel algorithms, we expanded the capability of our existing electromagnetic solver to a hybrid (MPI/sharedmemory) model that can potentially use the computer resources on futuregeneration computing architecture more efficiently. As a preliminary step, we discuss a hybrid MPI/OpenMP model and demonstrate performance and analysis on the leadershipclass computing systems such as the IBM BG/P, BG/Q, and Cray XK6. Our hybrid MPI/OpenMP model achieves speedup when the computation amounts are large enough to compensate the OMP threading overhead. 

Slides FRSAC1 [4.252 MB]  
FRSAC2  Comparison of Eigenvalue Solvers for Large Sparse Matrix Pencils  287 


Funding: Work supported by the DFG through SFB 634 Efficient and accurate computation of eigenvalues and eigenvectors is of fundamental importance in the accelerator physics community. Moreover, the eigensystem analysis is generally used for the identifications of many physical phenomena connected to vibrations. Therefore, various types of algorithms such that Arnoldi, Lanczos, KrylovSchur, JacobiDavidson etc. were implemented to solve the eigenvalue problem efficiently. In this direction, we investigate the performance of selected commercial and freely available software tools for the solution of a generalized eigenvalue problem. We choose two setups by considering spherical and billiard resonators in order to test the robustness, accuracy, and computational speed and memory consumption issues of the recent versions of CST, Matlab, Pysparse, SLEPc and CEM3D. Simulations were performed on a standard personal computer as well as on a cluster computer to enable the handling of large sparse matrices in the order of hundreds of thousands up to several millions degrees of freedom. We obtain interesting comparison results with the examined solvers which is useful for choosing the appropriate solvers for a given practical application. 

Slides FRSAC2 [10.095 MB]  