# Beam Monitoring and Control with FPGA Based

Electronics

Nathan Eddy Fermilab

## What Can You Do With FPGA Hardware

Nathan Eddy Fermilab

## Outline

- FPGA Overview
- Examples
  - Fermilab Digital Instrumenation
  - Next Generation WiFi
- Commercial Hardware
- Future Perspectives

#### Field Programmable Gate Arrays or... System on a Programmable Chip

- Provide reconfigurable hardware implementations in a single chip
  - Combine the speed of hardware with flexibility usually associated with software
- Features vary from basic logic and I/O to complete systems with dedicated RAM, DSP, Clock Management, and advanced I/O
- Used in a wide variety of applications
  - Cell Phones, Wireless, Radar, Image Processing, etc



## Glossary

- DSP Digital Signal Processing or Digital Signal Processor
- HDL Hardware Definition Language used to code and simulate logic
- RTL Register Transfer Level, the logic level code for an FPGA, can be used for full timing and logic simulations
- EDA Electronic Design Application general term for all programs which assist with system design

#### The Bells and Whistles or...

#### What's in there?

- Up to 180k Logic Elements
- Up to 10MBits of RAM
  - Able to implement true dual port ram & FIFOs
- Dedicated DSP blocks running at up to 500MHz
  - Able to implement multiplies, multiplyaccumulate, and FIR Filters
- Sophisticated Clock Management
  - Multiple PLLs for multiply, divide, and phase shifting
  - Dedicated Clock networks throughout the chip



- Single-end and Differential I/O for all common standards
  - Up to 1100 user defined pins
  - Very Flexible & Configurable

#### Embedded Systems

- Hard Core Embedded processor is a dedicated physical component of the chip, separate from the programmable logic
  - 2-4 times faster than Soft Core
- Soft Core Embedded processor is built out of the programmable logic on the chip
  - A 32 bit RISC processor uses about few percent of total resources
- Implementation of existing code directly in the FPGA without having to develop HDL code

#### Hard Core Resource Allocation



#### Soft Core Resource Allocation



Nathan Eddy

#### Benefits of FPGAs and Digital Design

- Extreme flexibility inherent in FPGA
  - Algorithms and functionality can be changed and updated as needed
  - Code base which can be used for multiple projects
  - Intellectual Property (IP) cores provide off the shelf solutions for many interfaces and DSP applications
- The speed of parallel processing
  - Can perform up to 512 multiplies using dedicated blocks
- The Pipeline nature of FPGA logic is able to satisfy rigorous and well defined timing requirements
- Digital Design provides a straightforward simulation path for design development and verification
  - A variety of commercial tools available MatLab, SimuLink, AccelDSP, etc

#### From Design to Implementation

- FPGA tool is used to generate RTL for configuring FPGA
  - Provides simulation but can be tedious
- Write HDL code and feed it to the FPGA tools
  - Can use IP like subroutines
- MatLab/Simulink to FPGA tools with 3<sup>rd</sup> Party synthesis tool
- C to FPGA tools with 3<sup>rd</sup> Party tools
- Working towards plug and play orientated design
  - Easier to learn
  - Faster development



## FPGA Examples

- Fermilab Digital Damper for Beam Instrumentation Applications
  - Transverse instability damper
  - Adaptive feedforward RF correction
  - Processing signals for HOM BPM from Tesla Cavities
- Next Generation WiFi Proto-typing
  - Single FPGA MIMO Testbed
  - Full FPGA implemented MIMO system

#### Fermilab Digital Damper Hardware

- Board Overview
  - Large FPGA with fast ADC's and DAC's
  - Can synchronize with external clock
  - All interfaces in FPGA
  - Designed to solve a wide range of problems
- Transverse Damper in the Recycler
- Adaptive RF Correction for Recycler LLRF
- High Order Mode BPM testing at DESY



#### Recycler Transverse Instability Damper



Similar for Horizontal

- Provide negative feedback to damp out transverse instabilities
  - Uses Pipeline ability of the FPGA to control delays
- After commissioning, had one occurrence where the damper caused beam loss when the input RF clock went away during a Low Level RF system reset
  - Implemented RF clock verification algorithm in the FPGA which disables the damper and reports and error back to the control system
- Work underway to implement more sophisticated diagnostics and monitoring capabilities
  - Currently only using 10% of the FPGA resources





Nathan Eddy

#### Recycler Low Level RF



- Stored beam contained within barrier bucket RF – typical buckets are 2kV
- Needed to correct distortions on Recycler Low Level RF barrier buckets
  - Amplifier response, cables, cavity response
  - Need a better than 1% correction on 2kV barriers





Nathan Eddy

#### High Order Mode BPM in an FPGA Digitizer

- Beam Position in 4D (X, X', Y, Y') can be determined from High Order Dipole Modes taken out of Superconducting RF Cavities
  - Resolution to a few μm
- The amplitudes are found by taking a dot product with eigenvectors in MatLab
  - Eigenvectors are found from SVD of calibration data
  - Could also use sin & cos
- This can be done in the digitizer hardware very efficiently by an FPGA
  - This can improve the processing time by 5-6 orders of magnitude for online measurements
  - Proto-type design was tested in August at DESY using VME Digital Damper



Nathan Eddy

#### Next Generation Wireless

- Will use Multiple Input Multiple Output (MIMO) Technology
  - A key element is multipath which refers to reflections of RF waves in a physical environment
  - Spatially multiplexed orthogonal frequency multiplexing MIMO
  - Uses reflections to tune system performance and minimize errors
  - Increase transfer rate based upon multiple antennas
    - More data and better spectral density
- Requires very sophisticated algorithms to run realtime
  - □ Scattering, diffraction, absorption are all considerations
  - Need a channel model to account for all of these effects
    - Software based mathematical models
  - Need a real world environment the MIMO system will operate in
    - Requires ability to rapidly tune transmitter and receiver

#### MIMO System



- Use 2 to 4 antennas
  - Data Rate = BW \* Ess \* Num Attennas
  - BW is 20MHz/40MHz, Ess is spectral efficiency 2.7 to 3.6bps/Hz
- Use Linear Algebra to decouple the channel matrix in spatial domain and recover the transmitted data
  - Matrix inversion, maximum likelihood detection (MLD), and SVD used to resolve signal
- Has been demonstrated on FPGA based hardware at speeds up to 1Gbps
  - Flexibility of FPGA hardware allows fast testing and of new algorithms
  - Algorithm development and FPGA implementation and testing at same time

#### Transmitted Data



**Received Signal** 

 $\mathbf{r} = \mathbf{H}\mathbf{x} + \mathbf{v}$ 

#### FPGA Single Chip Testbed For MIMO Algorithms



- Entirely digital system developed by research team at Kyoto University
- Received Signal Generator is used to simulate incoming signals from 4 antennas
- Automatic Gain Controller returns the upper 8 bits of each signal
- The Maximum Likelihood Detector uses a correlation metric to maximize

$$\mu_{\mathrm{C}} \triangleq \Re \Big[ \boldsymbol{g}^{\dagger} \boldsymbol{\hat{x}} - \sum_{i=1}^{M-1} \sum_{j=i+1}^{M} a_{i,j} b_{i,j} \Big], \qquad g_{m} \triangleq \boldsymbol{\hat{h}}_{m}^{\dagger} \boldsymbol{y}, \qquad a_{i,j} \triangleq \boldsymbol{\hat{h}}_{i}^{\dagger} \boldsymbol{\hat{h}}_{j},$$

- The Bit Error Rate counter keeps track the of decoding errors
- Data Rates over 900Mbps have been achieved

#### Full FPGA MIMO WiFi Implementation



- Developed by research team at Laval University using Lyrtech commercial hardware
- The transmitter block performs some complicated manipulations of the output data to improve signal quality, maximize power for a given channel map, and double the BW
- The decoder block constructs the covariance matrix

$$\mathbf{R}_{\mathbf{x}\mathbf{x}} = \sum_{n=1}^{N} \hat{\mathbf{c}}_{n} \hat{\mathbf{c}}_{n}^{H} + \mathbf{I}\sigma_{n}^{2}$$

- I is the identity matrix and  $\sigma_n$  is the thermal noise on each antenna
- It then exploits the covariance matrix to perform matrix inversions simultaneously with a custom matrix inversion algorithm – Layered Space Time Decoder
- Able to achieve data rates up to 1Gbps

#### Commercial Off the Shelf Hardware Available

- General purpose FPGA based hardware is becoming more and more common
  - Variety of platforms VME, PCI
  - Lot's of I/O options fast serial links, gigabit ethernet
- Many products include software for nearly plug and play system development
- The application and use of FPGA based solutions is expanding rapidly and should prompt further commercial products

#### Wildstar 4 PCI Processor Blades



- Developed by Annapolis Microsystems, Inc
- Up to four Xilinx Virtex 4 FPGAs with up to 3.5GBytes of DDR2 RAM
  500MHz processing, 450MHz embedded PowerPCs
- PCI or PCI-X with high speed DMA multichannel PCI controller
- API and device drivers for Windows and Linux
- Accepts 1 or 2 commercial I/O Boards
  - □ ADC boards with 2.5GHz, 8 bits to 130MHz, 16 bits
  - DAC boards from 2.3GSps, 12 bits to 600MSps, 16 bits
  - Universal 3Gbit I/O Rocket I/O, 10G Ethernet
  - Tri XFP 10 Gigabit Fiber Optic I/O

#### Wildstar Corefore Design Suite



#### Future Perspectives for FPGA Systems

- Sure things...
  - FPGA are now the acknowledged leader of cutting edge fast DSP applications where speed and flexibility are needed
  - Accelerator Control and Instrumentation is already using FPGAs to implement fast online applications, especially feedback & control
  - The size, speed, and feature sets are growing by leaps and bounds
  - Design tools are getting closer to traditional programming and becoming easier to use
- Looks promising…
  - Use of FPGA's to implement online orbit measurements and optic calcuations which could be used for realtime feedback
  - The next step is cluster and mesh architectures using FPGAs to further increase the processing power
- It could happen..
  - FPGA based co-processors for dedicated calculations
  - FPGA based super computers which configure their hardware to optimize the performance for the algorithms being used

#### Backup Slides

#### Parallel and Pipelined



# Implementing Matrix Operations on an FPGA



- Develop and test linear algebra algorithms required for MIMO in Matlab
- Then need to explore FPGA implementations to optimize performance to meet requirements
- This process can be using a dedicated design tool like AcceIDSP to explore implementation options

#### FPGA Implementation Performance

- Givens Rotation is a common algorithm used to solve symmetric eigenvalue problems
- Can be implemented with CORDIC approximation or by using multipliers
  - Substantial performance increase possible with parallel multipliers

function [v, w] = givens rotation(x, y) r sqr = x(1)\*x(1) + y(1)\*y(1);r inv = 1/sqrt(r sqr); sin phi = y(1)\*r inv;  $\cos phi = x(1) * r inv;$ vt = x\*cos phi + y\*sin phi; wt = y\*cos phi - x\*sin phi; if (x(1) == 0) & (y(1) == 0)v = x: w = v: else w = wt: v = vt: end

| Architecture               | DSP48s | Slices | Throughput |
|----------------------------|--------|--------|------------|
| Resource Shared Multiplier | 26     | 943    | 2.8 MSPS   |
| Parallel Multipliers       | 46     | 1774   | 54 MSPS    |
| Resource Shared CORDIC     | 0      | 870    | 1.3 MSPS   |
| Parallel CORDIC            | 0      | 2237   | 5.7 MSPS   |

#### Recycler Transverse Damper Response





Nathan Eddy

#### Dipole Mode Response



Beam position offset produces mode amplitude porportional to position X charge



Beam at angle produces signal at start of structure, cancels at end of structure: Result is "derivitive" like signal, 90 degrees out of phase with position signal

Amplitude is proportional to Angle X charge X cavity length

- \* \* <del>\*</del> <del>\*</del> \*

Bunch tilt signal produces a signal with the same phase as the beam angle signal

Amplitude is proportional to Tilt X charge X bunch length

Not significant for the DESY TTF (bunches are very short)

## Analog IO Needed for Accelerator Applications

- Need to work with analog input signals
  - Beam pickups, Schottky detectors, Torroids, etc
  - Requires Analog to Digital Converters (ADCs)
  - Current ADC performance
    - Up to 2-3 GSPS with 8 bit precision
    - Up to 500 MSPS with 12 bit precision
    - Up to 100 MSPS with 14 bit precision
- Need to produce analog output signals
  - □ To act on the beam RF, kick signals, etc
  - Require Digital to Analog Converters (DACs)
  - Current DAC performance
    - Up to 1 GSPS with 14 bit precision
- The effectiveness of FPGA solutions is largely dominated by the performance of the converters
  - Mixers can be used to downconvert or upconvert the signals

#### PBar Downconverter Digitizer



#### Downconverter Digitizer as BPM

- Designed as used as a BPM in the transfer line downstream of the Antiproton production target
- The original system was unable to see the antiproton signals due to small amplitude signals and kicker noise
- New system allows position measurements on antiproton beam with ~100µm of resolution



#### PBar Downconverter as a Programmable Trigger Module

- The Sampled Bunch Display (SBD) system is a scope based system used to record bunch data for analysis
- Implemented a programmable delay trigger state machine to provide scope triggers
  - The triggers are synchronized with the turn marker from the RF system
  - As an extra, we also implemented readback of machine parameters like momentum on each trigger
- The system was developed and commissioned in about a week



#### NimBin FPGA Development Board

- Developed from initial Downconverter Digitizer as cost effective general purpose FPGA instrumentation
- Similar to commercial development boards available but targetted for use in Fermilab
- NimBin provides low overhead
- Basic hardware for many applications
  - Monitoring and Feedback
  - Sophisticated Test Signal Source
  - Interface Testing

