Keyword: distributed
Paper Title Other Keywords Page
MOBPP03 Fault Tolerant, Scalable Middleware Services Based on Spring Boot, REST, H2 and Infinispan database, controls, operation, network 33
 
  • W. Sliwinski, K. Kaczkowski, W. Zadlo
    CERN, Geneva, Switzerland
 
  Control systems require several, core services for work coordination and everyday operation. One such example is Directory Service, which is a central registry of all access points and their physical location in the network. Another example is Authentication Service, which verifies callers identity and issues a signed token, which represents the caller in the distributed communication. Both cases are real life examples of middleware services, which have to be always available and scalable. The paper discusses design decisions and technical background behind these two central services used at CERN. Both services were designed using latest technology standards, namely Spring Boot and REST. Moreover, they had to comply with demanding requirements for fault tolerance and scalability. Therefore, additional extensions were necessary, as distributed in-memory cache (using Apache Infinispan), or Oracle database local mirroring using H2 database. Additionally, the paper will explain the tradeoffs of different approaches providing high-availability features and lessons learnt from operational usage.  
slides icon Slides MOBPP03 [6.846 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-MOBPP03  
About • paper received ※ 27 September 2019       paper accepted ※ 08 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
MOPHA112 Improving Perfomance of the MTCA System by use of PCI Express Non-Transparent Bridging and Point-To-Point PCI Express Transactions controls, embedded, ISOL 480
 
  • L.P. Petrosyan
    DESY, Hamburg, Germany
 
  The PCI Express Standard enables one of the highest data transfer rates today. However, with a large number of modules in a MTCA system and an increasing complexity of individual MTCA components along with a growing demand for high data transfer rates to client programs performance of the overall system becomes an important key parameter. Multiprocessor systems are known to provide not only the ability for higher processing bandwidth, but also allow greater system reliability through host failover mechanisms. The use of non-transparent bridges in PCI systems supporting intelligent adapters in enterprise and multiple processors in embedded systems is a well established technology. There the non-transparent bridge acts as a gateway between the local subsystem and the system backplane. This can be ported to the PCI Express standard by replacing one of the transparent switches on the PCI Express switch with a non-transparent switch. Our experience of establishing non-transparent bridging in MTCA systems will be presented.  
poster icon Poster MOPHA112 [0.452 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-MOPHA112  
About • paper received ※ 10 September 2019       paper accepted ※ 03 November 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
MOPHA160 Enabling Data Analytics as a Service for Large Scale Facilities simulation, data-analysis, software, experiment 614
 
  • K. Woods, R.J. Clegg, N.S. Cook, R. Millward
    Tessella, Abingdon, United Kingdom
  • F. Barnsely, C. Jones
    STFC/RAL, Chilton, Didcot, Oxon, United Kingdom
 
  Funding: UK Research and Innovation - Science & Technology Facilities Council (UK SBS IT18160)
The Ada Lovelace Centre (ALC) at STFC is an integrated, cross-disciplinary data intensive science centre, for better exploitation of research carried out at large scale UK Facilities including the Diamond Light Source, the ISIS Neutron and Muon Facility, the Central Laser Facility and the Culham Centre for Fusion Energy. ALC will provide on-demand, data analysis, interpretation and analytics services to worldwide users of these research facilities. Using open-source components, ALC and Tessella have together created a software infrastructure to support the delivery of that vision. The infrastructure comprises a Virtual Machine Manager, for managing pools of VMs across distributed compute clusters; components for automated provisioning of data analytics environments across heterogeneous clouds; a Data Movement System, to efficiently transfer large datasets; a Kubernetes cluster to manage on demand submission of Spark jobs. In this paper, we discuss the challenges of creating an infrastructure to meet the differing analytics needs of multiple facilities and report the architecture and design of the infrastructure that enables Data Analytics as a Service.
 
poster icon Poster MOPHA160 [1.665 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-MOPHA160  
About • paper received ※ 30 September 2019       paper accepted ※ 10 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
TUBPL05 RecSyncETCD: A Fault-tolerant Service for EPICS PV Configuration Data operation, network, EPICS, controls 714
 
  • T. Ashwarya, E.T. Berryman, M.G. Konrad
    FRIB, East Lansing, Michigan, USA
 
  Funding: Work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DESC0000661
RecCaster is an EPICS module which is responsible for uploading Process Variables (PVs) metadata from the IOC database to a central server called RecCeiver. The RecCeiver service is a custom-built application that passes this data on to the ChannelFinder, a REST-based search service. Together, RecCaster and RecCeiver form the building blocks of RecSync. RecCeiver is not a distributed service which makes it challenging to ensure high availability and fault-tolerance to its clients. We have implemented a new version of RecCaster which uploads the PV metadata to ETCD. ETCD is a commercial off-the-shelf distributed key-value store intended for high availability data storage and retrieval. It provides fault-tolerance as the service can be replicated on multiple servers to keep data consistently replicated. ETCD is a drop-in replacement for the existing RecCeiver to provide data storage and retrieval for PV metadata. Also, ETCD has a well-documented interface for client operations including the ability to live-watch the PV metadata for its clients. This paper discusses the design and implementation of RecSyncETCD as a fault-tolerant service for storing and retrieving EPICS PV metadata.
 
slides icon Slides TUBPL05 [1.099 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-TUBPL05  
About • paper received ※ 26 September 2019       paper accepted ※ 02 October 2020       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
TUBPR01 The Distributed Oscilloscope: A Large-Scale Fully Synchronised Data Acquisition System Over White Rabbit network, HOM, status, controls 725
 
  • D. Lampridis, T. Gingold, M. Malczak, F. Vaga, T. Włostowski, A. Wujek
    CERN, Geneva, Switzerland
  • M. Malczak
    Warsaw University of Technology, Institute of Electronic Systems, Warsaw, Poland
 
  A common need in large scientific experiments is the ability to monitor by means of simultaneous data acquisition across the whole installation. Data is acquired as a result of triggers which may either come from external sources, or from internal triggering of one of the acquisition nodes. However, a problem arises from the fact that once the trigger is generated, it will not arrive to the receiving nodes simultaneously, due to varying distances and environmental conditions. The Distributed Oscilloscope (DO) concept attempts to address this problem by leveraging the sub-nanosecond synchronization and deterministic data delivery provided by White Rabbit (WR) and augmenting it with automatic discovery of acquisition nodes and complex trigger event scheduling, in order to provide the illusion of a virtual oscilloscope. This paper presents the current state of the DO, including work done on the FPGA and software level to enhance existing acquisition hardware, as well as a new protocol based on existing industrial standards. It also includes test results obtained from a demonstrator based on two digitizers separated by a 10 km optical fiber, used as a showcase of the DO concept.  
slides icon Slides TUBPR01 [10.026 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-TUBPR01  
About • paper received ※ 27 September 2019       paper accepted ※ 10 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
WEDPL02 AliECS: A New Experiment Control System for the Alice Experiment controls, detector, experiment, operation 956
 
  • T. Mrnjavac, K. Alexopoulos, V. Chibante Barroso, G.C. Raduta
    CERN, Geneva, Switzerland
 
  The ALICE Experiment at CERN LHC (Large Hadron Collider) is undertaking during Long Shutdown 2 in 2019-2020 a major upgrade, which includes a new computing system called O² (Online-Offline). To ensure the efficient operation of the upgraded experiment along with its newly designed computing system, a reliable, high performance and automated experiment control system is being developed with the goal of managing all O² synchronous processing software, and of handling the data taking activity by interacting with the detectors, the trigger system and the LHC. The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. Such technologies will allow the ALICE collaboration to benefit from a vibrant and innovating open source community. This communication illustrates the AliECS architecture. It provides an in-depth overview of the system’s components, features and design elements, as well as its performance. It also reports on the experience with AliECS as part of ALICE Run 3 detector commissioning setups.  
slides icon Slides WEDPL02 [2.858 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WEDPL02  
About • paper received ※ 30 September 2019       paper accepted ※ 09 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
WEMPR009 Development of Event Receiver on Zynq-7000 Evaluation Board timing, controls, FPGA, linac 1063
 
  • H. Sugimura
    KEK, Ibaraki, Japan
 
  The timing system of SuperKEKB accelerator is used Event Timing System developed by Micro Research Finland. In this presentation, we tested the receiver on Zynq7000 evaluation board. The serialized event data are transferred from Event Generator to Event Receiver by using GTX transceiver. So, we selected Zynq7000(7z030) as receiver, because the FPGA has the GTX. And also, Zynq is mounted on arm processor, it is easily able to control received event data stream by using EPICS ICO. Finally we are aiming to combine event system and RF or BPM system in one FPGA board.  
poster icon Poster WEMPR009 [0.572 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WEMPR009  
About • paper received ※ 17 September 2019       paper accepted ※ 09 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
WEPHA020 Pushing the Limits of Tango Archiving System using PostgreSQL and Time Series Databases TANGO, database, controls, SRF 1116
 
  • R. Bourtembourg, S. James, J.L. Pons, P.V. Verdier
    ESRF, Grenoble, France
  • G. Cuní, S. Rubio-Manrique
    ALBA-CELLS Synchrotron, Cerdanyola del Vallès, Spain
  • M. Di Carlo
    INAF - OAAB, Teramo, Italy
  • G.A. Fatkin, A.I. Senchenko, V. Sitnov
    NSU, Novosibirsk, Russia
  • G.A. Fatkin, A.I. Senchenko, V. Sitnov
    BINP SB RAS, Novosibirsk, Russia
  • L. Pivetta, C. Scafuri, G. Scalamera, G. Strangolino, L. Zambon
    Elettra-Sincrotrone Trieste S.C.p.A., Basovizza, Italy
 
  The Tango HDB++ project is a high performance event-driven archiving system which stores data with micro-second resolution timestamps, using archivers written in C++. HDB++ supports MySQL/MariaDB and Apache Cassandra backends and has been recently extended to support PostgreSQL and TimescaleDB*, a time-series PostgreSQL extension. The PostgreSQL backend has enabled efficient multi-dimensional data storage in a relational database. Time series databases are ideal for archiving and can take advantage of the fact that data inserted do not change. TimescaleDB has pushed the performance of HDB++ to new limits. The paper will present the benchmarking tools that have been developed to compare the performance of different backends and the extension of HDB++ to support TimescaleDB for insertion and extraction. A comparison of the different supported back-ends will be presented.
https://timescale.com
 
poster icon Poster WEPHA020 [1.609 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WEPHA020  
About • paper received ※ 30 September 2019       paper accepted ※ 02 November 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
WEPHA103 Backward Compatible Update of the Timing System of WEST FPGA, network, timing, controls 1338
 
  • Y. Moudden, A. Barbuti, G. Caulier, T. Poirier, B. Santraine, B. Vincent
    CEA/DRF/IRFM, St Paul Lez Durance, France
 
  Between 2013 and 2016, the tokamak Tore Supra in operation at Cadarache (CEA-France) since 1988 underwent a major upgrade following which it was renamed WEST (Tungsten [W] Environment in Steady state Tokamak). The synchronization system however was not upgraded since 1999*. At the time, a robust design was achieved based on AMD’s TAXI chip**: clock and events are distributed from a central emitter over a star shaped network of simplex optical links to electronic crates around the tokamak. Unfortunately, spare boards were not produced in sufficient quantities and the TAXI is obsolete. In fact, multigigabit serial communication standards question the future availability of any such low rate SerDeses. Designing replacement boards provides an opportunity for a new CDR solution and extended functionalities (loss-of-lock detection, latency monitoring). Backward compatibility is a major constraint given the lack of resources for a full upgrade. We will first describe the current state of the timing network of WEST, then the implementation of a custom CDR in full firmware, using the IOSerDeses of Xilinx FPGAs and will finally provide preliminary results on development boards.
*"Upgrade of the timing system for Tore Supra long pulses", D. Moulin et al. IEEE RealTime Conference 1999
**http://hep.uchicago.edu/~thliu/projects/Pulsar/otherdoc/TAXIchip.pdf
 
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WEPHA103  
About • paper received ※ 30 September 2019       paper accepted ※ 03 October 2020       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
 
WESH2003 Toward Continuous Delivery Of A Nontrivial Distributed Software System software, controls, operation, monitoring 1511
 
  • S. Wai
    SARAO, Cape Town, South Africa
 
  Funding: SKA South Africa National Research Foundation of South Africa Department of Science and Technology
The MeerKAT Control and Monitoring(CAM) solution is a mature software system that has undergone multiple phases of construction and expansion. It is a distributed system with a run-time environment of 15 logical nodes featuring dozens of interdependent, short-lived processes that interact with a number of long-running services. This presents a challenge for the development team to balance operational goals with continued discovery and development of useful enhancements for its users (astronomers, telescope operators). Continuous Delivery is a set of practices designed to always keep software in a releasable state. It employs the discipline of release engineering to optimise the process of taking changes from source control to production. In this paper, we review the current path to production (build, test and release) of CAM, identify shortcomings and introduce approaches to support further incremental development of the system. By implementing patterns such as deployment pipelines and immutable release candidates we hope to simplify the release process and demonstrate increased throughput of changes, quality and stability in the future
 
slides icon Slides WESH2003 [2.933 MB]  
poster icon Poster WESH2003 [1.448 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WESH2003  
About • paper received ※ 30 September 2019       paper accepted ※ 09 October 2019       issue date ※ 30 August 2020  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)