ICALEPCS2017 - List of Keywords (distributed)

Paper	Title	Other Keywords	Page
TUPHA011	A New Distributed Control System for the Consolidation of the CERN Tertiary Infrastructures	ion, controls, interface, monitoring	390
	L. Scibile, C. Martel, P. Villeton Pachot CERN, Geneva, Switzerland
	The operation of the CERN tertiary infrastructures is carried out via a series of control systems distributed over the CERN sites. The scope comprises: 260 buildings, 2 large heating plants with 27 km heating network and 200 radiators circuits, 500 air handling units, 52 chillers, 300 split systems, 3000 electric boards and 100k light points. With the infrastructure consolidations, CERN is carrying out a migration and an extension of the old control systems dated back to the 70's, 80's and 90's to a new simplified, yet innovative, distributed control system aimed at minimizing the programming and implementation effort, standardizing equipment and methods and reducing lifecycle costs. This new methodology allows for a rapid development and simplified integration of the new controlled infrastructure processes. The basic principle is based on open standards PLC technology that allows to easily interface to a large range of proprietary systems. The local and remote operation and monitoring is carried out seamlessly with Web HMIs that can be accessed via PC, touchpads or mobile devices. This paper reports on the progress and future challenges of this new control system.
	Poster TUPHA011 [1.662 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUPHA011
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

TUPHA013	Accelerator Fault Tracking at CERN	ion, operation, controls, target	397
	C. Roderick, L. Burdzanowski, D. Martin Anido, S. Pade, P. Wilk CERN, Geneva, Switzerland
	CERNs Accelerator Fault Tracking (AFT) system aims to facilitate answering questions like: "Why are we not doing Physics when we should be?" and "What can we do to increase machine availability?" People have tracked faults for many years, using numerous, diverse, distributed and un-related systems. As a result, and despite a lot of effort, it has been difficult to get a clear and consistent overview of what is going on, where the problems are, how long they last for, and what is the impact. This is particularly true for the LHC, where faults may induce long recovery times after being fixed. The AFT project was launched in February 2014 as collaboration between the Controls and Operations groups with stakeholders from the LHC Availability Working Group (AWG). The AFT system has been used successfully in operation for LHC since 2015, yielding a lot of attention and generating a growing user community. In 2017 the scope has been extended to cover the entire Injector Complex. This paper will describe the AFT system and the way it is used in terms of architecture, features, user communities, workflows and added value for the organisation.
	Poster TUPHA013 [3.835 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUPHA013
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA036	Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization	ion, operation, data-analysis, framework	1436
	S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth CERN, Geneva, Switzerland M. Zenha-Rela University of Coimbra, Coimbra, Portugal
	Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.
	Poster THPHA036 [0.280 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA043	Lightflow - a Lightweight, Distributed Workflow System	ion, synchrotron, EPICS, experiment	1457
	A. Moll, R. Clarken, P. Martin, S.T. Mudie SLSA-ANSTO, Clayton, Australia
	The Australian Synchrotron, located in Clayton, Melbourne, is one of Australia's most important pieces of research infrastructure. After more than 10 years of operation, the beamlines at the Australian Synchrotron are well established and the demand for automation of research tasks is growing. Such tasks routinely involve the reduction of TB-scale data, online (realtime) analysis of the recorded data to guide experiments, and fully automated data management workflows. In order to meet these demands, a generic, distributed workflow system was developed. It is based on well-established Python libraries and tools. The individual tasks of a workflow are arranged in a directed acyclic graph and one or more directed acyclic graphs form a workflow. Workers consume the tasks, allowing the processing of a workflow to scale horizontally. Data can flow between tasks and a variety of specialised tasks is available. Lightflow has been released as open source on the Australian Synchrotron GitHub page
	Poster THPHA043 [0.582 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA043
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA045	Packaging and High Availability for Distributed Control Systems	ion, software, controls, framework	1465
	M.A. Araya, L. Pizarro, H.H. von Brand UTFSM, Valparaíso, Chile
	Funding: Centro Científico Tecnológico de Valparaíso (CONICYT FB-0821) Advanced Center for Electrical and Electronic Engineering (CONICYT FB-0008) The ALMA Common Software (ACS) is a distributed framework used for control of astronomical observatories, which is built and deployed using roughly the same tools available at its design stage. Due to a shallow and rigid dependency management, the strong modularity principle of the framework cannot be exploited for packaging, installation and deployment. Moreover, life-cycle control of its components does not comply with standardized system-based mechanisms. These problems are shared by other instrument-based distributed systems. The new high-availability requirements of modern projects, such as the Cherenkov Telescope Array, tend to be implemented as new software features due to these problems, rather than using off-the-shelf and well-tested platform-based technologies. We present a general solution for high availability strongly-based on system services and proper packaging. We use RPM Packaging, oVirt and Docker as the infrastructure managers, Pacemaker as the software resource orchestrator and life-cycle process control through Systemd. A prototype for ACS was developed to handle its services and containers.
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA045
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA109	Improving the Safety and Protective Automatic Actions of the CMS Electromagnetic Calorimeter Detector Control System	ion, controls, detector, software	1639
	R.J. Jiménez Estupinan, D.R.S. Di Calafiori, G. Dissertori, L. Djambazov, W. Lustermann, S. Zelepoukine ETH, Zurich, Switzerland P. Adzic, P. Cirkovic, D. Jovanovic, P. Milenovic University of Belgrade, Belgrade, Serbia S. Zelepoukine UW-Madison/PD, Madison, Wisconsin, USA
	The CMS ECAL Detector Control System (DCS) features several monitoring mechanisms able to react and perform automatic actions based on pre-defined action matrices. The DCS is capable of early detection of anomalies inside the ECAL and on its off-detector support systems, triggering automatic actions to mitigate the impact of these events and preventing them from escalating to the safety system. The treatment of such events by the DCS allows for a faster recovery process, better understanding of the development of issues, and in most cases, actions with higher granularity than the safety system. This paper presents the details of the DCS automatic action mechanisms, as well as their evolution based on several years of CMS ECAL operations.
	Poster THPHA109 [1.333 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA109
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

Paper

Title

Other Keywords

Page

TUPHA011

A New Distributed Control System for the Consolidation of the CERN Tertiary Infrastructures

ion, controls, interface, monitoring

390

L. Scibile, C. Martel, P. Villeton Pachot
CERN, Geneva, Switzerland

The operation of the CERN tertiary infrastructures is carried out via a series of control systems distributed over the CERN sites. The scope comprises: 260 buildings, 2 large heating plants with 27 km heating network and 200 radiators circuits, 500 air handling units, 52 chillers, 300 split systems, 3000 electric boards and 100k light points. With the infrastructure consolidations, CERN is carrying out a migration and an extension of the old control systems dated back to the 70's, 80's and 90's to a new simplified, yet innovative, distributed control system aimed at minimizing the programming and implementation effort, standardizing equipment and methods and reducing lifecycle costs. This new methodology allows for a rapid development and simplified integration of the new controlled infrastructure processes. The basic principle is based on open standards PLC technology that allows to easily interface to a large range of proprietary systems. The local and remote operation and monitoring is carried out seamlessly with Web HMIs that can be accessed via PC, touchpads or mobile devices. This paper reports on the progress and future challenges of this new control system.

Poster TUPHA011 [1.662 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUPHA011

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

TUPHA013

Accelerator Fault Tracking at CERN

ion, operation, controls, target

397

C. Roderick, L. Burdzanowski, D. Martin Anido, S. Pade, P. Wilk
CERN, Geneva, Switzerland

CERNs Accelerator Fault Tracking (AFT) system aims to facilitate answering questions like: "Why are we not doing Physics when we should be?" and "What can we do to increase machine availability?" People have tracked faults for many years, using numerous, diverse, distributed and un-related systems. As a result, and despite a lot of effort, it has been difficult to get a clear and consistent overview of what is going on, where the problems are, how long they last for, and what is the impact. This is particularly true for the LHC, where faults may induce long recovery times after being fixed. The AFT project was launched in February 2014 as collaboration between the Controls and Operations groups with stakeholders from the LHC Availability Working Group (AWG). The AFT system has been used successfully in operation for LHC since 2015, yielding a lot of attention and generating a growing user community. In 2017 the scope has been extended to cover the entire Injector Complex. This paper will describe the AFT system and the way it is used in terms of architecture, features, user communities, workflows and added value for the organisation.

Poster TUPHA013 [3.835 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUPHA013

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA036

Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization

ion, operation, data-analysis, framework

1436

S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth
CERN, Geneva, Switzerland
M. Zenha-Rela
University of Coimbra, Coimbra, Portugal

Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.

Poster THPHA036 [0.280 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA043

Lightflow - a Lightweight, Distributed Workflow System

ion, synchrotron, EPICS, experiment

1457

A. Moll, R. Clarken, P. Martin, S.T. Mudie
SLSA-ANSTO, Clayton, Australia

The Australian Synchrotron, located in Clayton, Melbourne, is one of Australia's most important pieces of research infrastructure. After more than 10 years of operation, the beamlines at the Australian Synchrotron are well established and the demand for automation of research tasks is growing. Such tasks routinely involve the reduction of TB-scale data, online (realtime) analysis of the recorded data to guide experiments, and fully automated data management workflows. In order to meet these demands, a generic, distributed workflow system was developed. It is based on well-established Python libraries and tools. The individual tasks of a workflow are arranged in a directed acyclic graph and one or more directed acyclic graphs form a workflow. Workers consume the tasks, allowing the processing of a workflow to scale horizontally. Data can flow between tasks and a variety of specialised tasks is available. Lightflow has been released as open source on the Australian Synchrotron GitHub page

Poster THPHA043 [0.582 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA043

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA045

Packaging and High Availability for Distributed Control Systems

ion, software, controls, framework

1465

M.A. Araya, L. Pizarro, H.H. von Brand
UTFSM, Valparaíso, Chile

Funding: Centro Científico Tecnológico de Valparaíso (CONICYT FB-0821) Advanced Center for Electrical and Electronic Engineering (CONICYT FB-0008)
The ALMA Common Software (ACS) is a distributed framework used for control of astronomical observatories, which is built and deployed using roughly the same tools available at its design stage. Due to a shallow and rigid dependency management, the strong modularity principle of the framework cannot be exploited for packaging, installation and deployment. Moreover, life-cycle control of its components does not comply with standardized system-based mechanisms. These problems are shared by other instrument-based distributed systems. The new high-availability requirements of modern projects, such as the Cherenkov Telescope Array, tend to be implemented as new software features due to these problems, rather than using off-the-shelf and well-tested platform-based technologies. We present a general solution for high availability strongly-based on system services and proper packaging. We use RPM Packaging, oVirt and Docker as the infrastructure managers, Pacemaker as the software resource orchestrator and life-cycle process control through Systemd. A prototype for ACS was developed to handle its services and containers.

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA045

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA109

Improving the Safety and Protective Automatic Actions of the CMS Electromagnetic Calorimeter Detector Control System

ion, controls, detector, software

1639

R.J. Jiménez Estupinan, D.R.S. Di Calafiori, G. Dissertori, L. Djambazov, W. Lustermann, S. Zelepoukine
ETH, Zurich, Switzerland
P. Adzic, P. Cirkovic, D. Jovanovic, P. Milenovic
University of Belgrade, Belgrade, Serbia
S. Zelepoukine
UW-Madison/PD, Madison, Wisconsin, USA

The CMS ECAL Detector Control System (DCS) features several monitoring mechanisms able to react and perform automatic actions based on pre-defined action matrices. The DCS is capable of early detection of anomalies inside the ECAL and on its off-detector support systems, triggering automatic actions to mitigate the impact of these events and preventing them from escalating to the safety system. The treatment of such events by the DCS allows for a faster recovery process, better understanding of the development of issues, and in most cases, actions with higher granularity than the safety system. This paper presents the details of the DCS automatic action mechanisms, as well as their evolution based on several years of CMS ECAL operations.

Poster THPHA109 [1.333 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA109

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)