ICALEPCS2017 - List of Keywords (data-analysis)

Paper	Title	Other Keywords	Page
TUCPA01	Data Analysis Support in Karabo at European XFEL	ion, experiment, FEL, controls	245
	H. Fangohr, M. Beg, V. Bondar, D. Boukhelef, S. Brockhauser, C. Danilevski, W. Ehsan, S.G. Esenov, G. Flucke, G. Giovanetti, D. Goeries, S. Hauf, B.C. Heisen, D.G. Hickin, D. Khakhulin, A. Klimovskaia, M. Kuster, P.M. Lang, L.G. Maia, L. Mekinda, T. Michelat, A. Parenti, G. Previtali, H. Santos, A. Silenzi, J. Sztuk-Dambietz, J. Szuba, M. Teichmann, K. Weger, J. Wiggins, K. Wrona, C. Xu XFEL. EU, Schenefeld, Germany S. Aplin, A. Barty, M. Kuhn, V. Mariani CFEL, Hamburg, Germany T. Kluyver University of Southampton, Southampton, United Kingdom
	We describe the data analysis structure that is integrated into the Karabo framework [1] to support scientific experiments and data analysis at European XFEL GmbH. The photon science experiments have a range of data analysis requirements, including online (i.e. near real-time during the actual measurement) and offline data analysis. The Karabo data analysis framework supports execution of automatic data analysis for routine tasks, supports complex experiment protocols including data analysis feedback integration to instrument control, and supports integration of external applications. The online data analysis is carried out using distributed and accelerator hardware (such as GPUs) where required to balance load and achieve near real-time data analysis throughput. Analysis routines provided by Karabo are implemented in C++ and Python, and make use of established scientific libraries. The XFEL control and analysis software team collaborates with users to integrate experiment specific analysis codes, protocols and requirements into this framework, and to make it available for the experiments and subsequent offline data analysis. [1] Heisen et al (2013) "Karabo: An Integrated Software Framework Combining Control, Data Management, and Scientific Computing Tasks". Proc. of 14th ICALEPCS 2013, Melbourne, Australia (p. FRCOAAB02)
	Slides TUCPA01 [10.507 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUCPA01
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA036	Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization	ion, operation, distributed, framework	1436
	S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth CERN, Geneva, Switzerland M. Zenha-Rela University of Coimbra, Coimbra, Portugal
	Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.
	Poster THPHA036 [0.280 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA186	Parallel Execution of Sequential Data Analysis	ion, GUI, GPU, controls	1877
	J.F.J. Murari, K. Klementiev MAX IV Laboratory, Lund University, Lund, Sweden
	The Parallel Execution of Sequential Data Analysis (ParSeq) software has been developed to work on large data sets of thousands spectra of a thousand points each. The main goal of this tool is to perform spectroscopy analysis without delays on the large amount of data that will be generated on Balder beamline at Max IV . ParSeq was developed using Python and PyQt and can be operated via scripts or graphical user interface (GUI). The pipeline is consisted of nodes and transforms. Each node generally has a common group of components: data manager (also serves as legend), data combiner, metadata viewer, transform dialog, help panel and a plot window (from silx library ) as main element. The transforms connect nodes, applying the respective parameters in the active data. It is also possible to create cross-data linear combinations (e.g. averaging, RMS or PCA) and propagate them downstream. Calculations will be done with parallel execution on GPU. The GUI is very flexible and user-friendly, containing splitters, dock widgets, colormaps and undo/redo options. The features mentioned are missing in other analysis platforms what justifies the creation of ParSeq. Klementiev, K., et al. "The BALDER Beamline at the MAX IV Laboratory" Journal of Physics: Conference Series. IOP Publishing, 2016 ** Scientific Library for eXperimentalists - http://www.silx.org/
	Poster THPHA186 [0.407 MB]
DOI •	reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA186
Export •	reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

Paper

Title

Other Keywords

Page

TUCPA01

Data Analysis Support in Karabo at European XFEL

ion, experiment, FEL, controls

245

H. Fangohr, M. Beg, V. Bondar, D. Boukhelef, S. Brockhauser, C. Danilevski, W. Ehsan, S.G. Esenov, G. Flucke, G. Giovanetti, D. Goeries, S. Hauf, B.C. Heisen, D.G. Hickin, D. Khakhulin, A. Klimovskaia, M. Kuster, P.M. Lang, L.G. Maia, L. Mekinda, T. Michelat, A. Parenti, G. Previtali, H. Santos, A. Silenzi, J. Sztuk-Dambietz, J. Szuba, M. Teichmann, K. Weger, J. Wiggins, K. Wrona, C. Xu
XFEL. EU, Schenefeld, Germany
S. Aplin, A. Barty, M. Kuhn, V. Mariani
CFEL, Hamburg, Germany
T. Kluyver
University of Southampton, Southampton, United Kingdom

We describe the data analysis structure that is integrated into the Karabo framework [1] to support scientific experiments and data analysis at European XFEL GmbH. The photon science experiments have a range of data analysis requirements, including online (i.e. near real-time during the actual measurement) and offline data analysis. The Karabo data analysis framework supports execution of automatic data analysis for routine tasks, supports complex experiment protocols including data analysis feedback integration to instrument control, and supports integration of external applications. The online data analysis is carried out using distributed and accelerator hardware (such as GPUs) where required to balance load and achieve near real-time data analysis throughput. Analysis routines provided by Karabo are implemented in C++ and Python, and make use of established scientific libraries. The XFEL control and analysis software team collaborates with users to integrate experiment specific analysis codes, protocols and requirements into this framework, and to make it available for the experiments and subsequent offline data analysis.
[1] Heisen et al (2013) "Karabo: An Integrated Software Framework Combining Control, Data Management, and Scientific Computing Tasks". Proc. of 14th ICALEPCS 2013, Melbourne, Australia (p. FRCOAAB02)

Slides TUCPA01 [10.507 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-TUCPA01

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA036

Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization

ion, operation, distributed, framework

1436

S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth
CERN, Geneva, Switzerland
M. Zenha-Rela
University of Coimbra, Coimbra, Portugal

Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.

Poster THPHA036 [0.280 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)

THPHA186

Parallel Execution of Sequential Data Analysis

ion, GUI, GPU, controls

1877

J.F.J. Murari, K. Klementiev
MAX IV Laboratory, Lund University, Lund, Sweden

The Parallel Execution of Sequential Data Analysis (ParSeq) software has been developed to work on large data sets of thousands spectra of a thousand points each. The main goal of this tool is to perform spectroscopy analysis without delays on the large amount of data that will be generated on Balder beamline at Max IV *. ParSeq was developed using Python and PyQt and can be operated via scripts or graphical user interface (GUI). The pipeline is consisted of nodes and transforms. Each node generally has a common group of components: data manager (also serves as legend), data combiner, metadata viewer, transform dialog, help panel and a plot window (from silx library **) as main element. The transforms connect nodes, applying the respective parameters in the active data. It is also possible to create cross-data linear combinations (e.g. averaging, RMS or PCA) and propagate them downstream. Calculations will be done with parallel execution on GPU. The GUI is very flexible and user-friendly, containing splitters, dock widgets, colormaps and undo/redo options. The features mentioned are missing in other analysis platforms what justifies the creation of ParSeq.
* Klementiev, K., et al. "The BALDER Beamline at the MAX IV Laboratory" Journal of Physics: Conference Series. IOP Publishing, 2016
** Scientific Library for eXperimentalists - http://www.silx.org/

Poster THPHA186 [0.407 MB]

DOI •

reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA186

Export •

reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)