Author: Neufeld, N.
Paper Title Page
TUPPC063 Control and Monitoring of the Online Computer Farm for Offline Processing in LHCb 721
 
  • L.G. Cardoso, P. Charpentier, J. Closier, M. Frank, C. Gaspar, B. Jost, G. Liu, N. Neufeld
    CERN, Geneva, Switzerland
  • O. Callot
    LAL, Orsay, France
 
  LHCb, one of the 4 experiments at the LHC accelerator at CERN, uses approximately 1500 PCs (averaging 12 cores each) for processing the High Level Trigger (HLT) during physics data taking. During periods when data acquisition is not required most of these PCs are idle. In these periods it is possible to profit from the unused processing capacity to run offline jobs, such as Monte Carlo simulation. The LHCb offline computing environment is based on LHCbDIRAC (Distributed Infrastructure with Remote Agent Control). In LHCbDIRAC, job agents are started on Worker Nodes, pull waiting tasks from the central WMS (Workload Management System) and process them on the available resources. A Control System was developed which is able to launch, control and monitor the job agents for the offline data processing on the HLT Farm. This control system is based on the existing Online System Control infrastructure, the PVSS SCADA and the FSM toolkit. It has been extensively used launching and monitoring 22.000+ agents simultaneously and more than 850.000 jobs have already been processed in the HLT Farm. This paper describes the deployment and experience with the Control System in the LHCb experiment.  
poster icon Poster TUPPC063 [2.430 MB]  
 
THCOBA01 Evolution of the Monitoring in the LHCb Online System 1408
 
  • C. Haen, E. Bonaccorsi, N. Neufeld
    CERN, Geneva, Switzerland
 
  The LHCb online system relies on a large and heterogeneous I.T. infrastructure : it comprises more than 2000 servers and embedded systems and more than 200 network devices. The low level monitoring of the equipment was originally done with Nagios. In 2011, we replaced the single Nagios instance with a distributed Icinga setup presented at ICALEPCS 2011. This paper will present with more hindsight the improvements we observed, as well as problems encountered. Finally, we will describe some of our prospects for the future after the Long Shutdown period, namely Shinken and Ganglia.  
slides icon Slides THCOBA01 [1.426 MB]  
 
THCOBA05 Control System Virtualization for the LHCb Online System 1419
 
  • E. Bonaccorsi, L. Granado Cardoso, N. Neufeld
    CERN, Geneva, Switzerland
  • F. Sborzacchi
    INFN/LNF, Frascati (Roma), Italy
 
  Virtualization provides many benefits such as more efficiency in resource utilization, less power consumption, better management by centralized control and higher availability. It can also save time for IT projects by eliminating dedicated hardware procurement and providing standard software configurations. In view of this virtualization is very attractive for mission-critical projects like the experiment control-system (ECS) of the large LHCb experiment at CERN. This paper describes our implementation of the control system infrastructure on a general purpose server-hardware based on Linux and the RHEV enterprise clustering platform. The paper describes the methods used , our experiences and the knowledge acquired in evaluating the performance of the setup using test systems, constraints and limitations we encountered. We compare these with parameters measured under typical load conditions in a real production system. We also present the specific measures taken to guarantee optimal performance for the SCADA system (WinCC OA), which is the back-bone of our control system.  
slides icon Slides THCOBA05 [1.065 MB]