A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z  

Neufeld, N.

Paper Title Page
TUP001 Monitoring the LHCb Experiment Computing Infrastructure with NAGIOS 96
 
  • E. Bonaccorsi, N. Neufeld
    CERN, Geneva
 
  LHCb has a large and complex infrastructure consisting of thousands of servers and embedded computers, hundreds of network devices and a lot of shared infrastructure services such as shared storage, login and time services, databases and many more. All operationallly critical aspects are integrated into the standard Experiment Control System based on PVSSII. This enables non-expert operators to do first-line reactions. At the lower level and in particular for monitoring the infrastructure the Control System itself depends on a secondary infrastructure based on the industry standard NAGIOS has been put in place. We present the design and implementation of the fabric management based on NAGIOS. Care has been taken to complement rather than duplicate functionality available in the Experiment Control System.  
TUP002 Software Management in the LHCb Online System 99
 
  • N. Neufeld, E. Bonaccorsi, L. Brarda, J. Closier, G. Moine
    CERN, Geneva
  • H. Degaudenzi
    EPFL, Lausanne
 
  LHCb has a large online IT infrastructure with thousands of servers and embedded systems, network routers and switches, databases and storage appliances. These systems run a large number of different applications on various operating systems. The dominant operating systems are Linux and MS-Windows. This large heterogenous environment, operated by a small number of administrators, requires that new software or updates can be pushed quickly, reliably and as automated as possible. We present here the general design of LHCb's software management along with the main tools: LinuxFC / Quattor and Microsoft SMS, how they have been adapted and integrated and discuss experiences and problems.  
poster icon Poster  
WEA001 Bringing the Power of Dynamic Languages to Hardware Control Systems 358
 
  • J. M. Caicedo Carvajal, N. Neufeld, R. Stoica
    CERN, Geneva
 
  Funding: Marie Curie Early Stage Research Training Fellowship of the European Community's Sixth Framework Programme under contract number MEST-CT-2005-020216-ELACCO

Hardware control systems are normally programmed using high-performance languages like C or C++ and increasingly also Java. All these languages are strongly typed and compiled which brings usually good performance but at the cost of a longer development and testing cycle and the need for more programming expertise. Dynamic languages which were long thought to be too slow or not powerful enough for control purposes are, thanks to modern powerful computers and advanced implementation techniques, fast enough for many of these tasks. We present examples from the LHCb Experiment Control System (ECS), which is based on a commercial SCADA software (PVSS II). We have successfully used Python to integrate hardware devices into the ECS. We present the necessary lightweight middle-ware we have developed, including examples for controlling hardware and software devices. We also discuss the development cycle, tools used and compare the effort to traditional solutions.

 
slides icon Slides  
WED004 Management of the LHCb Online Network Based on SCADA System 621
 
  • G. Liu, N. Neufeld
    CERN, Geneva
 
  LHCb employs two large networks based on Ethernet. One is a data network dedicated for data acquisition, the other one is a control network which connects all devices in LHCb. Sophisticated monitoring of both networks at all levels is essential for the successful operation of the experiment. LHCb uses a commercial SCADA system (PVSSII) for its Experiment Control System (ECS). For the consistency and efficiency reason, the network management system is implemented in the same framework. We show here how a large scale network can be monitored and managed using tools originally made for industrial supervisory control, and discuss several tools developed to facilitate the integration of network management and monitoring in LHCb's control system. In the network management system, the status of the network is monitored at different levels, including the application level, the devices, the ports and the connectivities. Alarms can be issued to inform the experiment operators and online network experts about errors such as dropped packets or broadcast-storms. Reports and long-term monitoring are possible by using powerful trending tools.  
WED005 Implementing High Availability with COTS Components and Open-source Software 624
 
  • R. Schwemmer, N. Neufeld
    CERN, Geneva
 
  High Availability of IT services is essential for the successful operation of large experimental facilities such as the LHC experiments. In the past, high availability was often taken for granted and/or ensured by using very expensive high-end hardware based on proprietary, single-vendor solutions. Today's IT infrastructure in HEP is usually a heterogeneous environment of cheap, of the shelf components which usually have no intrinsic failure tolerance and can thus not be considered reliable at all. Many services, in particular networked services like the Domain Name service, shared storage and databases need to run on this unreliable hardware, while they are indispensable for the operation of today's control systems. We present our approach to this problem which is based on a combination of open-source tools, such as the Linux High Availability Project and home-made tools to ensure high-availability for the LHCb Experiment Control system, which consists of over 200 servers, several hundred switches and is controlling thousands of devices ranging from custom made devices, connected to the LAN, to the servers of the event-filter farm.  
slides icon Slides