Big science needs big computing: the WLCG

Fresh issue


Accelerating particles around a 27km ring close to the speed of light, then colliding them 800 million times a second is no mean feat. Only one machine in the world is capable of this job – the Large Hadron Collider (LHC), a particle accelerator built under the French/Swiss border. The unique nature of the LHC attracts global interest and the 8,000 particle high energy physicists who work on its experiments are from almost every nation on the planet.

Alongside the actual accelerator there are four large detectors - ALICEATLASCMS andLHCb - and two smaller ones - LHCf and TOTEM. Combined they will produce over 15 petabytes of data each year. Big science and even bigger stacks of data means big computing needs.

The Worldwide LHC Computing Grid (WLCG) was set up as a global collaboration linking grid infrastructures and computer centres around the world to handle the data deluge pouring out of the LHC. The solution was to build a grid infrastructure to allow the LHC data to be stored, analysed and shared by multiple, geographically distributed research groups. EGI is a major partner in the WLCG, contributing over a billion hours of CPU time to the project in the last 12 months. But the partnership started well before the LHC was switched on in 2008. EGI and its precursor EGEE (the Enabling Grids for E-sciencE projects) have been working closely with the physicists from the very beginning to help integrate grid services with the unique demands of the LHC.


WLCG tools


The LHC community is not a homogeneous group and it’s not working on a single problem. Every experiment running on the LHC uses different detectors and has different requirements. And because the LHC scientists were the pioneers of grid computing in Europe, given their unprecedented need for computing power, the experiment groups have created many bespoke grid applications from the ground up. Some of the tools they have created are now central to the infrastructure that EGI provides and available to other scientific communities to deploy for their own use.

The following is a list of tools developed or ported by the WLCG for the European Grid Infrastructure:

AliEn (ALICE Environment) is a complete grid framework, including front end and data management. The ALICE experiment has developed AliEn to access the grid using open source components. It provides a command line interface for the grid as well as file and metadata catalogues, task queue and package manager. It utilises Perl as a scripting language and supports cryptography, SOAP and easy web integration. The project uses object orientated programming in C++. For ALICE it uses the ROOT framework and GEANT4. ALIEn has also been used by other projects including mammogrid - a grid-enabled European database of mammograms to support co-working between healthcare professionals across the EU.
 DIRAC DIRAC (Distributed Infrastructure with Remote Agent Control) is a generic solution for workload and data management tasks on the grid, developed by the LHCb experiment. It has LHC-specific plugins but they do not need to be installed. It does not have a built in front-end but works with Ganga (see below).
Ganga Ganga is an easy-to-use front-end for job definition and management by the user, developed jointly by the ATLAS and LHCb experiments. It is implemented in Python and includes built-in support for configuring and running applications on a local batch system or a grid infrastructure. As a front-end it does not manage workload or data once on the grid. To do this it can use the standard grid Workload Management System, PANDA (see below) or DIRAC (see above). Ganga is compatible with EGI, as well as with the Open Science Grid in the USA and Nordugrid. (AppDB entry)
Geant4 Geant4 is a toolkit for the simulation of the passage of particles through matter. It is used extensively by all the LHC experiments but has applications in nuclear physics, medical research and space science. Geant4 is compatible with AliEn, Ganga, DIRAC, PANDA and CRAB.
PANDA Developed by ATLAS, PANDA is a workload and data management framework, which allows the use of pilot jobs to secure suitable job slots for future analysis. It works with both the EGI and Open Science Grid infrastructures.
 ROOT ROOT is a framework for managing data on various computing infrastructures, used extensively by the LHC experiments. It uses object orientated programming and uses C++. It is used in/by AliEn, DIRAC and PANDA. (AppDB entry)