Atlas Computing Alessandro De Salvo <[email protected]> Terzo workshop sul calcolo dell’INFN 5-2004 Outline Computing model Activities in 2004 Conclusions A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Data Rates per year Rate(Hz) sec/year Events/year Size(MB) Total(TB) Raw Data 200 1.00E+07 2.00E+09 1.6 3200 ESD (Event Summary Data) 200 1.00E+07 2.00E+09 0.5 1000 General ESD 180 1.00E+07 1.80E+09 0.5 900 General AOD (Analysis Object Data) 180 1.00E+07 1.80E+09 0.1 180 General TAG 180 1.00E+07 1.80E+09 0.001 2 Calibration 40 MC Raw 1.00E+08 2 200 ESD Sim 1.00E+08 0.5 50 AOD Sim 1.00E+08 0.1 10 TAG Sim 1.00E+08 0.001 0 Tuple 0.01 Nominal year: 107 s Accelerator efficiency: 50% A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Processing times Reconstruction Time/event for Reconstruction now: 60 kSI2k sec • We could recover a factor 4: • • factor 2 from running only one default algorithm factor 2 from optimization • Foreseen reference: 15 kSI2k sec/event Simulation Time/event for Simulation now: 400 kSI2k sec • We could recover a factor 4: • • factor 2 from optimization (work already in progress) factor 2 on average from the mixture of different physics processes (and rapidity ranges) • Foreseen reference: 100 kSI2k sec/event Number of simulated events needed: 108 events/year • Generate samples about 3-6 times the size of their streamed AOD samples A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Production/analysis model Central analysis Central production of tuples and TAG collections from ESD Estimate data reduction to 10% of full AOD • 0.5kSI2k per event (estimate), quasi real time 9MSI2k User analysis Tuples/streams analysis New selections Each user will perform 1/N of the MC non-central simulation load • • About 720Gb/group/annum analysis of WG samples and AOD private simulations Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape Assume this is all done on T2s DC2 will provide very useful informations in this domain A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Computing centers in Atlas Tiers defined by capacity and level of service Tier-0 (CERN) • • • • • Hold a copy of all raw data to tape Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing) Keep calibration data on disk Run first-pass calibration/alignment and reconstruction Distribute ESD’s to external Tier-1’s • Tier-1’s (at least 6): • • • • • • (1/3 to each one of 6 Tier-1’s) Regional centers Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s Keep on tape 1/6 of Raw Data Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions Provide facilities for physics group controlled ESD analysis Calibration and/or reprocessing of real data (one per year) Tier-2’s (about 4 per Tier-1) • • • • Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s Keep on disk a small selected sample of ESD’s Provide facilities (CPU and disk space) for user analysis and user simulation (~25 users/Tier-2) Run central simulation A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 R. Jones – Atlas Software Workshop may 2005 Tier-1 Requirements External T1 : Storage requirement Fraction Disk (TB) Tape (TB) General ESD (curr.) 429 150 1/3 General ESD (prev..) 214 150 1/6 AOD 257 180 1/1 TAG 3 2 1/1 RAW Data (sample) 6 533 1/6 0.0 33.3 1/6 ESD Sim (curr.) 23.8 8.3 1/3 ESD Sim (prev.) 11.9 8.3 1/6 14 10 1/1 0 0 1/1 171 120 1/3 1130 1195 RAW sim AOD Sim Tag Sim User Data (20 groups) Total Processing for Physics Groups 1760 kSI2k Reconstruction 588 kSI2k A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 R. Jones – Atlas Software Workshop may 2005 Tier-2 Requirements External T2 : Storage requirement Disk (TB) General ESD (curr.) Fraction Tape (TB) 26 0 1/50 0 18 1/50 AOD 64 0 1/4 TAG 3 0 1/1 ESD Sim (curr.) 1.4 0 1/50 ESD Sim (prev.) 0 1 1/50 AOD Sim 14 10 User Data (600/6/4=25) 37 26 146 57 General ESD (prev..) Total 1/1 Simulation 21 kSI2k Reconstruction 2 kSI2k Users 176 kSI2k Total: 199 kSI2k A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Tier 0/1/2 sizes Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones) R. Jones – Atlas Software Workshop may 2005 Scheduled CPU activity, 85% efficiency Chaotic CPU activity, 60% Disk usage, 70% efficient Tape assumed 100% efficient CERN T0+T1/2 All T1 (6) All T2 (24) Total Auto tape (Pb) 4.4 7.2 1.4 12.9 Shelf tape (Pb) 3.2 0.0 0.0 3.2 Disk (Pb) 1.9 6.8 3.5 12.2 CPU (MSI2k) 4.8 14.2 4.8 23.8 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Computing System PC (2004) = ~1 kSpecInt2k ~Pb/sec R. Jones – Atlas Software Workshop may 2005 Event Builder 10 GB/sec Event Filter ~159kSI2k •Some data for calibration and monitoring to institutess 450 Mb/sec •Calibrations flow back ~ 300MB/s/T1 /expt Tier 0 ~9 Pb/year/T1 No simulation T0 ~5MSI2k Tier 1 US Regional Centre Italian Regional Centre French Regional Centre UK Regional Centre (RAL) ~7.7MSI2k/T1 ~2 Pb/year/T1 622Mb/s Tier 2 622Mb/s LNF Physics data cache NA RM1 MI 100 - 1000 MB/s Workstations Desktop Tier2 Centre Tier2 Centre Tier2 Centre ~200kSI2k ~200kSI2k ~200kSI2k Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 ~200 Tb/year/T2 Atlas computing in 2004 “Collaboration” activities Data Challenge 2 • May-August 2004 • Real test of computing model for computing TDR (end 2004) • Simulation, reconstruction, analysis & calibration Combined test-beam activities • Combined test-beam operation concurrent with DC2 and using the same tools “Local” activities • • • • Single muon simulation (Rome1, Naples) Tau studies (Milan) Higgs production (LNF) Other ad-hoc productions A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Goals in 2004 DC2/test-beam Computing model studies Pile-up digitization in Athena Deployment of the complete Event Data Model and the Detector Description Simulation of full Atlas and 2004 Combined Testbeam Test of the calibration and alignment procedures Full use of Geant4, POOL and other LCG applications Use widely the GRID middleware and tools Large scale physics analysis Run as much as possible the production on GRID • Test the integration of multiple GRIDs “Local” activities • Run local, ad-hoc productions using the LCG tools A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 DC2 timescale September 03: Release7 Put in place, understand & validate: Slide from Gilbert Poulard Mid-November 03: preproduction release March 17th 04: (production) Testing and validation Testing and validation Release 8 May 17th 04: Run test-production Continuous testing of s/w components Improvements on Distribution/Validation Kit Start final validation Event generation ready Simulation ready Geant4; POOL; LCG applications Event Data Model Digitization; pile-up; byte-stream Conversion of DC1 data to POOL; large scale persistency tests and reconstruction Intensive test of “Production System” Data preparation Data transfer June 23rd 04: Reconstruction ready July 15th 04: Tier 0 exercise August 1st Physics and Computing model studies Analysis (distributed) Reprocessing Alignment & calibration A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 DC2 resources Process No. of events Time duration CPU power Volume of data At CERN Off site months kSI2k TB TB TB Simulation 107 2 1000 20 4 16 Phase I RDO 107 2 100 20 4 16 (May-June-July) Pile-up Digitization 107 2 100 30 30 24 Event mixing & Byte-stream 107 2 (small) 20 20 0 Total Phase I 107 2 1200 90 58 56 Reconstruction Tier-0 107 0.5 600 5 5 10 Reconstruction Tier-1 107 2 600 5 0 5 Total 107 100 63 71 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Phase II (>July) Tiers in DC2 Country “Tier-1” More than 23 countries involved Grid kSI2k (ATLAS DC) Australia NG 12 Austria LCG 7 Canada Sites TRIUMF 7 LCG 331 CERN 1 LCG 700 China LCG 30 Czech Republic LCG 25 CERN France Germany CCIN2P3 1 LCG ~ 140 GridKa 3 LCG 90 LCG 10 Greece Israel Italy Japan Netherlands NorduGrid LCG 23 CNAF 5 LCG 200 Tokyo 1 LCG 127 NIKHEF 1 LCG 75 NG 30 NG 380 Poland LCG 80 Russia LCG ~ 70 Slovakia LCG Slovenia NG Spain PIC 4 Switzerland Taiwan ASTW 1 LCG 50 LCG 18 LCG 78 UK RAL 8 LCG ~ 1000 US BNL 28 Grid3/LCG ~ 1000 A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 DC2 tools Installation tools Atlas software distribution kit Validation suite Production system Atlas production system interfaced to LCG, US-Grid, NorduGrid and legacy systems (batch systems) Tools • • • • • Production management Data management Cataloguing Bookkeping Job submission GRID distributed analysis • ARDA domain: test services and implementations A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Software installation Software installation and configuration via PACMAN Relocatable, multi-release distribution No root privileges needed to install GRID-enabled installation RedHat 7.3 >= 512 MB of RAM Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release Kit installation Building scripts (Deployment package) Built in about 3 hours, after the release is built Kit requirements Pacman packages (tarballs) Kit creation A site is marked as validated after the installed software is checked with the validation tools Distribution format Grid installation via submission of a job to the destination sites Software validation tools, integrated with the GRID installation procedure Full use of the Atlas Code Management Tool (CMT) pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease Documentation (building, installing and using) http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Production System components Production database Supervisor (Windmill) Oracle based Hold definition for the job transformations Hold sensible data on the jobs life cycle Consumes jobs from the production database Dispatch the work to the executors Collect info on the job life-cycle Interact with the DMS for data registration and movements among the systems Executor One for each grid falvour and legacy system • • • • Communicates with the supervisor Executes the jobs to the specific subsystems • • • LCG (Lexor) NorduGrid (Dulcinea) US Grid (Capone) LSF Flavour-neutral job definitions are specialized for the specific needs Submit to the GRID/legacy system Provide access to GRID flavour specific tools Data Management System (Don Quijote) Global cataloguing system Allows global data management Common interface on top of the system-specific facilities A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Production System architecture Task = [job]* Dataset = [partition]* Data Management System JOB DESCRIPTION Location Hint (Task) Task (Dataset) Task Transf. Definition + physics signature Human intervention Job Run Info Supervisor 1 Jabber US Grid Executer Location Hint (Job) Job (Partition) Supervisor 2 Jabber LCG Executer Partition Transformation infos Transf. Release version Definition signature Supervisor 3 Jabber NG Executer Supervisor 4 Jabber LSF Executer RB Chimera RB US Grid LCG NG Local Batch A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 DC2 status DC2 first phase started May 3rd Test the production system Start the event generation/simulation tests Full production should start next week Full use of the 3 GRIDs and legacy systems DC2 jobs will be monitored via GridICE and an ad-hoc monitoring system, interfaced to the production DB and the production systems A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Computing & INFN (1) Responsibles & managers D. Barberis • Genova, inizialmente membro del Computing Steering Group come responsabile del software dell.Inner Detector, ora ATLAS Computing Coordinator G. Cataldi • Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore • Roma1, responsabile TDAQ/LVL2 • Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction Coordinator e coordinatore del software per il Combined Test Beam S. Falciano A. Farilla L. Luminari A. Nisati • Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia • Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board • Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE • Pavia, Atlas Physics Coordinator • Pavia, ATLAS Simulation Coordinator e membro del Software Project Management Board • Pavia, PESA Coordinator e membro del Computing Managament Board L. Perini G. Polesello A. Rimoldi V. Vercesi A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Atlas Computing & INFN (2) Atlas INFN sites LCG compliant for DC2 Tier-1 • CNAF (G.Negri) Tier-2 • • • • Frascati (M. Ferrer) Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa) Naples (G. Carlino, A. Doria, L. Merola) Rome1 (A. De Salvo, A. Di Mattia, L. Luminari) Activities Development of the LCG interface to the Atlas Production Tool • Participation to the DC2 using the GRID middleware (May - July 2004) Local productions with GRID tools Atlas VO management (A. De Salvo) Atlas code distribution (A. De Salvo) • • F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa Atlas code distribution model (PACMAN based) fully deployed The current installation system/procedure gives the possibility to have easily the cohexistence of the Atlas software and other experiments’ environment Atlas distribution kit validation (A. De Salvo) Transformations for DC2 (A. De Salvo) A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004 Conclusions First real test of the Atlas computing model is starting DC2 tests started at the beginning of May “Real” production starting in June Will give important informations for the Computing TDR Very intensive use of the GRIDs Atlas Production System interfacted to LCG, NG and US Grid (GRID3) Global data management system Getting closer to the real experiment computing model A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004