Atlas Computing
Alessandro De Salvo <[email protected]>
Terzo workshop sul calcolo dell’INFN 5-2004
Outline
 Computing
model
 Activities in 2004
 Conclusions
A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Data Rates per year
Rate(Hz)
sec/year
Events/year
Size(MB)
Total(TB)
Raw Data
200
1.00E+07
2.00E+09
1.6
3200
ESD (Event Summary Data)
200
1.00E+07
2.00E+09
0.5
1000
General ESD
180
1.00E+07
1.80E+09
0.5
900
General AOD (Analysis Object Data)
180
1.00E+07
1.80E+09
0.1
180
General TAG
180
1.00E+07
1.80E+09
0.001
2
Calibration
40
MC Raw
1.00E+08
2
200
ESD Sim
1.00E+08
0.5
50
AOD Sim
1.00E+08
0.1
10
TAG Sim
1.00E+08
0.001
0
Tuple
0.01
Nominal year: 107 s
Accelerator efficiency: 50%
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Processing times
 Reconstruction
 Time/event for Reconstruction now: 60 kSI2k sec
• We could recover a factor 4:
•
•
factor 2 from running only one default algorithm
factor 2 from optimization
• Foreseen reference: 15 kSI2k sec/event
 Simulation
 Time/event for Simulation now: 400 kSI2k sec
• We could recover a factor 4:
•
•
factor 2 from optimization (work already in progress)
factor 2 on average from the mixture of different physics processes (and rapidity
ranges)
• Foreseen reference: 100 kSI2k sec/event
 Number of simulated events needed: 108 events/year
• Generate samples about 3-6 times the size of their streamed AOD samples
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Production/analysis model

Central analysis


Central production of tuples and TAG collections from ESD
Estimate data reduction to 10% of full AOD
•


0.5kSI2k per event (estimate), quasi real time  9MSI2k
User analysis



Tuples/streams analysis
New selections
Each user will perform 1/N of the MC non-central simulation load
•
•



About 720Gb/group/annum
analysis of WG samples and AOD
private simulations
Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape
Assume this is all done on T2s
DC2 will provide very useful informations in this domain
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Computing centers in Atlas

Tiers defined by capacity and level of service

Tier-0 (CERN)
•
•
•
•
•
Hold a copy of all raw data to tape
Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing)
Keep calibration data on disk
Run first-pass calibration/alignment and reconstruction
Distribute ESD’s to external Tier-1’s
•

Tier-1’s (at least 6):
•
•
•
•
•
•

(1/3 to each one of 6 Tier-1’s)
Regional centers
Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s
Keep on tape 1/6 of Raw Data
Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions
Provide facilities for physics group controlled ESD analysis
Calibration and/or reprocessing of real data (one per year)
Tier-2’s (about 4 per Tier-1)
•
•
•
•
Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s
Keep on disk a small selected sample of ESD’s
Provide facilities (CPU and disk space) for user analysis and user simulation (~25
users/Tier-2)
Run central simulation
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
R. Jones – Atlas Software Workshop may 2005
Tier-1 Requirements
External T1 : Storage requirement
Fraction
Disk (TB)
Tape (TB)
General ESD (curr.)
429
150
1/3
General ESD (prev..)
214
150
1/6
AOD
257
180
1/1
TAG
3
2
1/1
RAW Data (sample)
6
533
1/6
0.0
33.3
1/6
ESD Sim (curr.)
23.8
8.3
1/3
ESD Sim (prev.)
11.9
8.3
1/6
14
10
1/1
0
0
1/1
171
120
1/3
1130
1195
RAW sim
AOD Sim
Tag Sim
User Data (20 groups)
Total
Processing for Physics Groups 1760 kSI2k
Reconstruction 588 kSI2k
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
R. Jones – Atlas Software Workshop may 2005
Tier-2 Requirements
External T2 : Storage requirement
Disk (TB)
General ESD (curr.)
Fraction
Tape (TB)
26
0
1/50
0
18
1/50
AOD
64
0
1/4
TAG
3
0
1/1
ESD Sim (curr.)
1.4
0
1/50
ESD Sim (prev.)
0
1
1/50
AOD Sim
14
10
User Data (600/6/4=25)
37
26
146
57
General ESD (prev..)
Total
1/1
Simulation 21 kSI2k
Reconstruction 2 kSI2k
Users 176 kSI2k
Total: 199 kSI2k
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Tier 0/1/2 sizes

Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones)



R. Jones – Atlas Software Workshop may 2005

Scheduled CPU activity, 85% efficiency
Chaotic CPU activity, 60%
Disk usage, 70% efficient
Tape assumed 100% efficient
CERN
T0+T1/2
All T1
(6)
All T2
(24)
Total
Auto tape (Pb)
4.4
7.2
1.4
12.9
Shelf tape (Pb)
3.2
0.0
0.0
3.2
Disk (Pb)
1.9
6.8
3.5
12.2
CPU (MSI2k)
4.8
14.2
4.8
23.8
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing System
PC (2004) = ~1 kSpecInt2k
~Pb/sec
R. Jones – Atlas Software Workshop may 2005
Event Builder
10 GB/sec
Event Filter
~159kSI2k
•Some data for calibration and
monitoring to institutess
450 Mb/sec
•Calibrations flow back
~ 300MB/s/T1 /expt
Tier 0
~9 Pb/year/T1
No simulation

T0 ~5MSI2k

Tier 1
US Regional
Centre
Italian Regional
Centre
French Regional
Centre

UK Regional
Centre (RAL)

~7.7MSI2k/T1
~2 Pb/year/T1
622Mb/s
Tier 2
622Mb/s
LNF
Physics data
cache
NA
RM1
MI
100 - 1000
MB/s
Workstations
Desktop
Tier2 Centre
Tier2 Centre
Tier2 Centre
~200kSI2k ~200kSI2k ~200kSI2k
Each Tier 2 has ~25 physicists working on
one or more channels
Each Tier 2 should have the full AOD, TAG &
relevant Physics Group summary data
Tier 2 do bulk of simulation
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004

~200 Tb/year/T2
Atlas computing in 2004

“Collaboration” activities

Data Challenge 2
• May-August 2004
• Real test of computing model for computing TDR (end 2004)
• Simulation, reconstruction, analysis & calibration
 Combined test-beam activities
• Combined test-beam operation concurrent with DC2
and using the same tools
 “Local” activities
•
•
•
•
Single muon simulation (Rome1, Naples)
Tau studies (Milan)
Higgs production (LNF)
Other ad-hoc productions
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Goals in 2004

DC2/test-beam









Computing model studies
Pile-up digitization in Athena
Deployment of the complete Event Data Model and the Detector Description
Simulation of full Atlas and 2004 Combined Testbeam
Test of the calibration and alignment procedures
Full use of Geant4, POOL and other LCG applications
Use widely the GRID middleware and tools
Large scale physics analysis
Run as much as possible the production on GRID
• Test the integration of multiple GRIDs
 “Local” activities
• Run local, ad-hoc productions using the LCG tools
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 timescale

September 03: Release7

Put in place, understand & validate:
Slide from Gilbert Poulard






Mid-November 03: preproduction release
March 17th 04:
(production)

Testing and validation

Testing and validation

Release 8


May 17th 04:
Run test-production
Continuous testing of s/w components
Improvements on Distribution/Validation Kit

Start final validation


Event generation ready
Simulation ready


Geant4; POOL; LCG applications
Event Data Model
Digitization; pile-up; byte-stream
Conversion of DC1 data to POOL; large scale persistency
tests and reconstruction


Intensive test of “Production System”
Data preparation
Data transfer

June 23rd 04:

Reconstruction ready

July 15th 04:

Tier 0 exercise

August 1st

Physics and Computing model studies



Analysis (distributed)
Reprocessing
Alignment & calibration
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 resources
Process
No. of
events
Time
duration
CPU
power
Volume
of data
At
CERN
Off
site
months
kSI2k
TB
TB
TB
Simulation
107
2
1000
20
4
16
Phase I
RDO
107
2
100
20
4
16
(May-June-July)
Pile-up
Digitization
107
2
100
30
30
24
Event mixing &
Byte-stream
107
2
(small)
20
20
0
Total Phase I
107
2
1200
90
58
56
Reconstruction
Tier-0
107
0.5
600
5
5
10
Reconstruction
Tier-1
107
2
600
5
0
5
Total
107
100
63
71
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Phase
II
(>July)
Tiers in DC2
Country
“Tier-1”
More than 23 countries involved
Grid
kSI2k (ATLAS DC)
Australia
NG
12
Austria
LCG
7
Canada
Sites
TRIUMF
7
LCG
331
CERN
1
LCG
700
China
LCG
30
Czech Republic
LCG
25
CERN
France
Germany
CCIN2P3
1
LCG
~ 140
GridKa
3
LCG
90
LCG
10
Greece
Israel
Italy
Japan
Netherlands
NorduGrid
LCG
23
CNAF
5
LCG
200
Tokyo
1
LCG
127
NIKHEF
1
LCG
75
NG
30
NG
380
Poland
LCG
80
Russia
LCG
~ 70
Slovakia
LCG
Slovenia
NG
Spain
PIC
4
Switzerland
Taiwan
ASTW
1
LCG
50
LCG
18
LCG
78
UK
RAL
8
LCG
~ 1000
US
BNL
28
Grid3/LCG
~ 1000
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 tools

Installation tools



Atlas software distribution kit
Validation suite
Production system


Atlas production system interfaced to LCG, US-Grid, NorduGrid and
legacy systems (batch systems)
Tools
•
•
•
•
•

Production management
Data management
Cataloguing
Bookkeping
Job submission
GRID distributed analysis
•
ARDA domain: test services and implementations
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Software installation

Software installation and configuration via PACMAN




Relocatable, multi-release distribution
No root privileges needed to install
GRID-enabled installation




RedHat 7.3
>= 512 MB of RAM

Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release
Kit installation


Building scripts (Deployment package)
Built in about 3 hours, after the release is built
Kit requirements


Pacman packages (tarballs)
Kit creation


A site is marked as validated after the installed software is checked with the validation tools
Distribution format


Grid installation via submission of a job to the destination sites
Software validation tools, integrated with the GRID installation procedure


Full use of the Atlas Code Management Tool (CMT)
pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease
Documentation (building, installing and using)

http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Production System components

Production database




Supervisor (Windmill)





Oracle based
Hold definition for the job transformations
Hold sensible data on the jobs life cycle
Consumes jobs from the production database
Dispatch the work to the executors
Collect info on the job life-cycle
Interact with the DMS for data registration and movements among the systems
Executor

One for each grid falvour and legacy system
•
•
•
•


Communicates with the supervisor
Executes the jobs to the specific subsystems
•
•
•

LCG (Lexor)
NorduGrid (Dulcinea)
US Grid (Capone)
LSF
Flavour-neutral job definitions are specialized for the specific needs
Submit to the GRID/legacy system
Provide access to GRID flavour specific tools
Data Management System (Don Quijote)



Global cataloguing system
Allows global data management
Common interface on top of the system-specific facilities
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Production System architecture
Task = [job]*
Dataset = [partition]*
Data
Management
System
JOB DESCRIPTION
Location
Hint
(Task)
Task
(Dataset)
Task
Transf.
Definition
+ physics signature
Human
intervention
Job
Run Info
Supervisor 1
Jabber
US Grid
Executer
Location
Hint
(Job)
Job
(Partition)
Supervisor 2
Jabber
LCG
Executer
Partition Transformation infos
Transf. Release version
Definition signature
Supervisor 3
Jabber
NG
Executer
Supervisor 4
Jabber
LSF
Executer
RB
Chimera
RB
US Grid
LCG
NG
Local Batch
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 status
 DC2 first phase started May 3rd
 Test the production system
 Start the event generation/simulation tests
 Full production should start next week
 Full use of the 3 GRIDs and legacy systems

DC2 jobs will be monitored via GridICE
and an ad-hoc monitoring system, interfaced to the
production DB and the production systems
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing & INFN (1)

Responsibles & managers

D. Barberis
•

Genova, inizialmente membro del Computing Steering Group come responsabile del software
dell.Inner Detector, ora ATLAS Computing Coordinator
G. Cataldi
•
Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore
•
Roma1, responsabile TDAQ/LVL2
•
Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction
Coordinator e coordinatore del software per il Combined Test Beam

S. Falciano

A. Farilla

L. Luminari

A. Nisati
•
Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia
•
Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board
•
Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE
•
Pavia, Atlas Physics Coordinator
•
Pavia, ATLAS Simulation Coordinator e membro del Software Project Management Board
•
Pavia, PESA Coordinator e membro del Computing Managament Board

L. Perini

G. Polesello
 A. Rimoldi

V. Vercesi
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing & INFN (2)

Atlas INFN sites LCG compliant for DC2
 Tier-1
•
CNAF (G.Negri)
 Tier-2
•
•
•
•

Frascati (M. Ferrer)
Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa)
Naples (G. Carlino, A. Doria, L. Merola)
Rome1 (A. De Salvo, A. Di Mattia, L. Luminari)
Activities
 Development of the LCG interface to the Atlas Production Tool
•




Participation to the DC2 using the GRID middleware (May - July 2004)
Local productions with GRID tools
Atlas VO management (A. De Salvo)
Atlas code distribution (A. De Salvo)
•
•


F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa
Atlas code distribution model (PACMAN based) fully deployed
The current installation system/procedure gives the possibility to have easily the cohexistence of
the Atlas software and other experiments’ environment
Atlas distribution kit validation (A. De Salvo)
Transformations for DC2 (A. De Salvo)
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Conclusions

First real test of the Atlas computing model is starting



DC2 tests started at the beginning of May
“Real” production starting in June
Will give important informations for the Computing TDR
 Very intensive use of the GRIDs
 Atlas Production System interfacted to LCG, NG and US Grid
(GRID3)
 Global data management system

Getting closer to the real experiment computing model
A. De Salvo – Atlas Computing – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Scarica

Atlas Computing - A. De Salvo – INFN Workshop CCR 27-2004