Calcolo e software di Atlas
ultime rilevanti novità
Eseguito felicemente DC1-fase1 distribuito
Deciso impegno in Grid per fase 2
EDG ora concentrato su Atlas, primi buoni risultati
18-9-2002
CSN1-Catania
L.Perini
1
Disclaimer
• Non faccio ora overview complessiva per Atlas
• Recenti sviluppi molto positivi, racconto quelli, e
ruolo INFN in essi
• Recente Atlas EB (6-sett) con discussione calcolo,
recente conf. EDG a Budapest (1-5 sett), recente
meeting Atlas-EDG task force
– Le slides che presento largamente riciclate da queste
occasioni, con opportune “cuciture” e aggiunte INFN
18-9-2002
CSN1-Catania
L.Perini
2
DC1 preparation: datasets
• Datasets
– Few set of data samples: full list
• “validation” : Run Atrecon and look at standard plots
– “old” known data sets; new ones; single particles
• High statistics
• Medium statistics
– Priorities defined
– Data sets allocated to the participating centres
• Documentation
– Web pages
18-9-2002
CSN1-Catania
L.Perini
3
DC1-1 e parte INFN
• DC1-1 praticamente fatto:
– CPUs Roma1 46, CNAF 40, Milano 20, Napoli 16, LNF 10
• 2 106 dei 1.5 107 dijets > 17 GeV suddivisi fra tutti ( fattore
filtro=9): Fatto
– 8 105 Roma1, 2 Napoli, 10 fra CNAF e Milano (~7 e 3)
• 2.7 107 single muons Roma1 Fatto
• 5 105 dijets > 130 GeV (filtro= 42%) ~Fatto
– Approx. 40% Milano e 60% CNAF
• Inoltre LNF 25K ev. Muon Phys e Napoli 5K
• Bulk del lavoro fra ultima set. Luglio e 10 Settembre!
• INFN circa 1.5 TB output su 22.9 totali (in linea con CPU:
5700/88000 SI95) 6.5%: ma INFN=10% ATLAS.
– Tot.Atlas ~2M ore (CPU 500MH), INFN 130K con 130 CPU
– Bisogna equilibrare per il 2003!
18-9-2002
CSN1-Catania
L.Perini
4
Data Samples I

Validation samples



single particles (e, g, m, p), jet scans, Higgs events
see Jean-Francois Laporte’s talk
Single-particle production





different energies (E=5, 10, …, 200, 1000 GeV)
fixed h points; h scans (|h|<2.5); h crack scans (1.3<h<1.8)
standard beam spread (sz=5.6 cm);
fixed vertex z-components (z=0, 4, 10 cm)
Minimum-bias production

(30 million events)
single p (low pT; pT=1000 GeV with 2.8<h<3.2)
single m (pT=3, …, 100 GeV)
single e and g


(740k events)
(1 million events)
different h regions (|h|<3, 5, 5.5, 7)
Armin
NAIRZ
18-9-2002
ATLAS Software
Workshop,L.Perini
RHUL, September 16-20, 2002
CSN1-Catania
5
3
Data Samples II

QCD di-jet production

different cuts on ET(hard scattering) during generation




(5.2 million events)
large production of ET>11, 17, 25, 55 GeV samples,
applying particle-level filters
large production of ET>17, 35 GeV samples, without filtering,
full simulation within |h|<5
smaller production of ET>70, 140, 280, 560 GeV samples
Physics events requested by various HLT groups
(e/g, Level-1, jet/ETmiss, B-physics, b-jet, m; 4.4 million events)


large samples for the b-jet trigger simulated with
default (3 pixel layers) and staged (2 pixel layers) layouts
B-physics (PL) events taken from old TDR tapes
Armin
NAIRZ
18-9-2002
ATLAS Software
Workshop,L.Perini
RHUL, September 16-20, 2002
CSN1-Catania
6
4
Data produced
Sample
Type
Validation
Single
0.7 M
758
53 GB
Physics
44 K
163
57 GB
Single
27 M
530
540 GB
Physics
6.2 M
40635
13.2 TB
Single
2.7 M
9502
119 GB
Physics
4.4 M
25095
9.0 TB
Single
30.4 M
10790
0.7 GB
Physics
10.5 M
65893
22.2 TB
L.Perini
As of ~30
August
76683
High Statistics
Medium Statistics
Total
1ncu
Grand
Total ~ 1 PIII 500 MHz
18-9-2002
Events
CSN1-Catania
CPU
NCU days
Storage
722.9
Participation
to DC1/Phase 1
Country
City
Maximum number
of machines
SI95
Australia
Melbourne
24
1008
Austria
Innsbruck
2
185
Canada
Alberta, CERN
185
8825
CERN
500
18900
40
1470
Czech Republic
Prague
Denmark
Copenhagen
France
Lyon
Germany
Karlsruhe
140
6972
Israel
Weizmann
74
3231
Italy
Bologna
Milano, Napoli, Roma
40
80
2201
3058
Japan
Tokyo
78
4586
Normay
Bergen
Russia
Dubna, Moscow, Protvino
115
4329
Spain
Valencia
100
5376
Taiwan
Taipei
48
1984
UK
Birmingham, Cambridge, Glasgow,
Lancaster, Liverpool
RAL
300
4410
5880
Arlington, Oklahoma
BNL
CSN1-Catania
LBNL
37
100
800
USA
18-9-2002
L.Perini
100
991
3780
11172
8
Uso della Farm di Atlas a Roma1(A. De
Salvo): segue CNAF e Mi in % delle 40 e 20 CPU’s
rispettive (G.Negri)
Numero di hosts e di processori in linea
nella farm in funzione del tempo. In
totale le CPU in linea nella farm sono
state 52 a partire dall'inizio di luglio. I
due server principali di disco e il server
di monitoring, per un totale di 6 CPUs,
non vengono usati dal sistema di code.
Numero di processori
disponibili (slots, in verde) e di
processori utilizzati (in blu)
attraverso il sistema di code
batch della farm in funzione
del tempo.
18-9-2002
CSN1-Catania
L.Perini
9
18-9-2002
CSN1-Catania
L.Perini
10
18-9-2002
CSN1-Catania
L.Perini
11
ATLAS DC1 Phase 1 : July-August 2002
Goals : Produce the data needed for the HLT TDR
Get as many ATLAS institutes involved as possible
• Samples done (< 3% job failures)
–50*106 events generated
–10*106 events, after filtering, simulated with GEANT3
–31*106 single particles simulated
Ora a metà settembre DC1-1 concluso, avendo prodotto tutti gli high priority, e la
gran parte die medium priority samples. Alcuni medium e parte die low
continuano in modo asincrono in specifici siti
Gran parte del lavoro fatto in ultima settimana luglio, agosto, primi giorni di
settembre: circa 40 giorni per un totale di 110kSpI95*mese
Nordugrid usata fra Berger, Copenhagen, Stoccolma; US Grid tools fra LBNL,
Oklaoma, Arlington, per 200k Ev e 30k ore CPU su 3 settimane con storage
locale e a BNL HPSS
18-9-2002
CSN1-Catania
L.Perini
12
ATLAS DC1 Phase 1 : July-August 2002
• CPU Resources used :
– Up to 3200 processors (5000 PIII/500 equivalent)
– 110 kSI95 (~ 50% of one Regional Centre at LHC startup)
– 71000 CPU*days
– To simulate one di-jet event : 13 000 SI95sec
• Data Volume :
– 30 Tbytes
– 35 000 files
– Output size for one di-jet event (2.4 Mbytes)
– Data kept at production site for further processing
• Pile-up
• Reconstruction
• Analysis
18-9-2002
CSN1-Catania
L.Perini
13
ATLAS DC1 Phase I
• Phase I
– Primary concern is delivery of events to HLT community
• Goal ~107 events (several samples!)
– Put in place the MC event generation & detector simulation chain
• Switch to AthenaRoot I/O (for Event generation)
• Updated geometry (“P” version for muon spectrometer)
– Late modifications in digitization (few detectors)
• Filtering
– To be done in Atlsim
– To be checked with Atlfast
• Validate the chain:
Athena/Event Generator -> (Root I/O)->Atlsim/Dice/Geant3->(Zebra)
– Put in place the distributed MonteCarlo production
• “ATLAS kit” (rpm)
• Scripts and tools (monitoring, bookkeeping)
• Quality Control and Validation of the full chain
18-9-2002
CSN1-Catania
L.Perini
14
DC1 preparation: software
•
One major issue was to get the software ready
– New geometry (compared to December-DC0 geometry)
• Inner Detector
– Pixels: More information in hits; better digitization
– TRT: bug fix in digitization
– Better services
• Calorimeter
– ACBB readout
– ENDE readout updated (last minute update to be avoided if possible)
– End-caps shifted by 4 cm.
• Muon
– AMDB p.03 (more detailed chambers cutouts)
– New persistency mechanism
• AthenaROOT/IO
– Used for generated events
– Readable by Atlfast and Atlsim
•
And Validated
18-9-2002
CSN1-Catania
L.Perini
15
DC1 preparation: kit; scripts & tools
• Kit
– “ATLAS kit” (rpm) to distribute the s/w Alessandro de Salvo
• It installs release 3.2.1 (all binaries) without any need of AFS
– Last update July 2002
• It requires :
– Linux OS (Redhat 6.2 or Redhat 7.2)
– CERNLIB 2001 (from DataGrid repository) cern-0.0-2.i386.rpm (~289
MB)
• It can be downloaded :
– from a multi-release page (22 rpm's; global size ~ 250 MB )
– “tar” file also available
– Installation notes are available:
» http://pcatl0a.mi.infn.it/~resconi/kit/RPM-INSTALL
• Scripts and tools (monitoring, bookkeeping)
– Standard scripts to run the production
– AMI bookkeeping database (developed by Grenoble group)
18-9-2002
CSN1-Catania
L.Perini
16
DC1 preparation: validation & quality control
• We processed the same data in the various centres and made the
comparison
– To insure that the same software was running in all production centres
– We also checked the random number sequences
• We defined “validation” datasets
– “old” generated data which were already simulated with previous version
of the simulation
• Differences with the previous simulation understood
– “new” data
• Physics samples
• Single particles
– Part of the simulated data was reconstructed (Atrecon) and checked
• This was a very “intensive” activity
– We should increase the number of people involved
• It is a “key issue” for the success!
18-9-2002
CSN1-Catania
L.Perini
17
Expression of Interest (some…)
•
•
–
•
•
•
•
CNAF, Milan, Roma1, Naples
–
–
–
–
RAL, Birmingham, Cambridge, Glasgow,
Lancaster, Liverpool
Copenhagen, Oslo, Bergen, Stockholm,
Uppsala, Lund
FCUL Lisboa
Prague
Manno
Thessaloniki
18-9-2002
BNL; LBNL
Arlington; Oklahoma
• Russia
“Nordic” cluster
–
•
•
•
•
–
–
CCIN2P3 Lyon
IFIC Valencia
FZK Karlsruhe
UK
–
•
• USA
CERN
INFN
CSN1-Catania
JINR Dubna
ITEP Moscow
SINP MSU Moscow
IHEP Protvino
• Canada
– Alberta, TRIUMF
•
•
•
•
•
•
Tokyo
Taiwan
Melbourne
Weizmann
Innsbruck
……
L.Perini
18
DC1 Phase II and pileup
• For the HLT studies we need to produce events
with “pile-up” :
– It will be produced in Atlsim
– Today “output” is still “Zebra”
• We don’t know when another solution will be available
• Will we need to “convert” the data?
– Fraction of the data to be “piled-up” is under discussion
• Both CPU and storage resources are an issue
– If “full” pile-up is required CPU needed would be higher
to what has been used for Phase I
• Priorities have to be defined
18-9-2002
CSN1-Catania
L.Perini
19
Pile-up production in DC1 (1 data sample)
still under discussion
L
Number of
events
2 x 10**33
Total size
TB
Eta cut
1.2 x 10**6 3.6
4.3
<3
10**34
1.2 x 10**6 5.6
6.7
2 x 10**33
1.2 x 10**6 4.9
6.0
10**34
1.2 x 10**6 14.3
17
18-9-2002
Event size
CSN1-Catania
L.Perini
<5
20
Pile-up production in DC1 (1 data sample)
still under discussion
L
Number of
events
Total Time
NCU days
Eta cut
2 x 10**33
Time per
event
NCU sec
1.2 x 10**6 3700
5 x 10**4
<3
10**34
1.2 x 10**6 3700
5 x 10**4
2 x 10**33
1.2 x 10**6 5700
8 x 10**4
10**34
1.2 x 10**6 13400
19 x 10**4
18-9-2002
CSN1-Catania
L.Perini
<5
21
DC1-2 Goals: Norman at EB
• Athena-based reconstruction of data simulated in Phase 1:
–
–
–
–
for HLT;
for other physics studies;
to discover shortcomings in the reconstruction codes;
to use new Event Data Model and Det. Descr., as required for recon.
• To carry out studies of ‘data-flow’ and algorithms in HLT
(‘bytestream’ etc.);
• To make a significant ‘next step’ in our use of Geant4;
(specification of this is an ongoing discussion with Simulation
Co-ordinator.)
18-9-2002
CSN1-Catania
L.Perini
22
DC1-2 Goals: Norman at EB
• To make limited (but we hope finite) use of Grid tools for
the ‘batch-type’ reconstruction. (Document sent to
potential Grid ‘suppliers’: special session during next
s/w week at RHUL, Sep.19)
• To build on and develop world-wide momentum gained
from Phase 1;
• To meet LHCC DC1 milestone for end of this year.
• NOTE: as stated previously, it has always been clear that
we would not complete ALL HLT-related studies by end of
this year.
18-9-2002
CSN1-Catania
L.Perini
23
Importante passo per Atlas e Grid:
Atlas-EDG task force
• Lo scopo è provare che si può fare DC1-2 con i
DataGrid, almeno per una parte significativa
• Lo strumento è riprodurre qualche % di quanto
gia’ fatto in DC1-1
• Lavoro cominciato a fine luglio e ci stiamo
riuscendo! Programma:
– 100 jobs da 24 ore riprodotti da esperti
– Aggiunta: 250 jobs “nuovi” saranno sottomessi da Luc
(esperto DC1, ma mai viste Grids…)
18-9-2002
CSN1-Catania
L.Perini
24
Task Force (EDG Coo Budapest) With mymods
• Task for with ATLAS & EDG people (lead by Oxana Smimova)
• ATLAS is eager to use Grid tools for the Data Challenges
– ATLAS Data Challenges are already on the Grid (NorduGrid, Ivdgl)
– The DC1/phase2 (to start in late November) is expected to be done mostly using the
Grid tools
• By September 19 (ATLAS SW week Grid meeting) evaluate the usability of
EDG for the DC tasks
– The task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one
non-EDG site (Karlsruhe)
• Intensive activity has meant they could process some partitions but problems
with long running jobs still need final Globus fix
• Data Management chain is proving difficult to use and sometime unreliable
• Need to clarify policy for distribution/installation of appl. s/w
• On-going activity with very short-timescale: highest priority task
18-9-2002
CSN1-Catania
L.Perini
25
Members and sympathizers
ATLAS
EDG
Jean-Jacques Blaising
Laura Perini
Ingo Augustin
Frederic Brochu
Gilbert Poulard
Stephen Burke
Alessandro De Salvo
Alois Putzer
Frank Harris
Michael Gardner
Di Qing
Bob Jones
Luc Goossens
David Rebatto
Peter Kunszt
Marcus Hardt
Zhongliang Ren
Emanuele Leonardi
Roger Jones
Silvia Resconi
Charles Loomis
Christos Kanellopoulos
Oxana Smirnova
Mario Reale
Guido Negri
Stan Thompson
Markus Schulz
Fairouz Ohlsson-Malek
Luca Vaccarossa
Jeffrey Templon
Steve O'Neale
18-9-2002
and counting…
CSN1-Catania
L.Perini
26
Task description: Oxana
•
•
Input: set of generated events as ROOT files (each input
partition ca 1.8 GB, 100.000 events); master copies are stored
in CERN CASTOR
Processing: ATLAS detector simulation using a pre-installed
software release 3.2.1
–
–
–
•
•
Each input partition is processed by 20 jobs (5000 events each)
Full simulation is applied only to filtered events, ca 450 per job
A full event simulation takes ca 150 seconds per event on a 1GHz
processor)
Output: simulated events are stored in ZEBRA files (ca 1 GB
each output partition); an HBOOK histogram file and a logfile (stdout+stderr) are also produced. 20% of output to be
stored in CASTOR.
Total: 9 GB of input, 2000 CPU-hours of processing, 100 GB
of output.
18-9-2002
CSN1-Catania
L.Perini
27
Execution of jobs: Oxana
•
It is expected that we make full use of the Resource Broker
functionality
–
–
•
•
A job consists of the standard DC1 shell-script, very much the
way it is done in a non-Grid world
A Job Definition Language is used to wrap up the job,
specifying:
–
–
–
–
•
Data-driven job steering
Best available resources otherwise
The executable file (script)
Input data
Files to be retrieved manually by the user
Optionally, other attributes (maxCPU, Rank etc)
Storage and registration of output files is a part of the job
18-9-2002
CSN1-Catania
L.Perini
28
Preliminary work done: Oxana
•
•
ATLAS 3.2.1 RPMs are distributed with the EDG tools to provide the
ATLAS runtime environment
Validation of the ATLAS runtime environment by submitting a short
(100 input events) DC1 job was done at several sites:
–
–
–
–
–
–
•
•
CERN
NIKHEF
RAL
CNAF
Lyon
Karlsruhe – in progress
A fruitful cooperation between ATLAS users and EDG experts
The task force attributes:
– Mailing list [email protected]
– Web-page http://cern.ch/smirnova/atlas-edg
18-9-2002
CSN1-Catania
L.Perini
29
Status sept.11 (Guido)+mymods
what has been tested with ATLAS production jobs
•
•
only one site (CERN Production Testbed, 22 dual-processors WNs) has
been used, now the other 5 in use for the last 50 jobs
5 users have submitted 10 jobs each (5000 evts/job)
output partitions 00001
00011
00021
00031
00041
•
•
•
to
to
to
to
to
00010
00020
00030
00040
00050
Guido Negri
Fairouz Ohlsson-Malek
Silvia Resconi
Oxana Smirnova
Frederic Brochu
matchmaking successfully done (see next slide)
registration of output zebra files successfully done
3 jobs failed:
–
“Failure while executing job wrapper” (Frederic Brochu and Fairouz Ohlsson-
–
“Globus Failure: cannot access cache files… check permission, quota and
disk space” (Frederic Brochu)
Malek)
18-9-2002
CSN1-Catania
L.Perini
30
WP8 Summary in Budapest
•
Current WP8 top priority activity is Atlas/EDG Task Force work
•
– This has been very positive. Focuses attention on the real user problems, and
as a result we review our requirements, design etc. Remember the eternal
cycle! We should not be surprised if we change our ideas. We must maintain
flexibility with continuing dialogue between users and developers.
Will continue Task Force flavoured activities with the other experiments
•
Current use of Testbed is focused on main sites (CERN,Lyon,Nikhef,CNAF,RAL) – this is
mainly for reasons of support
Once stability is achieved (see Atlas/EDG work) we will expand to other sites. But we
should be careful in selection of these sites in the first instance. Local support would seem
essential.
•
•
WP8 will maintain a role in architecture discussions, and maybe be involved in
some common application layer developments
•
THANKS To members of IT and the middleware WPs for heroic efforts in past
months, and to Federico for laying WP8 foundations
18-9-2002
CSN1-Catania
L.Perini
31
Il futuro
• DC1-2 ricostruzione dei dati, < 1/10 CPU DC1-1
– Inizio a fine novembre, dati nei siti protoTier1, uso di Grid, da
decidere come e quando nel meeting del 19 settembre e subito dopo
– Ipotesi uso di GRID tool compatibili ma diversi, con interoperabilità
tramite lavoro EDT-IvDGL (demo EU-US con Atlas applications
previste in Novembre) : sarebbe anche in primo test della linea
GLUE/LCG
• Quanto pileup e Geant4 sarà compreso in DC1-2 (da
concludersi a Gen02) ancora da decidere, il resto per DC2
dopo la metà 2003
• Risorse INFN-ATLAS Tier1+Tier2(Mi+Roma1) a 120 CPU’s
a 300 per assicurare share 10% in DC2:
– 140 Tier1, 80 Mi, 50 Roma1, 30 altre sedi (Napoli e LNF hanno
partecipato a DC1-1 con 16 e 10 CPU rispettivamente) sarebbero
raggiunti con richieste 2003 (Tier1+LCG): ma Roma1 dovra’
almeno raggiungere Milano: il numero di Tier2 potrebbe crescere
(già in 2003?). Contiamo su tasca LCG di CSN1
18-9-2002
CSN1-Catania
L.Perini
32
CPU and Disk for ATLAS DC2
DC1-1 ~110 kSi95 Months
Year-Quarter
03Q3
03Q4
04Q1
04Q2
Computing power (kSI95 Months)
Total requirement for Simulation
Total requirement for pile-up
55
100
250
Total requirement for Reconstruction
5
Total requirement for Analysis
20
20
10
10
10
Processing power required
CERN T0
18
83
CERN T1 (DC related only)
Offsite T1+T2( DC only)
37
167
10
7
20
5
15
40
Storage (TeraBytes)
Data Simulated at CERN
Data Simulated Offsite
Data Transferred to CERN
Data stored at CERN
Active Data at CERN
Assume numbed of active Offsite
Sum Data Stored Offsite
For ATLAS Data Challenge 3, end of 2004 beginning of 2005 ? We intend to
generate and simulate 5 times more data than for DC2.
For ATLAS Data Challenge 4, end of 2005 beginning of 2006 ? We intend to
generate and simulate 2 times more data than for DC3.
18-9-2002
CSN1-Catania
L.Perini
33
Scarica

ATLAS DC1 Phase 1