CDF
Calcolo @ CDF
A. Sidoti
University of Pisa - INFN Pisa
Workshop sulle Problematiche di Calcolo e Reti nell'INFN
24-28 Maggio 2004
Sant' Elmo Beach Hotel, Castiadas (CA)
Outline
•The CDF experiment
• CDF computing model
•Interactions INFN – CDF
1
CDF Experiment@Tevatron
CDF is a high energy physics experiment at the Tevatron
(Fermilab, Batavia, IL). Multipurpose detector.
Broad Physics Spectrum:
•Top
•Electroweak
The Tevatron is a proton-antiproton
•SUSY searches
collider with √s = 1.96 TeV
•B-sector (and charm) CDF is taking data since March 2001
•QCD
~700 physicists are involved
Italy is the country giving the largest
non-US contribution (~10%)
Other countries:
•Canada
•Japan,Korea,Taiwan
•Spain,Germany,Switzerland,UK,Russia,
Finland
So far(May 04) ~340pb-1 on tape
•Event size ~250kB (50kB if
compressed)
•dataset size:
And will hopefully increase!
Dataset
Size
Bhad(SVT) 28(5)TB
HighPt Ele
2TB
Events
140M
15M
Tevatron Run II
Luminosity projections
Just two datasets are shown, many
others (control datasamples)!
Also duplicated for different
versions
Run2 reconstruction
20MEv/day ~200Hz
2GHz-sec per event
Yield 700 events/ fb-1
Tevatron Luminosity Plan
Computing@CDF : History
In RunI it was difficult and inefficient to work far from Fermilab
->Wanted to improve for RunII
Computing needs à la LHC before GRID!
Need to develop a computing model by our own.
No enough man power to start from scratch.
Strategy:
Integrating best solutions available in the CDF framework to build
the CDF computing model.
•Batch system: started with LSF -> FBSNG -> Condor
•Data Handling: (cf A. Fella Talk) Sam and DCache
•….
CDF Central Analysis Farm
•
•
•
•
•
•
•
•
•
My Desktop
Compile/link/debug everywhere
Submit from everywhere
My favorite
Execute @ FNAL
Computer
– Submission of N parallel jobs
with single command
Log
– Access data from CAF disks
out
job
– Access tape data via
FNAL
transparent cache
ftp
rootd
Get job output everywhere
tape
gateway
scratch
server
Store small output on local
N jobs
scratch area for later analysis enstore
out
Access to scratch area from
everywhere
dCache
Installing GW and WN with
NFS
ups/upd
IT WORKS NOW
Remote cloning works!
Local Data servers
rootd
A pile of PC’s
Batch System
Batch system we are using now implements:
•Jobs can run with different time limit (CPU and Real);
prioritized with the time limit.
•Fair sharing among users
•Group of users might have higher privileges (and fair sharing
among them)
•This is implemented for the “Italian” portion of CAF we
have at FNAL
•FBSNG is our batch system queue(http://www-isd.fnal.gov/fbsng/)
•If we need to switch to other batch systems (e.g. PBS) we
need to have the same features
CDF computing scheme (offline)
User’s desktops
UCSDCAF
(SanDiego)
Ntuple, rootuple
Enstore,
Tapes
Raw Data
Reprocessed Data
(Raw+Physics Objects)
Reconstruction Farm:
•Located at FNAL
•FBSNG queue batch
system
•Users do not run jobs
there
CAF@FNAL
Central Analysis at Fermilab (CAF)
•User analysis jobs are running
(producing ntuples)
ASCAF (Taiwan)
KorCAF
(Corea)
•FBSNG Batch queue
system
•Authentication through Kerberos
CAF@CNAF
CondorCaf@FNAL
Hardware resources in CDF-GRID
site
INFN
Taiwan
Japan
Korea
Germany
GridKa
Cantabria
UCSD
Rutgers
MIT
UK
Canada
GHz now
TB now
GHz
TB Summer
09/04
250 (54 dual)
100
120
~200
8.5
2.5
16
950
150
150
120
~240
30
2.5
6
18
30
280
100
240+
1
5
4
-
60
280
400
200
400
240+
2
5
4
-
Job status on a web page, command line-mode to monitor
job execution (top, ls, tail, debug)
Possible to connect a local gdb to a process running on
a remote farm!
User decides where to run (FNAL, CNAF, San Diego, ….)
It would have been hard to have physics results for
conferences and publication without building CDF-dCAFs
MC production off-site is a reality and necessity (at least
a factor 10 more MC than data events)
We are running production on UCSDCAF
Towards CDF-Grid…
CDF proposal: do offsite 50% of analysis work
Plan and core are ready (dCAFs)
Working hard on missing pieces
Our proposal: do 15% of analysis work in Italy in
one year if enough resources
CNAF performance: data  CPU : OK
Data import : 1TB/day
~120Mbit/sec
OK
Data export :
 output at FNAL
200Mbits/sec achieved
Data analysis:
– Problem :
●
>100 processes read from same
disk… performance drop to zero
– Solution (home made):
●
Files are copied on worker node
scratch disk and opened there
●
Queuing tool limits to 20 copies
at the same time  file server
feeds at 110MByte/sec
(950Mbit/sec)
Sicurezza: CDF-Grid
Accedere a CDF-Grid dall’Italia e viceversa puo` diventare un
incubo a causa delle diverse security policies adottate dalle
diverse sezioni:
La sicurezza a FNAL si basa su Kerberos (5).
•I dati vengono da FNAL (rcp, bbftp, ftp,…) kerberizzati
•L’output dei jobs deve finire sui desktops delle sezioni (ftp, rcp
kerberizzato …).
Le sezioni devono accettare di runnare un server (tutti
vogliono runnare un client ma nessuno il server!) di una
applicazione trusted per il trasferimento dati
•Al momento sopravviviamo perche` i sys.man sono amici
•Ma chiaramente non e` una soluzione scalabile
•Vorremmo scelte univoche per tutte le sezioni INFN (almeno
quelle che ospitano un gruppo CDF)
•Stesso problema per l’interattivo
Interattivo nelle Sezioni
Qualche soluzione adottata nelle sezioni:
Pisa(D. Fabiani, E. Mazzoni):
Farm di CPU per interattivo
Storage Area Network in comune con gli altri esperimenti
di sezione (Computing Storage Facility)
cf . E. Mazzoni
Padova(M. Menguzzato):
Creazione di un cluster Mosix con i diversi desktops del
gruppo.
Completamente trasparente agli utenti
Ottime performances.
Per il momento AFS non e` montato (il codice di CDF e`
disponibile su AFS) -> Problema grave!
Batch
Il contributo italiano al CDF-Grid e` stato fondamentale e
sarebbe stato difficile senza l’aiuto dei gruppi calcolo delle
sezioni INFN.
•Bologna(F. Semeria – O. Pinazza):
•Attivita` di test ed installazione a Bologna del sistema di
code FBSNG usato sulla CAF
•Sistema di web monitoring delle dCAF.
•CNAF(F. Rosso et al.): insostituibile l’aiuto dei sistemisti
CNAF per installazione hardware/software della CAF@CNAF.
Attivita` di installazione DH (dCache e SAM cf talk A. Fella).
Batch II
Frascati(I. Sfiligoi):
•Implementazione icaf: area scratch delle dCAF e tool grafici.
•Installazione di Condor in sostituzione di FBSNG sulle dCAF
(al momento due dCAF usano CondorCAF: UCSFCAF e
CondorCAF)
•Implementazione di PEAC (Proof Enabled Analysis Cluster)
Demo presentata al SC2003 (Phoenix, AZ)
PEAC
Significant part of analysis involves interactive visualization of
histograms
Large ntuples (1-100 GB) will be inevitable -> time processing by Root
can be long
Physicist tend to loose “inspiration” with time
PEAC extends the concept of batch job to interactive analysis
Borrow CPU from batch processes for brief period
Use PROOF to parallelize the rootuple access
Demo at SC2003 Results (http://hepweb.ucsd.edu/fkw/sc2003/ ):
Analysis B+->D0p (6GB ntuples)
Plot takes 10 minutes on a P4 2.6GHz
On INFN farm with 12 Proof slaves
1st pass 39 s
2nd pass 22s (ntuples cached)
Conclusioni
●
CDF sta costruendo una griglia computazionale per MC and
Analysis.
●
Tre anni prima di LHC!
●
Fondamentale per poter fare analisi di fisica (e tenere il passo
dei dati che raccoglieremo (speriamo!) )
●
Difficile fare analisi e costruire le dCAF senza il supporto dei
gruppi di calcolo locali INFN.
Ringraziamenti
S. Belforte, M. Neubauer, M.Menguzzato, A. Fella, I.
Sfiligoi, F. Wurthwein per il materiale fornito
BackUP
CDF rates, or why data to analyze do not scale
with L
1. Luminosity
4.5 E31
changes by a factor
3 in a 16 hour run
1.5 E31
2. Triggers at Level 1 are
automatically prescaled
Level 1
3. Rate to tape stays in [50,70]
Hz at all times
Level 2
Level 3
Mosix a Padova
1 Quadriproc 4x Xeon PIII 700 Mhz (2.5TB) di disco esportato via
NFS (master, accetta il login)
2 biproc 2x Xeon PIV 2.4GHz (400GB each) (slaves, non accettano
login)
I jobs migrano dal master alle slaves :
fattore 3 piu` veloci
Comodita` per l’utente per totale trasparenza
Problemi:
gmake solo in modalita` NON OpenMosix
OpenAFS non compila (problema importante e
documentato). Risolto in vecchie release di OpenMosix ma si e`
riproposto.
Possibili soluzioni:
•Upgrade!
•Ricompilare OpenAFS con NFS enabled e montare AFS su
una macchina che non fa parte del cluster
•Mirror del ramo AFS (giornaliero)
The landscape
●
●
●
●
DAQ data logging upgrade
– More data = more physics
– Approved by FNAL’s Physics Advisor Committee and
Director
Computing needs grow, but DOE/Fnal-CD budget flat
CDF proposal: do offsite 50% of analysis work
– CDF-GRID
possible !
●
We have a plan on how to do it
●
We have most tools in use already
●
We are working on missing ones (ready by end of year)
Our proposal: do 15% of analysis work in Italy
Monitoring the CAFs
http://cdfcaf.fnal.gov
Also non-FNAL developed monitor tools
CNAF monitoring
Ganglia
Tevatron: Luminosity
Integrated Luminosity is a key
ingredient for Tevatron RunII
success.
Analysis presented here is
based on different integrated
luminosity period (72pb-1)
Record Peak Luminosity
(05/02/2004) 6.11031 cm-2 s-1
CDF Takes efficiency at >85%
Silicon integrated most of runs
CDF and DØ are collecting 1pb-1/day
CDF
Proton-antiproton collider means:
•Larger number of physics object
•Events are bigger (storage, I/O)
•Reconstruction and analysis need
more CPU power
Tevatron Run II
Luminosity projections
Yield 700 events/ fb-1
Tevatron Luminosity Plan
Run2 reconstruction
20MEv/day ~200Hz
2GHz-sec per event
Typical bbar events
So far(May 04) ~240pb-1 on tape
•Event size ~250kB (50kB if
compressed)
•dataset size:
Dataset
Size Events
Bhad(SVT) 28(5)TB
140M
HighPt Ele
2TB
15M
Just two datasets are shown, many
others (control datasamples)!
Also duplicated for different
versions
Production Farm
Events processed per day
Total # evts processed
Analysis Farms: USA vs. ITALIA
• FNAL (totale 2004, including
test systems, spares etc.)
– 500 duals
– 184TB disco
– 30% non-FNAL-owned
• FNAL (INFN 2003)
– 162 duals
– 34TB
• CNAF-CDF (attuale)
– 54 duals
– 8.5 TB
Transverse Mass
31
31
Scarica

First CDF Measurement of W->enu production cross section in the