AMS Data Handling e INFN P.G. Rancoita Perugia 11/12/2002 1 AMS Ground Segment Data flow in AMS-02 • High Rate (Scientific + Calibration) : 3-4 Mbit/s • Slow Rate (House Keeping) : 16 kbit/s • NASA ancillary data : 1 kbit/s • Total Volume : 30 - 41 GB/day 11 - 15 TB/year 2 AMS Ground Segment Data volume in AMS-02 • Archived Data 1. Event Summary Data : 2. Event Tag : 3. Total (+Raw and ancillary) : • Data on direct access 1. Event Summary Data : 2. Event Tag : • Total data volume (3 years): • Namely 180 GB/day 44 TB/year 0.6 TB/year 56 - 60 TB/year 8.3 TB/year 0.6 TB/year 180 TB 3 Events and Ev. rate • Exp rate of average accepted ev. about 200 Hz, this means in 3 y’s about (1.5-2)x10^10 ev’s • Typical reconstructed ev. length less than about 6.5-7 kB. • Total storage for ESD about 130 TB 4 AMS Ground Segment Data budget in AMS-02 Data/ Year 1998 2001 2002 2003 2004 2005 2006 2007 2008 2009 Total Raw 0.20 ---- --- --- --- 0.5 15 15 15 0.5 46.2 ESD 0.30 --- --- --- --- 1.5 44 44 44 1.5 135.3 Tags 0.05 --- --- --- --- 0.1 0.6 0.6 0.6 0.1 2.0 Total 0.55 --- --- --- --- 2.1 59.6 59.6 59.6 2.1 183.5 MC 0.11 1.7 8.0 8.0 8.0 8.0 44 44 44 44 210.4 Grand Total 0.66 1.7 8.0 8.0 8.0 10.1 104 104 104 46.1 ~400 5 POIC@MSFC AL AMS Ground Segment: External Communications HOSC Web Server and xterm commands POCC POCC XTerm Data budget in AMS-02 Monitoring, H&S data Flight Ancillary data AMS science data (selected) cmds archive TReK WS “voice”loop TReK WS Video distribution GSE PC Farm Science Operations Center GSE Buffer data Retransmit To SOC GSE D A T A AMS Remote center MC production Data mirror archiving RT data Commanding Monitoring NRT Analysis S e r v e r Production Farm MC production NRT Data Processing Primary storage Archiving Distribution Science Analysis Analysis Facilities Data Server Analysis Facilities AMS Station AMS Station AMS Station AMS Ground Centers 6 AMS Ground Segment AMS-02 Ground Facilities • • • • • POIC @ Marshal MSFC POCC @ JSFC / MSFC / MIT / CERN (A)SOC @ CERN Remote Center - Italian Ground Segment Laboratories 7 8 AMS Ground Segment Payload Operation and Integration Center (POIC) • • • • • • • POIC @ Marshall SFC (Huntsville -AL) Receives data from ISS Buffers data until retransmission to (A)SOC Forward monitoring and meta-data to POCC Transmits commands from POCC to AMS Runs unattended 24h/day, 7days/week Must buffer ~ 2 weeks of data 600 GByte 9 AMS Ground Segment Payload Operation Control Center (POCC) • • • • • POCC @ JSFC, MSFC, MIT, CERN Receives data from POIC @ MSFC Monitors data and runs quality control program Process ~ 10% of data in near real time Originates and transmits commands to AMS through POIC • Requires scientists on shift 10 AMS Ground Segment (AMS) Science Operation Center [(A)SOC] • • • • • • • Complete Data Repository (Raw + Reco) Production of Reconstructed data Re-processing / Re-calibration of data Meta-data Repository and Command archive Production and management of MC events MonteCarlo Repository Scientific Data Analysis Facility 11 AMS Science Operation Center Computing Facilities Production Farm PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ Tape Server PC Linux 2x2GHz+ Tape Server Disk Server Gigabit Switch (1 Gbit/sec) #8 #2 PC Linux Server 2x2GHz, SCSI RAID Cell #1 Gigabit Switch (1 Gbit/sec) MC Data Server Disk Server Disk Disk Server Server Disk Server Data Server Simulated data AMS data NASA data metadata Archiving and Staging 2xSMP, (Q, SUN) PC Linux 2x2GHz+ PC Linux 2x2GHz+ Gigabit Switch (1 Gbit/sec) Analysis Facilities 12 AMS Ground Segment AMS Italian Ground Segment (IGS) • Get data (raw + reco + meta-data) from (A)SOC • Complete Mirror and Meta-data repository: Master Copy of the full Data set • Monte Carlo production (20%) • Support local user’s community for Data Analysis 13 AMS Ground Segment AMS Italian Ground Segment 14 AMS Ground Segment Italian Ground Segment Facilities • Italian Ground Segment Data Storage Complete mirror data and meta-data repository (IGSDS) namely the MASTER COPY of the full AMS Data Set • Data Transfer Facility DTF • Data Transfer Management and Survey DTMS • Monte Carlo contribution: (20%) 15 AMS Ground Segment Data Transfer to IGS • Involved: DTF, IGSDS, DTMS • DTF (CERN): access Data at (A)SOC and transfer to IGSDS • IGSDS (TBD): receive and store Data • DTMS (Milano): watch over the Data transfer • Network required: 32 Mbit/s 16 17 DATA Transfer Dev. 18 Data transfer New release of Data Transfer is running since 20 weeks. Stops are due only to power outages at CERN. 19 Data transfer • “production rate” = 2.2 Mbit/sec • Sustainable production rate = 8 Mb/sec (80% of available bandwidth) • This thanks to a forking mechanism and bbftp’s efficient bandwidth usage • Milano and CERN Data Transfer DB’s consistency = 100% • Data that has to be retransmitted= 0.2 % 20 Data transfer: present work • Test bbftp’s variable TCP parameters (done) • Release a new version of “our” bbftp (minor changes on authorization and error reporting) (done) • Test system in a more reliable environment (no power outages…) • Implement automatic recovery. • Setup GUI (Graph. User Interface) to start/stop system • Complete Web monitoring tools. 21 22 AMS Italian Ground Segment Data Storage at IGSDS • Place: • Archived Data: • On-line Data: TBD 180 TB (3 years) ~ 2 TB (1-2 weeks) 23 Descrizione dei costi • Costi relativi al Central AMS Ground Segment (POIC+POCC+(A)SOC) 24 Central Production Facility • La Central Production Facility sara’ dedicata alla ricostruzione dei dati. • La CPF sara’ fisicamente alloggiata presso il CERN e fa parte dell’ (A)SOC • Le necessita’ per la CPF sono suddivise in storage e CPU (e DB servers). 25 HW e costi del Data Handling di AMS ---Central Production Facility Per quanto riguarda la potenza di calcolo, si avra’ bisogno dell’equivalente di: • 50 dual 1.5 GHz boxes, 1 GB RAM, • Processing storage: 10 TB 26 Central Production Facility Ai costi e alle conoscenze attuali degli sviluppi dei costi, si prevede per la facility nel periodo 2004-2006 un costo di • CPF 350 KUS $ • DB Servers 50 KUS $ • Event Storage 200 KUS $ 27 POCC, Marshall (POIC), Analysis Ai costi e alle conoscenze attuali degli sviluppi dei costi, si prevede un costo di • Marshall • POCC (x2) • Analysis 55 KUS $ 150 KUS $ 55 KUS $ 28 Spese Addizionali • Spese 2000-2001 per prototipi e initial set-up) 150 KUS $ • Running costs & Upgrades 2007–2008 150 KUS $ Totale (escluso personale) 1160 KUS $ Si attende che il 20% +IVA di questa circa venga da parte INFN : 277 k€ 29 Stime del personale per il Data Handling di AMS • E’ in fase di formalizzazione la spesa per personale (oltre ai fisici) da dedicare al data handling per il periodo 2003-2008 • Il personale consiste in system administrators, SW and HW engeneers. Le stime in anni/uomo sono: • POCC circa 8.5 • (A)SOC circa 15.3 • User’s support group circa 15.6 (incluso personale dedicato ad item particolari quali lo storage) • Totale circa 39.4/anni uomo • Se si assume un costo di 50K€/anno uomo si ottiene circa 1970 K€ , il cui 20% (circa 390 K€) dovrebbe essere un 30 contributo INFN Descrizione dei costi • Costi relativi all’ Italian Ground Segment, relativi a DTF, DTMS, IGSDS 31 DTF Il sistema di DATA TRANSFER avra’ un suo front-end INFN presso il CERN, con un sistema dedicato a “prendere” i dati e trasferirli in Italia al MASTER COPY repository Il sistema si basa su: • Architettura Client/Server (SSL) • Bbftp • MySql 32 DTF cont. Per tale sistema sara’ necessario: • 1 Server AMD 1.5 GHz • 1.5 TB su disk raid (scsi) • 32 Mb/s CERN IGS • Costo inclusa la manutenzione e sostituzione dei server circa 50k€ +IVA mel periodo 2004-2008 Richieste di banda: (4 R + 8 NT ) + (2 R + 4 NT ) rt + 2 (SR+CAS) = 20 Mb/s 33 DTMS High performance server, with fast CPU and high I/O throughput. I/O Buffer • Capacity equivalent to 7 days of data taking to recover from any connectivity failure • 1.5 Tbytes Network • High speed network connections to CPF. Must be consistent with a flux of 3 days worth of data: 32 Mb/s • Each facility (DTF and DTMS) costs about 27+VAT k€ up 2008 34 DATA STORAGE : Italian MASTER COPY 2 High performance servers, with fast CPU and high I/O throughput. I/O Buffer : Capacity equivalent to about 3 days of data taking to recover from any connectivity failure (0.5 Tbytes) On-line storage RAID system (1 Tbytes) Off-line storage : Tapes or similar (e.g.: LTO) 180 Tbytes. For instance LTO Off-line Robotics staging area: Depending on the robot solution adopted, it varies between a few percent and 10% of the stored data (10 Tbytes) Network: High speed network connections to CPF. Must be consistent with a flux of 3 days worth of data (32 Mb/s)35 Cost (2002 price based on LTO) : 355 k€ + VAT • • • • • • • • Sommario costi per la parte INFN per il contributo al Ground Segment Centrale (CERN) e IGS relativa al Data Transfer e Master Copy per il periodo 2003-2008 HW to AMS central ground segment 277k€ Personnel (A)SOC,POCC, etc 394k€. Total cost 671 k€ (VAT included) HW (IGSDS) for 200TB storage 428k€ HW DTF e DTMS (63k€) Total cost 491k€ Grand Total (2003-2008) 1162 k€ No cost for IGSDS facility (infrastructure and personnel) is included 36