The Read-Out Crate in the ATLAS DAQ/EF prototype -1 G.Ambrosini, E.Arik, H.P.Beck, S.Cetin, T.Conka, A.Fernandes, D.Francis, Y.Hasegawa, M.Joos, G.Lehmann, J.Lopez, A.Mailov, L.Mapelli, G.Mornacchi, Y.Nagasaka, M.Niculescu, K.Nurdan, J.Petersen, D.Prigent, J.Rochez, L.Tremblet, G.Unel, S.Veneziano, Y.Yasu • The Read-Out Crate model • The Read-Out Buffer • The ROBin • ROB performance • ROC performance • I/O module configurations • Conclusions CHEP 2000 Padova Stefano Veneziano 1 ATLAS DAQ/EF Detector Electronics Front EndDAQ consisting of read-out crates Input Rate L DT AR G Q L DT AR G Q E B Readout I Buffers F E B Readout I Buffers F Fromtrigger systems (L2A, L2R, ROI) Event Builder FarmDAQ consisting of sub-farm crates Tolevel 2 DFM +LDAQ L DS AF QO Switching Network S EFSub F Farm I L DS AF QO Switch Supervision 75 - 100 kHz ROC Bandwidth 1- 2 kHz 4-5 GB/s S EFSub FF arm I ~100 MB/s Mass Storage CHEP 2000 Padova Stefano Veneziano 2 Read-Out Crate logical model The ROC performs: • data collection from many readout links • event buffering during LVL2 latency • event fragment distribution to Event Builder and LVL2 External I/O channels Internal I/O channels External I/O channels CHEP 2000 Padova Stefano Veneziano 3 The I/O Module – Each I/O channel, external or internal, has an associated Task – A Task is activated on the occurence of stimuli (messages or event data) – An I/O Module (IOM) is a collection of Tasks associated to an external I/O channel. – All Tasks belonging to an IOM are scheduled within a single process and activated by polling conditions (scheduler) – Event manager API – All components within a ROC communicate via a well defined message passing protocol CHEP 2000 Padova • the baseline implementation of the ROC crate, VMEbus based one Single Board Computer (SBC) per IOM, has evolved to a collapsed solution, where a SBC can handle many external I/O channels. Stefano Veneziano 4 Message passing • Based on circular buffers • Supported buses: VMEbus, PVIC, PCI, CPU bus, EBIO (TCP/IP) • Duplication of R/W pointers ==> no polling on the bus • DMA and broadcast used where possible PVIC CHEP 2000 Padova Stefano Veneziano 5 The MFCC based ROBin Today’s ROB implementation minimizes movement of event fragments over the system bus (need to receive and buffer events of 1 kB at 100 kHz), by using an add-on PMC card (one per Read-Out Link). The CES MFCC 8441 is a commercially available intelligent PMC – I/O: via user programmable 10k50ev front-end FPGA – Same S/W environment as on SBC (LynxOS 3.0.1) CHEP 2000 Padova Stefano Veneziano 6 The ROBin software • No device drivers, minimal operating system calls. • Single process, scheduler, three tasks to manage one ROL, one internal I/O to ROB-host + firmware. CHEP 2000 Padova Stefano Veneziano 7 ROBin firmware PPC ROL (S-link) 603 sdram • Firmware on FPGA from VHDL synthesis (40/66 MHz clk): •PPC master and slave •interface to ROL protocol (S-link) •Buffer manager and DMA manager state machines •2 kB data fragment buffering •communication to ROBin task via two FIFOs (EM pages stats) CHEP 2000 Padova Stefano Veneziano 8 ROBin interaction ROBin application interacts with ROBhost and event data source (FE-FPGA) The following ROB performance results rely on test programs running on the ROBhost on: • Event Location • Event Deletion • Event Retrieval CHEP 2000 Padova ROB-host PCI ROBin Scheduler loop Stefano Veneziano 9 Event location •Messages over the PPC-PCIPPC buses EM_GetById - sequential IDs 60.0 •No S-Link I/O •No broadcast mechanism •Best arrangement of events into memory (one event per class) rate (kHz) 50.0 40.0 30.0 20.0 10.0 0.0 0 1 2 3 4 number of MFCCs RIO2 8062 MFCC 8441 PMC PPC MEM PPC MEM PCI bus CHEP 2000 Padova Stefano Veneziano 10 Event retrieval EM_GetByIdCopy 1 2 3 4 50 rate (kHz). 40 MFCC MFCCs MFCCs MFCCs 30 20 10 0 0 256 512 768 1024 event size (bytes) • Messages: single cycles • Event data transfer: DMA • Transfer bandwidth: ~50 MB/s CHEP 2000 Padova Stefano Veneziano 11 Event deletion EM_DeleteById No event regeneration # pages/event = 1 # events/class = 1 600 rate (kHz) 500 400 300 1 MFCC 2 MFCCs 3 MFCCs 4 MFCCs 200 100 0 0 10 20 30 40 50 60 L2R group size • No input of new events, only messages over the system bus. • Messages sent in DMA mode • Several delete requests packed into one message • Delete requests get acknowledged CHEP 2000 Padova Stefano Veneziano 12 S-Link measurements EM_DeleteById (with SLIDAS, single cycle message passing, preliminary) 150 rate (kHz) 125 One ROB with one ROBin 100 75 50 25 1016 bytes/event 504 bytes/event 248 bytes/event pages/event=1 0 0 10 20 30 40 50 L2R group size • Rate dominated by event fragment input traffic • Max. input rate in best conditions = 145 MB/s (with no messages from ROB-host) (SLIDAS max bandwidth is 160 MB/s) CHEP 2000 Padova Stefano Veneziano 13 ROC measurements 140 event rate (kHz) 120 100 80 60 40 1% L2A 20 5% L2A • No S-Link, input emulated • All ROB<->ROBin messages sent in single cycle mode • rate dominated by PCI traffic and ultimately by MFCC CPU to ~120 kHz 0 0 1 2 3 4 number of MFCCs PPC MEM PVIC PMC DMA PCI bus DMA RIO2 8062 Not usedPVIC PMC PVIC Event data VMEbus PPC MEM DMA PPC PCI bus DMA RIO2 8062 Input fragments + Messages + Event data Messages + Event data (Messages) CHEP 2000 Padova Stefano Veneziano 14 I/O modules configurations T E ROLs R R R To further minimize data movement and message passing, more than one external input can be handled by an IOM CHEP 2000 Padova event rate (kHz) TRG T T E 200 180 160 140 120 100 80 60 40 20 0 R R R T ROC Event Rates R R R E E E T T T R R R E E E One ROB No ROBins Stefano Veneziano 15 ROC Event Rates with ROBins ROC Event Rate vs. # ROBins 1KB event fragments Event Rate (kHz) 120 110 100 TRG + EBIF + ROBs TRG/EBIF + ROBs TRG + EBIF/ROBs TRG/EBIF/ROBs 90 80 0 1 2 3 4 5 Number of ROBins per ROB Max event rate increases when data collection (ROBins to EBIF) is done on the same SBC. Expected ATLAS max ROL bandwitdh 1KBX75(100) kHz CHEP 2000 Padova Stefano Veneziano 16 Conclusions • A Read-Out Crate based on the DAQ/-1 design and deployed on COTS components has the functionality required by the ATLAS Trigger/DAQ community • Software layering and minimal dependence of the software packages on the operating system adds flexibility without a degradation of performance. It also facilitates porting (move to Motorola/Linux or Intel/Linux) • The requirements of many Read-Out Links per Read-Out Buffer have lead to the design of the ROBIN, capable of receiving Event data fragments at the expected Level-1 rates. CHEP 2000 Padova Stefano Veneziano 17