Headline in Arial Bold 30pt Le nuove frontiere dell’ HPC Sergio Re Sales & Marketing Manager Silicon Graphics Italia SGI oggi Un’anzienda rivolta all’innovazione Più di $500m fatturato Più di 1700 persone 800+ persone che visitano clienti 300+ Ingegneri in R&D 6000+ Clienti attivi in oltre 50 paesi nel mondo • sistemi Linux HPC più avanzati del mondo I ns punti di forza tecnici • un’architettura scalare unica • un sistema di condivione globale della memoria • File systems e condivisione dello storage • Servizi e consulenza Fatturato per segmento di mercato Revenue Mix Geographic Contribution Services 48% Americas 59% Legacy 16% Core 36% Enterprise Business Management/Media 9% Engineering Analysis 20% Sciences 36% Defense & Intelligence 35% Europe 25% Rest of World 16% SGI Systems Highly Integrated and Massively Scalable HighPerformance Computing Advanced Graphics Storage Joint Development With Partners SGI and Intel collaborage on system design requirements for large scale computing for Itanium and Xeon CPUs SGI maintains a close working relationship with Novell on Linux support for scalability, performance, and support for SGI servers and storage systems. SGI works closely with Red Hat on Linux support with a special emphasis on security and adherence to standards SGI has contributed thousands of lines of code to the Linux Community Including code that supports large scale computing, reliability and stability SGI and Oracle are jointly developing, selling and marketing Enterprise solutions for data intensive problems Le necessità di calcolo Simulazione Metereologica CPU Generiche applicazioni di calcolo Database Signal processing Memoria Web server I/O Media streaming SGI Advanced HPC Platform Large SMP Workflow SGI® Altix® 4700 Midrange SMP/ Cluster Workflow SGI® Altix® 450 Intel® ITANIUM2 based Application Appliance SGI® f1200 Intel® Xeon® based Cluster Workflow SGI® Altix® XE ICE 8200 SGI Workflow Ready Solution Workflow Continuum Common Linux OS & Development Tools SGI Scalable shared file servers & storage solutions Common Workload Management Tools Systems Management & Monitoring SGI: Complete HPC Solution on Linux® • Compilers: – Intel C++ and Fortran Compilers for Linux – GNU Compiler for C and Fortran 77 • Libraries: SGI ProPack™ for Linux And Third Party Tools – – – – – – SGI Message Passing Toolkit SGI Scientific Computing Software Library SGI Flexible File I/O Intel Math Kernel Library Intel Integrated Performance Primitives NAG C, Parallel, Fortran, Fortran SMP, F90 • Automated Parallelization Tools: – Parallel Software Products ParaWise • Open Source Development Tools: – Linuxlinks.org, Freshmeat.net, SourceForge®.net • SGI Data Management Software – CXFS™ cluster File System – DMF hierarchical storage management Standard Linux Distributions • Novell SUSE™ Linux Enterprise Server 9 and 10 • Debuggers: – – – – Intel Debugger Etnus® TotalView® GNU GDB Allinea Software Distrbuted Debug Tool • Performance and Analysis Tools: – – – – – Intel VTune™ Performance Analyzer Intel Trace Analyzer and Trace Collector SGI Performance Co-Pilot™ SGI pfmon and profile.pl SGI Histx • Other SGI ProPack Tools – – – – – REACT 4.2 (real-time support) XVM NUMA tools (cpuset, dlock, dplace) Embedded Support Partner (ESP) Graphics support • RedHat® Enterprise Linux 4 ICE 8200 Breakthrough Reliability • • • • • Carlsbad IRU: 128 cores and no cables Redundant, hot swap power and cooling Fully Buffered DIMMS to reduce transient errors Blade design provides rapid serviceability InfiniBand backplane for high signal reliability (2) 4x DDR IB Switch Blades (1) 24-Port IB switch ASIC per blade Front View (7+1) 1625W 12VDC Output Front-End Power Supplies (16) 2-Socket Nodes 10U 24-inch EIA Form Factor (17.50-in H x 22.5-in W x 32-in D) (1) Chassis Management Controller (CMC) ICE 8200 Breakthrough Performance Density Up to 512 Cores and 6 TFlops per Rack (16) Carlsbad Blades L1Display • Each 42U rack (30” W x 40” D) rack has: L1Display (16) Carlsbad Blades L1Display • (48) 4x DDR IB L1Display (16) Carlsbad Blades L1Display L1Display (16) Carlsbad Blades L1Display – (4) IRUs with (16) 2-Socket Carlsbad Nodes each – (128) DP Intel® Xeon® sockets – DDR IB ports on (4) backplanes for torus L1Display • 19” standard rack also supported • SGI offers optional chilled water-cooled units for use in large system configurations • 39.5kW (high-bin SKUs + (4) FB DIMMs /socket) – 31.6kW (assuming 80% system-level derate) • Rack weight ~ 2050 Lb (246 Lb/ft2 footprint) SGI Scalable ccNUMA Architecture Basic Node Interconnect C A C H E CPU CPU C A C H E C A C H E CPU CPU NUMAlink Interconnect Interface Chip Interface Chip Physical Memory Physical Memory Physical Memory C A C H E Open Systems Scaleable Infrastructure RASC™ (FPGA) C A C H E CPU CPU Interface Chip Physical Memory C A C H E FPGA(s) FPGA(s) Scalable GPUs C A C CPU GPUs H GPUsCPU E TIO C A C H E General Purpose I/O Interfaces C General A C CPU Purpose H E I/O General Purpose CPU I/O Interface TIOChip Interface TIOChip Physical Memory Physical Memory NUMAlink™ Interconnect Fabric C A C H E SGI® Altix® 450 • “Plug and Solve” Blade Form Factor • Half-rack or Full-rack S S S S S S S S S S • 5U ‘IRU’ Chassis S S S S S S S S S S -Chassis-only option S S S S S S S S S S • 608 GB SSI Memory -Increasing to 912GB in 1HCY07 NUMALINK -4 to 76 cores Double Single Single Single Single Slot Slot Slot Slot Slot (I/O) Power Sup. S S S S I/O S S S S Power Sup. Power Sup. • 2 to 38 Sockets 5U ‘Individual Rack Unit’ Chassis NUMALINK -3rd party rack option Bringing it Together: Solution Components ISC Star-P™ SGI® Altix® and Altix® XE • • • • Scalable servers, clusters, and supercomputers Cost-efficient, reliable Altix XE clusters with leading density, power efficiency Advanced scalability to 512 processors per Altix server and 128TB globally addressable memory per system Complete Linux solution for HPC Images courtesy of Silicon Graphics, Inc; Interactive Supercomputing Inc. • • • • Interactive parallel computing platform Bridges MATLAB and Altix servers Works with familiar desktop tools, while leveraging an HPC for computationally-intensive tasks Automatic and transparent, no new programming How Star-P works • Star-P consists of desktop & server software • Desktop software – Star-P Client – – Overloads or intercepts desktop tool functions Connects and communicates to/with server software – securely • Server software – Star-P Server on SGI® Altix® or SGI® Altix® XE – – – Manages and directs resources – memory, cpu’s and I/O Contains world class libraries for parallel execution User & Session management What’s the Value • Desktop Users: – No change in religion – Interactivity • On parallel machines • For large data – No reprogramming • No C, Fortran, MPI – Reduced run times • Not hours or weeks – Continued model optimization • Organization – – – – – Collapse development cycles Reduce costs Broaden usage Shorten solution time Accelerate research Parallel Development Takes Too Long • • • • • Months or years are spent porting from desktops to parallel systems No interactively on parallel machines from desktop Little ability to iterate Long compute times for batch runs; hours-days Analyst’s ability to optimize the model is limited Using Star-P– Serial operations • Use MATLAB – – – – – – – • • File Editor Profiler Debugger Array Editor Desktop Visualization Small Calculations Running Star-P does not affect normal MATLAB environment Problems that can be solved on desktop - stay on desktop SGI® RASC™ Solution: Simplifies Development & Improves Programmer Efficiency Gnu Debugger (GDB) FPGA Aware Simultaneous debugging of both the CPU based app and the FPGA accelerated app RASC Abstraction Layer (RASCAL) SGI provided Enables serial or parallel FPGA scaling RASC API and Core Provides tools to develop reconfigurable computing Services Library elements in a multi-user, multiprocessing environment 3rd Party HLL Development Tools Mitrionics Mitrion C, Impulse-C and ROCCC Synplicity Synplify Pro and Xilinx Synthesis Technology Supported within RASC environment For advanced incremental and modular design methodologies How Do FPGAs Differ from Traditional CPUs? Compare Application Run Time %’s Algorithm Algorithm Memory Calls Branche inst. 100% Export Algorithm to RASC 90% 80% % of Runtime 70% 60% 50% 40% 30% 20% 10% 0% App 1 App 2 App 3 App 4 Identify RASC appropriate algorithm App 5 Algorithm Execution Time 01001000010010 01110100101010 11100101010001 10001000110001 01010101010111 00000111100100 00010010111010 0 11 001 00011 1 1 11011110011 0 Traditional Method CPU only Algorithm Execution time 01001000010010 01110100101010 RC100 Method Key Algorithm running on FPGA Directly map computationallyintensive algorithms to hardware with RC100 technology Time Savings Job Run Time Application Run-Time Comparison SGI® RASC™ RC100 Blade SRAM SRAM SRAM SSP NL4 V4LX200 TIO SRAM PCI SRAM Selmap NL4 Loader SRAM Selmap SRAM NL4 TIO SSP V4LX200 SSAM SRAM SRAM SGI Workflow Ready Solution –Segment Example –Fluid Structure Interaction (FSI) Any combination of ALTIX servers & XE sharing storage resources SGI Solution: FSI Workload CAPABILITY CAPACITY SGI Altix 450/4700 SMP & super head node Minimize time-to solution for the largest & most demanding problems InfiniBand or GigE Fabric Fibre Channel or InfiniBand or GigE Fabric SGI® Scalable NAS (and other shared file servers) 1.ALTIX XE • Modest memory addressability (~24GB+/core) 2.ALTIX 450/4700 • Large memory addressability (~48GB/core) • Option for B/W blades 3.Storage • High Speed SAS (~250GB/core) • 4SAS disks per XE node SGI Altix XE 1200/1300 (x86-64) clusters Cost-effective Solution & performance leader for most analyses Optimally meet the diverse needs of all workloads or procurement drivers Advanced Graphics Storage SGI® InfiniteStorage Hardware Multi-purpose RAID Systems Streaming Real-time RAID 4500 6700 • • • • Max performance 4Gb FC or IB Enterprise S/W FC RAID / SATA Low-cost SATA RAID 350 • Multiple high resolution streaming • Isochronous • 4 Gb connectivity • 500GB SATA drives Ultra-dense RAID 10000 4000 • 4Gb Fibre Channel • Ultimate Price/Performance • FC RAID / SATA • Ultra-high density • Tape complement or replacement • One rack – 240 TB SAN NAS • Completely integrated • Easy to deploy • Grows with customers’ business JBOD 120 • Easy to deploy modular scalability Data Management Software Stack Storage Product Integration DMF CXFS NFS RDMA accelerated (NAS) XFS SGI System Altix Servers 3rd Party Disks, Fiber Channel Switches InfiniteStorage Appliance Manager ® SGI® InfiniteStorage, le soluzioni SGI® InfiniteStorage Data Migration Facility (DMF) migra in maniera trasparente I files dallo Storage On-line a quello definito near-line secondo I critei temporali assegnati • Questo porta ad abbassare il TCO • Incremetare il ROI e la produttività • E’ più facile da gestire • Riduce I rischi di perdita dei dati • Protegge gli investimenti iniziali • Integra la disponibilità del dato con la sua sicurezza SGI® InfiniteStorage Shared File System CXFS™ • tutti I file sono condisisi • non sono copiati • Non si spreca spazio • si risparmia tempo • si risparmiano soldi File A File D File G File B File E File F File H File C File I Dedicato a: Decision Support Centres, Surveillance/Homeland Security/Crisis, C4I battlefield command and control Vi interessa?... Si chiama Pixelfusion Enviroment Media Fusion Process Input Fusion Native Render Network Input IP Streams Fusion Fusion Output Render to Pipes Output Local Display Stream to IP Local Streams Record/ Retrieve Network HPC Technology Investment Strategy • Packaging – Consistency – Density & Reliability – Energy Efficiency • Interconnect – Reduce Cost – Increase Value • Data Management Oracle® TimesTen : In-Memory DB Customer Benchmark Results Government Customer’s Data & Tests Incumbent 96GB System vs. Altix 960GB Improvement • Ingest order-records 5x • Ingest person-records 12x • Query 1 per secest vs. 91K/sec • Join Data 1 every min vs. 13K/sec • Sub-Query 1 every 5mins vs. 2.5K/sec May’06 SGI Altix Servers Support More Memory Maximum Memory Memory/Core 128 TB 128 GB/core IBM p595 2 TB 32 GB/core HP Superdome 2 TB 16 GB/core Sun Enterprise 25K 1 TB 8 GB/core SGI Altix 4700 Source: Ideas International, Inc. – February 2007 • SGI Altix 4700 supports more memory • Fewer cores are required to support the same level of memory • Lower TCO: • Spend less on processors • Spend less on software licenses SGI Altix 4700 Requires Less Floorspace Dense System Packaging is one of SGI’s Core Competencies System Footprint Altix 4700 45” 26” 1170” HP SuperDome 60” 48” 2880” Sun E25K 65” 33” 2145” IBM p595 52” 31” 1612” HP SuperDome Area IBM p595 Depth Sun E25K Width SGI Altix 4700 SGI Altix Innovative Power Architecture SGI Servers are Twice as Efficient Typical Power Architecture SGI Power Architecture AC Server 48VDC AC ~80% No 48V Conversion SGI Altix Server Additional boards board 1.85v 12v 3.3v board ~80% 1.2v Additional boards ~70% 80% x 80% x 70% = 45% efficiency 1.85v 12v 90% 3.3v 1.2v High Efficiency Power Converters 85% 90% x 92% = 76% efficiency Proprietary Power Design Interconnect Strategy : Reduce Cost & Increase Capability • NUMAlink4 (Today) • NUMAlink5: hw extension of IFB (’09) – Custom copper cable – COTS Infiniband12x copper cable – Custom signaling – COTS serdes – Custom protocol – Custom protocol (higher capability) $450 (5m) $150 (5m) Copper : Weight becoming significant Picture credit : LRZ SGI NumaLink System Architecture MPU Very Large Shared Memory CPU . Globally Addressable . Low Latency . High Bandwidth . Many Ports CPU Sergio Re – Tel. 02.36547100 – E-mail [email protected] – WWW.SGI.COM