Tier2 di Milano Componenti e Monitoring Luca Vaccarossa Milano 14 dicembre 2007 User Interface (UI) • E’ la macchina con i comandi per la sottomissione a Grid • voms-proxy-init / grid-proxy-init • edg-job-sumit <file.jdl> • edg-job-status <jobid> • edg-job-get-output User Interface (UI) • atlfarm008.mi.infn.it • atlfarm010.mi.infn.it • grid008.mi.infn.it Computing Element • • • • t2-ce-01.mi.infn.it Grid gateway PBS server (TORQUE) MAUI scheduler Computing Element Il sistema batch della farm e' Torque + Maui. le code abilitate per gli utenti locali sono: • local (max cpu time 48h, max walltime 72h) • short (coda corta con cpu riservate, max cpu time 40m, max walltime 2h) Worker Nodes (WN) • • • • • • • • grid009.mi.infn.it grid012.mi.infn.it grid016.mi.infn.it grid017.mi.infn.it grid018.mi.infn.it grid019.mi.infn.it grid021.mi.infn.it grid022.mi.infn.it • • • • • • • • grid023.mi.infn.it grid024.mi.infn.it grid025.mi.infn.it grid026.mi.infn.it t2-wn-02.mi.infn.it t2-wn-03.mi.infn.it t2-wn-04.mi.infn.it t2-wn-05.mi.infn.it Worker Nodes (WN) • • • • • • • • t2-wn-06.mi.infn.it t2-wn-07.mi.infn.it t2-wn-08.mi.infn.it t2-wn-09.mi.infn.it t2-wn-13.mi.infn.it t2-wn-14.mi.infn.it t2-wn-15.mi.infn.it t2-wn-16.mi.infn.it • • • • • • • t2-wn-17.mi.infn.it t2-wn-18.mi.infn.it t2-wn-19.mi.infn.it t2-wn-21.mi.infn.it t2-wn-22.mi.infn.it t2-wn-23.mi.infn.it t2-wn-24.mi.infn.it Comandi PBS showq Show job status and some job info showbf [-v] Check for immediately available CPUs and nodes checkjob [-v] <job_id> | qstat -f <job_id> Check job status canceljob <job_id> Cancel a job, sending essentially a qdel to the pbs_server showstart [-h] <job_id> Show when job is scheduled to start Comandi PBS • PBSNODES –a | less • Si vedono i WN che non hanno job • Segnalare a [email protected] Priorita’ e FairShare • • • • Priorita’: diagnose –p http://tier2.mi.infn.it/priorita.txt FS: diagnose –f http://tier2.mi.infn.it/fairshare.txt Chi sono io ? "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Silvia Resconi/[email protected]" resconi "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Tommaso Lari" lari Chi sono io ? "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Attilio Andreazza" andreazz "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Clara Troncon" troncon "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Leonardo Carminati" lcarmina Chi sono io ? "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Donatella Cavalli" cavalli "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Caterina Pizio" pizio Chi sono io ? "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Umberto De Sanctis" atlas012 "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Simone Montesano" atlas020 Chi sono io ? "/C=IT/O=INFN/OU=Personal Certificate/L=Milano/CN=Chiara Tamarindi" atlas033 • "/C=IT/O=INFN/OU=Personal Certificate/L=Genova/CN=Fabrizio Parodi" parodi • "/C=IT/O=INFN/OU=Personal Certificate/L=Genova/CN=Bianca Osculati" osculati GridView • • • • • http://gridview.cern.ch/GRIDVIEW/ Monitoring and Visualization Tool for LCG Data Transfer Job Status Service Availability SAM Tests • • • • • https://lcg-sam.cern.ch:8443/sam/sam.py Certificato nel browser Test automatici SAM on demand? https://cic.gridops.org/index.php?section=r c&page=samadmin Ganglia • http://ganglia.sourceforge.net/ • “Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.” Ganglia • It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. • It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.