Tier2 di Milano
Componenti e Monitoring
Luca Vaccarossa
Milano 14 dicembre 2007
User Interface (UI)
• E’ la macchina con i comandi per la
sottomissione a Grid
• voms-proxy-init / grid-proxy-init
• edg-job-sumit <file.jdl>
• edg-job-status <jobid>
• edg-job-get-output
User Interface (UI)
• atlfarm008.mi.infn.it
• atlfarm010.mi.infn.it
• grid008.mi.infn.it
Computing Element
•
•
•
•
t2-ce-01.mi.infn.it
Grid gateway
PBS server (TORQUE)
MAUI scheduler
Computing Element
Il sistema batch della farm e' Torque + Maui.
le code abilitate per gli utenti locali sono:
• local (max cpu time 48h, max walltime
72h)
• short (coda corta con cpu riservate, max
cpu time 40m, max walltime 2h)
Worker Nodes (WN)
•
•
•
•
•
•
•
•
grid009.mi.infn.it
grid012.mi.infn.it
grid016.mi.infn.it
grid017.mi.infn.it
grid018.mi.infn.it
grid019.mi.infn.it
grid021.mi.infn.it
grid022.mi.infn.it
•
•
•
•
•
•
•
•
grid023.mi.infn.it
grid024.mi.infn.it
grid025.mi.infn.it
grid026.mi.infn.it
t2-wn-02.mi.infn.it
t2-wn-03.mi.infn.it
t2-wn-04.mi.infn.it
t2-wn-05.mi.infn.it
Worker Nodes (WN)
•
•
•
•
•
•
•
•
t2-wn-06.mi.infn.it
t2-wn-07.mi.infn.it
t2-wn-08.mi.infn.it
t2-wn-09.mi.infn.it
t2-wn-13.mi.infn.it
t2-wn-14.mi.infn.it
t2-wn-15.mi.infn.it
t2-wn-16.mi.infn.it
•
•
•
•
•
•
•
t2-wn-17.mi.infn.it
t2-wn-18.mi.infn.it
t2-wn-19.mi.infn.it
t2-wn-21.mi.infn.it
t2-wn-22.mi.infn.it
t2-wn-23.mi.infn.it
t2-wn-24.mi.infn.it
Comandi PBS
showq
Show job status and some job info
showbf [-v]
Check for immediately available CPUs and nodes
checkjob [-v] <job_id> | qstat -f <job_id>
Check job status
canceljob <job_id>
Cancel a job, sending essentially a qdel to the pbs_server
showstart [-h] <job_id>
Show when job is scheduled to start
Comandi PBS
• PBSNODES –a | less
• Si vedono i WN che non hanno job
• Segnalare a [email protected]
Priorita’ e FairShare
•
•
•
•
Priorita’: diagnose –p
http://tier2.mi.infn.it/priorita.txt
FS: diagnose –f
http://tier2.mi.infn.it/fairshare.txt
Chi sono io ?
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Silvia
Resconi/[email protected]"
resconi
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Tommaso Lari"
lari
Chi sono io ?
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Attilio
Andreazza" andreazz
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Clara Troncon"
troncon
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Leonardo
Carminati" lcarmina
Chi sono io ?
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Donatella
Cavalli" cavalli
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Caterina Pizio"
pizio
Chi sono io ?
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Umberto De
Sanctis" atlas012
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Simone
Montesano" atlas020
Chi sono io ?
"/C=IT/O=INFN/OU=Personal
Certificate/L=Milano/CN=Chiara
Tamarindi" atlas033
• "/C=IT/O=INFN/OU=Personal
Certificate/L=Genova/CN=Fabrizio Parodi"
parodi
• "/C=IT/O=INFN/OU=Personal
Certificate/L=Genova/CN=Bianca
Osculati" osculati
GridView
•
•
•
•
•
http://gridview.cern.ch/GRIDVIEW/
Monitoring and Visualization Tool for LCG
Data Transfer
Job Status
Service Availability
SAM Tests
•
•
•
•
•
https://lcg-sam.cern.ch:8443/sam/sam.py
Certificato nel browser
Test automatici
SAM on demand?
https://cic.gridops.org/index.php?section=r
c&page=samadmin
Ganglia
• http://ganglia.sourceforge.net/
• “Ganglia is a scalable distributed
monitoring system for high-performance
computing systems such as clusters and
Grids.”
Ganglia
• It relies on a multicast-based listen/announce
protocol to monitor state within clusters and uses
a tree of point-to-point connections amongst
representative cluster nodes to federate clusters
and aggregate their state.
• It leverages widely used technologies such as
XML for data representation, XDR for compact,
portable data transport, and RRDtool for data
storage and visualization.
Scarica

canceljob