Publication patterns in HEP computing M. G. Pia1, T. Basaglia2, Z. W. Bell3, P. V. Dressendorfer4 1INFN Genova, Genova, Italy 2CERN, Geneva, Switzerland 3ORNL, Oak Ridge, TN, USA 4IEEE, Piscataway, NJ, USA NSS 2012 Maria Grazia Pia, INFN Genova CA Anaheim, 1 Analysis topics What they publish How much Where Citations Technology vs physics Software vs hardware Software/DAQ-trigger General tools − Geant4 − ROOT HEP experiments − LEP ALEPH, DELPHI, L3, OPAL − CDF − BaBar − LHC ALICE, ATLAS, CMS, LHCb, TOTEM Grid computing − LCG Maria Grazia Pia, INFN Genova No time to report all the results 2 Data sources Thomson-Reuters: ISI Web of Knowledge − CERN subscription: since 1970, conference database not included − Search by keywords, collaboration name Journal web sites − IEEE TNS − NIM, Comp. Phys. Comm. (Elsevier) − JINST (IOP/SISSA) ➤ Full-text searches CERN databases − CERN Document System − Greybook Years: 1982-2011 (LEP), 1992-2011 (BaBar, LHC) − Reproducible sample for citation analysis Publication analysis extended to 30 September 2012 Maria Grazia Pia, INFN Genova 3 Data sample Contamination − Non-pertinent entries in the data sample Omission − Pertinent papers are not included in the data sample ➩ Cross-checks − WoS/CDS, WoS/publishers’ web sites WoS inconsistencies and errors − Total number of citations includes Conference database − Proceedings papers: false classifications and omissions ➩Manually corrected whenever possible Automated analysis (whenever possible) Manual evaluation: abstracts and full-text papers − Some degree of subjectivity Maria Grazia Pia, INFN Genova 4 S. Agostinelli et al. 3301 citations Geant4: a simulation toolkit (20 October2012) NIM A, vol. 506, no. 3, pp. 250-303, 2003 Most cited CERN publication in WoS (excluding Rev. Part. Properties) J. Allison et al. Geant4 Developments and Applications IEEE Trans. Nucl. Sci., vol. 53, no. 1, pp. 270-278, 2006 665 citations (20 October2012) Many papers cite the NIM paper, but they omit citing the TNS one, even though both are indicated in http://cern.ch/geant4 Many papers that use Geant4 do not cite either reference Maria Grazia Pia, INFN Genova Citation analysis: until end 2011 5 500 Geant4 NIM Geant4 TNS Born from LHC experimental requirements Multidisciplinary sources of citations Citations 400 300 200 100 0 Geant4 NIM: Citing Journals 2003 2004 2005 2006 2007 2008 2009 2010 2011 Year NIM A Phys. Rev. D TNS Phys. Rev. Lett. Med. Phys. Phys. Med. Biol. Phys. Rev. C Phys. Lett. B NIM B JINST EPJC Astrop. Phys. JHEP J. Phys. G Appl. Radiat. Isot. Radiat. Meas. J. Korean Phys. Soc. Radiat. Prot. Dosim. G4 NIM: Citing Collaborations BaBar ATLAS 30% Physics CMS LHC HEP Other LHCb HARP CDF LUNA MiniBooNE 75% citations (plot) N TOF BES III 16% citations (plot) JET EFDA 19% citations from collaborations ALICE ISOLDE 0 100 200 Citations Maria Grazia Pia, INFN Genova 300 400 0 50 100 150 200 Citations 6 R. Brun and F. Rademakers ROOT - An object oriented data analysis framework NIM A, vol. 389, no. 1-2, pp. 81-86, 1997 584 citations (20 October 2012) AIHENP Workshop proceedings paper I. Antcheva et al. ROOT - A C++ framework for petabyte data storage, statistical analysis and visualization Comp. Phys Comm., vol. 180, no. 12, pp. 2499-2512, 2009 Maria Grazia Pia, INFN Genova 32 citations (20 October 2012) Citation analysis: until end 2011 7 60 ROOT Proc. ROOT CPC Citations 50 ROOT Proc.: Citing Journals Astropart. Phys. Lect. Notes Comp. NIM B JHEP Med. Phys. EPJC Phys. Med. Biol. JINST Phys. Rev. D Phys. Rev. C Comp. Phys. Comm. TNS NIM A 40 30 20 10 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 75% citations Year 0 20 40 60 80 100 120 Citations Field of citing journals Geant4 % ROOT % 30.3 49.6 Technology 29.9 18.2 Physics 13.9 6.0 BioMedical Maria Grazia Pia, INFN Genova CMS CDF CLAS D0 N TOF T2K ATLAS ALICE BABAR RISING R3B PHOBOS KIMS JET-EFDA HADES H1 GLAST D0 BELLE AUGER 8% of all citations from collaborations 0 1 2 3 Citations 8 HEP experiments Maria Grazia Pia, INFN Genova Collaboration members 3500 3000 2500 2000 1500 1000 500 C D F 0 AL E D PH EL PH I L O 3 PA Ba L b AL ar IC AT E LA S C M S LH TO Cb TE M Members LEP • ALEPH • DELPHI • L3 • OPAL CDF BaBar LHC • ALICE • ATLAS • CMS • LHCb • TOTEM Start of run CDF: 1985/1988 LEP: 1989 BaBar: 1999 LHC: 2008 Experiment 9 Time distribution Start of run Publications vs operation year Publication year All LEP BaBar LHC CDF 200 LEP BaBar LHC CDF 200 Number of publications Number of publications 250 CDF: 1985/1988 LEP: 1989 BaBar: 1999 LHC: 2008 150 150 100 100 50 0 1985 1990 1995 2000 2005 Year Publication year Maria Grazia Pia, INFN Genova 2010 50 0 −20 −10 0 10 20 Year Rescaled w.r.t. year of start run 10 Time distribution Publications/member vs. year Number of publications 0.12 LEP BaBar LHC CDF 0.10 0.10 0.08 0.06 0.04 0.02 0.00 Start of run Number of publications 0.12 CDF: 1985/1988 LEP: 1989 BaBar: 1999 LHC: 2008 Publications/member vs. year LEP BaBar LHC CDF 0.08 0.06 0.04 0.02 1985 1990 1995 2000 Year 2005 2010 0.00 −20 −10 0 10 20 Year Same as previous slide, rescaled by the number of experiment members Maria Grazia Pia, INFN Genova 11 Share of hardware, software and DAQ-trigger publications Publications Publications 200 100 Experiment Maria Grazia Pia, INFN Genova C D F 0 0.6 0.4 0.2 0.0 L O 3 PA Ba L b AL ar IC AT E LA S C M S LH TO Cb TE M C D F 300 0.8 AL E D PH EL PH I 400 1.0 Number of publications 500 Technological publications general physics hardware DAQ−trigger software AL E D PH EL PH I L O 3 PA Ba L b AL ar IC AT E LA S C M S LH TO Cb TE M Number of publications 600 Experiment 12 Physics publications Physics publications/member Physics publications ● Number of publications ● 400 ● 300 ● ● ● 200 ● ● 100 ● Ba ba AL r IC AT E LA S C M S LH TO Cb TE M C D F PA L L3 ● O AL E D PH EL PH I 0 ● Experiment Number of publications/members 1.0 500 ● 0.8 ● ● 0.6 ● ● 0.4 ● 0.2 ● ● 0.0 H HI P E LP AL DE L3 ● ● ● L ar b M F S E S PA ab LIC LA CM HC TE CD O B A AT L O T Experiment LEP experiments completed their life-cycle LHC experiments: at an early stage of their physics production Maria Grazia Pia, INFN Genova 13 Technological publications Technological publications/member Technological publications 0.18 50 ● ● ● ● ● ● ● Ba ba AL r IC AT E LA S C M S LH TO Cb TE M C D F ● PA L 0 L3 ● O ● ● Experiment 0.10 0.08 0.06 0.04 ● ● ● 0.02 ● ● ● ● ● ● ● ● 0.00 Ba ba AL r IC AT E LA S C M S LH TO Cb TE M C D F 100 0.12 PA L 150 Software. DAQ−trigger Hardware 0.14 L3 200 Software. DAQ−trigger Hardware 0.16 O ● ● AL E D PH EL PH I Number of publications/members 250 AL E D PH EL PH I Number of publications 300 Experiment Roughly constant trends, once the number of publications is normalized to the number of collaborators Maria Grazia Pia, INFN Genova 14 Software vs. hardware Hardware/software publications DAQ−trigger/software publications 18 8 16 14 6 ● 12 Ratio 10 8 4 ● 6 ● ● ● ● 4 ● ● ● 2 ● ● ● ● 2 ● ● ● ● 0 Ba ba AL r IC AT E LA S C M S LH TO Cb TE M C D F PA L L3 ● O AL E D PH EL PH I Ba ba AL r IC AT E LA S C M S LH TO Cb TE M C D F PA L O L3 0 Experiment ● ● ● AL E D PH EL PH I Ratio ● Experiment Hardware publications: approximately 4 times more than software DAQ-trigger publications: approximately 1.3 times more than software Maria Grazia Pia, INFN Genova 15 Journals: LEP and LHC LEP Dominated by physics publications Maria Grazia Pia, INFN Genova LHC Still dominated by technological publications 16 Hardware 0 citations: 6% 100 80 Number of publications 200 0 citations: 20% 150 100 60 40 20 0 0 80 20 40 60 80 20 40 DAQ−Trigger Software 40 20 20 40 60 Citations 80 100 80 100 0 citations: 28% 50 40 30 20 10 0 0 20 ATLAS: ATLAS pixel detector electronics and sensors: 185 Maria Grazia Pia, INFN Genova 60 Citations 0 citations: 27% 0 0 Citations 60 0 50 0 100 Number of publications Citations of the most cited paper ALEPH: 340 DELPHI: 309 L3: 509 OPAL: 473 BaBar: 859 ALICE: 116 CMS: 129 LHCb: 101 TOTEM: 35 Number of publications The most cited papers are often the general reference papers about the detector published by each experiment 120 Number of publications Citations Physics 40 60 80 100 Citations 17 References 60 40 20 0 0 web sites 20 40 60 50 40 30 20 10 0 20 40 60 References DAQ−Trigger Software 15 10 5 0 60 0 100 20 0 70 References 20 40 60 References Maria Grazia Pia, INFN Genova 80 Number of publications Bibliographical entries in software papers are often Number of publications 25 more citations Hardware Number of publications More references Number of publications Physics papers cite more references than technological papers Physics 80 100 80 100 80 100 15 10 5 0 0 20 40 60 References 18 Sources of citations of physics papers Samples in plots account for >90% of citations DELPHI CMS ALEPH Phys. Rev. D Phys. Rev. D Phys. Lett. B JHEP EPJC Phys. Lett. B Nucl. Phys. B EPJC Nucl. Phys. B Proc. Suppl. Phys. Rev. Lett. Phys. Rev. Lett. Acta Phys. Pol. B JHEP Phys. Rev. C Z. Phys. C Nucl. Phys. A LEP Int. J. Mod. Phys. A Mod. Phys. Lett. A Acta Phys. Pol. B J. Phys. G Nucl. J. Phys. G Int. J. Mod. Phys. A NIM A Progr. Theor. Phys. Suppl. Phys. Rep. New J. Phys. Mod. Phys. Lett. A JINST Phys. Atom. Nucl. J. Cosm. Astrop. Phys. Nucl. Phys. A Ann. Rev. Nucl. Part. Sci. 0 5 10 15 Citations (%) ATLAS 20 25 LHC 0 5 10 15 20 25 30 Citations (%) Citations to HEP physics papers mostly come from journals specialized in HEP and a few related fields (astroparticle and nuclear physics) Maria Grazia Pia, INFN Genova 19 Sources of citations of technological papers DELPHI ALEPH NIM A EPJC Phys. Lett. B TNS Nucl. Phys. B Proc. Suppl. Z. Phys. C Comp. Phys. Comm. Nucle. Phys. B Phys. Rev. D JHEP Acta Phys. Pol. B Int. J. Mod. Phys. A Phys. Rep. Ann. Rev. Nucl. Part. Sci. Rev. Mod. Phys. Rep. Prog. Phys. CMS ATLAS NIM A TNS JINST EPJC Phys. Rev. D JHEP LEP LHC Nucl. Phys. B Proc. Suppl. Phys. Lett. B Comp.Phys. Comm. Int. J. Mod. Phys A 0 10 20 30 40 0 Citations (%) 10 20 30 40 50 60 Citations (%) Citations from HEP physics and technology journals Maria Grazia Pia, INFN Genova 20 2008-2011 More refined analysis of technological papers published since the start of LHC run TNS 2008-2011 Hardware Software NIM 2008-2011 DAQ-trigger Hardware 25 Software 50 20 40 Number of papers Number of papers 45 15 10 5 35 30 25 20 15 10 5 0 0 ATLAS CMS LHCb Maria Grazia Pia, INFN Genova ALICE TOTEM LHC ATLAS CMS LHCb ALICE TOTEM LHC 21 2008-2011 Self-citations Hardware Software 35 30 25 TNS 20 40 DAQ-trigger 15 10 5 Number of outside citations Number of self-citations 40 Outside citations 0 80 Software DAQ-trigger 35 30 25 TNS 20 15 10 5 0 ATLAS CMS LHCb ALICE TOTEM LHC Hardware Software 60 50 NIM A 40 30 20 10 0 ATLAS CMS LHCb ATLAS 80 Number of outside citations 70 Number of self-citations Hardware ALICE TOTEM LHC Maria Grazia Pia, INFN Genova CMS LHCb Hardware ALICE TOTEM LHC Software 70 60 50 NIM A 40 30 20 10 0 ATLAS CMS LHCb ALICE TOTEM LHC 22 Conclusions Software is largely underrepresented in HEP scholarly literature w.r.t. hardware Publication patterns appear similar in the LEP and LHC era Citation patterns are different for publications by HEP experiments and about general software tools Publish! …and don’t forget to cite Maria Grazia Pia, INFN Genova 23