The impact of Monte Carlo simulation: a scientometric analysis of scholarly literature Maria Grazia Pia INFN Genova, Italy Maria Grazia Pia1, Tullio Basaglia2, Zane W. Bell3, Paul V. Dressendorfer4 1INFN Sezione di Genova, Italy – 2CERN, Switzerland – 3ORNL, USA – 4IEEE, USA SNA + MC 2010 Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2010 Maria Grazia Pia, INFN Genova To write or not to write? T. Basaglia, Z. W. Bell, P. V. Dressendorfer, A. Larkin, M.G. Pia, Writing Software or Writing Scientific Articles? IEEE Trans. Nucl. Sci., vol. 55, no. 2, pp. 671-678, 2008 IEEE NSS 2007 Software publications are largely underrepresented in particle/nuclear physics literature w.r.t. hardware ones However, some software publications collect a large number of citations S. Agostinelli et al.,Geant4 - a simulation toolkit NIM A, vol. 506, pp. 250-303, 2003 2079 citations Maria Grazia Pia, INFN Genova 18 October 2010 Features of this study Thomson-Reuters, ISI Web of Knowledge Web of Science, Journal Citation Reports INFN subscription: since 1990 CERN subscription: since 1970 Journal web sites Not all publishers provide full-text search capabilities Automated analysis Whenever possible Manual scan of publication records, abstracts and full-text papers Some degree of subjectivity Sample of representative journals in research areas exploiting Monte Carlo simulation Does not cover the whole scope of physics/engineering literature Maria Grazia Pia, INFN Genova In the last 5 years (2004-2009) Not just one champion paper… 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 NIM A+B Phys Rev D PMB Nucl Phys B Phys Rev Lett APJ The number of papers mentioning Monte Carlo or simulation has been increasing in the past 50 years TNS Phys Lett B Med Phys NIM A+B Phys Rev D PMB Nucl Phys B Phys Rev Lett APJ 80000 70000 Published papers Occurrence TNS Phys Lett B Med Phys 58% TNS 45% NIM A+B papers mention “simulation” or “Monte Carlo” 60000 50000 40000 30000 20000 10000 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 Years Maria Grazia Pia, INFN Genova Years Also the total number of published papers has increased Patterns in TNS papers Fraction of published papers ~ 60% recent publications mention model* ~ 60% recent publications mention Monte Carlo OR simulation The fraction of papers mentioning these words has been growing over the last 50 years MC/simulation model* model* AND MC/simulation model NOT MC/simulation The increase appears to be associated with Monte Carlo and/or simulation 0.6 0.5 0.4 0.3 0.2 0.1 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 0.0 Year Maria Grazia Pia, INFN Genova Roughly constant fraction of papers mentioning model*, but not mentioning Monte Carlo or simulation Technological journals Fraction of published papers Monte Carlo OR simulation 0.6 0.5 0.4 NIM A+B TNS NIM A NIM B 0.3 0.2 0.1 1960-2009 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 0.0 Year Similar fraction of papers mentioning Monte Carlo or simulation in NIM and TNS until ~1990; then larger fraction in TNS, but similar trend of increasing fraction Maria Grazia Pia, INFN Genova Medical physics journals Physics journals Phys Rev D Phys Lett B 25% Phys Rev Lett APJ Nucl Phys B Med Phys PMB 45% 40% 20% Occurrence Occurrence 35% 15% 10% 30% 25% 20% 15% 10% 5% 5% 0% 0% 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 Years Years Monte Carlo or Simulation Maria Grazia Pia, INFN Genova Relative presence in literature TNS PMB Nucl Phys B Phys Rev D APJ Phys Lett B Phys Rev Lett NIM A Med Phys NIM B Monte Carlo or Simulation 45% 40% Occurrence 35% 30% 25% 20% 15% 10% 5% 0% 1960 1964 1965 1969 1970 1974 1975 1979 1980 1984 1985 1989 Years Maria Grazia Pia, INFN Genova 1990 1994 1995 1999 2000 2004 2005 2009 What contributes to the increasing popularity of Monte Carlo simulation? Analysis in progress Socio-economic factors, computing facilities, availability of general purpose codes TNS US 45000 0.6 50000 40000 0.60 45000 0.5 0.4 25000 20000 0.3 15000 0.50 35000 Income ($) 30000 40000 Fraction of papers 35000 0.40 30000 25000 0.30 20000 0.20 15000 10000 10000 0.2 5000 0.10 5000 0 0.1 0.00 1960 1963 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 0 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 Nikkei 225 TNS MC/sim Year Year Maria Grazia Pia, INFN Genova The butterfly effect, IEEE NSS 2010 Fraction of MC/simulation papers Nikkei 225 Monte Carlo codes in technological research Papers published in 2004-2009 Fraction of papers mentioning well-known Monte Carlo codes Penelope MCNP FLUKA NIM ~ 9% TNS ~ 15% EGS Geant4 GEANT 3 0% 2% 4% NIM 13407 papers Maria Grazia Pia, INFN Genova 6% 8% TNS 2630 papers Monte Carlo enables physics APS journals, 1990-2008 Phys. Rev. A Papers mentioning well-known Monte Carlo codes Phys. Rev. B Phys. Rev. E Penelope FLUKA Phys. Rev. ST Accel. MCNP EGS Phys. Rev. C Geant4 GEANT 3 Phys. Rev. Lett. Phys. Rev. D 0 100 200 300 400 500 600 Number of citations Maria Grazia Pia, INFN Genova Full-text search in http://prola.aps.org/ Cutting edge of obsolescence GEANT 3 still widely used in physics production (CDF, D0, CLEO, BES, Belle etc.) Penelope 2004-2008 MCNP Phys. Rev. C Phys. Rev. Lett. Phys. Rev. D FLUKA EGS Geant4 Mostly GEANT 3.21, GEANT 3 but also older versions 0 50 100 150 200 Number of citations 250 Not only Geant4! Penelope Geant4 and MCNP more often mentioned in technological journals, GEANT 3 appears less popular in technology research 2004-2008 MCNP FLUKA NIM TNS EGS Geant4 GEANT 3 0 Maria Grazia Pia, INFN Genova 100 200 300 400 Number of citations 500 The most cited paper The most cited paper of the whole Nuclear Science & Technology category is about a Monte Carlo code Source: Thomson-Reuters ISI Web of Knowledge Period: 1970-2010 3rd most cited CERN paper 2nd most cited INFN paper 2nd most cited physics paper in Japan (excluding Review Part. Phys.) stay tuned… still growing! Maria Grazia Pia, INFN Genova Geographical distribution Africa Oceania S America Citations to Geant4 NIM 2003 paper Asia Russia+ Europe 2003-2010 citations N America 0% 10% 20% 30% 40% 50% 60% 70% 80% Update:18 October 2010 Top 10 Country SCOTLAND Excluding BaBar CANADA NETHERLANDS SPAIN CANADA RUSSIA All SPAIN JAPAN RUSSIA ENGLAND ENGLAND FRANCE SWITZERLAND FRANCE ITALY ITALY GERMANY GERMANY USA USA 0% 0% 10% 20% Maria Grazia Pia, INFN Genova 30% 40% 50% 5% 10% 15% 20% 25% 30% 35% 40% BaBar: 231 papers All Citing institutes HARVARD UNIV UC BERKELEY UNIV MILAN OHIO STATE UNIV UNIV LIVERPOOL UNIV PADUA RAL UNIV ROMA 1 Geant4 NIM 2003 paper UNIV VALENCIA INFN 2003-2010 citations 0% Update:18 October 2010 FERMILAB UCL RIKEN RUSSIAN ACAD SCI UNIV LIVERPOOL UNIV OXFORD NASA KYOTO UNIV CHINESE ACAD SCI JINR UNIV VALENCIA UNIV TOKYO JINR UNIV VALENCIA KYOTO UNIV CERN UNIV TOKYO INFN INFN 4% 6% Maria Grazia Pia, INFN Genova 8% 15% 20% 25% Excluding BaBar and CERN UNIV ROMA 1 2% 10% Top 10 Excluding BaBar 0% 5% 10% 12% 14% 0% 2% 4% 6% 8% 10% Citing journals Geant4 NIM 2003 paper Top 10 Update:18 October 2010 EPJC Astropart Phys • Technology w/o conf. proc. • HEP w/o conf. proc. • Nuclear Physics w/o proc. • Medical Physics w/o proc. • Astroparticle Physics w/o proc. NIM B Phys Rev C Phys Med Biol Med Phys Phys Rev Lett IEEE TNS Phys Rev D Wide scope, including: NIM A 0% 5% 10% 15% 20% Main source of citations: HEP and technology journals Maria Grazia Pia, INFN Genova Phys. Rev. A/B/C/D/E Anal. Chem. Geophys. Res. Lett. Plasma Sci. Technol. Appl. Surf. Sci. Appl. Eng. Agriculture etc. HEP citations Geant4 NIM 2003 paper authors (ISI Web) Semi-automated classification: experiment identification (ISI Web) manual inspection Other Linear Collider Astroparticle BES CDF CERN non-LHC LHC 59% BaBar physics papers (2004-2008) BaBar 0% 10% 20% 30% 40% 50% Plot → using Geant4 + producing archival results + publishing + citing Maria Grazia Pia, INFN Genova BaBar experiment citations to Geant4 NIM 2003 Fluctuation or trend? 45 40 Citations to Geant4 NIM 2003 Citations 450 35 400 350 30 25 20 15 Citations 10 300 5 Experimental life-cycle 0 250 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Year 200 150 100 Scaled to end 2010 50 0 2002 2004 2006 2008 2010 Year How will publications by LHC experiments affect the picture in the next years? And other disciplines? Maria Grazia Pia, INFN Genova LHC experiments 2009-2010 Citations to Geant4 NIM 2003 NIM A 5 ATLAS 6 EPJC 1 NIM A 2 4 JHEP 1 CMS J Phys G 1 2 LHCB 2 NIM A 1 ALICE 1 NIM A Mostly by groups Missing citations 2004-2008 publications 51% NIM A+B NIM 2003 reference 59% TNS 64%% Phys. Rev. C Phys. Rev.… Cite Mention 93% 82% Phys. Rev. D 0 100 200 300 400 81% APS 54% NIM-TNS articles mentioning Geant4 cite the NIM A 2003 reference 40% Elsevier 27% TNS papers published in 2007-2008 mentioning Geant4 and 10% NIM A+B ones cite the TNS 2006 reference Maria Grazia Pia, INFN Genova Full-text search in publishers’ webs Conclusions Majority of technological literature reports use of simulation and Monte Carlo Monte Carlo plays a major role in producing physics results Use of Monte Carlo codes is increasing Physics community still heavy user of older code GEANT 3 HEP and Medical Physics researchers are the dominant users of Geant4 Significant presence of astroparticle community LHC: now running, how will the citation statistics evolve ? Many Monte Carlo users do not cite reference for code used in their papers Maria Grazia Pia, INFN Genova