SEARCHING BIOTECHNOLOGY INFORMATION IN THE 2010s: Section II (Databases & Search Strategies) Luca Falciola, IP Manager, Promethera Biosciences Sardegna Ricerche-Univ. Cagliari (Sep. 15th 2014) Databases & Biotechnology : A foreword Covering even a limited number of databases is already pretty impossible and a selection is required according to a few criteria Free access (at most, requiring the registration at a website using a user/password and an e-mail to get full access to services ; at this scoep better using a separate, specific free e-mail on the Internet to be used only at this scopes and for receiving Table of Contents, updates etc.) Overall positive reputation , importance , and good « search experience » even for occasional user This selection can be easily expanded for specific objectives by Searching the structured repertory in Nucl. Acid Res . website and the yearly update Combining search topic in Google and/or Pubmed Exploring he NCBI and EBI websites 2 Sardegna Ricerche L. Falciola 15/09/2014 DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature - Pubmed HighWire - Publishers’ website 2 Patent Literature 3 Chemical Structure & Biological Sequences 4 Metabases Sardegna Ricerche L. Falciola 15/09/2014 Scientific Literature: Introduction The databases of scientific literature are several, mostly thematic, and Pubmed has a major role in life sciences More than 1,2 entries for year 2013, for a total of more than 25 millions Considered by many as the most complete database This leadership should not forget other resources that, at different levels, may be competitive for identifying relevant literature Commercial ones (EMBASE, SCISEARCH,SCI VERSE, BIOSIS, etc.) Databases covering a large panel of publishers for promoting the purchase of articles that provide full-text search features or other advances search / push services The full-text Vs indexing/completeness comparison is actually a main topic 4 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Introduction Pubmed offers almost everything you need with exception of full-text search A well organized help page including links to Youtube and other tutorials Access to the large panel of services of NCBI as summarized in this guide and this NAR paper Sign-in page for accessing a even larger panel of features Guides to other literature databases , NCBI digital library, and the MeSH system 5 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Advanced Search and Search History Both features are un the same page that can be maintained or even saved 6 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Field Search A large number of fields is available for text or numeric searches 7 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: MeSH Examples (antibodies) 8 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: MeSH tutorial Some Pubmed tutorial are too complex and some university provide simplified versions like for MeSH and insisting in pursuing a sequential structured approach to identify the more relevant MeSH terms Not forgetting that MeSH are not always present and are relevant for extract a more relevant subset of references to explore with a series of related criteria 9 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Search Operators A large number of operators /symbols expand the possibilities well beyond AND OR NOT (and truncation, double quotes are essential for pursuing precise but not too extensive searches) The search can be also improved by the large selection of filters in left sidebar 10 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Heterogeneity An important issue is that Pubmed is intended to provide publications as soon to users, explaining some heterogeneity in indexing and access to articles 11 Sardegna Ricerche L. Falciola 15/09/2014 Pubmed: Some tricks Substitute “” with a – between two words in a phrase The use of truncation shows how many spelling errors are present in the database that may make you miss some relevant hits Sedn Pubmed reference by e-mail just by indicating the PMID after http://www.ncbi.nlm.nih.gov/pubmed/ eg http://www.ncbi.nlm.nih.gov/pubmed/25031662,25000062 The « Related » references can be saved in the search history and combined with keywords to search within them Search History is limited in time and length (better not exceeding 50-80 entries) 12 Sardegna Ricerche L. Falciola 15/09/2014 Highwire Press: Overview Large literature life science database hosted by Stanford Univ. aggregating journals from many major publishers but also books and conference abstracts, also as full text and with some useful filters 13 Sardegna Ricerche L. Falciola 15/09/2014 Highwire: Help Page 14 Sardegna Ricerche L. Falciola 15/09/2014 Highwire: Search Results & History 15 Sardegna Ricerche L. Falciola 15/09/2014 Highwire: Services Preview of keywords in the context, alerting for new articles including a given citation or keywords, alternative viewing features, links to supplementary/ free documents and management of ToC are well implemented 16 Sardegna Ricerche L. Falciola 15/09/2014 Publishers’ Website: Introduction All main publishers with a large panel of journals have nice feature to keep track of new articles or searching heir publications Nature, Science, Wiley, Springer Scienedirect of Elsevier is particularly rich of functions and has a broad coverage (even of journals not indexed in Pubmed 17 Sardegna Ricerche L. Falciola 15/09/2014 Publishers’ Website: Other Examples Wiley 18 Sardegna Ricerche L. Falciola 15/09/2014 DATABASES FOR BIOTECHNOLOGY INFORMATION 1 2 Scientific Literature Patent Literature - Lens - Espacenet - Patentscope 3 Chemical Structure & Biological Sequences 4 Metabases Sardegna Ricerche L. Falciola 15/09/2014 Patent Literature: Introduction Patent information that may be relevant for a biotech search is available in a variety of formats: Text-based Biological sequences Chemical structures Regular review of patent publications can be performed by using appropriately three types of tools : Multi-Patent offices websites (Patentscope, Espacenet, Lens) Patent office-specific tools (at USPTO, EPO, Australian, Indian, etc.) but in general poorly implemented outside basic number or proceedings Access for sequence- or structure-based searches (Lens, EBI) Each approach and tool has own strengths/weakness: Need to compare/double-check Access to PDF and identification of keyword context 20 Sardegna Ricerche L. Falciola 15/09/2014 Patent Literature: Overview Main strengths: Patentscope and Lens: full text/stemmed/nested searches, large number of criteria, login for saving search strategies, graphical/automated grouping of results Patentscope and Espacenet: machine-based translation Lens: somehow easier to use for both searching and getting/sending links to PDF files, nice support section, possible to search only granted patents, nice sorting/filtering functions, claims and abstract on the same page Espacenet: Cooperative Patent Classification & citing/cited documents features for(non-) EP appl. , link to EPO register, links to (often) reliable patent family & Inpadoc/status information Main weaknesses: Patentscope: unstability in case of long search session, IPC only, no clear patent family information Lens: format inconsistency for code/number fields, coverage and patent family definition, with functions appearing and disappearing (now providing IPC and USPC) Espacenet: somehow old-style for both searching documents and getting PDF files In general: No visibility on actual coverage for all collections Limited means to identify keyword context 21 Sardegna Ricerche L. Falciola 15/09/2014 Lens : Search window 22 Sardegna Ricerche L. Falciola 15/09/2014 Lens: Search Window 23 Sardegna Ricerche L. Falciola 15/09/2014 Lens: Search Window 24 Sardegna Ricerche L. Falciola 15/09/2014 Lens: Filtering Features 25 Sardegna Ricerche L. Falciola 15/09/2014 Lens: Help Page 26 Sardegna Ricerche L. Falciola 15/09/2014 Espacenet: Search Window and Criteria 27 Sardegna Ricerche L. Falciola 15/09/2014 Espacenet: Patent Kind Codes & Help 28 Sardegna Ricerche L. Falciola 15/09/2014 Espacenet: CPC Classification 29 Sardegna Ricerche L. Falciola 15/09/2014 Espacenet: Results & Record View 30 Sardegna Ricerche L. Falciola 15/09/2014 Patentscope: Search Window & Results 31 Sardegna Ricerche L. Falciola 15/09/2014 Patentscope: Record & Records Analysis 32 Sardegna Ricerche L. Falciola 15/09/2014 DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature 2 Patent Literature 3 4 Chemical Structure & Biological Sequences -Uniprot - EBI-Fasta - ChEMBL/Pubchem Metabases Sardegna Ricerche L. Falciola 15/09/2014 Uniprot: Overview & Search Criteria 34 Sardegna Ricerche L. Falciola 15/09/2014 Uniprot: Overview & Search Criteria 35 Sardegna Ricerche L. Falciola 15/09/2014 Uniprot: HBB in Genecards Vs Uniprot 36 Sardegna Ricerche L. Falciola 15/09/2014 EBI-Fasta: Search Window 37 Sardegna Ricerche L. Falciola 15/09/2014 EBI-Fasta: Overview of Results 38 Sardegna Ricerche L. Falciola 15/09/2014 EBI-Fasta: Patent Sequence Record 39 Sardegna Ricerche L. Falciola 15/09/2014 ChEMBL: Introduction Medicinal chemistry data/products is now more accessible also to nonspecialist through portals such as EBI/ChEMBL, PubChem, or Drugbank portals that aggregate and make them searchable through different criteria, across biological/medical/patent information together with chemical information from proprietary repositories) for creating Molecular Clouds (Ertl and Rohde, J Cheminf 2012) 40 Sardegna Ricerche L. Falciola 15/09/2014 ChEMBL: Features 41 Sardegna Ricerche L. Falciola 15/09/2014 ChEMBL: Search & Browse Features 42 Sardegna Ricerche L. Falciola 15/09/2014 ChEMBL: Targets, Ligands & Drug Approvals 43 Sardegna Ricerche L. Falciola 15/09/2014 DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature 2 Patent Literature 3 Chemical Structure & Biological Sequences 4 Metabases - Google Sardegna Ricerche - Google Scholar L. Falciola - Drugbank 15/09/2014 Google: Advanced Search & GoogleGuide 45 Sardegna Ricerche L. Falciola 15/09/2014 Google Scholar: Introduction This site claims having broad coverage of both scientific and patent literature but it is actually unclear the coverage: beyond US patent documents and by which date (they index papers and not journals of which publishers The system has some additional useful features compared to “pure“ Google Separate advanced search features Management of alerts through own Gmail account Import features for reference management systems (but not always precise) Selection of publication date instead of appearance on the web (but again not always precise) Clear link to PDF on the left side of the window Citation list (that can be searched separately) and “related articles” features Metrics / search by journal Focused help page with advis on how getting your paper indexed 46 Sardegna Ricerche L. Falciola 15/09/2014 Google Scholar: Advanced Search Features 47 Sardegna Ricerche L. Falciola 15/09/2014 Google Scholar: Settings and Metrics Features 48 Sardegna Ricerche L. Falciola 15/09/2014 Google Scholar: Final Comments Google Scholar provides means for overcoming only some limitations of “pure” Google Lack of visibility about publication/journal coverage Unstructured search features within documents Lack of indexing It is an interesting tool for exploratory searches or completing searches made in “traditional” databases Exploiting full-text and advanced search features in a more structured environment Linking articles to combinations of specific technical details, cross-references, authors Obtaining additional search criteria to be used elsewhere 49 Sardegna Ricerche L. Falciola 15/09/2014 DrugBank: Introduction 50 Sardegna Ricerche L. Falciola 15/09/2014 DrugBank: Results 51 Sardegna Ricerche L. Falciola 15/09/2014 DrugBank: Records 52 Sardegna Ricerche L. Falciola 15/09/2014 Thank you !! [email protected] The views and the opinions expressed in this presentation are the author’s personal thoughts on these subjects. They are not intended to be considered opinions and positions of Promethera, nor imply any commitment by Promethera to any particular action. 53