WP4 Analysis of non-EBRCN databases and network services of interest to BRCs Current status Paolo Romano EBRCN General Meeting, Paris, 28-29/11/2002 1 WP4: databases of interest Short delay: 1 month ca. · Definition of a list of databases and services that could be of interest to BRCs done · Selection of a subsets of those databases and services done EBRCN General Meeting, Paris, 28-29/11/2002 2 WP4: identifiers and methods · Selection of information of interest to BRCs within selected databases ongoing, done for Medline & EMBL · Analysis of identifiers and information and of methods for linking ongoing, done for Medline EBRCN General Meeting, Paris, 28-29/11/2002 3 WP4: Pubmed IDs · CABRI catalogue production guidelines update ongoing, done for Literature in animal and human cells · Retrieval of needed PUBMED IDs for linking ongoing, done for ICLC, BCCM/LMBP, NCCB plasmids, support from DSMZ (Kracht) and BCCM (Guissart) EBRCN General Meeting, Paris, 28-29/11/2002 4 WP4: structure and syntax · Catalogue structures update ongoing, done for Literature in animal and human cells · SRS structure and syntax files ongoing, depending on deadlines for submission of catalogues, done for ICLC EBRCN General Meeting, Paris, 28-29/11/2002 5 WP4: catalogues updates Catalogues updates: done ICLC: November 2002 Plasmids and cell lines: January 2003 “Other catalogues”: February 2003 Bacteria: March 2003 Fungi and Yeasts: May 2003 EBRCN General Meeting, Paris, 28-29/11/2002 6 WP4: EMBL links • EMBL Data Library is the European database for DNA sequences • It is updated daily and a coordination with NCBI and DDBJ ensures its completeness • It is offered at EBI by means of SRS EBRCN General Meeting, Paris, 28-29/11/2002 7 WP4: EMBL links • Test have been conducted to identify how to link to EMBL Data Library through SRS, without IDs • Tests performed on: • • • • • Bacteria and Archaea Animal and Human Cell Lines Fungi and Yeasts Plasmids Viruses EBRCN General Meeting, Paris, 28-29/11/2002 8 WP4: EMBL links variability • Links are different for different materials • Links can use various EMBL fields: • • • • All-text (not very useful) Organism (for micro-organisms) Division (useful for viruses and plasmids) Feature Table data (allow for a correct definition of a source through Key, Qualifier, Description) EBRCN General Meeting, Paris, 28-29/11/2002 9 WP4: EMBL links variability • Example search: CBS 100.20 in CBS_FIL • Fields and values: • • • • Organism: fungi Ft-Key: source Ft-Qualifier: strain Ft-Description: "cbs 100.20" EBRCN General Meeting, Paris, 28-29/11/2002 10 WP4: EMBL links variability • Annotation problems: • CBS 100.20 can be annotated as CBS 100.20 or CBS100.20 • CBS 112345 can be annotated as CBS12345 • Indexing problems: • CBS 100.20 is indexed as CBS, 100 and 20 • The dot is not included and is used as a space EBRCN General Meeting, Paris, 28-29/11/2002 11 WP4: EMBL links variability Examples of searches: • Query: Bacteria & source & cip* ( ([emblrelease-FtKey:source] & [emblrelease-FtQualifier:strain] & [emblrelease-FtDescription:cip*]) < [emblrelease-Organism:bacteria*] ) • Query: Cell line & source & dsm* ( ([emblrelease-FtKey:source] & [emblrelease-FtQualifier:cell_line] & [emblrelease-FtDescription:dsm*]) < [emblrelease-Organism:mammalia*] ) EBRCN General Meeting, Paris, 28-29/11/2002 12 WP4: EMBL links variability Examples of search: • Query: Bacteria & source & cbs 100.20 ( ( ([emblrelease-FtKey:source] & [emblrelease-FtQualifier:strain] & ( ( [emblrelease-FtDescription:cbs] & [emblrelease-FtDescription:100] ) | [emblrelease-FtDescription:cbs100] ) & [emblrelease-FtDescription:20]) ) < [emblrelease-Organism:fungi*] ) EBRCN General Meeting, Paris, 28-29/11/2002 13 WP4: extracted databases Extracted databases • Selection of a meaningful subset of information (strain identification) for each material, including links to external dbs/services ongoing, proposal sent to collections next month EBRCN General Meeting, Paris, 28-29/11/2002 14