Banche dati specializzate Banche dati Specializzate Le banche dati specializzate raccolgono insiemi di dati omogenei dal punto di vista tassonomico e/o funzionale disponibili nelle Banche dati Primarie e/o in Letteratura, rivisti e annotati con informazioni di valore aggiunto Banche dati Specializzate di Patterns Proteici •Data una sequenza non caratterizzata: A che famiglia appartiene? Qual è la sua funzione? “The protein signature approach” • Confrontiamo sequenze appartenenti alla stessa famiglia, cercando ‘pattern’ comuni • Costruiamo un database di profili conservati (elementi di sequenza conservati in specifiche posizioni) • Usiamo questi profili (pattern) per classificare una sequenza incognita What are protein signatures? Protein family/domain Multiple sequence alignment Build model Search UniProt Protein analysis Significant match ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK Mature model Diagnostic approaches (sequence-based) Single motif methods Regex patterns (PROSITE) Full domain alignment methods Profiles (Profile Library) HMMs (Pfam) Multiple motif methods Identity matrices (PRINTS) Patterns Sequence alignment Motif Define pattern Extract pattern sequences Build regular expression xxxxxx xxxxxx xxxxxx xxxxxx C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C Pattern signature PS00000 Banche dati Specializzate di Patterns Proteici Protein families PFAM (acronimo di Protein Families) è un database di domini di proteine descritti con modelli markoviani. E’ diviso in due sezioni: pfam-A contiene allineamenti curati da esperti; pfam-B contiene sequenze che vengono automaticamente raggruppate. Pfam InterPro Entry Groups similar signatures together AddsAdds extensive extensive annotation annotation Links Links to other to other databases databases Structural information and viewers Hierarchical classification Interpro hierarchies: Families FAMILIES can have parent/child relationships with other Families Parent/Child relationships are based on: • Comparison of protein hits child should be a subset of parent siblings should not have matches in common • Existing hierarchies in member databases • Biological knowledge of curators Interpro hierarchies: Domains DOMAINS can have parent/child relationships with other domains Domains and Families may be linked through Domain Organisation Hierarchy InterPro Entry Groups similar signatures together AddsAdds extensive extensive annotation annotation to databases other databases Links to Links other Structural information and viewers InterPro Entry Groups similar signatures together Adds extensive annotation Adds extensive annotation Links Links to other to other databases databases Structural information and viewers The Gene Ontology project provides a controlled vocabulary of terms for describing gene product characteristics InterPro Entry Groups similar signatures together Adds extensive annotation Adds extensive annotation Links Links to other to other databases databases Structural information and viewers UniProt KEGG ... Reactome ... IntAct ... UniProt taxonomy PANDIT ... MEROPS ... Pfam clans ... Pubmed InterPro Entry Groups similar signatures together Adds extensive annotation Adds extensive annotation to databases other databases Links to Links other Structural information and viewers PDB 3-D Structures SCOP Structural domains CATH Structural domain classification Searching InterPro Searching InterPro Protein family membership Domain organisation Domains, repeats & sites GO terms Searching InterPro Searching InterPro Banche dati Specializzate associate a Patterns Nucleotidici Eukaryotic Promoter Database (http://www.epd.isb-sib.ch/) Transcription Factors TRANSFAC Translation Terminations TransTERM Vector database VectorDB Repeats Database Repbase Profili strutturali CATH (http://www.cathdb.info/) SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) CATH SCOP Banche dati Specializzate di Geni Genomi Trascritti e Profili di Espressione Pathways Metabolici Mutazioni Banche dati Specializzate di Geni • COGs • Entrez Gene • RefSeq ENTREZ Gene Siti Genomici NCBI Genomes EBI Genomes TIGR (Craig Venter) Il Genoma Umano Il Genoma Umano all’NCBI Il Genoma Umano alla Celera Ensembl UCSC Genome Bioinformatics Banche dati del Trascrittoma • dbEST • UniGene • UTRdb/UTRsite Banche dati di Espressione • GEO • ArrayExpress • EPDex Banche dati di Pathways Metabolici Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/ Banche dati di Pathways Metabolici REACT_945.4 Banche dati delle Mutazioni • dbSNP • HGVBase