Banche dati
specializzate
Banche dati Specializzate
Le banche dati specializzate raccolgono insiemi di dati
omogenei dal punto di vista tassonomico e/o funzionale
disponibili nelle Banche dati Primarie e/o in Letteratura,
rivisti e annotati con informazioni di valore aggiunto
Banche dati Specializzate di Patterns Proteici
•Data una sequenza non caratterizzata:
A che famiglia appartiene?
Qual è la sua funzione?
“The protein signature approach”
• Confrontiamo sequenze appartenenti alla
stessa famiglia, cercando ‘pattern’ comuni
• Costruiamo un database di profili conservati (elementi di sequenza
conservati in specifiche posizioni)
• Usiamo questi profili (pattern) per classificare una sequenza incognita
What are protein signatures?
Protein family/domain
Multiple sequence alignment
Build model
Search
UniProt
Protein analysis
Significant
match
ITWKGPVCGLDGKTYRNECALL
AVPRSPVCGSDDVTYANECELK
Mature model
Diagnostic approaches (sequence-based)
Single motif
methods
Regex patterns
(PROSITE)
Full domain
alignment methods
Profiles
(Profile Library)
HMMs
(Pfam)
Multiple motif
methods
Identity matrices
(PRINTS)
Patterns
Sequence alignment
Motif
Define pattern
Extract pattern sequences
Build regular
expression
xxxxxx
xxxxxx
xxxxxx
xxxxxx
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C
Pattern signature
PS00000
Banche dati Specializzate di Patterns Proteici
Protein families
PFAM (acronimo di Protein Families) è un database di domini di
proteine descritti con modelli markoviani. E’ diviso in due sezioni:
pfam-A contiene allineamenti curati da esperti; pfam-B contiene
sequenze che vengono automaticamente raggruppate.
Pfam
InterPro Entry
Groups similar signatures together
AddsAdds
extensive
extensive
annotation
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
Hierarchical classification
Interpro hierarchies: Families
FAMILIES can have parent/child relationships with other Families
Parent/Child relationships are based on:
• Comparison of protein hits
child should be a subset of parent
siblings should not have matches in common
• Existing hierarchies in member databases
• Biological knowledge of curators
Interpro hierarchies: Domains
DOMAINS can have
parent/child relationships
with other domains
Domains and Families may be linked through
Domain Organisation
Hierarchy
InterPro Entry
Groups similar signatures together
AddsAdds
extensive
extensive
annotation
annotation
to databases
other databases
Links to Links
other
Structural information and viewers
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
The Gene Ontology project provides a
controlled vocabulary of terms for
describing gene product characteristics
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
UniProt
KEGG ... Reactome ... IntAct ...
UniProt taxonomy
PANDIT ... MEROPS ... Pfam clans ...
Pubmed
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
to databases
other databases
Links to Links
other
Structural information and viewers
PDB 3-D Structures
SCOP Structural
domains
CATH Structural
domain classification
Searching InterPro
Searching InterPro
Protein family membership
Domain organisation
Domains, repeats
& sites
GO terms
Searching InterPro
Searching InterPro
Banche dati Specializzate
associate a Patterns Nucleotidici
Eukaryotic Promoter Database (http://www.epd.isb-sib.ch/)
Transcription Factors TRANSFAC
Translation Terminations TransTERM
Vector database VectorDB
Repeats Database Repbase
Profili strutturali
CATH (http://www.cathdb.info/)
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
CATH
SCOP
Banche dati Specializzate di
Geni
Genomi
Trascritti e Profili di Espressione
Pathways Metabolici
Mutazioni
Banche dati Specializzate
di Geni
•
COGs
•
Entrez Gene
•
RefSeq
ENTREZ Gene
Siti Genomici
NCBI Genomes
EBI Genomes
TIGR (Craig Venter)
Il Genoma Umano
Il Genoma Umano all’NCBI
Il Genoma Umano alla Celera
Ensembl
UCSC Genome Bioinformatics
Banche dati del Trascrittoma
•
dbEST
•
UniGene
•
UTRdb/UTRsite
Banche dati di Espressione
•
GEO
•
ArrayExpress
•
EPDex
Banche dati di
Pathways Metabolici
Kyoto Encyclopedia
of Genes and Genomes
http://www.genome.jp/kegg/
Banche dati di
Pathways Metabolici
REACT_945.4
Banche dati delle
Mutazioni
•
dbSNP
• HGVBase