Different protein folds require different
amino acid composition of their cores
Davide Alocci, Andrea Bernini, Pasquale Lista, Andrea Santarelli
Ottavia Spiga, Edoardo Morandi and Neri Niccolai
Department of Biotechnology, Chemistry and Pharmacy
University of Siena
Bioinformatiha 2, Firenze 18 ottobre
-15
Coring objects to discover their origin
Earth
Bioinformatiha 2, Firenze 18 ottobre
= 6,371
-14
Coring objects to discover their origin
Earth
Bioinformatiha 2, Firenze 18 ottobre
proteins
-14
Coring objects to discover their origin
Atom depth analysis
Atom depth* = atom distance
from molecular surface.
* a rather new descriptor for
complex molecular strutures
proteins
[Chakravarty,S. and Varadarajan,R. (1999) Residue depth: a novel parameter for
the analysis of protein structure and stability. Structure Fold. Des., 7, 723–732]
Bioinformatiha 2, Firenze 18 ottobre
-13
Coring objects to discover their origin
3D atom depth analysis
S A D I C
imple
tom
epth
ndex
alculator
proteins
Bioinformatiha 2, Firenze 18 ottobre
-13
Coring objects to discover their origin
3D atom depth analysis
proteins
Bioinformatiha 2, Firenze 18 ottobre
-12
Coring objects to discover their origin
3D atom depth analysis
A0
r
Depht index defined as:
exposed volume
sphere volume
Bioinformatiha 2, Firenze 18 ottobre
-12
3D atom depth analysis
Bioinformatiha 2, Firenze 18 ottobre
http://www.sbl.unisi.it/prococoa/
from PDB ID
1UBQ
Di
-11
3D atom depth analysis
K63
0.19
0.30
0.25
0.23
0.50
0.68
0.91
1.11
1.29
from PDB ID
1UBQ
Bioinformatiha 2, Firenze 18 ottobre
N
CA
C
O
CB
CG
CD1
CD2
N
CA
C
O
CB
CG
CD
OE1
OE2
0.10
0.05
0.11
0.18
0.02
0.02
0.02
0.00
Dimax
0.38
E24
0.52
0.50
0.52
0.76
0.95 Dimax
1.17
1.24
1.24
L43
http://www.sbl.unisi.it/prococoa/
Dimax
N
CA
C
O
CB
CG
CD
CE
NZ
-11
Dimax analysis of protein singles
defining strutural layers
in protein 3D structures
each strutural layer includes
atoms with similar Di’s
fast and accurate analysis of
aa content of structural layers
Bioinformatiha 2, Firenze 18 ottobre
-10
Dimax analysis of protein singles
quite a few proteins like to stay single
(at least in the crystalline state)
Bioinformatiha 2, Firenze 18 ottobre
-9
Dimax analysis of protein singles
quite a few proteins like to stay single
(at least in the crystalline state)
Bioinformatiha 2, Firenze 18 ottobre
-9
a database of protein singles
Experimental Method: X-RAY (79,770)
Chain Type: Protein (74,456)
Only 1 chain in asym. unit: (28,803)
Oligomeric state: 1 (21,193)
Number of Entities: 1 (3,517)
Homologue Removal @ 95% identity (2,410)
DOOPS: 2,410 proteins in the dataset
4,657,574 atoms
589,383 residues
18
16
14
12
10
8
6
4
2
0
1
Bioinformatiha 2, Firenze 18 ottobre
1001
2001
-8
a database of protein singles
Swiss-Prot: 540,958 proteins in the dataset (192 Maa)
DOOPS: 2,410 proteins in the dataset
4,657,574 atoms
589,383 residues
18
16
14
12
10
8
6
4
2
0
01
Bioinformatiha 2, Firenze 18 ottobre
1001
1000
2000
2001
-8
Di analysis of protein singles
color
Ln
Di
L6
> 1.2
red
L5
1.0 – 1.2
orange
L4
0.8 – 1.0
yellow
L3
0.6 – 0.8
green
L2
0.4 -0.6
blue
L1
0.2 - 0.4
indigo
L0
< 0.2
violet
3 VTR (chitinolytic enzyme 572 aa)
Bioinformatiha 2, Firenze 18 ottobre
-7
Di analysis of protein cores
DOOPS: 2,410 proteins; 4,657,574 atoms; 589,383 residues
calculation of % amino acid content in L0
the first quantitative analysis of a large array of protein cores!
core aa if Dimax < 0.2
~20 % of total
molecular volume
ΣDOOPS aa(L0) = 106,088
Bioinformatiha 2, Firenze 18 ottobre
aa
% in L0
Alanine
Cysteine
Aspartate
Glutamate
Phenylalanine
Glycine
Histidine
Isoleucine
Lysine
Leucina
Methionine
Asparagine
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine
11.51
2.63
1.77
1.2
6.36
10.81
1.32
11.74
0.58
16.27
2.49
1.7
2.45
1.21
0.83
4.85
4.65
13.7
1.43
2.5
-6
Di analysis of protein cores
DOOPS: 2,410 proteins; 4,657,574 atoms; 589,383 residues
calculation of % amino acid content in L0
the first quantitative analysis of a large array of protein cores!
ΣDOOPS aa(L0) = 106,088
~20 % of total
molecular volume
Bioinformatiha 2, Firenze 18 ottobre
aa
% in L0
Alanine
Cysteine
Aspartate
Glutamate
Phenylalanine*
Glycine
Histidine
Isoleucine
Lysine
Leucina
Methionine
Asparagine
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine
11.51
2.63
1.77
1.2
6.36
10.81
1.32
11.74
0.58
16.27
2.49
1.7
2.45
1.21
0.83
4.85
4.65
13.7
1.43
2.5
-6
Di analysis of protein cores :
folding clues from aa core composition?
Class
Homologous
superfamily
Domains
1
(mainly α)
5
386
875
37,038
2
(mainly β)
20
229
520
43,881
3
(α & β)
14
594
1113
90,029
4
(few sec. str.)
1
104
118
2,588
40
1313
2626
173,536
Total
Bioinformatiha 2, Firenze 18 ottobre
Architectures Topology
-5
Di analysis of protein cores :
folding clues from aa core composition?
DOOPS + CATH
selected Architectures
with ≥ 10 PDB files
1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 total
#
Proteins
mono
( domain
)
213 84 19 10 17 57 94 134 12 84 52 139 218
(84) (40) (17) (3) (13) (37) (73) (110) (12) (73) (44) (106) 203
Bioinformatiha 2, Firenze 18 ottobre
10 49 1,190
(8) (49) (872)
-4
Towards protein folding barcodes
% L0 1.10
ALA
ARG
ASN
ASP
CYS
GLN
GLU
GLY
HIS
ILE
LEU
LYS
MET
PHE
PRO
SER
THR
TRP
TYR
VAL
1.50
2.10
2.30
2.40
2.60
2.80
13,28 10,32 21,46 12,74
1.20
1.25
9,26
10,05
8,43
9,32
5,5
3.10
3.20
3.30
3.40
3.60
3.90
overall
10,69 10,08 12,58 11,88 14,95 12,01 11.51
0,6
1,28
0,24
1,39
0
0,64
1,72
0,75
0
0,55
1,11
1,75
0,3
0,47
0,95
0.83
0,67
2,62
0,73
2,77
1,85
2,04
1,77
1,36
0
2,1
2,9
0,96
1,52
2,8
2,1
1.70
1,61
2,62
0,24
2,91
1,23
1,27
2,03
1,79
0
2,1
2,9
3,02
1,77
2,34
0,95
1.77
av + 2σ
av - σ
3,35
2,99
5,37
0,83
22,84
2,04
1,46
4,42
0,92
2,83
2,1
1,49
1,86
1,4
3,05
2.63
0,6
1,5
0,24
1,11
1,23
1,15
1,81
1,69
0
0,46
1,56
2,15
0,99
1,4
1,33
1.21
1,48
1,44
0,73
1,52
0
1,15
1,19
1,04
0
0,91
2,59
2,41
1,08
0,93
0,67
1.20
8,05
8,72
9,76
13,85 16,05
9,92
16,2
10,82
9,17
8,78
11,81 11,35 12,64 13,08
9,91
10.81
0,79
0,56
0
2,65
1,96
0,47
2,48
1.32
12,8
11,77 12,53 11,53
7,01
11,34 11.74
1,01
1,6
2,44
1,11
0,62
0,76
12,68
9,95
10,73
8,59
6,79
13,61 10,68 10,78 13,76
8,02
17,18 12,97 13,98 33,94 16,54
11,9
14,33 14,22 15,42 13,63 16.27
0,38
0,49
0,56
0
0,09
0,62
1,36
0,55
0
0,67
0.58
23,88 18,34 22,44 11,77
1,91
0,67
0,91
0
1,11
2,62
4,17
1,71
4,99
0
2,8
2,65
3,15
1,83
2,93
2,76
2,41
2,39
3,27
1,91
2.49
6,44
6,79
2,93
4,57
4,32
7,12
7,06
6,73
15,6
7,22
4,95
6,18
6,07
4,21
6,01
6.36
1,34
2,46
3,41
2,63
3,09
3,31
3
2,78
0
3,29
2,9
1,84
2,25
1,4
1,81
2.45
3,49
4,55
3,66
5,96
3,09
5,34
5,56
5,13
2,75
2,83
5,35
4,43
4,23
6,07
5,34
4.85
2,28
4,81
4,15
7,2
5,56
3,31
5,12
4,47
0,92
3,2
5,22
4,25
4,94
5,14
5,91
4.65
1,01
1,55
0
2,77
3,7
0,38
1,63
2,78
2,75
2,19
1,52
0,66
1,26
0,47
2,1
1.43
2,62
3,69
0,24
4,57
2,47
1,27
2,69
4,38
0,92
3,29
3,12
1,58
2,32
0
2,29
2.50
12,34
9,68
9,51
7,62
9,88
16,28 12,75 13,51 11,93 14,53 12,88
11,7
16,29 19,16 15,54
84
(40)
19
(17)
10
(3)
17
(13)
213
#PDB
(84)
Ala
0
3,02
57
(37)
94
(73)
Cys
134
(110)
12
(12)
84
(73)
Leu
Phe
52
(44)
CATH-ADAPT
alpha ribbon
horseshoe
trefoil
139
(106)
218
203
10
(8)
3CKC(A02)
Bioinformatiha 2, Firenze 18 ottobre
PDB ID
1RG8(A00)
av + σ
av - 2σ
Di of 173,536 CATH domains
28 h, 5’ (average comp. time 1.72 s/domain)
Calculations performed on
6 cores 990X CPU based computer
13.7
49
2,410
(49)
Val
four layer
sandwich
PDB ID
CATH - atom
depth assisted protein tomography
2IMH(A01)
PDB ID
PDB ID
1UZK(A01)
aa % average value (av)
Class Architectures Topology
Homologous
superfamily
1
5
386
875
2
20
229
520
3
14
594
1113
4
1
104
118
Total
40
1313
2626
-3
Dual cores for faster folding?
CATH
PDB ID:
2ZU4
2.40.10.10
1-197
Bioinformatiha 2, Firenze 18 ottobre
1.10.1840.10
198-306
-2
CONCLUSIONS
Databanks
+
New tools S A D I C
=
New insights
imple
tom
epth
ndex
alculator
?
protein fold barcoding
CATH – ADAPT…
Bioinformatiha 2, Firenze 18 ottobre
-1
Aknowledgements
Davide
Andrea
Edoardo
Andrea
Ottavia
Bioinformatiha 2, Firenze 18 ottobre
Alocci
Bernini
Morandi
Santarelli
Spiga
0!!
Docenza su temi di Bioinformatica presso UNISI
Livello Laurea Triennale
Scienze Biologiche
Bioinformatica, docente Claudia Landi
Biotecnologie
Cenni di Bioinformatica nei Corsi di Biochimica, Biologia Molecolare e Genetica
Livello Laurea Magistrale
Scienze Chimiche
Chimica delle proteine – Genomica strutturale, docente Neri Niccolai
Biologia Molecolare e Cellulare
Modellistica 3D di componenti cellulari
Livello Dottorale
INGEGNERIA INFORMATICA E DELL’AUTOMAZIONE
Bioinformatics, docente Monica Bianchini
,
Computer and Automation Engineering
Models and languages for Bioinformatics, docente Moreno Falaschi
Scuola di Dottorato in INGEGNERIA E SCIENZA
DELL'INFORMAZIONE
in corso di realizzazione: un curriculum di Bioinformatics
nella Magistrale di Computer and Automation Engineering
internazionalizzato con le Universita' di Leiden e Delft (Olanda)
Curriculum in Bioinformatics
Scarica

Diapositiva 1 - SienaBioGrafiX