Different protein folds require different amino acid composition of their cores Davide Alocci, Andrea Bernini, Pasquale Lista, Andrea Santarelli Ottavia Spiga, Edoardo Morandi and Neri Niccolai Department of Biotechnology, Chemistry and Pharmacy University of Siena Bioinformatiha 2, Firenze 18 ottobre -15 Coring objects to discover their origin Earth Bioinformatiha 2, Firenze 18 ottobre = 6,371 -14 Coring objects to discover their origin Earth Bioinformatiha 2, Firenze 18 ottobre proteins -14 Coring objects to discover their origin Atom depth analysis Atom depth* = atom distance from molecular surface. * a rather new descriptor for complex molecular strutures proteins [Chakravarty,S. and Varadarajan,R. (1999) Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold. Des., 7, 723–732] Bioinformatiha 2, Firenze 18 ottobre -13 Coring objects to discover their origin 3D atom depth analysis S A D I C imple tom epth ndex alculator proteins Bioinformatiha 2, Firenze 18 ottobre -13 Coring objects to discover their origin 3D atom depth analysis proteins Bioinformatiha 2, Firenze 18 ottobre -12 Coring objects to discover their origin 3D atom depth analysis A0 r Depht index defined as: exposed volume sphere volume Bioinformatiha 2, Firenze 18 ottobre -12 3D atom depth analysis Bioinformatiha 2, Firenze 18 ottobre http://www.sbl.unisi.it/prococoa/ from PDB ID 1UBQ Di -11 3D atom depth analysis K63 0.19 0.30 0.25 0.23 0.50 0.68 0.91 1.11 1.29 from PDB ID 1UBQ Bioinformatiha 2, Firenze 18 ottobre N CA C O CB CG CD1 CD2 N CA C O CB CG CD OE1 OE2 0.10 0.05 0.11 0.18 0.02 0.02 0.02 0.00 Dimax 0.38 E24 0.52 0.50 0.52 0.76 0.95 Dimax 1.17 1.24 1.24 L43 http://www.sbl.unisi.it/prococoa/ Dimax N CA C O CB CG CD CE NZ -11 Dimax analysis of protein singles defining strutural layers in protein 3D structures each strutural layer includes atoms with similar Di’s fast and accurate analysis of aa content of structural layers Bioinformatiha 2, Firenze 18 ottobre -10 Dimax analysis of protein singles quite a few proteins like to stay single (at least in the crystalline state) Bioinformatiha 2, Firenze 18 ottobre -9 Dimax analysis of protein singles quite a few proteins like to stay single (at least in the crystalline state) Bioinformatiha 2, Firenze 18 ottobre -9 a database of protein singles Experimental Method: X-RAY (79,770) Chain Type: Protein (74,456) Only 1 chain in asym. unit: (28,803) Oligomeric state: 1 (21,193) Number of Entities: 1 (3,517) Homologue Removal @ 95% identity (2,410) DOOPS: 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues 18 16 14 12 10 8 6 4 2 0 1 Bioinformatiha 2, Firenze 18 ottobre 1001 2001 -8 a database of protein singles Swiss-Prot: 540,958 proteins in the dataset (192 Maa) DOOPS: 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues 18 16 14 12 10 8 6 4 2 0 01 Bioinformatiha 2, Firenze 18 ottobre 1001 1000 2000 2001 -8 Di analysis of protein singles color Ln Di L6 > 1.2 red L5 1.0 – 1.2 orange L4 0.8 – 1.0 yellow L3 0.6 – 0.8 green L2 0.4 -0.6 blue L1 0.2 - 0.4 indigo L0 < 0.2 violet 3 VTR (chitinolytic enzyme 572 aa) Bioinformatiha 2, Firenze 18 ottobre -7 Di analysis of protein cores DOOPS: 2,410 proteins; 4,657,574 atoms; 589,383 residues calculation of % amino acid content in L0 the first quantitative analysis of a large array of protein cores! core aa if Dimax < 0.2 ~20 % of total molecular volume ΣDOOPS aa(L0) = 106,088 Bioinformatiha 2, Firenze 18 ottobre aa % in L0 Alanine Cysteine Aspartate Glutamate Phenylalanine Glycine Histidine Isoleucine Lysine Leucina Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine 11.51 2.63 1.77 1.2 6.36 10.81 1.32 11.74 0.58 16.27 2.49 1.7 2.45 1.21 0.83 4.85 4.65 13.7 1.43 2.5 -6 Di analysis of protein cores DOOPS: 2,410 proteins; 4,657,574 atoms; 589,383 residues calculation of % amino acid content in L0 the first quantitative analysis of a large array of protein cores! ΣDOOPS aa(L0) = 106,088 ~20 % of total molecular volume Bioinformatiha 2, Firenze 18 ottobre aa % in L0 Alanine Cysteine Aspartate Glutamate Phenylalanine* Glycine Histidine Isoleucine Lysine Leucina Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine 11.51 2.63 1.77 1.2 6.36 10.81 1.32 11.74 0.58 16.27 2.49 1.7 2.45 1.21 0.83 4.85 4.65 13.7 1.43 2.5 -6 Di analysis of protein cores : folding clues from aa core composition? Class Homologous superfamily Domains 1 (mainly α) 5 386 875 37,038 2 (mainly β) 20 229 520 43,881 3 (α & β) 14 594 1113 90,029 4 (few sec. str.) 1 104 118 2,588 40 1313 2626 173,536 Total Bioinformatiha 2, Firenze 18 ottobre Architectures Topology -5 Di analysis of protein cores : folding clues from aa core composition? DOOPS + CATH selected Architectures with ≥ 10 PDB files 1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 total # Proteins mono ( domain ) 213 84 19 10 17 57 94 134 12 84 52 139 218 (84) (40) (17) (3) (13) (37) (73) (110) (12) (73) (44) (106) 203 Bioinformatiha 2, Firenze 18 ottobre 10 49 1,190 (8) (49) (872) -4 Towards protein folding barcodes % L0 1.10 ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL 1.50 2.10 2.30 2.40 2.60 2.80 13,28 10,32 21,46 12,74 1.20 1.25 9,26 10,05 8,43 9,32 5,5 3.10 3.20 3.30 3.40 3.60 3.90 overall 10,69 10,08 12,58 11,88 14,95 12,01 11.51 0,6 1,28 0,24 1,39 0 0,64 1,72 0,75 0 0,55 1,11 1,75 0,3 0,47 0,95 0.83 0,67 2,62 0,73 2,77 1,85 2,04 1,77 1,36 0 2,1 2,9 0,96 1,52 2,8 2,1 1.70 1,61 2,62 0,24 2,91 1,23 1,27 2,03 1,79 0 2,1 2,9 3,02 1,77 2,34 0,95 1.77 av + 2σ av - σ 3,35 2,99 5,37 0,83 22,84 2,04 1,46 4,42 0,92 2,83 2,1 1,49 1,86 1,4 3,05 2.63 0,6 1,5 0,24 1,11 1,23 1,15 1,81 1,69 0 0,46 1,56 2,15 0,99 1,4 1,33 1.21 1,48 1,44 0,73 1,52 0 1,15 1,19 1,04 0 0,91 2,59 2,41 1,08 0,93 0,67 1.20 8,05 8,72 9,76 13,85 16,05 9,92 16,2 10,82 9,17 8,78 11,81 11,35 12,64 13,08 9,91 10.81 0,79 0,56 0 2,65 1,96 0,47 2,48 1.32 12,8 11,77 12,53 11,53 7,01 11,34 11.74 1,01 1,6 2,44 1,11 0,62 0,76 12,68 9,95 10,73 8,59 6,79 13,61 10,68 10,78 13,76 8,02 17,18 12,97 13,98 33,94 16,54 11,9 14,33 14,22 15,42 13,63 16.27 0,38 0,49 0,56 0 0,09 0,62 1,36 0,55 0 0,67 0.58 23,88 18,34 22,44 11,77 1,91 0,67 0,91 0 1,11 2,62 4,17 1,71 4,99 0 2,8 2,65 3,15 1,83 2,93 2,76 2,41 2,39 3,27 1,91 2.49 6,44 6,79 2,93 4,57 4,32 7,12 7,06 6,73 15,6 7,22 4,95 6,18 6,07 4,21 6,01 6.36 1,34 2,46 3,41 2,63 3,09 3,31 3 2,78 0 3,29 2,9 1,84 2,25 1,4 1,81 2.45 3,49 4,55 3,66 5,96 3,09 5,34 5,56 5,13 2,75 2,83 5,35 4,43 4,23 6,07 5,34 4.85 2,28 4,81 4,15 7,2 5,56 3,31 5,12 4,47 0,92 3,2 5,22 4,25 4,94 5,14 5,91 4.65 1,01 1,55 0 2,77 3,7 0,38 1,63 2,78 2,75 2,19 1,52 0,66 1,26 0,47 2,1 1.43 2,62 3,69 0,24 4,57 2,47 1,27 2,69 4,38 0,92 3,29 3,12 1,58 2,32 0 2,29 2.50 12,34 9,68 9,51 7,62 9,88 16,28 12,75 13,51 11,93 14,53 12,88 11,7 16,29 19,16 15,54 84 (40) 19 (17) 10 (3) 17 (13) 213 #PDB (84) Ala 0 3,02 57 (37) 94 (73) Cys 134 (110) 12 (12) 84 (73) Leu Phe 52 (44) CATH-ADAPT alpha ribbon horseshoe trefoil 139 (106) 218 203 10 (8) 3CKC(A02) Bioinformatiha 2, Firenze 18 ottobre PDB ID 1RG8(A00) av + σ av - 2σ Di of 173,536 CATH domains 28 h, 5’ (average comp. time 1.72 s/domain) Calculations performed on 6 cores 990X CPU based computer 13.7 49 2,410 (49) Val four layer sandwich PDB ID CATH - atom depth assisted protein tomography 2IMH(A01) PDB ID PDB ID 1UZK(A01) aa % average value (av) Class Architectures Topology Homologous superfamily 1 5 386 875 2 20 229 520 3 14 594 1113 4 1 104 118 Total 40 1313 2626 -3 Dual cores for faster folding? CATH PDB ID: 2ZU4 2.40.10.10 1-197 Bioinformatiha 2, Firenze 18 ottobre 1.10.1840.10 198-306 -2 CONCLUSIONS Databanks + New tools S A D I C = New insights imple tom epth ndex alculator ? protein fold barcoding CATH – ADAPT… Bioinformatiha 2, Firenze 18 ottobre -1 Aknowledgements Davide Andrea Edoardo Andrea Ottavia Bioinformatiha 2, Firenze 18 ottobre Alocci Bernini Morandi Santarelli Spiga 0!! Docenza su temi di Bioinformatica presso UNISI Livello Laurea Triennale Scienze Biologiche Bioinformatica, docente Claudia Landi Biotecnologie Cenni di Bioinformatica nei Corsi di Biochimica, Biologia Molecolare e Genetica Livello Laurea Magistrale Scienze Chimiche Chimica delle proteine – Genomica strutturale, docente Neri Niccolai Biologia Molecolare e Cellulare Modellistica 3D di componenti cellulari Livello Dottorale INGEGNERIA INFORMATICA E DELL’AUTOMAZIONE Bioinformatics, docente Monica Bianchini , Computer and Automation Engineering Models and languages for Bioinformatics, docente Moreno Falaschi Scuola di Dottorato in INGEGNERIA E SCIENZA DELL'INFORMAZIONE in corso di realizzazione: un curriculum di Bioinformatics nella Magistrale di Computer and Automation Engineering internazionalizzato con le Universita' di Leiden e Delft (Olanda) Curriculum in Bioinformatics