Interazioni Proteina-Ligando “Docking“, calcolo di affinità e sviluppo di nuovi farmaci Metodi bioinformatici per lo studio delle interazioni proteina ligando • Determinazione della struttura del complesso Proteina-Ligando “Docking” • Determinare la affinità di un ligando • Cercare possibili inibitori • Disegnare nuovi farmaci The molecular docking problem Given two molecules with 3D conformations in atomic details: Do the molecules bind to each other? How does the molecule-molecule complex looks like? How strong is the binding affinity? What do we dock? The two molecules might be: A protein (enzyme, receptor) and a small molecule (substrates, ligands) A protein and a DNA molecule Two proteins Why do we dock? Drug discovery costs are too high: ~$800 millions, 8~14 years, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004) Drugs interact with their receptors in a highly specific and complementary manner. Core of the target-based structure-based drug design (SBDD) for lead generation and optimization. Lead is a compound that shows biological activity, is novel, and has the potential of being structurally modified for improved bioactivity and selectivity Rational Approach to Drug Discovery Identify target Clone gene encoding target Express target in recombinant form Crystal structures of target and target/inhibitor complexes Screen recombinant target with available inhibitors Synthesize modifications of lead compounds Identify lead compounds Synthesize modifications of lead compounds Identify lead compounds Toxicity & pharmacokinetic studies Preclinical trials Lead Optimization Active site Lead Lead Optimization Motivation for proteinprotein docking • Biological activity depends on the specific recognition of proteins. • Understand protein interaction networks in a cell • Yield insight to thermodynamics of molecular recognition • The experimental determination of protein-protein complex structures remains difficult. Three components of docking pre- and/or during docking: Representation of receptor binding site and ligand during docking: Sampling of configuration space of the ligand-receptor complex during docking and scoring: Evaluation of ligand-receptor interactions Bringing a New Drug to Market 1 compound approved Review and approval by Food & Drug Administration Phase III: Confirms effectiveness and monitors adverse reactions from long-term use in 1,000 to 5,000 patient volunteers. Phase II: Assesses effectiveness and looks for side effects in 100 to 500 patient volunteers. 5 compounds enter clinical trials Phase I: Evaluates safety and dosage in 20 to 100 healthy human volunteers. 5,000 compounds evaluated 0 2 4 6 8 Discovery and preclininal testing: Compounds are identified and evaluated in laboratory and animal studies for safety, biological activity, and formulation. 10 12 14 Years 16 DIFFERENCE BETWEEN AN INHIBITOR AND DRUG Extra requirement of a drug compared to an inhibitor •Selectivity LIPINSKI’S RULE OF FIVE •Less Toxicity Poor absorption or permeation are more likely when : •Bioavailability -There are more than five H-bond donors •Slow Clearance -The mol.wt is over 500 Da MlogP is over 4.15(or CLOG P>5) •Reach The Target -The -The sums of N’s and O’s is over 10 •Ease Of Synthesis •Low Price •Slow Or No Development Of Resistance •Stability Upon Storage As Tablet Or Solution •Pharmacokinetic Parameters •No Allergies Termodinamica dei complessi Enzima-Inibitore: E+I EI [E] [I] Kd= [EI] ∆Gbind = RT ln (Kd) = ∆Hbind - T∆Sbind < 0 • ∆H • ∆S Forza delle interazioni E-I <0 Disordine Conformazionale > 0 Free Energy calculations E I EI 〈 ΔG bind,sol 〉=〈 ΔH bind,vac 〉−TΔS bind,vac −〈 ΔG solv 〉−〈 ΔG solv 〉+〈 ΔG solv 〉 Basic principles The association of molecules is based on interactions H-bonds salt bridges hydrophobic contacts electrostatic very strong repulsive (VdW) interactions on short distances Ligands are flexible Receptors are mostly rigid Sampling of configuration space of the ligand-receptor complex Descriptor-matching: using pattern-recognizing geometric methods to match ligand and receptor site descriptors geometric, chemical, pharmacophore properties, such as distance pairs, triplet, volume, vector, hydrogenbond, hydrophobic, charged, etc. Molecular simulation: MD (molecular dynamics), MC (Monte Carlo) Others: GA (genetic algorithm), similarity, fragmentbased No “best” method Molecular simulation: MD & MC Two major components: The description of the degrees of freedom The energy evaluation The local movement of the atoms is performed Due to the forces present at each step in MD (Molecular Dynamics) Randomly in MC (Monte Carlo) Usually time consuming: Search from a starting orientation to low-energy configuration Several simulations with different starting orientation must be performed to get a statistically significant result Algoritmi genetici Se consideriamo un problema che ha una soluzione dipendente da n parametri e da k valori, una esplorazione completa richiede k n operazioni. Ma se noi sappiamo come si può evolvere il sistema (perchè abbiamo un training set o sappiamo le regole) per ricavare il risultato, sappiamo che alcuni passaggi non sono possibili o non si sono mai verificati, e sappiamo che ci sono percorsi che sono preferiti ad altri. Se l’algoritmo viene modellato per rispettare gli schemi osservati e viene calcolata per ogni passaggio una FITNESS, cioè un valore di attendibilità, posso arrivare entro un certo numero di cicli ad avere un risultato che ha una fitness ottimale per le mie aspettative posso simulare un crossing over tra due sequenze visto che so come il crossing over avviene. cromosomi posso simulare la mutagenesi visto che conosco le frequenze di mutazioni e gli eventi mutageni che accadono Coarse search step generazione valutazione della fitness N H N N N selezione del cromosoma con fitness maggiore OH N H mutazione e crossing-over nuovi cromosomi sostituiscono i precedenti O OH O Fine search step stop Genetic algorithm docking Requires the generation of an initial population where conventional MC and MD require a single starting structure in their standard implementation. The collection of genes (chromosome) is assigned a fitness based on a scoring function. There are three genetic operators: mutation operator randomly changes the value of a gene; crossover exchanges a set of genes from one parent chromosome to another; migration moves individual genes from one subpopulation to another. Come combinare la ricerca conformazionale e l'affinità • Esistono diversi algoritmi di ricerca MC, FFT , GA , etc. • Docking Rígido o Flessible • L'affinità si stima con una funzione punteggio “Scoring” che stima il G • Non sempre il risultato a miglior punteggio è quello corretto Fast Fourier Transform FFT IFFT L Correlation R FFT Surface Y Interior X Binding Site Increase the speed by 107 Uso di Griglie : FFT Ligando Proteína Programmi: AutoDock http://autodock.scripps.edu/ Programma: UCSF-Dock http://dock.compbio.ucsf.edu/ Esempio 1 Disegno e valutazione di un nuovo inibitore della chinasi C-Kit Cosa si sa di C-Kit: • Chinasi C-Kit: è un target terapeutico nel trattamento dei tumori gastrointestinali • Inibitore Imatinib: imatinib (o gleevec) è un potente inibitore di C-Kit • La mutazione di C-Kit D816V conferisce resistenza all' imatinib, pero mantiene la chinasi attiva Come funziona Imatinib? – Induce il loop di attivazione a adottare una conformazione inattiva C-Kit Inattiva Complesso con Imanitib Mutante resistente! C-Kit y C-Kit D816V con e senza Imanitib x MD D816V mutant wild-type Calcolo dell' Affinità x Imatinib di C-Kit wt e Mutante D816V H (kcal/mol) TS G Kd wild-type+imatinib 21 nM -57.69 40.70 -17.0 D816V+imatinib 12 M -59.15 54.69 -4.5 Il problema sta nel contributo entropico... Ridisegno dell' inibitore L'Introduzione di un atomo di Cloro per destabilizzare F811 e aumentare l' entropía del complesso, aumentando la affinita D816V mutant wild-type Risultati: (kcal/mol) H TS G Kd D816V+prototype 22 nM -60.17 45.29 -14.9 wild-type+prototype 47 nM -57.25 39.02 -18.2 wild-type+imatinib 21 nM -57.69 40.70 -17.0 M il prototipo 12 Imatinib-Cl, è capace di inibire la C-Kit wt e il mutante D816V, mantenendo la specificità per C-Kit. D816V+imatinib -59.15 54.69 -4.5 Three components of docking pre- and/or during docking: Representation of receptor binding site and ligand during docking: Sampling of configuration space of the ligand-receptor complex during docking and scoring: Evaluation of ligand-receptor interactions Types of scoring functions Force field based: nonbonded interaction terms as the score, sometimes in combination with solvation terms Empirical: multivariate regression methods to fit coefficients of physically motivated structural functions by using a training set of ligand-receptor complexes with measured binding affinity Knowledge-based: statistical atom pair potentials derived from structural databases as the score Other: scores and/or filters based on chemical properties, pharmacophore, contact, shape complementary Consensus scoring functions approach Force field based scoring functions lig rec E= ∑ ∑ i=1 j=1 ( Aij B ij qi q j − b +332 a Dr ij r ij r ij ) e.g. CharmM in CDocker Advantages FF terms are well studied and have some physical basis Transferable, and fast when used on a precomputed grid Disadvantages Only parts of the relevant energies, i.e., potential energies & sometimes enhanced by solvation or entropy terms Electrostatics often overestimated, leading to systematic problems in ranking complexes Empirical scoring functions ΔG=ΔG 0 +ΔG rot N rot +ΔG HB ∑ neutral Hbonds f ( ΔR,Δα ) +ΔG io ∑ ionic _int f ( ΔR,Δα ) +ΔG aro ∑ aro _int f ( ΔR,Δα ) +ΔG lipo ∑ lipo . cont f ( ΔR,Δα ) LUDI PLP LigScore Jain Counts the number of interactions and assign a score based on the number of occurrences H-bonds, ionic interactions (easy to quantify) Hydrophobic interactions (more difficult to assess and quantify) Number of rotatable bonds frozen (link to entropic cost of binding, quite difficult to estimate) Advantages: fast & direct estimation of binding affinity Knowledge-based potentials of mean force scoring functions (PMF) Assumptions An observed crystallographic complex represents the optimum placement of the ligand atoms relative to the receptor atoms Advantages Similar to empirical, but more general (much more distance data than binding energy data) Disadvantages PMF are typically pair-wise, while the probability to find atoms A and B at a distance r is non-pairwise and depends also on surrounding atoms Consensus Scoring Combination of several scoring functions The common top rankers get a higher consensus rank than single outliers False positives can be detected easier than one singular scoring function Advisable to use 2-4 well-suited scoring functions for the consensus score Take home message There is no best method! Try different methods, force-fields, scoring functions Refer to your results as a suggestion Use the experimental data ALCUNI programmi per fare docking, alcuni web server dedicati ClusPro: http://nrc.bu.edu/cluster/ Z-Dock (http://zlab.bu.edu/zdock/) può essere scaricato sul proprio computer Z-Dock su server: http://zdock.bu.edu/ http://bioinfo3d.cs.tau.ac.il/PatchDock/ http://vakser.bioinformatics.ku.edu/resources/gramm/grammx http://www.csd.abdn.ac.uk/hex_server/