L’ordito algoritmico: alcuni problemi algoritmici che hanno favorito il progresso scientifico. Alberto Policriti Dipartimento di Matematica e Informatica, Universita’ di Udine. [email protected] www.dimi.uniud.it/~policrit Di cosa parleremo • Classi di problemi (i problemi specifici richiedono un trattamento tecnico) • Problemi “significativi” (che legano l’algoritmica ad altre discipline) • Complessita’ (perche’, alla fine, e’ il vero problema dell’algoritmica) Quali problemi • Il problema della decisione (Entscheidungsproblem) • Problemi algoritmici in biologia computazionale • Una riflessione sulla nozione di complessita’ Passato Presente Futuro Le fonti principali • M. Davis “The Universal Computer: the road from Leibniz to Turing” • S. Feferman “On the light of Logic” • E. Green “Strategies for the systematic sequencing of complex genomes” • D. Knuth “Papers on the foundation of Computer Science” Il problema della decisione Trovare un algoritmo per decidere le formule se una formula della logica al prim’ordine e’ soddisfacibile. In a sense it [il problema della decisione] is the most general probem of mathematics. J. Herbrand La logica del prim’ordine Esempio: x‚y‚uxyx‚u vy ‚vv‚u Se x e y sono donne e x e’ felice con u, allora esiste v tale che y e’ felice con ve u e v sono amici Se x e y sono punti e x e’ sulla retta u, allora esiste v tale che y e’ sulla retta v e u e v sono parallele Esempio: Un algoritmo per risolvere il problema della decisione ci potrebbe dire se l’ipotesi di Riemann (ottavo problema di Hilbert) e’ vera o falsa! David Hilbert Born: 23 Jan 1862 in Königsberg, Prussia (now Kaliningrad, Russia) Died: 14 Feb 1943 in Göttingen, Germany D. Hilbert nel 1937 The Entscheidungsproblem is solved when we know a procedure that allows for any given logical expression to decide by finitely many operations its validity or satisfiability. [...] The Entscheidungsproblem must be considered the main problem of mathematical logic. Principles of Mathematical Logic D. Hilbert and W. Ackermann 1928 Hilbert sapeva porre problemi! The mathematicians present at an international conference in Paris in August 1900 inevitably wondered what the new century would bring to their subject. [...] he presented, as a challenge to the mathematicians of the twentieth century, 23 problems that seemed utterly inaccessible by the methods available at the time. The Universal Computer M. Davis In his work, Hilbert demonstrated an unusual combination of direct intuition and concern for absolute rigor. With exceptional technical power at his command, he would tackle outstanding problems, usually with a great originality of approach. The title of Hilbert’s lecture in Paris was simply, “Mathematical problems”. Deciding the undecidable: Wrestling with Hilbert’s problems S. Feferman The great importance of definite problems for the progress of mathematical science in general ... is undeniable. ... [for] as long as a branch of knowledge supplies a surplus of such problems, it maintains its vitality. ... every mathematician certainly shares ..the conviction that every mathematical problem is necessarily capable of strict resolution ... we hear within ourselves the constant cry: There is the problem, seek the solution. You can find it through pure thought... D. Hilbert The solution of three of Hilbert’s problems were to involve mathematical logic and the foundation of mathematics in an essential way; they are the ones numbered 1,2, and 10 in his list Deciding the undecidable: Wrestling with Hilbert’s problems S. Feferman 1. L’ipotesi del continuo 2. La consistenza dell’aritmetica 10. L’esistenza di un algoritmo per risolvere le equazioni diofantee Non parleremo di 1. e 2. e’ il legame con il problema della decisione Il decimo problema di Hilbert Equazioni diofantee: P(x1, ... , xk) = 0 con P polinomio a coefficienti interi Esempio: E’ possibile scrivere una equazione diofantea che ammette soluzioni intere se e solo se l’ipotesi di Riemann e’ falsa. Contrary to Hilbert’s expectations, Problem 10 was eventually solved in the negative. This was accomplished in 1970 by a young russian mathematician, Yuri Matiyasevich, who built on earlier work in 1950’s and 1960’s by the American logicians Martin Davis, Hilary Putnam, and Julia Robinson. [...] Deciding the undecidable: Wrestling with Hilbert’s problems S. Feferman Gia’ nel 1920 si sospettava che problemi come il precedente fossero indecidibili. Ma come dimostrare che non esiste un algoritmo?? La soluzione del secondo problema: il simposio di Könisberg del 1930 During the days immediately preceding Hilbert’s address, a symposium on the foundations of mathematics took place in Königsberg. [...] At the round table discussion that concluded the event, a shy young man named Kurt Gödel [...] made a quiet announcement that, to those who grasped its import, signalled a new era in foundational studies. Von Neumann got the point at once, and concluded that the jig was up, that Hilbert’s program could not succeed. The Universal Computer M. Davis Il programma di Hilbert 1. La consistenza dell’aritmetica (secondo problema di Hilbert) 2. La completezza della logica e dell’aritmetica (Gödel 1928) 3. Il problema della decisione (Entscheidungsproblem) Kurt Gödel Born: 28 April 1906 in Brünn, Austria-Hungary (now Brno, Czech Republic) Died: 14 Jan 1978 in Princeton, New Jersey, USA The crucial step in Gödel’s proof was his demonstration that the property of a natural number of being the code of a proposition provable in PM is itself expressible in PM. [...] - U says that some particular proposition is not provable in PM. - That particular proposition is none other than U itself. - Therefore, U says: “U is not provable in PM.” The Universal Computer M. Davis Gödel aveva scritto il primo compilatore e ... decretato la fine del programma di Hilbert! Cosa rimane del programma di Hilbert? Hilbert had also sought explicit calculational procedures by means of which it would always be possible to determine, given some premises and a proposed conclusion, written in the notation of what has come to be called “first-order logic”, whether Frege’s rules would enable that conclusion to be derived from those premises. The task of finding such procedures came to be known as Hilbert’s Entscheidungsproblem (literally: decision problem), The Universal Computer M. Davis C’erano risultati parziali e i granndi giovani matematici erano tutti attivi: F. P. Ramsey, W. Ackermann, P. Bernays , M. Shönfinkel e lo stesso Gödel Apparently intrigued by these developments, Newman gave a lecture course in the spring term of 1935 on the foundations of mathematics featuring Gödel’s incompleteness theorem as its climax. Attending this course, Turing learned about Hilbert’s Entscheidungsproblem. Quite apart from the incredulity of such as Hardy, after Gödel’s work it was hard to believe that there could be an algorithm such as Hilbert had wanted. Alan Turing began to think about how it could be possible to prove that no such algorithm exists. The Universal Computer M. Davis Now, if someone comes along with a proposed algorithm to settle a given decision problem in a positive way, one can check to see that it does the required work (or at least try to do so), without inquiring into the general nature of what constitutes an algorithm. But if it is to be shown that the problem is undecidable, one has to have a precise explanation of what algorithms can compute in general. Deciding the undecidable: Wrestling with Hilbert’s problems S. Feferman Alan Turing http://www.turing.org.uk/turing/ His high pitched voice already stood out above the general murmur of well-behaved junior executives grooming themselves for promotion within the Bell corporation. Then he was suddenly heard to say: "No, I'm not interested in developing a powerful brain. All I'm after is just a mediocre brain, something like the President of the American Telephone and Telegraph Company." Quoted in A Hodges, Alan Turing the Enigma of Intelligence, (London 1983) 251. [...] on the basis of Turing’s analysis of the notion of computation, it is possible to conclude that anything computable by any algorithmic process can be computed by a Turing machine. So if we can prove that some particular task can not be accomplished by a Turing machine, we can conclude that no algorithmic process can accomplish that task. That is how Turing proved that there is no algorithm for the Entscheidungsproblem. In addition, Turing showed how to produce one individual Turing machine that, all by itself, can do anything that could be done by any Turing machine whatever – a mathematical model of an all-purpose computer. The Universal Computer M. Davis Il metodo diagonale nel lavoro di Turing Now, if we think of the halting set of a Turing machine as constituting a “package” and of the code number of that machine as labeling that package, then we have exactly the typical setup for applying the diagonal method: labeled packages in which the labels are exactly the kind of thing in the packages – in this case, natural numbers. The Universal Computer M. Davis La macchina universale di Turing The universal machine also provides a model of a “stored program” computer [...] in which the machine makes no fundamental distinction between “program” and “data.” Finally, the universal machine shows how “hardware” [...] thought of as a description of the functioning of a mechanism, canbe replaced by equivalent “software” [...] “stored” on the tape of a universal machine. The Universal Computer M. Davis On computable numbers with an application to the `Entscheidungsproblem’ A. Turing Proc. of the London Mathematical Society 1937 Turing’s universal computer was a marvelous conceptual device that all by-itself could execute any algorithmic task. But could one actually build such a thing? And aside from what such a machine could accomplish “in principle,” could it be designed and constructed so as to be able to solve real world problems in an acceptable time frame, and using reasonable available resources? By the end of 1945, Turing had produced his remarkable ACE (Automatic Computing Engine) Report. One detailed comparison of the ACE Report with von Neumann's EDVAC Report, notes that whereas the latter ``is a draft and is unfinished … more important … is incomplete …'' the ACE Report ``is a complete description of a computer, right down to the logical circuit diagrams'' and even including ``a cost estimate of £11,200.'' The Universal Computer M. Davis ACE: la risposta (inglese) di Turing ad Edvac [It] is … very contrary to the line of development here, and much more in the American tradition of solving one's difficulties by means of much equipment rather than by thought. … Furthermore certain operations which we regard as more fundamental than addition and multiplication have been omitted. ---------------------------------------------Alan Turing Problemi algoritmici in biologia computazionale Astronomy began when the Babylonians mapped the heavens. Our descendants will certainly not say that biology began with today’s genome projects, but they may well recognize that a great acceleration in the accumulation of biological knowledge began in our era. To make sense of this knowledge is a challenge, and will require increased understanding of the biology of cells and organisms. But part of the challenge is simply to organise, classify and parse the immense richness of sequence data. Biological sequence analysis R. Durbin, S. Eddy, A. Krogh and G. Mitchinson Un po’ di storia • 1953: F. Crick e J. Watson scoprono la struttura a doppia elica del DNA • anni ’70: si sviluppano le tecniche per il sequenziamento di spezzoni di DNA (F. Sanger) • anni ’80: viene lanciato il progetto genoma e partono le prime sperimentazioni pilota (insieme alle prime compagnie per lo sfruttamento commerciale di queste ricerche) • anni ’90: vengono sequenziati i primi organismi (qualche M di paia di basi) • 1990: viene pubblicato BLAST • 1998: C. Venter annuncia la costituzione della compagnia privata Celera e sfida il consorzio pubblico per il sequenziaemnto del genoma umano: Celera otterra’ il risultato in 3 anni (e 300 M di $) http://www.accessexcellence.org/AB/ Human Genome Working Draft Sequence published February 15 & 16, 2001 Science and Nature Dietro la sfida: Two main shotgun-sequencing strategies. Clone-by-clone shotgun sequencing Whole-genome shotgun sequencing Programmi e algoritmi in bioinformatica [...] Yet other programs provide user-friendly viewers for inspection and editing of the resulting sequence assemblies. A particularly popular suite of programs for these various steps is Phred, Phrap and Consed,which are designed for base calling, sequence assembly and the viewing of sequence assemblies, respectively. [...] Strategies for the systematic sequencing of complex genomes Eric D. Green (21 occorrenze della parola “programs” 2 della parola “algorithms”) Programmi e algoritmi nella sfida Finally, perhaps the most essential element of any whole-genome shotgun-sequencing strategy is the availability of a robust assembly program that can accommodate the inevitably large collection of sequence reads. [...] include algorithms that account for the anticipated spatial relationship of read pairs emanating from individual subclones, which help to avoid misassemblies due to repetitive sequences. Strategies for the systematic sequencing of complex genomes Eric D. Green Com’e’ finita la sfida? L’allineamento di sequenze Among the most useful computer-based tools in modern biology are those that involve sequence alignments of proteins, since these alignements oftem provide insights into gene and protein function. There are several types of alignments: global alignments of pairs of proteins, multiple alignments of members of protein families, and alignments made diring data base searches to detect homologies. S. Henikoff and J.G.Henikoff PNAS 1992 Cos’e’ un allineamento? Input: GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGAT GTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT Output: GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGAT GT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT Algoritmi • • • • • • • Needelman-Wunsh 1970 Smith –Waterman 1981 Landau-Vishkin 1986 Wu-Manber 1992 Myers 1994 Chang-Lawler 1994 ... GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGAT GTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT G T G A T G T A G T T G A T T A G C T T A 0 1 2 3 4 5 6 7 10 11 12 1 0 1 2 3 2 1 1 1 2 3 2 2 2 1 4 3 2 5 4 3 6 5 4 7 6 5 8 9 GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGAT GT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT Altri problemi algoritmici correlati • exact-matching (un problema piu’ “vecchio” e forse meno “applicativo”, gli algoritmi per la cui soluzione si sono rivelati fondamentali) • strutture dati (non conviene rappresentare in memoria sequenze come stringhe ma come sistemi di indici per tutti i possibili suffissi della sequenza) • protein folding (un bel problema NPcompleto che ci hanno regalato i biologi) • ... Riflessioni conclusive • Il problema della decisione poteva essere difficile ma era enunciato in modo chiaro e preciso. Matematicamente “pulito”. • I problemi algoritmici in biologia computazionale non sono sempre altrettanto “puliti” (forse, piu’ sono interessanti e piu’ sono “sporchi”). • In cosa consiste veramente la complessita’ di un problema algoritmico? Complessita’: le risorse che abbiamo sono finite Mathematics and Computer Science: Coping with Finiteness Advances in our ability to compute are bringing us substantially closer to ultimate limitations D. Knuth Che risorse (computazionali) abbiamo? Universo protone 10-13 cm 40 miliardi di anni luce 125 10 (maggiore o uguale al) numero di protoni nell’universo Se assumiamo una unita’ di tempo pari al tempo necessario alla luce a viaggiare per 10-13 cm e assumiamo che l’universo sia nato 10 milioni di anni fa, il numero di unita’ di tempo trascorse e’ minore o uguale a 42 10 Che “speranze” abbiamo • • • • • snail 0.0006 miles/h man 4 miles/h US auto 55 miles/h Jet 600 miles/h Supersonic jet 1200 miles/h • • • • • man (pencil) 0.2/sec man (abacus) 1/sec calculator 4/sec computer 200.000/sec fast computer 2M/sec Grid problem: calcolare il numero di cammini da start a finish finish start Il problema e’ difficile • non ci sono metodi noti per calcolare il numero di cammini (in a reasonable amount of time) • possiamo comunque generare dei cammini random e usare un teorema di statistica che ci dice che la stima migliore e’ data dalla media dei reciproci delle probabilita’ osservate • otteniamo una stima enorme: (1.6 ± 0.3) 1024 Un problema semplice (da enunciare) e “pulito”, ma ... non possiamo contare nemmeno su una procedura esaustiva per enumerare i cammini! il problema di stabilire una (qualunque) proprieta’ dei cammini sulla griglia e’ algoritmicamente trattabile? Forse abbiamo bisogno di una teoria della complessita’ algoritmica che ci permetta di classificare questo come un problema difficile Conclusioni I problemi algoritmici costituiscono l’ossatura dell’informatica e le loro soluzioni richiedono uno sforzo (matematico) genuino e particolare I problemi algoritmici si sono rivelati essere “dietro la scena” in momenti cruciali dell’avanzamento scientifico La complessita’ ed una teoria adeguata per il suo studio e’ probabilmente la piu’ interessante delle attuali sfide algoritmiche My favorite way to describe computer science is to say that it is the study of algorithms. D.Knuth