Theory of Combinatorics Codes on words Formal Languages Molecular Computing Formal Theory of Languages Codes THESIS On the power of classes of splicing systems Combinatorics Molecular on words Computing Dottoranda: Rosalba Zizza (XIII ciclo) Supervisori: Prof. Giancarlo Mauri Prof.ssa Clelia De Felice (Univ. di Salerno) Tom Head 1987 (Bull. of Math. Biology) “ Formal Language Theory and DNA: an analysis of the generative capacity of specific recombinant behaviors” SPLICING LINEARE CIRCOLARE Modelli non convenzionali di calcolo Una motivazione generale per lo splicing Gearchia di Chomsky RE mT CS LBA CF PDA REG DFA Splicing systems H(F1 , F2) ; C(F1 , F2 ) F1 , F2 {FIN , RE, CS, CF,REG} 1) Processo generativo del linguaggio 2) Prove di consistenza del sistema splicing LINEAR SPLICING restriction enzyme 2 restriction enzyme 1 ligase enzyme Linear splicing systems (A= finite alphabet, I A* initial language) Paun’s definition x u1u2 y, x u1 , u2 y Definitions SPA = (A, I, R) wu3u4 z R A* | A* $ A* | A* rules A* wu3 , u4 z r = u1 | u2 $ u3 | u4 R x u1 u4 z , wu3 u2 y Splicing language L(SPA) , H(F1, F2) Some known results [Head; Paun; Pixton; 1996-] • Fin H(Fin, Fin) Reg • Fin H(Fin, Reg) Re • H(Reg, Fin) = Reg Comparing the three definitions of (finite) splicing L(SH ) L(SPA ) L(SPI ) [P. Bonizzoni, C. Ferretti, G. Mauri, R.Z., Grammar Systems 2000, IPL ‘01] Problem (HEAD) Can we decide whether a regular language is generated by a finite splicing system? [P. Bonizzoni, R.Z., Tech Rep. 254-00 DSI, submitted] Theorem L regular language 0-generated L generated by finite splicing Monoide sintattico: Rappresentazione di L attraverso classi di congruenza Proprieta’ delle classi di congruenza... regole splicing CIRCULAR SPLICING restriction enzyme 2 restriction enzyme 1 ligase enzyme Circular languages: definitions and examples w, w A*, w • Conjugacy relation on A* Example abaa, baaa, aaab,aaba • A~ = A* ~ = set of all circular words • Circular language C A~ L L (A linearization of C, i.e. ~L =C) {w A*| ~w C}= Lin(C) (Full linearization of C) w=xy, w = yx are conjugate ~w = [w] , w A* ~ set of equivalence classes A* ~ w A* ~ ~L = {~w | w L} (circularization of L) C C Il nostro “approccio”... Linguaggi Circolari Regolari Linguaggi Formali chiusi sotto coniugazione Regolari Circular splicing systems (A= finite alphabet, I A~ initial language) Paun’s definition ~hu u , 1 2 u2 hu1 SCPA = (A, I, R) A~ ~ku u 3 4 R A* | A* $ A* | A* rules r = u1 | u2 $ u3 | u4 R u4ku3 Definition ~ u hu u ku 2 1 4 3 Splicing language C(SCPA) In the literature... Other models, additional hypothesis (on R) Other definitions of circular splicing (Head, Pixton) Problem 1 Structure of circular regular languages (regular languages closed under conjugacy relation) Problem 2 Characterize circular regular languages generated by finite circular splicing Circular finite splicing languages and Chomsky hierarchy CS~ CF~ Reg~ ~((aa)*b) ~(aa)* I= ~aa ~1, R={aa | 1 $ 1 | aa} ~(an bn) I= ~ab ~1, R={a | b $ b | a} Contributions [P. Bonizzoni, C. De Felice, G. Mauri, R.Z., Words99, DNA6 (2000), submitted] -Reg~ C(Fin, Fin) Fingerprint closed star languages X*, X regular group code Reg~ cyclic languages Cir (X*) X finite weak cyclic, altri esempi ~ (a*ba*)* -Comparing the three def. of circular splicing systems C(SCH ) C(SCPA ) C(SCPI ) L A* star language = L regular, closed under conjugacy relation, L=X*, with X regular Why studying star languages? Proposition Given SCPA=(A,I,R), if I ~ X*, ~ X* star language then the circular language generated by SCPA is “Consistence” easily follows!!! The unique problem is the generation of all words of the language ~ X* Proposition X* star language, X finite set ~ X* generated (by splicing) Theorem X* star language AND fingerprint closed ~X* generated (by splicing) X regular group code. For any automaton A and for any cycle c in A, c X*. (X* is fingerprint closed) The case of one-letter alphabet Each language on a* is closed under conjugacy relation Theorem L a* is CPA generated L = L 1 (aG ) + • L 1 is a finite set • n : G is a set of representatives of the elements in a subgroup G’ of Zn • max{ m | am L 1 } < n = min{ ag | ag G } = min aG Uniform languages characterization J N, L = AJ = j J Aj = {w A * | |w|=j} Complexity description / minimal splicing system Theorem L a* generated by a finite circular (Paun) system, then L is generated by ({a}, I, R) with I = L1 aG R= { an | 1 $ 1 | an } Examples • L = { a 3 , a 4 } { a 6 , a 14, a 16 }+ I={a 3 , a 4 , a 6 , a 14, a 16 } R={a6 | 1 $ 1 | a6 } • L = { a 3 , a 4 , a5 , a7 } {a8 , a9 , a10 , a12 , a13 , a14 , a15 }+ I={a3 , a4, a 5, a7, a8, a9, a10, a12 , a13 , a 14, a15 } R={a8 | 1 $ 1 | a8 } Given L a* , we CAN NOT DECIDE whether L is generated by a circular (Paun) splicing system (Rice’s theorem) Problem: Given L a* , regular , can we decide whether L is generated by a circular (Paun) splicing system? Probably YES !!! Sketch: L = { a 3 , a 4 } { a 6 , a 14, a 16 }+ G ={6, 14, 16 } G’ = {0, 2, 4} subgroup of Z6 • |Fl |=1 , Fl ={qn } a 12 a 11 • p|n: { a 3, a 4, a 6} n p Computational power of Pixton’s systems Pixton’s definition ~h , SCPI = (A, I, R) ~ h h Remind A~ R A* A* A* rules (, ; ), (, ; ) R h ~ h h C(SCH ) C(SCPA ) C(SCPI ) ~Reg Pixton recombinant process ~ ((A2)* (A3)*) ~Reg \ C(SCPI ) F Class of circular regular languages generated by Pixton • X* generated by regular group codes F • All known examples of regular splicing languages F • ~ A* \ a+ = ~(a*ba*)* (star free language) • ~(aa)*b • ~{w A* | h,k N : |w|a =2h+1, |w|b =2k+1} • ~(aa)*a, ~(ab)*a ~(ab)*b (Linear splicing) Inclusion results Characterization of regular (finite) splicing languages (Circular splicing) Fingerprint closed star languages, cyclic languages, weak cyclic languages, unary languages (CODES) Pixton systems (subclasses or regular languages) (Formal languages) Characterization of circular regular languages (Linear splicing) Problems on descriptional complexity (Circular splicing) Pixton systems Unary Languages: linear splicing vs. circular splicing