Theoretical Computer Science 340 (2005) 179 – 185 www.elsevier.com/locate/tcs
Preface A good friend, a good companion, a brilliant research partner, an enemy of any redundance and useless diversions, restless in tackling scientific problems, an enthusiastic follower of research by his colleagues, a person that people around him can rely on. These are only some qualities of Antonio Restivo that we (the editors, with very different backgrounds) have come up with. These qualities supported him throughout his research career that began when he was a young theoretical physicist at the “Istituto di Cibernetica” (IC), Arco Felice, Naples, where he provided for his own Copernican revolution by moving to Theoretical Computer Science. Fundamental for this decision was meeting the research team of the Istituto of Cibernetica, directed by E.R. Caianiello, and especially the meeting with Professor M.P. Schützenberger—Antonio was very impressed by the way that they approached research problems and by the respect that they had for competence and hard work. The following lines from Divina Commedia by Dante, an author often quoted by Professor Schützenberger, have become a sort of common inheritance of the research team of IC, since they can be seen as a link between scientific discoveries and the truth. Vie più che ’ndarno da riva si parte, perché non torna tal qual è si move, chi pesca per lo vero e non ha l’arte. 1 (Dante Alighieri, Divina Commedia, Paradiso, Canto XIII, Lines 121–123) The art of research is the ability of going straight into the heart of problems, whenever you believe that this is the right direction, even if going this way implies a high cost and even if it means going against the general opinion. But, it is also a “complementary” ability of detecting and hence avoiding false avenues of research. These abilities as well as the ability of asking the right questions in order to generate new ones, have led him to many relationships of genuine scientific cooperation with researchers all over the world, in a sort of successful globalization of scientific research. This special issue is a testimony to the positive influence that Antonio Restivo had and still has on his fellow researchers. Many papers in this issue contain clear traces of his influence.
1 It is even worse than vain to go off shore for one who wants to find the truth but hasn’t got the art, since he’ll come back not as he was before.
0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.019
180
Preface / Theoretical Computer Science 340 (2005) 179 – 185
His influence extends beyond individual researchers into whole research institutes. Some of them are particularly dear to Antonio’s heart—these are places where he worked or currently works, and where some of his dearest friends and students are still working: Palermo, Napoli, Paris—just to mention some of them. These are the places where he spent years of his own formation, and with which he still cooperates sharing his vast knowledge and the enthusiasm for research. We conclude by quoting Leonardo da Vinci’s words—this is a statement that Antonio wanted to be included on the website for students in a lecture in computer science at Palermo University; he thinks that this quotation encapsulates the fundamental meaning of research. Quelli che s’innamoran di pratica senza scienza, son come ’l nocchiere, ch’entra in naviglio senza timone o bussola, che mai ha certezza dove si vada. 2 (Leonardo da Vinci) To Antonio Restivo, a master of Theoretical Informatics, on the occasion of his 60th birthday. A. de Luca, F. Mignosi, D. Perrin, G. Rozenberg Leiden University, Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, Leiden 2333 CA, Netherlands E-mail address:
[email protected]
Publications by Antonio Restivo [1] R. Ascoli, G. Epifanio, A. Restivo, On the mathematical description of quantized fields, Comm. Math. Phys. 18 (1970) 291–300. [2] A. Restivo, Codes and aperiodic languages, in: K.-H. Böhling, K. Indermark (Eds.), Automatentheorie und Formale Sprachen, Lecture Notes in Computer Science, Vol. 2, Springer, Berlin, 1973, pp. 175–181. [3] R. Ascoli, G. Epifanio, A. Restivo, *-Algebrae of unbounded operators in scalarproduct spaces, Riv. Mat. Univ. Parma, 3 (1974) 1–12. [4] A. Restivo, S. Termini, An algorithm for deciding whether a strictly locally testable submonoid is free, Cahiers Math. Université de Montpellier, Vol. 3, 1974, pp. 299–303. [5] A. Restivo, On a question of McNaughton and Papert, Inform. Control 25(1) (1974) 93–101. [6] A. Restivo, A combinatorial property of codes having finite synchronization delay, Theoret. Comput. Sci. 1(2) (1975) 95–101.
2 Those who fall in love with Practice without Theory are like the seaman on a boat without a steering wheel or a compass, who is never sure where he’ll land.
Preface / Theoretical Computer Science 340 (2005) 179 – 185
181
[7] A. Restivo, A characterization of bounded regular sets, in: H. Barkhage (Ed.), Automata Theory and Formal Languages, Lecture Notes in Computer Science, Vol. 33, Springer, Berlin, 1975, pp. 239–244. [8] A. Restivo, S. Termini, On a family of rational languages, in: E. Caianiello (Ed.), New Concepts and Technologies in Parallel Information Processing, Nato Advanced Study Institutes Series, Series E, Noordhoff, Leyden, 1975, pp. 349–357. [9] A. Restivo, On a family of codes related to factorization of cyclotomic polynomials, in: S. Michaelson, R. Milner (Eds.), ICALP, Edinburgh University Press, 1976, pp. 38–44. [10] L. Boasson, A. Restivo, Une caractérisation des langages algébriques bornés, ITA 11(3) (1977) 203–205. [11] A. Restivo, Mots sans répétitions et langages rationnels bornés, ITA 11(3) (1977) 197–202. [12] A. Restivo, On codes having no finite completions, Discrete Math. 17(3) (1977) 309–316. [13] A. Restivo, Some decision results for recognizable sets in arbitrary monoids, in: G. Ausiello, C. Böhm (Eds.), ICALP, Lecture Notes in Computer Science, Vol. 62, Springer, Berlin, 1978, pp. 363–371. [14] A. de Luca, A. Restivo, Synchronization and maximality for very pure subsemigroups of a free semigroup, in: J. Becvár (Ed.), MFCS, Lecture Notes in Computer Science, Vol. 74, Springer, Berlin, 1979, pp. 363–371. [15] J. Berstel, D. Perrin, J.F. Perrot, A. Restivo, Sur le théorème du defaut, J. Algebra 60 (1979) 169–180. [16] A. de Luca, D. Perrin, A. Restivo, S. Termini, Synchronization and simplification, Discrete Math. 27 (1979) 287–308. [17] J.-M. Boë, A. de Luca, A. Restivo, Minimal complete sets of words, Theoret. Comput. Sci. 12 (1980) 325–332. [18] A. de Luca, A. Restivo, On some properties of very pure codes, Theoret. Comput. Sci. 10 (1980) 157–170. [19] A. de Luca, A. Restivo, A characterization of strictly locally testable languages and its applications to subsemigroups of a free semigroup, Inform. Control 44(3) (1980) 300–319. [20] A. de Luca, A. Restivo, On some properties of local testability, in: J.W. de Bakker, J. van Leeuwen (Eds.), ICALP, Lecture Notes in Computer Science, Vol. 85, Springer, Berlin, 1980, pp. 385–393. [21] S. Mauceri, A. Restivo, A family of codes commutatively equivalent to prefix codes, Inform. Process. Lett. 12(1) (1981) 1–4. [22] A. Restivo, Some remarks on complete subsets of a free monoid, 1981, pp. 19–25. [23] A. de Luca, A. Restivo, A synchronization property of pure subsemigroups of a free semigroup, 1981, pp. 233–240. [24] A. de Luca, A. Restivo, S. Salemi, On the centers of a language, Theoret. Comput. Sci. 24 (1983) 21–34. [25] A. Restivo, C. Reutenauer, Some applications of a theorem of Shirshov to language theory, Inform. Control 57(2/3) (1983) 205–213. [26] A. Restivo, S. Salemi, On weakly square free words, Bull. EATCS 21 (1983) 49–57.
182
Preface / Theoretical Computer Science 340 (2005) 179 – 185
[27] A. Restivo, C. Reutenauer, On cancellation properties of languages which are supports of ration power series, J. Comput. System Sci. 29(2) (1984) 153–159. [28] A. de Luca, A. Restivo, A finiteness condition for finitely generated semigroups, Semigroup Forum 28(1–3) (1984) 123–134. [29] A. Restivo, C. Reutenauer, On the Burnside problem for semigroups, J. Algebra 89(1) (1984) 102–104. [30] A. de Luca, A. Restivo, Representations of integers and language theory, in: M. Chytil, V. Koubek (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 176, Springer, Berlin, 1984, pp. 407–415. [31] A. Restivo, C. Reutenauer, Cancellation, pumping and permutation in formal languages, in: J. Paredaens (Ed.), ICALP, Lecture Notes in Computer Science, Vol. 172, Springer, Berlin, 1984, pp. 414–422. [32] A. Restivo, S. Salemi, Overlap-free words on two symbols, in: M. Nivat, D. Perrin (Eds.), Automata on Infinite Words, Lecture Notes in Computer Science, Vol. 192, Springer, Berlin, 1984, pp. 198–206. [33] A. Restivo, Rational languages and the Burnside problem, Theoret. Comput. Sci. 40 (1985) 13–30. [34] C. de Felice, A. Restivo, Some results on finite maximal codes, ITA 19(4) (1985) 383–403. [35] A. Restivo, S. Salemi, Some decision results on nonrepetitive words, Series. F, Vol. 12, NATO Adv. Sci. Inst., Springer, Berlin, 1985, pp. 289–295. [36] A. de Luca, A. Restivo, Star-free sets of integers, Theoret. Comput. Sci. 43 (1986) 265–275. [37] A. de Luca, A. Restivo, On a generalization of a conjecture of Ehrenfeucht, Bull. EATCS 30 (1986) 84–90. [38] A. Restivo, Codes and automata, in: J.-E. Pin (Ed.), Formal Properties of Finite Automata and Applications, Lecture Notes in Computer Science, Vol. 386, Springer, Berlin, 1988, pp. 186–198. [39] A. Restivo, Permutation properties and the Fibonacci semigroup, Semigroup Forum 38(3) (1989) 337–345. [40] A. Restivo, Finitely generated sofic systems, Theoret. Comput. Sci. 65(2) (1989) 265–270. [41] A. Restivo, S. Salemi, T. Sportelli, Completing codes, ITA 23(2) (1989) 135–147. [42] A. Restivo, A note on multiset decipherable codes, IEEE Transactions on Information Theory 35(3) (1989) 662. [43] A. Restivo, Coding sequences with constraints, in: R. Capocelli (Ed.), Sequences, Springer, New York, 1990, pp. 530–540. [44] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72(1) (1990) 55–64. [45] A. Restivo, Codes with constraint, in: M.P. Schutzenberger, M. Lothaire (Eds.), Mots, Langue, raisonnement, calcul, Hermes, 1990, pp. 358–366. [46] G. Guaiana, A. Restivo, S. Salemi, Complete subgraphs of bipartite graphs and applications to trace languages, ITA 24 (1990) 409–418. [47] G. Guaiana, A. Restivo, S. Salemi, On aperiodic trace languages, in: C. Choffrut, M. Jantzen (Eds.), STACS, Lecture Notes in Computer Science, Vol. 480, Springer, Berlin, 1991, pp. 76–88.
Preface / Theoretical Computer Science 340 (2005) 179 – 185
183
[48] G. Guaiana, A. Restivo, S. Salemi, Star-free trace languages, Theoret. Comput. Sci. 97(2) (1992) 301–311. [49] A. Restivo, A note on renewal systems, Theoret. Comput. Sci. 94(2) (1992) 367–371. [50] D. Giammarresi, A. Restivo, Recognizable picture languages, IJPRAI 6(2,3) (1992) 241–256. [51] R. Montalbano, A. Restivo, The star height one problem for irreducible automata, in: R. Capocelli et al. (Eds.), Sequences II, Springer, New York, 1993, pp. 457–469. [52] D. Giammarresi, A. Restivo, S. Seibert, W. Thomas, Monadic second-order logic over pictures and recognizability by tiling systems, in: P. Enjalbert, E.W. Mayr, K.W. Wagner (Eds.), STACS, Lecture Notes in Computer Science, Vol. 775, Springer, Berlin, 1994, pp. 365–375. [53] R. Montalbano, A. Restivo, Antonio on the star height of rational languages, Internat. J. Algebra Comput. 4(3) (1994) 427–441. [54] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, A periodicity theorem for trees, in: B. Pehrson, I. Simon (Eds.), Technology and Foundations—Information Processing ’94, Volume 1, Proceedings of the IFIP 13th World Computer Congress Hamburg, Germany, 28 August–2 September, 1994. North-Holland, Amsterdam, 1994, pp. 473–478. [55] M. Anselmo, A. Restivo, Factorizing languages, in: B. Pehrson, I. Simon (Eds.), Technology and Foundations—Information Processing ’94, Volume 1, Proceedings of the IFIP 13th World Computer Congress Hamburg, Germany, 28 August–2 September, 1994. North-Holland, Amsterdam, 1994, pp. 445–450. [56] F. Mignosi, A. Restivo, S. Salemi, A periodicity theorem on words and applications, in: J. Wiedermann, P. Hájek (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 969, Springer, Berlin, 1995, pp. 337–348. [57] D. Giammarresi, A. Restivo, S. Seibert, W. Thomas, Monadic second-order logic over rectangular pictures and recognizability by tiling systems, Inform. Comput. 125(1) (1996) 32–45. [58] D. Giammarresi, A. Restivo, Two-dimensional finite state recognizability, Fund. Inform. 25(3) (1996) 399–422. [59] M. Anselmo, A. Restivo, On languages factorizing the free monoid, Internat. J. Algebra Comput. 6(4) (1996) 413–427. [60] M.-P. Béal, F. Mignosi, A. Restivo, Minimal forbidden words and symbolic dynamics, in: C. Puech, R. Reischuk (Eds.), STACS, Lecture Notes in Computer Science, Vol. 1046, Springer, Berlin, 1996, pp. 555–566. [61] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, Congruences, automata and periodicities, in: J. Almeida, P.V. Silva, G.M.S. Gomes (Eds.), Semigroups, Automata and Languages, World Scientific, River Edge, NJ, 1996, pp. 125–135. [62] S. Mantaci, A. Restivo, Equations on trees, in: W. Penczek, A. Szalas (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 1113, Springer, Berlin, 1996, pp. 443–456. [63] M. Anselmo, C. De Felice, A. Restivo, On some factorization problems, Bull. Belg. Math. Soc. Simon Stevin 4(1) (1997) 25–43. [64] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 3, Springer, Berlin, 1997, pp. 215–267.
184
Preface / Theoretical Computer Science 340 (2005) 179 – 185
[65] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, Periodicities on trees, Theoret. Comput. Sci. 205(1–2) (1998) 145–181. [66] F. Mignosi, A. Restivo, S. Salemi, Periodicity and the golden ratio, Theoret. Comput. Sci. 204(1–2) (1998) 153–167. [67] M. Crochemore, F. Mignosi, A. Restivo, Automata and forbidden words, Inform. Process. Lett. 67(3) (1998) 111–117. [68] M. Crochemore, F. Mignosi, A. Restivo, Minimal forbidden words and factor automata, in: L. Brim, J. Gruska, J. Zlatuska (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 1450, Springer, Berlin, 1998, pp. 665–673. [69] M.G. Castelli, F. Mignosi, A. Restivo, Fine and Wilf’s theorem for three periods and a generalization of Sturmian words, Theoret. Comput. Sci. 218(1) (1999) 83–94. [70] F. Mignosi, A. Restivo, On negative informations in language theory, Aust. Comput. Sci. Commun. 21(3) (1999) 60–72. [71] S. Mantaci, A. Restivo, On the defect theorem for trees, Publ. Math. Debrecen 54 (1999) 923–932. [72] D. Giammarresi, A. Restivo, Extending formal language hierarchies to higher dimensions, ACM Comput. Surv. 31(3es) (1999) 12. [73] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors in finite and infinite words, in: J. Karhumäki, H.A. Maurer, G. Paun, G. Rozenberg (Eds.), Jewels are Forever, Springer, Berlin, 1999, pp. 339–350. [74] M. Crochemore, F. Mignosi, A. Restivo, S. Salemi, Text compression using antidictionaries, in: J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), ICALP Lecture Notes in Computer Science, Vol. 1644, Springer, Berlin, pp. 261–270. [75] M.-P. Béal, F. Mignosi, A. Restivo, M. Sciortino, Forbidden words in symbolic dynamics, Adv. Appl. Math. 25(2) (2000) 163–193. [76] J.-P. Duval, F. Mignosi, A. Restivo, Recurrence and periodicity in infinite words from local periods, Theoret. Comput. Sci. 262(1) (2001) 269–284. [77] S. Mantaci, A. Restivo, Codes and equations on trees, Theoret. Comput. Sci. 255(1–2) (2001) 483–509. [78] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors and fragment assembly, ITA, 35(6) (2001) 565–577. [79] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors and fragment assembly, in: W. Kuich, G. Rozenberg, A. Salomaa (Eds.), Developments in Language Theory 2001, Lecture Notes in Computer Science, Vol. 2295, Springer, Berlin, 2002, pp. 349–358. [80] A. Restivo, S. Salemi, Words and patterns, in: Developments in Language Theory 2001, Lecture Notes in Computer Science, Vol. 2295, Springer, Berlin, 2002, pp. 117–129. [81] A. Restivo, S.R. Della Rocca, L. Roversi (Eds.), Theoret. Comput. Sci. Proceedings of the Seventh Italian Conference, ICTCS 2001, Torino, Italy, 4–6 October, 2001, Lecture Notes in Computer Science, Vol. 2202, Springer, Berlin, 2001. [82] A. Restivo, P.V. Silva, On the lattice of prefix codes, Theoret. Comput. Sci. 289(1) (2002) 755–782. [83] F. Mignosi, A. Restivo, M. Sciortino, Words and forbidden factors, Theoret. Comput. Sci. 273(1–2) (2002) 99–117. [84] F. Mignosi, A. Restivo, Vol. 90, Chapter Periodicity, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002.
Preface / Theoretical Computer Science 340 (2005) 179 – 185
185
[85] A. Restivo, S. Salemi, Binary patterns in infinite binary words, in: W. Brauer, H. Ehrig, J. Karhumäki, A. Salomaa (Eds.), Formal and Natural Computing, Lecture Notes in Computer Science, Vol. 2300, Springer, Berlin, 2002, pp. 107–118. [86] F. Mignosi, A. Restivo, P.V. Silva, On Fine and Wilf’s theorem for bidimensional words, Theoret. Comput. Sci. 292(1) (2003) 245–262. [87] M.-P. Béal, M. Crochemore, F. Mignosi,A. Restivo, M. Sciortino, Computing forbidden words of regular languages, Fund. Inform. 56(1–2) (2003) 121–135. [88] S. Mantaci, A. Restivo, M. Sciortino, Burrows-Wheeler transform and Sturmian words, Inform. Process. Lett. 86(5) (2003) 241–246. [89] A. Restivo, P.V. Silva, Periodicity vectors for labelled trees, Discrete Appl. Math. 126(2–3) (2003) 241–260. [90] A. Gabriele, F. Mignosi, A. Restivo, M. Sciortino, Indexing structures for approximate string matching. in: R. Petreschi, G. Persiano, R. Silvestri (Eds.), CIAC, Lecture Notes in Computer Science, Vol. 2653, Springer, Berlin, 2003, pp. 140–151. [91] G. Castiglione, A. Restivo, Reconstruction of I-convex polyominoes, Electron. Notes in Discrete Math. 12 (2003). [92] G. Guaiana, A. Restivo, S. Salemi, On the trace product and some families of languages closed under partial commutations, J. Automat. Lang. Comb. 9(1) (2004) 61–79. [93] G. Castiglione, A. Restivo, S. Salemi, Patterns in words and languages, Discrete Appl. Math. 144(3) (2004) 237–246. [94] G. Castiglione, A. Restivo, Ordering and convex polyominoes, in: M. Margenstern (Ed.), MCU 2004, Lecture Notes in Computer Science, Vol. 3354, Springer, Berlin, 2004. [95] F. Burderi, A. Restivo, Varieties of codes and Kraft inequality, in: V. Diekert, B. Durand (Eds.), STACS 2005, Lecture Notes in Computer Science, Vol. 3404, Springer, Berlin, 2005.
Theoretical Computer Science 340 (2005) 186 – 187 www.elsevier.com/locate/tcs
Editorial
From Antonio’s former students It was December 2004: Antonio’s 60th birthday was approaching and we, as his former students, were wishing very much to dedicate him one page in this volume. Hence, during the Christmas’ holidays, we planned a top secret reunion at the Department of Mathematics in Palermo. We all agreed on one thing: we wished to contribute with a special preface, that would focus on some of Antonio’s aspects that only we, his students, are probably fortunate enough to know. We wished to tell about his kind attitude that cannot be separated from his outstanding scientific profile. Perhaps this is what makes Antonio a “milestone” for us (this is incredible... Everyone used the same term to “define” him!): Antonio looks at everybody in front of him with extreme attention to the person and to all his aspects and qualities. This is the way Antonio himself uses to tell us about his “maestro” and friend M.P. Schützenberger or about his best times at the Istituto di Cibernetica in Arco Felice, telling us many funny stories with his peculiar sense of humor and acting capabilities. We all feel that being his students is a great privilege. But, how can we tell people about all the times that Antonio surprises us and provides us with renewed enthusiasm for our research? We all remember the many times when some of us got stuck on a research problem and Antonio would say: “There should be something on this subject in some LITP technical report back in 1987”. Then he looked at several stacks of papers on his desk or his shelves and said: “It should be in this stack”. Then, he started going through this stack of papers and magically took one paper out of it: needless to say, it contained exactly the solution to the problem! Or when, in the biology laboratories, looking at shelves displaying jars with strange insects, seeds or leaves preserved with formalin, we felt a bit uncomfortable with our theorems and counterexamples. Then Antonio used to say: “We should put our examples in a jar, with some labels like Example of language with property X or Thue-Morse word, disproving conjecture Y, so as to find them when needed!”. Finally, when the top secret meeting in that December afternoon in Palermo was over, we realized that we had spent most of the time just telling each other our respective nice memories and stories about working with Antonio. We all really had a great time, but we were still left with the problem of writing all those memories in a way that would be really suited to Antonio... Indeed, he does not like “commemorations” at all.
0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.023
Editorial / Theoretical Computer Science 340 (2005) 186 – 187
187
After a while, we all agreed that the best thing would be to write in the simplest way what we all felt: Many thanks Antonio for everything and our hearty wishes for your birthday!!! Marcella Anselmo Marina Madonia Giuseppina Rindone
Dora Giammarresi Sabrina Mantaci Marinella Sciortino
Giovanna Guaiana Filippo Mignosi
Theoretical Computer Science 340 (2005) 188 – 203 www.elsevier.com/locate/tcs
Connections between subwords and certain matrix mappings夡 Arto Salomaa Turku Centre for Computer Science, Academy of Finland, Lemminkäisenkatu 14, 20520 Turku, Finland
Abstract Parikh matrices recently introduced have turned out to be a powerful tool in the arithmetizing of the theory of words. In particular, many inequalities between (scattered) subword occurrences have been obtained as consequences of the properties of the matrices. This paper continues the investigation of Parikh matrices and subword occurrences. In particular, we study certain inequalities, as well as information about subword occurrences sufficient to determine the whole word uniquely. Some algebraic considerations, facts about forbidden subwords, as well as some open problems are also included. © 2005 Elsevier B.V. All rights reserved. Keywords: Parikh matrix; Subword; Scattered subword; Number of subwords; Inference from subsequences; Forbidden subword
1. Introduction The purpose of this paper is to investigate the number of occurrences of a word u as a subword in a word w, in symbols, |w|u . For us the term subword means that w, as a sequence of letters, contains u as a subsequence. More formally, we begin with the following fundamental 夡 The paper is dedicated to Antonio Restivo on the occasion of his 60th birthday. I have been fortunate to meet Antonio every now and then through many decades. I have always found him as a young colleague and friend with very bright ideas. Antonio has also been involved in successful cooperation with the Turku research group. I wish him all the best in the years to come both in science and life. E-mail address: asalomaa@utu.fi.
0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.024
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
189
Definition 1. A word u is a subword of a word w if there exist words x1 , . . . , xn and y0 , . . . , yn , some of them possibly empty, such that u = x1 . . . xn
and
w = y0 x1 y1 . . . xn yn .
The word u is a factor of w if there are words x and y such that w = xuy. If the word x (resp. y) is empty, then u is also called a prefix (resp. suffix) of w. Throughout this paper, we understand subwords and factors in this way. In classical language theory, [13], our subwords are usually called “scattered subwords”, whereas our factors are called “subwords”. The notation used throughout the article is |w|u , the number of occurrences of the word u as a subword of the word w. Two occurrences are considered different if they differ by at least one position of some letter. (Formally an occurrence can be viewed as a vector of length |u| whose components indicate the positions of the different letters of u in w.) Clearly, |w|u = 0 if |w| < |u|. We also make the convention that, for any w and the empty word , |w| = 1. In [14] the number |w|u is denoted as a “binomial coefficient” w |w|u = . u If w and u are words over a one-letter alphabet, w = ai ,
u = aj ,
then |w|u equals the ordinary binomial coefficient: |w|u = ji . Our convention concerning the empty word reduces to the fact that 0i = 1. (The convention is made also in [3,14].) Assume that is an alphabet containing the letters a and b. A little reflection shows that, for any word w, (|w|a ) × (|w|b ) = |w|ab + |w|ba . This simple equation can be viewed as a general fact about occurrences of subwords. It is also an instance about the linearization of subword histories investigated in [10]. A slight variation of the equation immediately leads to difficulties. No explicit characterization is known for the relation between (|w|u , |w|v ) and (|w|uv , |w|vu ), where u, v, w are arbitrary words. (In general, we use small letters from the beginning of the English alphabet to denote letters of the formal alphabet.) A general problem along these lines is the following. What numbers |w|u suffice to determine the word w uniquely? For instance, a word w ∈ {a, b}∗ is uniquely determined by the values |w|a = |w|b = 4, |w|ab = 15.
190
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
Indeed, w = a 3 bab3 . On the other hand, a word w ∈ {a, b}∗ of length 4 is not uniquely determined by the values |w|u , |u| 2. Either one of the words abba and baab can be chosen as w, and still the equations |w|a = |w|b = |w|ab = |w|ba = 2, |w|aa = |w|bb = 1 are satisfied. A powerful tool for such problems is the notion of a Parikh matrix. The rest of this paper deals with this notion. The Parikh matrix associated to a word w tells the numbers |w|u for certain specific words u. The original notion of a Parikh matrix was introduced in [9]. When dealing with the extended notion, [17], one has more leeway in the choice of the words u.
2. Parikh matrices Parikh mappings (vectors) introduced in [12] express properties of words as numerical properties of vectors yielding some fundamental language-theoretic consequences, [13,5]. Much information is lost in the transition from a word to a vector. A sharpening of the Parikh mapping, where more information is preserved than in the original Parikh mapping, was introduced in [9]. The new mapping uses upper triangular square matrices, with nonnegative integer entries, 1’s on the main diagonal and 0’s below it. Two words with the same Parikh matrix always have the same Parikh vector, but two words with the same Parikh vector have in general different Parikh matrices. Thus, the Parikh matrix gives more information about a word than a Parikh vector. The set of all triangular matrices described above is denoted by M, and the subset of all matrices of dimension k 1 is denoted by Mk . We are now ready to introduce the original notion of a Parikh matrix mapping. Definition 2. Let = {a1 , . . . , ak } be an alphabet. The Parikh matrix mapping, denoted k , is the morphism:
k : ∗ → Mk+1 , defined by the condition: if k (aq ) = (mi,j )1 i,j (k+1) , then for each 1 i (k + 1), mi,i = 1, mq,q+1 = 1, all other elements of the matrix k (aq ) being 0. Observe that when defining the Parikh matrix mapping we have, similarly as when defining the Parikh vector, in mind a specific ordering of the alphabet. Knowledge of the Parikh matrices for different orderings of the alphabet will increase our knowledge of the word in question. If we consider letters without numerical indices, we assume the alphabetic ordering in the definition of Parikh matrices. The Parikh matrix mapping is not injective even for the alphabet {a, b}. For instance, consider the matrices 1 4 6 1 5 8 0 1 3 and 0 1 3 . 0 0 1 0 0 1
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
191
Then the five words baabaab, baaabba, abbaaab, abababa, aabbbaa are exactly the ones having the first matrix as the Parikh matrix. Similarly, the six words aababbaa, aabbaaba, abaababa, baaaabba, ababaaab, baaabaab are exactly the ones having the second matrix as the Parikh matrix. This example becomes clearer in view of the following theorem, [9], where the entries of the Parikh matrix are characterized. For the alphabet = {a1 , . . . , ak }, we denote by ai,j the word ai ai+1 . . . aj , where 1 i j k. Theorem 1. Consider = {a1 , . . . , ak } and w ∈ ∗ . The matrix k (w) = (mi,j )1 i,j (k+1) , has the following properties: • mi,j = 0, for all 1 j < i (k + 1), • mi,i = 1, for all 1 i (k + 1), • mi,j +1 = |w|ai,j , for all 1 i j k. By the second diagonal (and similarly the third diagonal, etc.) of a matrix in Mk+1 , we mean the diagonal of length k immediately above the main diagonal. (The diagonals from the third on are shorter.) Theorem 1 tells that the second diagonal of the Parikh matrix of w gives the Parikh vector of w. The next diagonals give information about the order of letters in w by indicating the numbers |w|u for certain specific words u. Properties of Parikh matrices, notably the unambiguity of Parikh matrix mappings, have been investigated in [4,7–10,15,16]. For any word w over the alphabet {a, b, c, d}, Theorem 1 implies that 1 |w|a |w|ab |w|abc |w|abcd 0 1 |w|b |w|bc |w|bcd 0 0 1 |w|c |w|cd 4 (w) = . 0 0 0 1 |w|d 0 0 0 0 1 The problem of deciding whether or not a given matrix is a Parikh matrix is discussed in [8]. No nice general criterion is known. However, the following theorem, [8], characterizes exhaustively the entries in the second and third diagonals of a Parikh matrix. Theorem 2. Arbitrary nonnegative integers may appear on the second diagonal of a Parikh matrix. Arbitrary integers mi,i+2 , 1 i k − 1, satisfying the condition 0 mi,i+2 mi,i+1 mi+1,i+2 (but no others) may appear on the third diagonal of a (k + 1)-dimensional Parikh matrix. Theorem 2 gives a complete characterization of Parikh matrices over binary alphabets, since in this case no further diagonals are present. In the general case, starting with arbitrary second and third diagonals satisfying the conditions of Theorem 2, the matrix can be completed to a Parikh matrix in at least one way.
192
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
We will now introduce the generalized notion of a Parikh matrix due to [17]. We first recall the definition of the “Kronecker delta”. For letters a and b,
a,b =
1 if a = b, 0 if a = b.
Definition 3. Let u = b1 . . . bk be a word, where each bi , 1 i k, is a letter of the alphabet . The Parikh matrix mapping with respect to u, denoted u , is the morphism:
u : ∗ → Mk+1 , defined, for a ∈ , by the condition: if u (a) = Mu (a) = (mi,j )1 i,j (k+1) , then for each 1 i (k + 1), mi,i = 1, and for each 1 i k, mi,i+1 = a,bi , all other elements of the matrix Mu (a) being 0. Matrices of the form u (w), w ∈ ∗ , are referred to as generalized Parikh matrices. Thus, the Parikh matrix Mu (w) associated to a word w is obtained by multiplying the matrices Mu (a) associated to the letters a of w, in the order in which the letters appear in w. The above definition implies that if a letter a does not occur in u, then the matrix Mu (a) is the identity matrix. For instance, if u = baac, then
1 0 Mu (a) = 0 0 0
0 1 0 0 0
0 1 1 0 0
0 0 1 1 0
0 0 0 . 0 1
1 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 , 0 1
Similarly,
1 0 Mu (b) = 0 0 0
1 0 Mu (c) = 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 . 1 1
In the original definition of a Parikh matrix, [9], the word u was chosen to be u = a1 . . . ak , for the alphabet = {a1 , . . . , ak }. In the general setup, the essential contents of Theorem 1 can be formulated as follows. For 1 i j k, denote Ui,j = bi . . . bj . Denote the entries of the matrix Mu (w) by mi,j . Theorem 3. For all i and j , 1 i j k, we have mi,1+j = |w|Ui,j .
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
193
Going back to our example u = baac, we infer from Theorem 3 that, for any word w, 1 |w|b |w|ba |w|baa |w|baac 0 1 |w|a |w|aa |w|aac 0 0 1 |w|a |w|ac Mu (w) = . 0 0 0 1 |w|c 0 0 0 0 1 For w = a 3 c3 bac2 ac we get 1 1 2 0 1 5 Mu (w) = 0 0 1 0 0 0 0 0 0
1 1 10 31 5 22 . 1 6 0 1
3. Matrix-deducible inequalities We begin with the following theorem. It concerns the occurrences of subwords of a certain general type. We consider decompositions xyz, and the occurrences of xyz, y, xy, and yz as subwords in an arbitrary w. Theorem 4. The inequality |w|xyz |w|y |w|xy |w|yz holds for arbitrary words w, x, y, z. Theorem 4 is due to [10]. A direct combinatorial proof is given also in [15]. The result can be obtained also using the following lemma, [10,17]. As in the preceding section, we denote by Mu (w) an arbitrary generalized Parikh matrix. Lemma 1. The value of any minor of the matrix Mu (w) is a nonnegative integer. The inequality presented in Theorem 4 is referred to as the Cauchy inequality for words. It can be claimed to be a fundamental property of words, because of its generality and because it reduces to equality in a great variety of cases. The choice for the name of the inequality is motivated by the resemblance to the well-known algebraic Cauchy inequality for real numbers and also by the methods used in the proof. The reader is referred to [10] for further details. No general theory exists concerning the cases when the Cauchy inequality actually reduces to an equality. We now present some considerations in this direction. We begin with a simple example. Consider the words w = a i1 bj1 ck1 ,
x = a i2 ,
y = bj2 ,
z = ck2 . (As usual, a, b, c stand for letters.) Clearly, |w|y = jj21 . Straightforward calculations show that 2 j1 k1 i1 = |w|xy |w|yz . |w|y |w|xyz = i2 j2 k2
194
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
For instance, the setup w = a 4 b4 c4 ,
x = a,
y = b,
z = c2
yields the value 384 for both sides of the equation. In general, if w = x1 y1 z1 ,
|w|x = |x1 |x = m,
|w|y = |y1 |y = n,
|w|z = |z1 |z = p,
then both sides of the Cauchy inequality equal mn2p and, thus, the inequality is not proper. Consider words over a one-letter alphabet. If the words w, x, y, z are of lengths n, i, j, k, respectively, then the inequality assumes the form n n n n , i+j j +k j i+j +k which is easily verified to be true. Here we have an equality exactly in case i = 0 or k = 0. Assume that y = a i bj a k ,
x = a i1 ,
z = a k1
and w = a i+i1 +i bj +j a k+k1 +k . Then again it is easy to verify that the inequality is not proper. More general results can be obtained using the linearization of subword histories presented in [10]. Consider the equation (|w|a ) × (|w|b ) = |w|ab + |w|ba mentioned in Section 1, valid for any word w and letters a and b. According to the terminology introduced in [10], we speak of the subword history a × b − ab − ba in the word w, defined by the equation SH(w, a × b − ab − ba) = (|w|a ) × (|w|b ) − |w|ab − |w|ba . Thus, our simple equation tells us that, for any word w, SH(w, a × b − ab − ba) = 0. Secondly, our equation can be written in the form SH(w, a × b) = SH(w, ab + ba). In other words, independently of w, the subword history a × b assumes the same value as the subword history ab + ba in w. In such a case we say that the two subword histories are equivalent. Our equation shows also how a particular subword history involving the operation × possesses an equivalent linear subword history, that is, an equivalent subword history not involving the operation ×. It was established in [10] that this holds true in general: the operation × can be eliminated from all subword histories. The proof uses the
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
shuffle u
v of two words u and v. By definition, u
u0 v0 u1 v1 . . . uk vk ,
195
v consists of all words
where k 0, ui , vi ∈ ∗ for 0 i k, and
u = u0 . . . uk , v = v0 . . . vk . It is fairly straightforward to prove
that if u and v are words over disjoint alphabets, then the subword histories u × v and x∈u v x are equivalent. This result forms the basis of the general linearization technique: for arbitrary u and v, one first provides the letters of v with primes, forcing the two words to be over disjoint alphabets. One then forms the shuffle, arguing at the same time with “reduced” words and multiplicities. For instance, 2abba + abab + baba + 2baab + aba + bab is a linear subword history equivalent to ab × ba. The following example is more sophisticated. Consider the special case A = (ab) × (aabb) (aab) × (abb) = B. of the Cauchy inequality. The equivalent linear subword histories are in this case: A=aba2 b2 +4a 2 bab2 +9a 3 b3 +a 2 b2 ab+abab2 +a 2 bab+6a 2 b3 +6a 3 b2 +4a 2 b2 , the linear subword history equivalent to B being obtained by adding a 2 bab2 +a 2 b2 ab+aba2 b2 +ababab+ab2 a 2 b+2abab2 +a 2 bab+ab2 ab+abab to A. This gives the following conclusion. (The result can be inferred without reference to Theorem 4.) For any word w, we have |w|ab |w|aabb |w|aab |w|abb . The equality holds exactly in case w does not contain the subword abab (and the right side is nonzero). The same argument is applicable for more general words. Consider the inequality |w|xyz |w|y |w|xy |w|yz , where x = a m , y = ab, z = bn , m, n 1. By analyzing the linear subword histories arising from the two sides of the inequality, we see that every term on the left side gives rise to a unique term on the right side and, moreover, the eventual additional terms on the right side all possess the subword abab. Thus, we obtain the following result. Theorem 5. The inequality |w|ab |w|a m bn |w|a m b |w|abn ,
m, n 2,
holds for all words w and is strict exactly in case w contains the subword abab (and the right side is nonzero). Numerous inequalities can be deduced from Lemma 1, by Theorem 1 or Theorem 3. The following general result is along these lines.
196
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
Theorem 6. Let k 1 and let w, arbitrary minor of the matrix 1 |w|x1 |w|x1 x2 0 1 |w|x2 .. . M= .. . 0 ...
x1 , . . . , xk be arbitrary words. Further, let Mdet be an ... |w|x1 ...xk ... |w|x2 ...xk .. . .. . |w|xk ... 0 1
.
Then Mdet 0. For instance, the subsequent inequalities are obtained by Theorem 6. The letters u, w, x, y, z, y1 , . . . , yn stand for arbitrary words. A suitable combination of the inequalities gives (partial) results about the cases when the inequality is strict. However, a general theory is missing. |w|xy |w|y |w|xyz |w|y1 . . . |w|yn |w|xy1 ...yn z |w|x |w|yz + |w|xy |w|z
|w|x |w|y , |w|xy |w|yz , (Cauchy inequality), |w|xy1 |w|y1 y2 . . . |w|yn z , |w|xyz + |w|x |w|y |w|z ,
|w|yz |w|xyzu + |w|xy |w|z |w|yzu + |w|y |w|xyz |w|zu |w|xy |w|yz |w|zu + |w|y |w|z |w|xyzu + |w|xyz |w|yzu , |w|x |w|y |w|zu + |w|x |w|yz |w|u + |w|xy |w|z |w|u + |w|xyzu |w|x |w|yzu + |w|xy |w|zu + |w|xyz |w|u + |w|x |w|y |w|z |w|u .
4. Sufficient conditions for complete inference A very central problem concerning words, also important in numerous applications, is to find some elements (factors, subwords, etc.) of words that characterize the word so that, instead of the word itself, it suffices to investigate the elements. For instance, one might be able to characterize a word in terms of some specific factors or subwords. Here the characterization can be total or partial: the elements considered may determine the word uniquely or only to a certain extent. A characterization in terms of factors, optimal in a specific sense, was given in [2]. Here we consider characterizations in terms of subwords. A general problem is the following. What numbers |w|u suffice to determine the word w uniquely? In addressing the general problem, one should specify a class of subwords u such that the values |w|u , where u ranges over this class, determine w uniquely. Such a class could consist of all words of at most a given length. Indeed, a notion often mentioned but not much investigated in the literature, [1,6,13,15], is that of a t-spectrum. For a fixed t 1, the t-spectrum of a word w tells all the values |w|u , where |u| t. Following the notation of
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
197
formal power series, [5], the t-spectrum of a word w in ∗ can be viewed as a polynomial in N0 < ∗ > of degree t. For instance, the polynomial aa + bb + 2ab + 2ba + 2a + 2b is the 2-spectrum of the word abba, as well as of the word baab. In general, one can define the function (t) as the maximal length such that any word of length (t) is uniquely determined by its t-spectrum. See [15] for other details. The function (t) is discussed in detail in [3], where the original formulation of the problems is credited to L.O. Kalashnik. For instance, the two different words abbaaab, baaabba (resp. ab2 a 3 ba2 b2 a, ba3 bab2 a 3 b) have the same 3-spectrum (resp. 4-spectrum), and are both of length 7 (resp. 12), [6]. This shows that
(3) 6, (4) 11. Perhaps one should not always consider subwords of the same length and take all of them. Sometimes very few words (of different lengths) determine the word uniquely. Consider words w over the alphabet {a, b}. We will now show how w can be uniquely inferred from certain values |w|u . A good choice for the words u is the sequence abi , i = 0, 1, 2, . . . . Indeed, as shown in the following Lemma, the word w can be uniquely inferred from its Parikh vector (r, s) and the numbers |w|abi , 1 i min(r, s). Lemma 2. Assume that w and w are words over the alphabet {a, b} with the same Parikh vector (r, s) and that |w|abi = |w |abi ,
1 i min(r, s).
Then w = w . Proof. Recall that the Parikh vector of a word w is the vector (|w|a , |w|b ). Notice that under our hypotheses one has |w|abi = |w |abi , 1 i r. Indeed, this is trivial if r s while if s < r, then |w|abi = |w |abi = 0 for s + 1 i r. Thus, in order to prove the statement, it is sufficient to show that the numbers r, s and |w|abi ,
1 i r,
determine the word w uniquely. Consider the r occurrences of the letter a in w, and denote by xi , 1 i r, the number of occurrences of b to the right of a particular occurrence of a, when the occurrences of a are counted from left to right. Thus, s x1 x2 . . . xr 0.
(1)
198
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
Denote |w|abi = i , 1 i r. We obtain the following system of equations: r
xi = j , j = 1, . . . , r. j i=1 (This follows because, for instance, each subword occurrence of ab2 in w is obtained by taking the ith occurrence of a, for some i where 1 i r, and an arbitrary pair of the xi occurrences of b to the right of this a.) When the binomial coefficients are written out as polynomials, the system of equations takes the form r
i=1
j
xi = Pj (1 , . . . , j ),
j = 1, . . . , r,
where each Pj is a linear polynomial with positive integer coefficients. (The latter can be given explicitly but this is irrelevant for our purposes.) For instance, we obtain for r = 4 : x1 + x2 + x3 + x4 = 1 , x12 + x22 + x32 + x42 = 22 + 1 , x13 + x23 + x33 + x43 = 63 + 62 + 1 , x14 + x24 + x34 + x44 = 244 + 363 + 142 + 1 . It is well known that this system has a unique unordered solution (on the complex field) which is given by the roots of a suitable polynomial of degree r. This is, indeed, a straightforward consequence of Newton–Girard formulas relating the coefficients of a polynomial and the sums of the powers of its roots. We derive that there is at most one ordered solution (x1 , . . . , xr ) where xi , 1 i r, are integers satisfying (1). Finally, the word w is uniquely inferred from the numbers xi and s. For instance, the values |w|a = 4, |w|b = 11, |w|ab4 = 128
|w|ab = 18,
|w|ab2 = 48,
|w|ab3 = 92,
yield the (unique) word w = b2 ab5 a 2 b3 ab. This concludes the proof.
Lemma 3. The statement of Lemma 2 holds true if the sequence abi , 1 i min(r, s), is replaced by any of the three sequences a i b,
bai ,
bi a,
1 i min(r, s).
Proof. The claim concerning the sequence ba i follows from Lemma 2, by interchanging the letters a and b. Consider the sequence a i b. Clearly, |w|a i b = |mi(w)|ba i , where mi(w) is the mirror image of w. Thus, mi(w) and, therefore, also w is uniquely determined by the given numerical values. Finally, the claim concerning bi a follows again by interchanging a and b. Theorem 7. For any integer l, a word w of length l over the alphabet {a, b} can be uniquely inferred from at most [l/2] + 2 specific values |w|u .
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
199
Proof. The result follows directly from Lemma 2 (or from Lemma 3), because min(r, s) [l/2]. For instance, the values |w|a ,
|w|b , |w|ab , |w|ab2
determine uniquely a word w of length 5. The result is optimal in the sense that no three among the values suffice for the same purpose. The 5002 values |w|a ,
|w|b ,
|w|abi ,
1 i 5000,
determine uniquely a word w of length 104 . The 12-spectrum of a word consists of somewhat more values but, according to [3], only words of length less that 600 are uniquely determined by it. To infer uniquely words of length 104 , the 18-spectrum is needed, [3]. In the consideration of spectra, attention may be restricted to binary alphabets, [3]. The situation is different if one just wants to have a “good” set of values |w|u for the inference of w. If the alphabet is bigger than binary, one may consider the letters pairwise or try some direct approach. In any case, one has to extend the results such as Lemmas 2 and 3. Some results about the injectivity of Parikh matrix mappings have been presented in [4,8,16]. The above considerations can be used to establish an injectivity result for generalized Parikh matrix mappings. We base our discussion on Lemma 2; Lemma 3 yields analogous results. Consider the generalized Parikh matrix mapping (over the alphabet {a, b})
= u ,
where u = abt , t 1.
Thus, the matrices (w) are (t + 2)-dimensional. In the matrix (a) the only nonzero entry above the main diagonal is the entry (1, 2), whereas in the matrix (b) all entries (j, j + 1), 2 j t + 1, equal 1. By Theorem 3, we have for an arbitrary word w: |w|abt 1 |w|a |w|ab . . . 0 1 |w|b ... |w|bt . . .. .. . (w) = .. .. . . |w|b 0 ... ... 0 1 Observe also that, for any word w, the value |w|b determines uniquely all values |w|bi , i 1. Hence, the following result is a consequence of Lemma 2 and Theorem 3. Theorem 8. If the equation (w) = (w ) holds for different words w and w , then |w| = |w | > 2t. Theorem 8 gives a numerical characterization of binary words in terms of matrices. It can be extended to arbitrary words by considering the letters pairwise. However, this method is not very efficient. It is likely that there are better direct ways for the characterization.
200
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
5. Forbidden subwords Forbidden factors of words and infinite words have been widely investigated. A forbidden factor of a word w is simply a word that does not occur as a factor of w. Forbidden factors are sometimes of fundamental importance in determining the structure of the word itself. A word u is a minimal forbidden factor of a word w if u is a forbidden factor of w but all proper factors of u are factors of w. This notion has relevant connections with automata theory, text compression and symbolic dynamics. The reader is referred to [11] and the references given therein. Analogous notions can be defined for subwords as well. Definition 4. A word u is a forbidden subword of w if |w|u = 0. A forbidden subword u of w is minimal if all proper factors v of u satisfy |w|v > 0. The purpose of this section is only to point out a direct connection between (minimal) forbidden subwords and generalized Parikh matrices. We hope to return to forbidden subwords in another contribution. Theorem 9. A word u is a forbidden subword of a word w if and only if the entry in the upper right corner of the generalized Parikh matrix Mu (w) equals 0. A forbidden subword u of w is minimal exactly in case all other entries above the main diagonal in Mu (w) are positive. Theorem 9 follows by the definitions and Theorem 3. For instance, consider w = baababb, Then
1 0 Mu (w) = 0 0 0
u = abba.
3 1 0 0 0
8 4 1 0 0
7 6 4 1 0
0 1 4 , 3 1
showing that u is a minimal forbidden subword of w. Observe that u is a minimal forbidden subword of w also under the following modified definition: A forbidden subword u of w is minimal if all proper subwords of u are also subwords of w. Minimality under this modified sense cannot be directly characterized by generalized Parikh matrices. A transposition in w, resulting in w = baabbab, gives the matrix 1 3 7 6 2 0 1 4 6 3 Mu (w) = 0 0 1 4 5 0 0 0 1 3 0 0 0 0 1 and, thus, u is not forbidden.
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
201
The defining condition for a forbidden subword, |w|u = 0, concerns estimations of the number |w|u . The inequalities discussed in Section 3 yield readily such estimations. For instance, the following result is a consequence of Theorem 4. Lemma 4. For any words w, u, x, y, z, where u = xyz, we have |w|u
|w|xy |w|yz . |w|y
6. Conclusion. Open problems Various algebraic considerations concerning Parikh matrices have been presented in the literature, see for instance [7]. Parikh matrices are not closed under the ordinary addition of matrices. A special operation was introduced for matrices in Mk in [7]. The entries above entries in the main diagonal in the matrix M1 M2 are obtained from the corresponding M1 and M2 by addition. (Thus, the main diagonal of the matrix M1 M2 consists of 1’s.) If we are dealing with binary alphabets, then Theorem 2 implies that the “sum” M1 M2 of two Parikh matricesM1 and M2 is again a Parikh matrix. The same conclusion hold for the “product” M1 M2 of two Parikh matrices M1 and M2 , defined by entry-wise multiplication. Indeed, if in both M1 and M2 the only element of the third diagonal is within the bounds allowed in Theorem 2, the same holds true with respect to the corresponding element in M1 M2 . Thus, we obtain the following result. Theorem 10. Parikh matrices over the alphabet {a, b} constitute a commutative semiring with identity, with respect to the operators and . If the alphabet consists of three or more letters, then M1 M2 is not necessarily a Parikh matrix for Parikh matrices M1 and M2 . As pointed out in [7], the matrix M1 M2 is not a Parikh matrix if M1 and M2 are the Parikh matrices resulting from the words abc and b, respectively. As regards the operation , it is not easy to find similar examples. Indeed, it is an open problem whether or not the set of Parikh matrices is closed under . The matrices satisfying the property of Lemma 1 are closed under . (In other words, if every minor of the matrices M1 and M2 is a nonnegative integer, the same holds true for the matrix M1 M2 .) However, not every matrix (in M) having this property is a Parikh matrix, [10,8]. Problems concerning the operation belong to the more general problem area concerning suitable algebraic operations for Parikh matrices. For instance, would the Kronecker product of matrices suit for some characterizations? Properly chosen algebraic operations might contribute significantly to the general characterization and injectivity problems of Parikh matrices, [4,10,8,16]. We conclude by mentioning some other open problems. Finding numerical values, such as in Lemmas 2 and 3, from which a word can be uniquely inferred is a problem area of considerable practical significance, [3,6,15]. What is a minimal or otherwise optimal set of such values? Our considerations above deal with binary alphabets. In the general case one can of course consider the letters pairwise, but a more direct approach is called for.
202
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
A number of open problems relate Parikh matrices with languages. Given a language L ⊆ ∗ , we denote by M(L) the set of Parikh matrices associated to the words in L. Is the equation M(L1 ) = M(L2 ) decidable when L1 and L2 come from a specific language family? This problem is open even for regular languages. Related problems are mentioned in [8]. One can also specify an alphabet and some values |w|u , and study the set of words w, where each of these values is met. For instance, the regular language b∗ (a 3 b+ab3 +abab)a ∗ results from the value |w|ab = 3, whereas the language (ba 3 b + abab)a ∗ results from the combination of the values |w|ab = 3 and |w|b = 2. The combination of the values |w|aba = 1 and |w|babab6 = 5 yields the unique word b5 abab6 . The conditions |w|a = |w|b = |w|ab = i,
for some i 1,
lead to a rather involved noncontext-free language. From finitely many conditions always a regular language results. Thus, if we fix values for arbitrary entries in a (generalized) Parikh matrix, then the set of all those words whose Parikh matrix has the fixed values in the corresponding entries is regular. Infinite languages are obtained by leaving open some entries in the second diagonal. Subword histories were considered above in Section 3. The equality problem, that is, the problem of deciding whether two subword histories assume the same value for all words, was settled in [10]. The corresponding inequality problem is open: given two subword histories SH 1 and SH 2 , is the value of SH 1 for an arbitrary word w less than or equal to that of SH 2 ? For instance, baab bab + baaab because |w|baab |w|bab + |w|baaab holds for all w. In the general case it is not even known whether the problem is decidable. The case of one-letter alphabets is easy to settle. By [10], the attention may be restricted to linear subword histories. One can also show that the inequality u v holds between two “monomial” subword histories u and v only in case u = v. Acknowledgements The author is grateful to the referee for useful suggestions.
References [1] [2] [3] [4]
J. Berstel, J. Karhumäki, Combinatorics on words—a tutorial, EATCS Bull. 79 (2003) 178–228. A. Carpi, A. De Luca, Words and special factors, Theoret. Comput. Sci. 259 (2001) 145–182. M. Dudik, L.J. Schulman, Reconstruction from subsequences, J. Combin. Theory A 103 (2002) 337–348. S. Fossé, G. Richomme, Some characterizations of Parikh matrix equivalent binary words, Inform. Proc. Lett. 92 (2004) 77–82. [5] W. Kuich, A. Salomaa, Semirings Automata Languages, Springer, Berlin, Heidelberg, New York, 1986.
A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203
203
[6] J. Ma˘nuch, Characterization of a word by its subwords, in: G. Rozenberg, W. Thomas (Eds.), Developments in Language Theory, World Scientific Publ. Co., Singapore, 2000, pp. 210–219. [7] A. Mateescu, Algebraic aspects of Parikh matrices, in: J. Karhumäki, H. Maurer, G. P˘aun, G. Rozenberg (Eds.), Theory is Forever, Springer, Berlin, 2004, pp. 170–180. [8] A. Mateescu, A. Salomaa, Matrix indicators for subword occurrences and ambiguity, Int. J. Found. Comput. Sci. 15 (2004) 277–292. [9] A. Mateescu, A. Salomaa, K. Salomaa, S. Yu, A sharpening of the Parikh mapping, Theoret. Inform. Appl. 35 (2001) 551–564. [10] A. Mateescu, A. Salomaa, S. Yu, Subword histories and Parikh matrices, J. Comput. Systems Sci. 68 (2004) 1–21. [11] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors in finite and infinite words, in: J. Karhumäki, H. Maurer, G. P˘aun, G. Rozenberg (Eds.), Jewels are Forever, Springer, Berlin, 1999, pp. 339–350. [12] R.J. Parikh, On context-free languages, J. Assoc. Comput. Mach. 13 (1966) 570–581. [13] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages 1–3, Springer, Berlin, Heidelberg, New York, 1997. [14] J. Sakarovitch, I. Simon, Subwords, in: M. Lothaire (Ed.), Combinatorics on Words, Addison-Wesley, Reading, MA, 1983, pp. 105–142. [15] A. Salomaa, Counting (scattered) subwords, EATCS Bull. 81 (2003) 165–179. [16] A. Salomaa, On the injectivity of Parikh matrix mappings, Fund. Inform. 64 (2005) 391–404. [17] T.-F. Serb˘ ¸ anu¸ta˘ , Extending Parikh matrices, Theoret. Comput. Sci. 310 (2004) 233–246.
Theoretical Computer Science 340 (2005) 204 – 219 www.elsevier.com/locate/tcs
Words derivated from Sturmian words Isabel M. Araújoa,1 , Véronique Bruyèreb,∗ a Departamento de Matemática, Universidade de Évora Rua Romão Ramalho, 59, 7000-671 Évora, Portugal b Institut d’Informatique, Université de Mons-Hainaut, Le Pentagone av. du Champ de Mars,
6, 7000 Mons, Belgium
Abstract A return word of a factor of a Sturmian word starts at an occurrence of that factor and ends exactly before its next occurrence. Derivated words encode the unique decomposition of a word in terms of return words. Vuillon has proved that each factor of a Sturmian word has exactly two return words. We determine these two return words, as well as their first occurrence, for the prefixes of characteristic Sturmian words. We then characterize words derivated from a characteristic Sturmian word and give their precise form. Finally, we apply our results to obtain a new proof of the characterization of characteristic Sturmian words which are fixed points of morphisms. © 2005 Elsevier B.V. All rights reserved. Keywords: Combinatorics on words; Sturmian words; Return words
1. Introduction The concepts of return words and derivated words were introduced by Durand in [9]. Given a Sturmian word x, a return word of a factor w of x, is a word that starts at an occurrence of w in x and ends exactly before the next occurrence of w. Derivated words encode the unique decomposition of a word in terms of its return words. In [14], Vuillon characterized Sturmian words in terms of their return words by showing that an infinite word is Sturmian if and only if each non-empty factor w of x has exactly ∗ Corresponding author.
E-mail addresses:
[email protected] (I.M. Araújo),
[email protected] (V. Bruyère). 1 Also at Centro de Álgebra da Universidade de Lisboa, Avenida Professor Gama Pinto, 2, 1649-003 Lisboa,
Portugal. 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.020
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
205
two distinct return words. In [2], the authors considered the shortest of those return words and its first occurrence. That permitted to answer negatively a question posed by Michaux and Villemaire in [13]. In this paper, we are interested in both return words as well as in the associated derivated word. Thus, in Section 2, we introduce the classes of Sturmian words and characteristic Sturmian words. In Section 3, we give the exact form of the return words of prefixes of characteristic Sturmian words together with their first occurrence. That allows us to fully characterize derivated words of characteristic Sturmian words, which we do in Section 4. Finally, in Section 5, we apply our results to obtain an alternative proof for the characterization of characteristic Sturmian words which are fixed points of morphisms given in [8]. 2. Sturmian words An infinite word x is Sturmian if the number of distinct factors of length n is exactly n+1. The function px : N → N such that px (n) is the number of distinct factors of x of length n is called the complexity of the infinite word x. It is well known that any non-ultimately periodic word satisfies px (n) n + 1, for all n ∈ N; in this sense Sturmian words are words of minimal complexity among infinite non-ultimately periodic words. It is clear from the definition of Sturmian word that any Sturmian word is necessarily binary. Moreover, all words considered in this paper are binary. There is a vast amount of literature on Sturmian words and their study is an active area of research. Both Chapter 2 in [12] and the survey [4] are comprehensive introductions to Sturmian words and contain many references to recent works. Allouche and Shallit’s recent book [1] also contains two chapters on the subject. We now define the subclass of characteristic Sturmian words. For an irrational ∈]0, 1[ we define a sequence (tn )n of finite words by t0 = 0,
t1 = 0a1 1,
an tn = tn−1 tn−2
(n 2),
where [0, a1 + 1, a2 , . . .] is the continued fraction expansion of (a1 0 and ai 1 for i 2). It is also usual to consider t−1 = 1, which permits to write t1 = t0a1 t−1 . We then define the infinite word f = lim tn n→∞
which is called the characteristic Sturmian word of slope . The sequence (tn )n is called the associated standard sequence of f . To each characteristic Sturmian word we may associate the sequence (qn )n of the lengths of the words tn of the above given sequence. Clearly (qn )n is given by q0 = 1,
q1 = a1 + 1,
qn = an qn−1 + qn−2
(n 2).
Any characteristic Sturmian word is indeed a Sturmian word. This fact is a consequence of the study of Sturmian words as mechanical words (see [12, Chapter 2]). It can also be proved within this context (see [12, Proposition 2.1.18]), that every Sturmian word has the same set of factors as a well chosen characteristic Sturmian word. Notice that any tn is a
206
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
prefix of both tm , for m n 1, and of f . On the other hand, if a1 = 0, then t0 = 0 is not a prefix of neither tn , for n 1, nor f . A pair of finite words (u, v) is standard if there is a finite sequence of pairs of words (0, 1) = (u0 , v0 ), (u1 , v1 ), . . . , (uk , vk ) = (u, v) such that for each i ∈ {1, . . . , k}, either ui = vi−1 ui−1 and vi = vi−1 , or ui = ui−1 and vi = ui−1 vi−1 . An unordered standard pair is a set {u, v} such that either (u, v) or (v, u) is a standard pair. A word of a standard pair is called a standard word. Any standard word is primitive (see [12, Proposition 2.2.3]), any word in a standard sequence (tn )n is a standard word and every standard word occurs in some standard sequence (see [12, Section 2.2.2]). A factor u of a word x is left (respectively right) special if 0u and 1u (respectively u0 and u1) are factors of x; it is bispecial if it is both left and right special. It is easy to see that a word x is Sturmian if and only if it has only one left (respectively right) special factor of each length. For a characteristic Sturmian word f , the set of left special factors is its set of prefixes, and its set of right special factors is the set of reversal of prefixes (see [12, Section 2.1.3]). Moreover, the bispecial factors of a characteristic Sturmian word f are the prefixes of f which are palindromes. The next lemma lists some useful facts about characteristic Sturmian words. For a finite word w of length greater than or equal to 2, we denote by c(w) the word obtained from w by swapping its last two letters. For a non-empty word w, we denote by w the word obtained from w by deleting its last letter. We say that a factor u of w is a strict factor of w if u is neither a prefix, nor a suffix, of w. Lemma 1. With the above notation, for any n ∈ N, (a) tn tn−1 = c(tn−1 tn ) and tn tn−1 = tn−1 tn , (b) tn tn−1 is not a strict factor, nor a suffix, of tn tn−1 tn , (c) the prefixes of f which are palindromes are the prefixes of length aqn + qn−1 − 2, with 1 a an+1 . Proof. A proof of (a) appears in [1], statement (b) is an easy consequence of [2, Lemma 3.8(iv)] and (c) can be found in [6]. 3. Return words Given a non-empty factor w of a Sturmian word x = x0 x1 . . . (where each xi is a letter of x), an integer i is said to be an occurrence of w in x if xi xi+1 . . . xi+|w|−1 = w. For adjacent occurrences i, j , i < j , of w in x the word xi xi+1 . . . xj −1 is said to be a return word of w in x (or, more precisely, the return word of the occurrence i of w in x). That is, a return word of w in x is a word that starts at an occurrence of w in x
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
207
and ends exactly before the next occurrence of w. Note that a return word of w always has w as a prefix or is a prefix of w. The latter happens when two occurrences of w overlap. Return words were first defined by Durand in [9]. Example 2. Consider the characteristic Sturmian word f , where has continued fraction expansion [0, 3, 2]. According to the definition of the sequence (tn )n , in this case we have t0 = 0, t1 = 001 and t2 = 0010010. Moreover, the word 00100100010010001001000100100100010010 is a prefix of f . Let us look for the return words of the factor w = 001. Occurrences of that factor are underlined. The words u = 001 and v = 0010 are the two return words of 001 that we find in that prefix of f . 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0. u
v
u
v
u
v
u
u
v
(1)
u
In [14], Vuillon shows that an infinite binary word x is Sturmian if and only if each nonempty factor of x has exactly two return words. In [2] we studied the shortest of those return words and its first occurrence. That study has permitted, in particular, to answer negatively a question posed by Michaux and Villemaire in [13]. In the next proposition we recall this result from [2] (see Fig. 1). Proposition 3 (Araújo and Bruyère [2, Proposition 3.2]). Let n 2. With the above notation, the shortest return word of a prefix w of f of length in the interval In =]qn + qn−1 − 2, qn+1 + qn − 2] is tn , and its first occurrence as a return word of w is 0 if |w| qn+1 − 2, an+2 qn+1 otherwise. We are now also interested in the other return word of a prefix of a characteristic Sturmian word. The next proposition gives its form and its first occurrence (see Fig. 2). Proposition 4. Let n 2 and i ∈ {1, . . . , an+1 }. With the above notation, the longest return word of a prefix w of f of length in the interval In,i =]iqn + qn−1 − 2, (i + 1)qn + qn−1 − 2] is tni tn−1 , and its first occurrence as a return word of w is (an+1 − i)qn . Remark 5. The interval In considered in Proposition 3 is the union of the intervals In,i , with i ∈ {1, . . . , an+1 }, considered in Proposition 4. In particular, it is clear that qn+1 −2 = an+1 qn + qn−1 − 2. Also, notice that in Proposition 4, when i = an+1 , the longest return word of w is tn+1 and its first occurrence is 0. In order to prove Proposition 4 we start by proving two lemmas. The first one gives us a special decomposition of a prefix of f . The second lemma points out a strategy to prove Proposition 4. Lemma 6. For n 0 and i ∈ {1, . . . , an+1 }, tn+1 tni tn−1 is a prefix of f .
208
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
Fig. 1. The shortest return word of a prefix of f and its first occurrence.
Fig. 2. The longest return word of a prefix of f and its first occurrence.
Proof. Let n 0 and i ∈ {1, . . . , an+1 }. We have that a
a
a
n+3 n+2 n+3 tn+3 = tn+2 tn+1 = tn+1 tn tn+2
−1
tn+1
is a prefix of f . If an+2 > 1, it follows that a
tn+1 tn+1 = tn+1 tn n+1 tn−1 is a prefix of f . Thus tn+1 tni tn−1 is a prefix of f . If, on the other hand, an+2 = 1, we see a that tn+1 tn tn+1 = tn+1 tn tn n+1 tn−1 is a prefix of f , and therefore so is tn+1 tni tn−1 . Lemma 7. Let n 0 and i ∈ {1, . . . , an+1 }. The occurrences of the prefixes of f , with lengths in the interval In,i =]iqn + qn−1 − 2, (i + 1)qn + qn−1 − 2], coincide. Moreover, if w and w are two such prefixes, and j is an occurrence of w and w in f , then a word u is a return word for the occurrence j of w if and only if it is a return word for the occurrence j of w . Proof. For the first part of the proof, it is enough to show that given w = x0 . . . xk−1 and w = x0 . . . xk , with k, k + 1 ∈ In,i , an occurrence of w is an occurrence of w in f . Notice that k < (i + 1)qn + qn−1 − 2. Hence, by Lemma 1(c), w is a prefix of f which is not a palindrome. Therefore w is not bispecial and, in particular, it is not right special (recall that w, being a prefix of f , is a left special factor). Thus, the only factor of f of length k + 1, which begins by w, is w . Therefore, any occurrence of w in f is an occurrence of w in f .
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
209
Fig. 3. Illustration of occurrences of tni tn in ](an+1 − i)qn , qn+1 [.
The second statement follows immediately from the first one and the definition of return word. Proof of Proposition 4. Let n 2 and i ∈ {1, . . . , an+1 }. By Lemma 7, it is enough to prove the result for the prefix of length iqn + qn−1 of f , tni tn−1 . Notice that this length a −i belongs to the interval In,i , since n 2. By Lemma 6, tn+1 tni tn−1 = tn n+1 tni tn−1 tni tn−1 is i a prefix of f . Thus both (an+1 − i)qn and qn+1 are occurrences of tn tn−1 in f . Moreover, there is no occurrence of tni tn−1 between (an+1 − i)qn and qn+1 . Indeed, if we suppose otherwise, there are two cases to be considered: (a) tni tn−1 occurs at a position in the interval ](an+1 − i)qn , an+1 qn ], (b) tni tn−1 occurs at a position in the interval ]an+1 qn , qn+1 [. The two cases are illustrated in Fig. 3 in which the first line represents the prefix tn+1 tn = a −i tn n+1 tni tn−1 tn of f , and the other lines represent the beginning of occurrences of tni tn−1 as described in cases (a) and (b) (keeping in mind that tn−1 is a prefix of tn ). Case (a) implies that tn tn−1 is a strict factor, or a suffix, of tn tn−1 tn , contradicting Lemma 1(b). In case (b) we obtain tn as a strict factor, or a suffix, of tn tn−1 which contradicts the primitivity of tn . We therefore conclude that tni tn−1 is a return word of tni tn−1 in f . We shall now determine the first occurrence of tni tn−1 as a return word of tni tn−1 in f . By the first part of the proof we already know that this first occurrence is bounded by (an+1 −i)qn . Now, if i = an+1 , then (an+1 −i)qn = 0 and therefore, in this case, 0 is the first occurrence of tni tn−1 as a return word of tni tn−1 in f . Suppose now that i < an+1 . Observing a the prefix tn+1 = tn n+1 tn−1 of f , we see that 0, qn , . . . , (an+1 − i)qn are occurrences of tni tn−1 . Therefore, the only return word that appears before position (an+1 − i)qn is tn , the shortest return word. We conclude that the first occurrence of tni tn−1 as a return word of tni tn−1 in f is (an+1 − i)qn . Propositions 3 and 4 are actually valid for n 0, though we have proved them only for n 2. The proofs for smaller values of n have to be made separately and are rather technical; they appear in an appendix at the end of this paper. Example 8. Let f and w = 001 be as in Example 2. Thus |w| = 3 and hence |w| ∈ ]q1 + q0 − 2, 2q1 + q0 − 2] = [3, 5]. Therefore we are in the case n = 1, i = 1 and the return words of w in f are indeed the words t1 = 001, t1 t0 = 0010, found in Example 2. Moreover, applying Propositions 3 and 4, we have that the first occurrence of t1 as a return word is at position 0, while the first occurrence of t1 t0 as a return word is at position (a2 − i)q1 = 3, as observed in Example 2.
210
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219 a
n+1 Now, for n 0, consider the interval In = ∪i=1 In,i as in Propositions 3 and 4. We may define In+1,0 = In,an+1 , and
Jn =
a
−1
n+1 In,i if n > 0, ∪ i=0 an+1 −1 ∪ i=1 In,i if n = 0.
Notice that Jn corresponds to shifting In to the left. From Propositions 3 and 4, we have that a the set of return words of a prefix w with length in In,an+1 = In+1,0 is {tn , tn n+1 tn−1 }, which 0 can also be written as {tn+1 , tn+1 tn }. Thus, combining Propositions 3 and 4, we obtain Proposition 9. Let n 1 and i ∈ {0, . . . , an+1 − 1} or n = 0 and i ∈ {1, . . . , an+1 − 1}. Let w be a prefix of f of length in In,i . Then the return words of w in f are tn and tni tn−1 . Moreover, the first occurrence of tn as a return word of w is 0, and the first occurrence of tni tn−1 as a return word of w is (an+1 − i)qn . The change of indexes from Propositions 3 and 4 to Proposition 9 will be very useful in the proofs of the results in the remaining of the paper. Therefore, we will refer to Proposition 9 whenever we make use of the return words of a prefix of a characteristic Sturmian word. Remark 10. Notice that working with characteristic Sturmian words is not a restriction since every Sturmian word has the same set of factors of a well-chosen characteristic Sturmian word. In Proposition 9, we study the return words of the prefixes of f . Since the prefixes of a characteristic Sturmian word coincide with the left special factors of any Sturmian word with the same set of factors, Proposition 9 actually gives us the form of the return words of the left special factors of a Sturmian word. Remark 11. In [14], Vuillon uses factor graphs of a Sturmian word x to study the return words of x. Factor graphs are efficient tools to study the factors of Sturmian words (for definition and applications see, for instance, [3,6,10]); they are formed by two cycles, intersecting each other in either one vertex or on a simple path. Vuillon, while proving that an infinite word is Sturmian if and only if each factor has exactly two return words, shows that the form of the return words of x depends on the labels of the above-mentioned cycles.
4. Derivated words Let us now introduce the concept of derivated word proposed by Durand in [9]. Let x be a Sturmian word, let w be a prefix of x and let u, v be the two return words of w. Then x can be written in a unique way as a concatenation of the words u and v. Suppose, without loss of generality, that u appears before v in that concatenation. Denote by (x) the first letter of x. Thus we define a bijection : {u, v} → {0, 1} by putting (u) = (x) and (v) = 1 − (x). In this way, if x = z1 z2 . . ., with zi ∈ {u, v}, we define Dw (x) = (z1 )(z2 ) · · · .
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
211
The word Dw (x) is called the derivated word of x with respect to w. The derivated word Dw (x) is a renaming by 0 and 1 of the occurrences of u and v in the decomposition of x in terms of its return words. This definition is better understood with an example. Example 12. Once again let f be the characteristic Sturmian word of slope = [0, 3, 2], and consider the return words of the prefix w = 001. The two return words of w in f are u = 001 and v = 0010. Thus we set (u) = (f ) = 0 and (v) = 1 − (f ) = 1. Thus, from (1), we see that the derivated word of f with respect to 001 starts with 0 1 0 1 0 1 0 0 1 0. Remark 13. Note that the images (u) and (v) were chosen so that any derivated word of x starts with the same letter as x. Remark 14. If two prefixes w, w of x have the same return words u and v then their derivated word is the same. Thus, we may call Dw (x) = Dw (x) the derivated word of x with respect to the return words u and v. Proposition 9 from above describes some prefix of the derivated word of f with respect to its prefix w. With the notation of the proposition, Dw (f ) has a prefix (u)an+1 −i (v). In the next proposition we determine the precise form of the whole derivated word Dw (f ). Its proof uses Proposition 9. Proposition 15. Let f be a characteristic Sturmian word of slope , where is given by its continued fraction expansion [0, a1 + 1, a2 , . . .]. For a prefix w of f whose return words are tn , tni tn−1 (n 1, i ∈ {0, . . . , an+1 − 1} or n = 0, i ∈ {1, . . . , an+1 − 1}), the derivated word Dw (f ) of f is the characteristic Sturmian word of slope • [0, an+1 + 1 − i, an+2 , an+3 , . . .] if a1 > 0; and • [0, 1, an+1 − i, an+2 , an+3 , . . .] if a1 = 0. In order to prove Proposition 15 we need two lemmas. The first lemma, which can be found in [9], is based on the unicity of the decomposition of a Sturmian word with respect to the return words of some prefix of it. Lemma 16 (Durand [9]). Let x be an infinite Sturmian word, w a prefix of x and let u, v be the two return words of w, such that u appears before v in the decomposition of x. Let be the morphism obtained by extending the mapping (x) → u and 1 − (x) → v. Then (a) (Dw (x)) = x and (b) if d is a word such that (d) = x then d = Dw (x). We denote by E the morphism 0 → 1, 1 → 0. Notice that E 2 is the identity mapping. The next lemma relates a characteristic Sturmian word f and its image E(f ), with respect to their associated standard sequences and their derivated words. Lemma 17. Let f be the characteristic Sturmian word of slope , where has continued fraction expansion [0, a1 + 1, a2 , . . .].
212
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
(a) E(f ) = f1− and 1 − has continued fraction expansion • [0, 1, a1 , a2 , . . .] if a1 > 0 and • [0, a2 + 1, a3 , . . .] if a1 = 0. (b) If (tn )n and (sn )n are the standard sequences associated to f and f1− , respectively, then • if a1 > 0 then E(tn ) = sn+1 for all n 0 and • if a1 = 0 then E(tn ) = sn−1 for all n 1. (c) Let n 1 and i ∈ {0, . . . , an+1 − 1} or n = 0 and i ∈ {1, . . . , an+1 − 1}. Then d is derivated from f with respect to the return words tn , tni tn−1 if and only if E(d) is derivated from f1− with respect to the return words E(tn ), E(tni tn−1 ). If a1 > 0, this is also true for n = 0. Proof. (a) The fact that E(f ) = f1− is proved in [12, Corollary 2.2.20]. The form of the continued fraction of 1 − comes from the definition of continued fractions. (b) Suppose first that a1 > 0. Then the continued fraction of 1 − is [0, 1, a1 , a2 , . . .]. Moreover, (sn )n , the standard sequence associated to f1− is given by s0 = 0,
s1 = 1,
a
n−1 sn = sn−1 sn−2
(n 2).
We prove that E(tn ) = sn+1 , for all n 0, by induction on n. For 0 and 1 we have E(t0 ) = 1 = s1 ,
E(t1 ) = E(0a1 1) = 1a1 0 = s1a1 s0 = s2 .
Now, let n 2 and suppose that the claim is true for n − 1 and n − 2. Then an E(tn ) = E(tn−1 tn−2 ) = snan sn−1 = sn+1
which completes the induction. Suppose now that a1 = 0. Since E(f1− ) = f , and in the continued fraction of 1 − the first non-zero entry is strictly greater than 1, we can use the first case to conclude that, for all n 0, E(sn ) = tn+1 . Thus E(tn ) = sn−1 , for all n 1, as desired. (c) Clearly, by Proposition 9 and by (b), if tn and tni tn−1 are the return words of some prefix w of f , then E(tn ) and E(tni tn−1 ) are the return words of the prefix E(w) of f1− . The definition of E permits us to conclude that d is derivated of f if and only if E(d) is derivated from f1− . Remark 18. Lemma 17(c) tells us, in particular, that for a prefix w of f , E(Dw (f )) = DE(w) (E(f )) = DE(w) (f1− ). Proof of Proposition 15. Suppose first that a1 > 0 and let d = Dw (f ). Notice that f begins by 0, and thus d also begins by 0. Thus, by Proposition 9, 0an+1 −i 1 is a prefix of d. Let be the morphism 0 → tn ,
1 → tni tn−1 .
We define a sequence of finite words (rm )m by setting r0 = 0,
r1 = 0an+1 −i 1,
a
m+n rm = rm−1 rm−2
(m 2).
(2)
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
213
Let us see that (rm ) = tm+n , for all m 0. We use induction on m. For m = 0 and m = 1 we have:
(r0 ) = tn , a
−i
a
(r1 ) = tn n+1 tni tn−1 = tn n+1 tn−1 = tn+1 . Suppose now that m 2 and that the claim is true for m − 1 and m − 2. Then a
a
m+n m+n (rm ) = (rm−1 rm−2 ) = tm+n−1 tm+n−2 = tm+n .
Now, if we let d = lim rm , we obtain (d ) = f and hence, by Lemma 16, d = d . Thus, by (2), d is the characteristic Sturmian word whose slope has continued fraction expansion [0, an+1 + 1 − i, an+2 , an+3 , . . .]. Now let a1 = 0. By Lemma 17, the derivated word of f with respect to the return words tn , tni tn−1 is the image by E of the derivated word d of E(f ) = f1− with respect to the return words E(tn ) and E(tni tn−1 ). The continued fraction expansion of 1 − is [0, a2 + 1, a3 , . . .] i sn−2 . Thus, by the first part of the proof, the slope of and E(tn ) = sn−1 , E(tni tn−1 ) = sn−1 d has the continued fraction expansion
[0, an+1 + 1 − i, an+2 , an+3 , . . .]. Therefore E(d) is the characteristic Sturmian word whose slope has the continued fraction expansion [0, 1, an+1 − i, an+2 , an+3 , . . .].
Example 19. Let f be as in Example 12. It is easy to see that f has exactly five derivated words: they are the characteristic Sturmian words whose slope is [0, 2, 2, 3],
[0, 3, 3, 2],
[0, 2, 3],
[0, 4, 2, 3]
and
[0, 3, 2].
The next result relates Proposition 15 and [9, Theorem 2.5]. We start by introducing some definitions with respect to morphisms. A morphism is non-trivial if it is neither the identity nor E, and it is non-erasing if the image of each letter is non-empty. Given a morphism , we say that a word x is a fixed point of , if (x) = x. Moreover, an infinite word x is morphic if there exists a non-erasing morphism , such that (a) = as with a ∈ {0, 1}, s = ε, and x = (a) (in particular, x is a fixed point of ). The infinite word x is called substitutive if it is the image by a literal morphism (i.e. the image of a letter is a letter) of a morphic word. Theorem 20. For a characteristic Sturmian word f , the following conditions are equivalent: (a) the continued fraction of is ultimately periodic,
214
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
(b) the set of all derivated words (with respect to prefixes of f ) of f is finite, (c) f is substitutive. Proof. The equivalence of (b) and (c) is [9, Theorem 2.5]. Let us now prove that (a) and (b) are equivalent. We consider first the case a1 > 0. Suppose that the set of all derivated words of f is finite. Thus, applying Proposition 15, there exist m, n, i and j, with m < n, such that the derivated word of f with respect to j i t tm , tm m−1 and the derivated word of f with respect to tn , tn tn−1 coincide, that is, the continued fraction expansions [0, am+1 + 1 − i, am+2 , am+3 , . . .]
and
[0, an+1 + 1 − j, an+2 , an+3 , . . .]
are equal. Therefore am+k = an+k , for all k 2, and [0, a1 + 1, a2 , a3 , . . .] = [0, a1 + 1, . . . , am+1 , am+2 , . . . , an+1 ] is ultimately periodic. Conversely, suppose that [0, a1 + 1, a2 , a3 , . . .] is ultimately periodic. It is clear from Proposition 15 that there are only finitely many derivated words from f . The equivalence of (a) and (b) for a1 = 0 is proved similarly.
5. An application As an application of the previous results, we obtain a new proof for Theorem 21 in terms of return words and derivated words. This theorem was first proved by Crisp et al. in [8]. Both Berstel and Séébold in [5] and Komatsu and van der Poorten in [11] have presented alternative proofs. This theorem states three equivalences like in Theorem 20, in the case where is a Sturm number, that is, its continued fraction expansion is of one of the following types: (i) [0, a1 + 1, a2 , . . . , an ], with an a1 1, (ii) [0, 1, a1 , a2 , . . . , an ], with an a1 . It is easy to see that is a Sturm number of type (i) if and only if 1 − is a Sturm number of type (ii). The main ingredients of our proof of Theorem 21 are Proposition 15 and the fact that if a characteristic Sturmian word is a fixed point of a morphism , then {(0), (1)} is a unordered standard pair (see [12, Proposition 2.3.11, Theorem 2.3.12] for a proof of this result). Theorem 21. For a characteristic Sturmian word f , the following conditions are equivalent: (a) is a Sturm number, (b) there exists a non-empty prefix w of f such that Dw (f ) = f , (c) f is a fixed point of a (non-erasing, non-trivial) morphism. Proof. [(a)⇒(b)] Suppose first that is a Sturm number of the form [0, a1 + 1, a2 , . . . , an ], i t where an a1 1. Consider the pair of words tn−1 , tn−1 n−2 , where i = an − a1 . We i t have 0 i an − 1, and thus tn−1 , tn−1 are the return words of some prefix of f . n−2
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
215
By Proposition 15, the derivated word of f with respect to those return words is the characteristic Sturmian word whose slope has the continued fraction expansion [0, an−1+1 + 1 − (an − a1 ), an−1+2 , an−1+3 , . . . , an−1+n ] = [0, a1 + 1, an+1 , an+2 , . . . , a2n−1 ] = [0, a1 + 1, a2 , . . . , an ] which is exactly the continued fraction expansion of . Therefore f is derivated from itself. Suppose now that is a Sturm number of the form [0, 1, a1 , a2 , . . . , an ], with an a1 . Then, by Lemma 17, the continued fraction expansion of 1 − is a Sturm number of the other form. Now, applying the first part of the proof we have that f1− is a derivated word of itself. Thus, E(f1− ) = f is also a derivated word of itself (see Remark 18). [(b)⇒(a)] Suppose that f starts by 0 and that the continued fraction expansion of is [0, a1 + 1, a2 . . .]
(3)
(in particular a1 1). Since f is a derivated word from itself, by Proposition 15, there exist m, i, with m > 0, such that the continued fraction expansion of is [0, am+1 + 1 − i, am+2 , am+3 , . . .].
(4)
Thus, a1 + 1 = am+1 + 1 − i, a2 = am+2 , a3 = am+3 , etc. That is, the continued fraction expansion of is [0, a1 + 1, a2 , . . . , am+1 ] and am+1 = a1 + i. Thus am+1 a1 . Therefore is a Sturm number. Suppose now that f starts with 1. Since f is a derivated word from f , E(f ) = f1− is also a derivated word of itself. Thus 1 − is a Sturm number and hence is also a Sturm number. [(b)⇒(c)] There exists a non-empty prefix w of f such that f = Dw (f ). Let u and v be the return words of w, and let (f ) denote the first letter of f . Hence, by Lemma 16, the morphism , defined by
((f )) = u, (1 − (f )) = v, verifies (f ) = f . [(c)⇒(b)] Let be a morphism such that (f ) = f . We want to show that (0), (1) are the return words of a non-empty prefix w of f . It will follow that Dw (f ) = f . Since has a fixed point which is a characteristic word, by [12, Proposition 2.3.11 and Theorem 2.3.12], {((0), (1)} is an unordered standard pair. In particular (0) and (1) are primitive words. Moreover, this pair is different from {0, 1} since is non-trivial. Claim. Any unordered standard pair, different from {0, 1}, is either {0, 0k 1}, {1, 1k 0}, or {u, uk u }, for some word u, some non-empty prefix u of u and k 1. Proof of Claim. The proof is by induction on the way standard pairs (u, v) are constructed. For the base case, the standard pairs different from (0, 1) are (10, 1) and (0, 01) which verify the claim. It is easy to check that if (u, v) verifies the claim, then the next pairs (vu, v) and (u, uv) also verify the claim.
216
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
Fig. 4. Occurrences i and i + |u| of w in f .
We start by considering the case |(0)| < |(1)|. Suppose first that (0) = u and (1) = uk u , with u a non-empty prefix of u and k 1. The word uk u is a prefix of f since 0a1 1 is a prefix of f , and (0a1 1) = ua1 +k u is a prefix of (f ) = f . Let us show that (0), (1) are the return words of w = uk u in f . The word 01 is clearly a factor of f (otherwise f would be ultimately periodic). Hence (01) = uk+1 u is also a factor of f . Therefore there is an occurrence i of w in f , with i 0, such that i + |u| is also an occurrence of w. The situation is represented in Fig. 4. There is no occurrence of w between i and i + |u| for otherwise u would be a strict factor of uu, contradicting its primitivity. Therefore (0) = u is a return word of w in f . As for the other return word, observe that there exists l 0 such that 10l 1 is a factor of f (otherwise f would be ultimately periodic). Thus (10l 1), and in particular uk u uk u = ww, are factors of f . Thus there are two occurrences j and j + |w| of w in f , for some j 0. There is no intermediate occurrence of w since w = (1) is primitive. It follows that (1) = uk u is the other return word of w in f . Suppose now that
(0) = 0 and (1) = 0k 1 and consider the prefix w = 0k of f . The proof is similar to the previous one. Thanks to the factor (01) of f , we verify that 0 = (0) is a return word of w in f . Thanks to the factor (10l 1) of f , we verify that 0k 1 = (1) is the other return word of w in f . The case (0) = 1 and (1) = 1k 0 is similar. Finally, if |(0)| > |(1)| the proof is analogous. Remark 22. In Theorem 21, statement (c) may be substituted by “f is a morphic word”. Indeed, a characteristic Sturmian word is morphic if and only if it is the fixed point of a (non-erasing, non-trivial) morphism. In order to prove this claim, let be a non-erasing, non-trivial morphism, and let f be a characteristic Sturmian word such that (f ) = f . Suppose, without loss of generality, that the first letter of f is 0. Then (0) = 0w, for some word w. Notice that w cannot be the empty word. Indeed, on one hand, it follows from the proof of Theorem 21 that both (0) and (1) should start by the same letter (in this case, 0). On the other hand if k is the first occurrence of 1 in f , that is 0k 1 is a prefix of f , then
(0k 1) = 0k (1). Since f is a fixed point of it follows that the first letter of (1) is 1, which is a contradiction. Thus w is non-empty and by [12, Theorem 1.2.8] (0) is the only fixed point of that starts with 0. Hence f = (0), and f is a morphic word.
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
217
Example 23. By Theorem 21, the word f from Example 19 is a morphic word since it is a derivated word of itself.
Acknowledgments The first author acknowledges the support of Fundação para a Ciência e a Tecnologia (Grant no. SFRH/BPD/11489/2002), and of the Centro de Álgebra da Universidade de Lisboa. Her participation in this work is partly part of the project POCTI Fundamental and Applied Algebra of Fundação para a Ciência e a Tecnologia and FEDER. She would also like to thank the hospitality of the Institut d’Informatique of the Université de Mons-Hainaut.
Appendix In this appendix we present the proofs of Propositions 3 and 4 for n ∈ {0, 1}. We start by the case n = 0, given by the following: Proposition 24. Let n = 0 and i ∈ {1, . . . , a1 }. Let w = 0i be the prefix of f of length in ]iq0 + q−1 − 2, (i + 1)q0 + q−1 − 2] = {i}. The shortest return word of w is t0 = 0, and its first occurrence is 0 if i < a1 , a2 q1 if i = a1 . The longest return word of w is t0i t−1 = 0i 1, and its first occurrence is (a1 − i)q0 . Proof. If a1 = 0 then the interval {1, . . . , a1 } is empty; thus we may assume that a1 > 0. Notice that (0a1 1)a2 0a1 +1 is a prefix of f . Studying this prefix it is clear that the two return words of w = 0i , are t0 = 0 and t0i t−1 = 0i 1. Moreover, the first occurrence of 0 as a return word of w is 0 if i = a1 and a2 q1 otherwise, while the first occurrence of 0i 1 is a1 − i = (a1 − i)q0 . The next proposition is Proposition 9 in the case n = 1. Proposition 25. Let n = 1 and i ∈ {1, . . . , a2 }. Let w be a prefix of f of length in the interval ]iq1 + q0 − 2, (i + 1)q1 + q0 − 2] = [iq1 , (i + 1)q1 − 1]. The shortest return word of w is t1 , and its first occurrence is 0 if i < a2 , a3 q2 if i = a2 . The longest return word of w is t1i t0 , and its first occurrence is (a2 − i)q1 .
218
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
Fig. 5. Illustration of an occurrence of t1i in ]a3 q2 , q3 [.
Proof. By Lemma 7, for each interval [iq1 , (i + 1)q1 − 1], it is enough to determine the return words for the prefix of w = t1i of f (notice that |w| ∈ [iq1 , (i + 1)q1 − 1]). It is easy to see that t2a3 t1i+1 is a prefix of f and t2a3 t1i+1 = t2a3 t1i t1 = t2a3 t1 t1i . Thus, a3 q2 and q3 are occurrences of t1i in f . Moreover, there is no occurrence of t1i in ]a3 q2 , q3 [. Indeed, if t1i occurred in that interval, we would obtain a situation as shown in Fig. 5 (the top line represents the prefix t2a3 t1 t1 of f , and the bottom line represents the beginning of an occurrence of t1i in ]a3 q2 , q3 [). We would hence have t1 as a strict factor of t1 t1 , contradicting the primitivity of t1 . Hence t1 is a return word of w in f . Now, by Lemma 6, t2 t1i t0 = t1a2 −i t1i t0 t1i t0 is also a prefix of f . Thus, both (a2 − i)q1 and q2 are occurrences of t1i in f . Moreover, there is no occurrence of t1i between (a2 − i)q1 and q2 . Indeed, remember that t1 = 0a1 1 and t0 = 0. Therefore t1i t0 is the longest return word of w in f . We now locate the first occurrence of the two return words of w. Let i < a2 . Since t2 = t1a2 t0 is a prefix of f , we see that 0, q1 , . . . , (a2 − i)q1 are occurrences of w = t1i in f . Therefore the shortest return word t1 occurs at position 0, and the first occurrence of the longest return word t1i t0 is greater than or equal to (a2 − i)q1 . Since we have already seen that (a2 − i)q1 is indeed an occurrence of the return word t1i t0 , we conclude that it is its first occurrence. Let now i = a2 . From the above we have that the first occurrence of the shortest return word t1 is bounded by a3 q2 . Let us see that t1 cannot appear before as a return word of w = t1a2 . It will also follow that the first occurrence of the longest return word t1a2 t0 = t2 is 0. Any occurrence of t1 as a return word of w corresponds to an occurrence of t1 w = t1a2 +1 . Now, if a1 = 0, then t1 = 1 and t2 = 1a2 0. Hence, considering the prefix t2a3 t1a2 +1 of f , it is clear that the first occurrence of t1 w in f is a3 q2 . On the other hand, if a1 > 0, then t2 is a prefix of t1 w. Thus, any occurrence of t1 w smaller than a3 q2 is of the form kq2 , with k ∈ {0, . . . , a3 − 1}, since t2 is primitive. Keeping in mind that t1 is a prefix of t2 , it follows that t1 = t0 t1 (see Fig. 6), which is not possible since t0 = 0 and t1 = 0a1 1.
I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219
219
a
Fig. 6. Illustration of occurrences of t1 as a return word of t1 2 before a3 q2 .
References [1] J.P.Allouche, J. Shallit,Automatic Sequences—Theory,Applications, Generalizations, Cambridge University Press, Cambridge, 2003. [2] I.M. Araújo, V. Bruyère, Sturmian words and a criterium by Michaux–Villemaire. Theoret. Comput. Sci., in press, doi:10.1016/j.tcs.2005.01.010 (Appeared in Proc. Fourth Internat. Conf. on Words, Turku, Finland, 2003, pp. 83–94.) [3] P. Arnoux, G. Rauzy, Représentation géométrique de suites de complexité 2n + 1, Bull. Soc. Math. France 119 (2) (1991) 199–215. [4] J. Berstel, Recent results in Sturmian words, in: Developments in Language Theory, Vol. II, Magdeburg, 1995, World Science Publishing, River Edge, NJ, 1996, pp. 13–24. [5] J. Berstel, P. Séébold, Morphismes de Sturm, Bull. Belg. Math. Soc. Simon Stevin 1 (2) (1994) 175–189 (Journées Montoises, Mons, 1992.). [6] V. Berthé, Fréquences des facteurs des suites sturmiennes, Theoret. Comput. Sci. 165 (1996) 295–309. [8] D. Crisp, W. Moran, A. Pollington, P. Shiue, Substitution invariant cutting sequences, J. Théor. Nombres Bordeaux 5 (1) (1993) 123–137. [9] F. Durand, A characterization of substitutive sequences using return words, Discrete Math. 179 (1998) 89–101. [10] I. Fagnot, L. Vuillon, Generalized balances in Sturmian words, Discrete Appl. Math. 121 (1–3) (2002) 83–101. [11] T. Komatsu, A.J. van der Poorten, Substitution invariant Beatty sequences, Japan J. Math. (N.S.) 22 (2) (1996) 349–354. [12] M. Lothaire, Algebraic Combinatoric on Words, Encyclopedia of Mathematics and its Applications, Cambridge University Press, Cambridge, 2002. [13] C. Michaux, R. Villemaire, Presburger arithmetic and recognizability of sets of natural numbers by automata: new proofs of Cobham’s and Semenov’s theorems, Ann. Pure Appl. Logic 77 (1996) 251–277. [14] L. Vuillon, A characterization of Sturmian words by return words, European J. Combin. 22 (2) (2001) 263–275.
Theoretical Computer Science 340 (2005) 220 – 239 www.elsevier.com/locate/tcs
Codes of central Sturmian words夡 Arturo Carpia, c,∗ , Aldo de Lucab, c a Dipartimento di Matematica e Informatica dell’Università di Perugia, via Vanvitelli 1, 06123 Perugia, Italy b Dipartimento di Matematica e Applicazioni, Università di Napoli “Federico II”, via Cintia, Monte S. Angelo,
80126 Napoli, Italy c Istituto di Cibernetica del C. N. R. “E. Caianiello”, Pozzuoli (NA), Italy
Abstract A central Sturmian word, or simply central word, is a word having two coprime periods p and q and length equal to p + q − 2. We consider sets of central words which are codes. Some general properties of central codes are shown. In particular, we prove that a non-trivial maximal central code is infinite. Moreover, it is not maximal as a code. A central code is called prefix central code if it is a prefix code. We prove that a central code is a prefix (resp., maximal prefix) central code if and only if the set of its ‘generating words’ is a prefix (resp., maximal prefix) code. A suitable arithmetization of the theory is obtained by considering the bijection , called ratio of periods, from the set of all central words to the set of all positive irreducible fractions defined as: () = 1/1 and (w) = p/q (resp., (w) = q/p) if w begins with the letter a (resp., letter b), p is the minimal period of w, and q = |w| − p + 2. We prove that a central code X is prefix (resp., maximal prefix) if and only if (X) is an independent (resp., independent and full) set of fractions. Finally, two interesting classes of prefix central codes are considered. One is the class of Farey codes which are naturally associated with the Farey series; we prove that Farey codes are maximal prefix central codes. The other is given by uniform central codes. A noteworthy property related to the number of occurrences of the letter a in the words of a maximal uniform central code is proved. © 2005 Elsevier B.V. All rights reserved. Keywords: Sturmian word; Code; Central word
夡 The work for this paper has been supported by the Italian Ministry of Education under Project COFIN 2003 Linguaggi Formali e Automi: metodi, modelli e applicazioni.
∗ Corresponding author. Dipartimento di Matematica e Informatica dell’Università di Perugia, via Vanvitelli 1,
06123 Perugia, Italy. E-mail addresses:
[email protected] (A. Carpi),
[email protected] (A. de Luca). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.021
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
221
1. Introduction Sturmian words are infinite sequences of symbols taken from a finite alphabet which are not eventually periodic and have the minimal possible value for the subword complexity, i.e., for any integer n 0 the number of the subwords of length n of any Sturmian word is equal to n + 1. Sturmian words are of great interest both from the theoretical and applicative point of view, so that there exists a large literature on the subject. We refer to the recent overviews on Sturmian words by Berstel and Séébold [4, Chapter 2] and by Allouche and Shallit [1, Chapters 9–10]. A geometrical definition of a Sturmian word is the following: consider the sequence of the cuts (cutting sequence) in a squared-lattice made by a ray having a slope which is an irrational number. A horizontal cut is denoted by the letter b, a vertical cut by a and a cut with a corner by ab or ba. Sturmian words represented by a ray starting from the origin are usually called standard or characteristic. The most famous Sturmian word is the Fibonacci word f = abaababaabaababaababaabaababaabaab · · · , which is the limit of the sequence of words (fn )n 0 , inductively defined as f0 = b, f1 = a, and fn+1 = fn fn−1 for n 1 . Standard Sturmian words can be equivalently defined in the following way which is a natural generalization of the definition of the Fibonacci word. Let c0 , c1 , . . . , cn , . . . be any sequence of integers such that c0 0 and ci > 0 for i > 0. We define, inductively, the sequence of words (sn )n 0 , where c
s0 = b, s1 = a, and sn+1 = snn−1 sn−1 for n 1 . The sequence (sn )n 0 converges to a limit s which is an infinite standard Sturmian word. Any standard Sturmian word is obtained in this way. We shall denote by Stand the set of all the words sn , n 0 of any standard sequence (sn )n 0 . Any word of Stand is called finite standard Sturmian word, or generalized Fibonacci word. In the study of combinatorial properties of Sturmian words a very important role is played by the set PER of all palindromic prefixes of all standard Sturmian words. The words of PER have been called central Sturmian words, or simply central words, in [4]. It has been proved in [6] that a word is central if and only if it has two coprime periods p and q and length equal to p + q − 2. In this paper, we consider sets of central words which are codes, i.e., bases of free submonoids of {a, b}∗ . There are several motivations for this research. From the theoretical point of view central codes have interesting combinatorial properties. In particular, a suitable arithmetization of the theory can be given. Moreover, the words of a central code are palindromes which satisfy some strong constraints which can be useful for the applications (coding with constraints [10], error correcting codes). Finally, we believe that these codes can be of some interest in discrete geometry (for instance to represent polygonals in a discrete plane). In Section 4 some general properties of central codes are shown. In particular, we prove that a non-trivial maximal central code X is PER-complete, i.e., any central word is a factor
222
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
of a word of X ∗ . As a consequence of this proposition and of some technical lemmas, we prove that a non-trivial maximal central code is infinite. Moreover, it is not maximal as a code. In Section 5, we consider prefix central codes, i.e., central codes such that no word of the code is a prefix of another word of the code. We prove that a central code is a prefix (resp., maximal prefix) central code if and only if the set of its ‘generating words’ is a prefix (resp., maximal prefix) code. A suitable arithmetization of the theory is obtained by considering the bijection , called ratio of periods, from the set of all central words to the set I of all positive irreducible fractions defined as: () = 11 and (w) = p/q (resp., (w) = q/p) if w begins with the letter a (resp., letter b), p is the minimal period of w, and q = |w| − p + 2. A suitable derivation relation on the set I is introduced. A subset H of I is called independent if no fraction of the set can be derived from another one. A subset H of I is called full if for any element p/q of I either from p/q one can derive an element of H or there exists an element of H from which one can derive p/q. We prove that a central code X is prefix (resp., maximal prefix) if and only if (X) is an independent (resp., independent and full) set of fractions. In Section 6, we consider for any positive integer n the set n of all central words w having minimal period p, q = |w| − p + 2 n + 1, and |w| n. One can prove that for each n, n is a maximal prefix central code called the Farey code of order n since it is naturally associated with the Farey series of order n. Finally, in Section 7, we consider the class of uniform central codes. A central code is uniform of order n if all the words of the code have length equal to n. For any n the maximal uniform central code of order n is given by Un = PER ∩ {a, b}n . The following noteworthy property, related to the number of occurrences |w|a of the letter a in a word w of a maximal uniform central code Un , is proved: for any k, 0 k n there exists a (unique) word w ∈ Un such that |w|a = k if and only if gcd(n + 2, k + 1) = 1. 2. Preliminaries Let A be a finite non-empty set, or alphabet, and A∗ the free monoid generated by A. The elements of A are usually called letters and those of A∗ words. The identity element of A∗ is called empty word and denoted by . We set A+ = A∗ \ {}. A word w ∈ A+ can be written uniquely as a sequence of letters as w = w1 w2 · · · wn , with wi ∈ A, 1 i n, n > 0. The integer n is called the length of w and denoted |w|. The length of is 0. For any w ∈ A∗ and a ∈ A, |w|a denotes the number of occurrences of the letter a in w. Let w ∈ A∗ . The word u is a factor (or subword) of w if there exist words p, q such that w = puq. A factor u of w is called proper if u = w. If w = uq, for some word q (resp., w = pu, for some word p), then u is called a prefix (resp., a suffix) of w. For any w ∈ A∗ , we denote by Fact w, the sets of its factors. For any X ⊆ A∗ , we set Fact X = Fact u. u∈X
An element of Fact X will be also called a factor of X.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
223
A set X is called dense if any word of A∗ is a factor of X. A set which is not dense is called thin. If X is a finite set we denote by (X) the maximal length of the words of X. Any word of A∗ of length greater than (X) is not a factor of X, so that X is thin. Let Y ⊆ A∗ . A set X is called Y-complete if Y ⊆ Fact X ∗ . A set X which is A∗ -complete, i.e., such that X∗ is dense, is called simply complete. Let p be a positive integer. A word w = w1 · · · wn , wi ∈ A, 1 i n, has period p if the following condition is satisfied: for all 1 i, j n, if i ≡ j (mod p), then wi = wj . From the definition one has that any integer q |w| is a period of w. As is well known, a word w has a period p |w| if and only if there exist words u, v, s such that w = us = sv ,
|u| = |v| = p .
We shall denote by w the minimal period of w. We can uniquely represent w as w = rkr , where |r| = w , k 1, and r is a proper prefix of r. We shall call r the fractional root or, simply, root of w. Let w = w1 · · · wn , wi ∈ A, 1 i n. The reversal of w is the word w ∼ = wn · · · w1 . One defines also ∼ = . A word is called palindrome if it is equal to its reversal. A code X over a given alphabet A is the base of a free submonoid of A∗ , i.e., any nonempty word of X∗ can be uniquely factorized by words of X (cf. [3]). A code X over A is prefix (resp., suffix) if no word of X is a prefix (resp., suffix) of another word of X. A code is biprefix if it is both prefix and suffix. A code X over the alphabet A is maximal if it is not properly included in another code on the same alphabet. As is well known any maximal code is complete. Conversely, a thin and complete code is maximal. A prefix code is a maximal prefix code if it is not properly included in another prefix code on the same alphabet. The following two lemmas will be useful in the sequel. Lemma 1. Let X be a code over the alphabet A and w ∈ A∗ be a word having root . If ∈ Fact X∗ , then X ∪ {w} is a code. Proof. Suppose that Y = X ∪ {w} is not a code. There would exist h, k > 0 and words y1 , . . . , yh , y1 , . . . , yk ∈ Y such that y1 = y1 and y1 · · · yh = y1 · · · yk . Since X is a code and w does not belong to Fact X ∗ , one easily derives that w has to occur in both sides of the previous equation, i.e., there exist minimal positive integers i and j such that w = yi = yj . Setting u = y1 · · · yi−1 and v = y1 · · · yj −1 , one has uw = vw with u, v ∈ X∗ , u = v, and , ∈ Y ∗ .
224
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
With no loss of generality, we can assume |u| > |v|. Then one has u = v , w = w , ∈ A+ . From this latter equation one obtains
w = w with ∈ A+ . This equation shows that || is a period of w and then of w, so that || ||. Thus, is a prefix of and, consequently a factor of u = v . Hence ∈ Fact X ∗ , which is a contradiction. Lemma 2. Let X be a prefix code over the alphabet A and w ∈ A∗ be a word such that wA∗ ∩ X ∗ = ∅. Then Y = X ∪ {w} is a code. Proof. Suppose that Y is not a code. There would exist h, k > 0 and words y1 , . . . , yh , y1 , . . . , yk ∈ Y such that y1 = y1 and y1 · · · yh = y1 · · · yk . Since X is prefix one has y1 = w or y1 = w. Without loss of generality, we may suppose that y1 = w. Since wA∗ ∩ X∗ = ∅ there exists j 2 such that y1 , . . . , yj −1 ∈ X and yj = w. Hence, one has y1 · · · yj −1 = u ∈ X+ and uw = wv,
with v ∈ A∗ .
Let n be a positive integer such that |un | |w|. One has un w = wv n so that un = w for a suitable ∈ A∗ . Thus, wA∗ ∩ X ∗ = ∅, which is a contradiction. 3. Central words In the study of combinatorial properties of Sturmian words a crucial role is played by the set PER of all finite words w having two periods p and q such that gcd(p, q) = 1 and |w| = p + q − 2. We assume that ∈ PER (this is formally coherent with the definition if one takes p = q = 1) The set PER was introduced in [6] where its main properties were studied. In particular, it has been proved that PER is equal to the set of the palindromic prefixes of all standard Sturmian words. The words of PER have been called central in [4]. As is well known, central words are in a two-letter alphabet {a, b} that, in the sequel, will be denoted by A. The set PER has remarkable structural properties. The set of all finite factors of all Sturmian words equals the set of factors of PER. Moreover, the set Stand of all finite standard Sturmian words is given by Stand = A ∪ PER{ab, ba} .
(1)
Thus, any finite standard Sturmian word which is not a single letter is obtained by appending ab or ba to a central word. The following useful characterization of central words is a slight generalization of a statement proved in [5] (see also [8]). We report the proof for the sake of completeness.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
225
Proposition 3. A word w is central if and only if w is a power of a single letter of A or it satisfies the equation: w = w1 abw2 = w2 baw1
(2)
with w1 , w2 ∈ A∗ . Moreover, in this latter case, w1 and w2 are central words, p = |w1 |+2 and q = |w2 | + 2 are coprime periods of w, and min{p, q} is the minimal period of w. Proof. In view of the results of [5, Lemma 4], it is sufficient to prove that any word w satisfying Eq. (2) is a central word. Indeed, in such a case, w has the two periods p = |w1 ab| and q = |w2 ba|, and |w| = p + q − 2. Moreover, gcd(p, q) = 1. In fact, suppose that gcd(p, q) = d 2. By the theorem of Fine and Wilf (see, e.g., [9]) the word w will have the period d. Thus, w1 ab = zp/d and w2 ba = zq/d , where z is the prefix of w of length d. We reach a contradiction since from the first equation the last letter of z has to be b, while from the second equation it has to be a. Since p and q are coprime, the word w is central. Finally, we observe that since w is palindrome, w1 and w2 are palindromes and prefixes of a central word, so that they are central words. The following corollary will be useful in the sequel. Corollary 4. If w ∈ PER has the factor x n with x ∈ A and n > 0, then x n−1 is a prefix (and suffix) of w. Proof. We can assume, with no loss of generality, that x = a. If w is a power of a letter, the statement is trivially true. If, on the contrary, w is not a power of a letter, then by Proposition 3, w = w1 abw2 = w2 baw1 with w1 , w2 ∈ PER. The word a n is a factor of w2 or of w1 or a prefix of aw1 . In the first two cases, by making induction on the length of |w|, we can assume that w2 or w1 has the prefix a n−1 ; in the third case, w1 has the prefix a n−1 . Thus, in all cases, a n−1 is a prefix of w. For any word w we denote by w (−) the shortest palindrome having the suffix w. The word w(−) is called the palindromic left-closure of w. For any set of words X, we set X(−) = {w (−) | w ∈ X}. The following lemmas were proved in [5]. Lemma 5. For any w ∈ PER, one has (aw)(−) , (bw)(−) ∈ PER. More precisely, if w = w1 abw2 = w2 baw1 , then (aw)(−) = w2 baw1 abw2 ,
(bw)(−) = w1 abw2 baw1 .
If w = x n with {x, y} = A, then (xw)(−) = x n+1 , (yw)(−) = x n yx n . Lemma 6. Let u, w ∈ PER and x ∈ A. If ux is a prefix of w, then also (xu)(−) is a prefix of w.
226
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
By Proposition 3 and Lemma 5 one easily derives that if u = (xw)(−) with w ∈ PER and x ∈ A, then |u| = u + |w| .
(3)
The following method to generate central words was introduced in [5]. By the preceding lemma, we can define the map
: A∗ → PER as follows: () = and for all v ∈ A∗ , x ∈ A,
(vx) = (x (v))(−) . The map : A∗ → PER is a bijection. The word v is called the generating word of (v). One has that for all v, u ∈ A∗
(vu) ∈ A∗ (v) ∩ (v)A∗ .
(4)
Example 7. Let w = abba. One has
(a) = a ,
(ab) = aba ,
(abb) = ababa ,
(abba) = ababaababa . As usually, one can extend to the subsets of A∗ by setting, for all X ⊆ A∗ , (X) = { (x) | x ∈ X}. In particular, one has (aA∗ ) = PERa and (bA∗ ) = PERb , where PERa = PER ∩ aA∗
and PERb = PER ∩ bA∗ .
Let I be the set of all irreducible positive fractions. We consider the map : PER → I, called the ratio of periods, defined as follows: let w ∈ PER \ {ε}, p be the minimal period of w, and q = |w| + 2 − p. We set
(w) = p/q if w ∈ PERa ,
(w) = q/p if w ∈ PERb .
Moreover,
(ε) =
1 1
.
As is well known [5] the map is a bijection. We recall that for all w ∈ PER, the numbers |w|a + 1 and |w|b + 1 are coprime. Moreover the function : PER → I defined, for any w ∈ PER, by
(w) =
|w|b + 1 |w|a + 1
is a bijection [2], called slope. Since and are both bijections, the values of each of them is determined from the other.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
227
We introduce in I the binary relation ⇒ defined as follows: for p/q, r/s ∈ I, one sets r p ⇒ q s
if
p q , r ∈ {p, q} , s = p + q ,
or
p q , s ∈ {p, q} , r = p + q .
One easily verifies that the graph of this relation is a complete binary tree with root 11 . We denote by ⇒∗ the reflexive and transitive closure of ⇒. For instance, one has 21 ⇒ 23 ⇒ 2 5 1 ∗ 5 5 ⇒ 7 , so that 2 ⇒ 7 . From Lemma 5 one derives that for any w, w ∈ PER one has
(w) ⇒ (w ) if and only if w = (xw)(−) , with x ∈ A .
(5)
We say that a subset H of I is independent if for any pair of fractions p/q, r/s ∈ H such that p/q ⇒∗ r/s one has p/q = r/s. A subset H of I is full if for any fraction p/q ∈ I there exists a fraction r/s ∈ H such that p/q ⇒∗ r/s or r/s ⇒∗ p/q. One introduce the Farey map Fa = ◦ . Thus for any x ∈ A∗ one has Fa(x) = ( (x)) ∈ I. Since and are bijections, also Fa will be so. Lemma 8. Let x, x ∈ A∗ . One has that Fa(x) ⇒∗ Fa(x ) if and only if x is a prefix of x . Proof. It is sufficient to prove that for any pair of words x, x ∈ A∗ , one has Fa(x) ⇒ Fa(x ) if and only if x ∈ xA. We suppose that x ∈ aA∗ (the case where x ∈ bA∗ or x = ε can be dealt with similarly). We set Fa(x) = p/q. Therefore, by Eq. (5), {Fa(xa), Fa(xb)} =
q p , p+q p+q
.
Thus, p/q ⇒ Fa(x ) if and only if Fa(x ) ∈ {Fa(xa), Fa(xb)}. Since Fa is a bijection, this last condition is equivalent to x ∈ xA. Corollary 9. A set X ⊆ A+ is a prefix code if and only if Fa(X) is an independent set. Proof. Let x and x be two distinct elements of X. By the previous lemma, x is a proper prefix of x if and only if Fa(x) ⇒∗ Fa(x ). This implies that X is a prefix code if and only if Fa(X) is an independent set. Corollary 10. A prefix code X ⊆ A∗ is maximal if and only if Fa(X) is a full set. Proof. A prefix code X is maximal if and only if for any word w ∈ A∗ there exists a word x ∈ X such that either w is a prefix of x or x is a prefix of w. By Lemma 8 this occurs if and only if Fa(w) ⇒∗ Fa(x) or Fa(x) ⇒∗ Fa(w). This implies that X is a maximal prefix code if and only if Fa(X) is a full set.
228
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
4. Central codes In this section we shall consider sets of central words which are codes. These codes, which are in a two-letter alphabet, will be called Sturmian central codes or, simply, central codes. For instance the sets X1 = {a, b}, X2 = {b, aa, aba}, X3 = {aa, aabaa, babbab}, and X4 = {b2 } ∪ (ab)∗ a are central codes. Proposition 11. A central code is thin. Proof. This is a consequence of the fact that the set PER is thin. Indeed, for instance, as is well known, the word aabb is not a factor of any Sturmian word. A central code is maximal if it is not properly included in another central code. By using a classical argument based on the Zorn property, which is satisfied by the family of central codes, one easily derives that any central code is included in a maximal central code. Proposition 12. A maximal central code is PER-complete. Proof. Let X be a maximal central code. By contradiction, suppose that there exists a word f ∈ PER such that f ∈ Fact X ∗ . Let p be the minimal period of f and q = |f | − p + 2. If v is the generating word of f, by Eqs. (5) and (4) one derives that there exist letters x, y ∈ A such that g = (vxy) ∈ PER has minimal period p + q and prefix f. Thus, f is a prefix of the root of g, so that ∈ Fact X∗ . By Lemma 1, X ∪ {g} should be a code which is central, contradicting the maximality of X as central code. Now, we shall prove (cf. Corollary 18) that the unique finite maximal central code is A. We need some preliminary technical lemmas. Lemma 13. Let X be a central code and u ∈ A∗ . The following statements hold: (1) If baau ∈ X∗ , then b ∈ X and aau ∈ X ∗ . (2) If X = A and aba 3 u ∈ X∗ , then aba ∈ X and aau ∈ X ∗ . Proof. If baau ∈ X∗ , there exist v ∈ X∗ and x ∈ X such that baau = xv . By Corollary 4 no central word has the prefix baa so that x is necessarily a proper prefix of baa. Hence, since X is a central code, x = b and v = aau ∈ X∗ . If aba 3 u ∈ X∗ , there exist v ∈ X∗ and x ∈ X such that aba 3 u = xv . By Corollary 4 no central word has the prefix aba 3 so that x is necessarily a proper prefix of aba 3 , i.e., x = aba or a. In the first case, v = aau ∈ X ∗ . In the second case, v = ba 3 u so that, by Statement 1 one has b ∈ X, i.e., X = A.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
229
Lemma 14. Let X be a finite PER-complete central code. Then a ∈ X or b ∈ X. Proof. Consider the word w = (aab)n aaa(baa)n with 3n (X). As one easily verifies, w = (a 2 bn a), so that w ∈ PER. Since X is PERcomplete, there exist words , ∈ A∗ such that w ∈ X ∗ . We have to distinguish three cases: (1) (aab)n a, aa(baa)n ∈ X∗ , (2) (aab)n aa, a(baa)n ∈ X∗ , (3) (aab)n = u, (baa)n = v with x = uaaav ∈ X, , ∈ X∗ , u, v ∈ A∗ . Let us consider Case 1. If a ∈ X, then the statement is true. Thus suppose a ∈ X. Since (X) 3n one derives that the first factor in the X-factorization of aa(baa)n , which has to be a palindrome, has the form aa(baa)i with 0 i < n. This implies that (baa)n−i ∈ X∗ . By Lemma 13 one derives b ∈ X. Case 2 can be dealt with symmetrically. Now let us consider Case 3. As x ∈ PER has the factor aaa, by Corollary 4 it must have the suffix aa. Since (baa)n = v and |v| < (X) 3n, one derives v = (baa)i with 0 i < n. This implies that = (baa)n−i ∈ X ∗ . By Lemma 13 one obtains again b ∈ X. Lemma 15. Let X be a finite PER-complete central code. Then one has b ∈ X or aba ∈ X. Symmetrically, one has a ∈ X or bab ∈ X. Proof. Consider the word w = (aaab)n aaa with 4(n − 1) (X). As one easily verifies, w = (a 3 bn ), so that w ∈ PER. Since X is PER-complete, there exist words , ∈ A∗ such that w ∈ X ∗ . Since (X) 4(n − 1), one has
(aaab)i a p , a q (baaa)j ∈ X∗ with i, j 1, i + j = n, p, q 0, and p + q = 3. We distinguish three cases, according to the values of q. Case q = 0. As (baaa)j ∈ X∗ , by Lemma 13 it follows that b ∈ X. Case q = 1. If X = A, then trivially, b ∈ X. If, on the contrary, X = A, since a(baaa)j ∈ X∗ , by Lemma 13 one derives aba ∈ X. Case q > 1. Since p = 3 − q 1 and a p (baaa)i ∼ ∈ X ∗ , one reaches the result by a similar argument. Lemma 16. Let X be a finite PER-complete central code. Then there exist h, k 0 such that (ab)h a, (ba)k b ∈ X.
230
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
Proof. Consider the word w = (ab)n a with n such that |w| = 2n + 1 3(X). As one easily verifies, w = (abn ), so that w ∈ PER. Since X is PER-complete, there exist words , ∈ A∗ such that w ∈ X ∗ . Since |w| 3(X), one derives that w has a factor in X 2 , i.e., w = xy , with x, y ∈ X and , ∈ A∗ . We shall suppose | | even (the opposite case is similarly dealt with). One has
= (ab)i ,
xy = (ab)n−i a,
0i < n .
As x is a palindrome, one obtains x = (ab)h a,
y = (ba)n−i−h ,
0h < n − i
and, similarly, y = (ba)k b,
= a(ba)n−i−h−k−1 ,
which concludes the proof.
0k < n − i − h ,
Proposition 17. Let X be a finite PER-complete central code. Then X = A. Proof. If a, b ∈ X, then X = A and the statement holds true. Let us then suppose that b ∈ X. By Lemma 14, one has a ∈ X and, by Lemma 15, aba ∈ X. Moreover, by Lemma 16, there exists k > 0 such that (ba)k b ∈ X. This yields a contradiction as the word (ab)k+2 a has two distinct X-factorizations, namely, (a) ((ba)k b) (aba) = (aba) ((ba)k b) (a) .
By the previous proposition and Proposition 12 it follows: Corollary 18. Let X be a finite maximal central code. Then X = A. The following proposition gives an example of an infinite maximal central code. The proof, which is rather technical, is reported in the appendix. Proposition 19. The set X = PER \ D, where D= ((ab)i a)∗ ∪ ((ba)i b)∗ , i 0
is a maximal central code. Proposition 20. There exists a PER-complete central code which is not a maximal central code.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
231
Proof. Let X = PER \ D be the maximal central code considered in Proposition 19 and set Y = X \ {aabaa} . Since the word aabaa is a factor of aabaabaa ∈ Y , one has Fact X = Fact Y . Let us prove that for any z ∈ D one has z ∈ Fact X. We can assume with no loss of generality that z = ((ab)i a)j , with i, j 0. Moreover, we can suppose j 2 since (ab)i a is a factor of ((ab)i a)2 . As one easily verifies, (bz)(−) = ((ab)i a)j ba((ab)i a)j −1 ∈ PER \ D = X , so that z ∈ Fact X. Thus D ⊆ Fact X. Since PER = X ∪ D, it follows PER ⊆ Fact X = Fact Y . Therefore, in view of the previous proposition, Y is a PER-complete code which is not a maximal central code. Lemma 21. The pairs (b2 , a 2 ) and (a 2 , b2 ) are synchronizing pairs of any central code X, i.e., for all u, v ∈ A∗ , ub2 a 2 v ∈ X∗ ⇒ ub2 , a 2 v ∈ X∗ ,
ua 2 b2 v ∈ X∗ ⇒ ua 2 , b2 v ∈ X∗ .
Proof. Since b2 a 2 is not a factor of PER, if ub2 a 2 v ∈ X∗ , then one of the following three cases occurs: ub, ba 2 v ∈ X∗ , ub2 , a 2 v ∈ X∗ , ub2 a, av ∈ X ∗ .
(6) (7) (8)
If Eq. (6) holds, then by Lemma 13 one has b, a 2 v ∈ X ∗ so that Eq. (7) is satisfied. If Eq. (8) holds, one obtains ab2 u∼ ∈ X ∗ so that by Lemma 13 one obtains a ∈ X and b2 u∼ ∈ X∗ . Hence, ub2 ∈ X∗ so that Eq. (7) is satisfied again. This proves that (b2 , a 2 ) is a synchronizing pair. In a symmetric way one proves that also (a 2 , b2 ) is a synchronizing pair. Proposition 22. A central code X = A is not complete. Proof. Let X be a complete central code. We consider the word a 2 b2 a 3 b3 a 2 b2 . There exist u, v ∈ A∗ such that ua 2 b2 a 3 b3 a 2 b2 v ∈ X∗ . By the preceding lemma, one derives b2 , b3 , a 2 , a 3 ∈ X∗ . Since X is a code, it follows a, b ∈ X∗ , i.e., X = A. As any maximal code is complete, by the previous proposition, one derives that a central code X = A is not maximal as code.
232
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
5. Prefix central codes In this section, we shall consider central codes which are prefix codes. Since the words of such codes are palindromes, one has that a prefix central code is also a suffix code and then a biprefix code. For instance, the set X = {a, bab, bb} is a prefix central code. Proposition 23. A central code Y is prefix if and only if Y = (X), with X a prefix code. Proof. Let Y = (X). If X is a prefix code, then, as proved in [5], Y is a prefix code. Conversely, suppose that X is not a prefix code. Then there exist x1 , x2 ∈ X and ∈ A+ such that x1 = x2 . By Eq. (4), (x1 ) = (x2 ) = (x2 ) for a suitable ∈ A+ . Hence, Y is not a prefix code. We call pre-code of a prefix central code Y the prefix code X such that Y = (X). For instance, the pre-code of {a, bab, bb} is the prefix code {a, ba, bb} and the pre-code of the prefix central code {aba, bb, babab, babbab} is the prefix code {ab, bb, baa, bab}. The pre-code of the prefix central code {a n ba n | n 0} is the prefix code a ∗ b. For any X ⊆ A∗ and all n > 1 we set n (X) = ( n−1 (X)), where 1 (X) = (X). From Proposition 23 one derives the following: Corollary 24. If X is a prefix code, then for all n 1, n (X) is a prefix central code. Proposition 23 shows that the property of being a prefix code is preserved by and −1 . On the contrary, the property of being a code is not, in general, preserved by or −1 , as shown by the following example. Example 25. The set X = {ab, ba, abbb} is a code whereas the set (X) = {aba, bab, abababa} is not a code. Conversely, the set X = {a, ab, bab} is not a code whereas
(X) = {a, aba, babbab} is a code. Proposition 26. A central code Y is prefix if and only if (Y ) is an independent set. Proof. By Proposition 23, Y is prefix if and only if Y = (X), with X a prefix code. By Corollary 9, this occurs if and only if Fa(X) = (Y ) is an independent set. A prefix central code is a maximal prefix central code if it is not properly included in another prefix central code. Proposition 27. A prefix central code X is a maximal prefix central code if and only if for all w ∈ PER, wA∗ ∩ XA∗ = ∅. Proof. If there exists w ∈ PER such that wA∗ ∩ XA∗ = ∅, then X ∪ {w} is a prefix central code properly containing X, so that X is not a maximal prefix central code.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
233
If X is not a maximal prefix central code, there exists at least one word w ∈ PER such that w is not a prefix of any word of X and no word of X is a prefix of w. This implies that wA∗ ∩ XA∗ = ∅. Proposition 28. A prefix central code is a maximal prefix central code if and only if its pre-code is a maximal prefix code. Proof. Let Y be a maximal prefix central code and X be its pre-code. By Proposition 23, X is a prefix code. Suppose that X is properly included in a prefix code X over A. Since is a bijection, Y = (X) ⊂ (X ). By Proposition 23, (X ) is a prefix central code which properly contains Y, which contradicts the maximality of Y as prefix central code. Conversely, suppose that the pre-code X of the prefix central code Y is a maximal prefix code. If Y is properly included in another prefix central code Y one would have X ⊂
−1 (Y ). By Proposition 23, −1 (Y ) is a prefix code, so that we reach a contradiction with the maximality of X. Proposition 29. A central code Y is a maximal prefix central code if and only if (Y ) is an independent and full set. Proof. By Propositions 23 and 28, Y is a maximal prefix central code if and only if Y =
(X), with X a maximal prefix code. By Corollaries 9 and 10, this occurs if and only if Fa(X) = (Y ) is an independent and full set. Remark 30. We observe that a maximal prefix central code X = A is not maximal as prefix code. Indeed, as is well known, any maximal prefix code is right-complete, i.e., for any w ∈ A∗ , wA∗ ∩ X ∗ = ∅, whereas by Proposition 22 a prefix central code X = A is not even complete. By Corollary 18, a finite maximal prefix central code X = A cannot be maximal as central code. More generally, we shall see (cf. Corollary 32) that any non-trivial maximal central code cannot be prefix. Proposition 31. Let X = A be a prefix central code. There exists w ∈ PER such that wA∗ ∩ X ∗ = ∅. Proof. Let x ∈ X. Without loss of generality, we may suppose that the first letter of x is a. There exists a word u ∈ A∗ such that y = xbaau ∈ PER. Indeed, by Eq. (1), xba is a finite standard Sturmian word so that z = xbaxba is a prefix of a standard Sturmian word; since xbaa is a prefix of z, it is a prefix of a word of PER. If yA∗ ∩ X ∗ = ∅, the statement is proved. Let us then suppose that yA∗ ∩ X ∗ = ∅. Thus there exists v ∈ A∗ such that yv = xbaauv ∈ X∗ . Since X is a prefix code, one has baauv ∈ X ∗ and by Lemma 13, b ∈ X. Now, let us consider the word bbabb = (bba) ∈ PER. If bbabbA∗ ∩ X ∗ = ∅, the statement is proved. Suppose that bbabbA∗ ∩ X ∗ = ∅. Since b ∈ X and X is a prefix code, it follows that abbA∗ ∩ X ∗ = ∅. By Lemma 13 one obtains a ∈ X, i.e., X = A, which is a contradiction. By Lemma 2 and Proposition 31 one derives the following:
234
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
Corollary 32. A prefix central code X = A is not a maximal central code. 6. Farey codes For any positive integer n, we consider the set p Fn = ∈ I 1p q n . q As is well known, by ordering the elements of Fn in an increasing way, one obtains the Farey series of order n (cf. [7]). Now, set p Gn = ∈ Fn+1 p + q − 2 n q and
n,a = {s ∈ PERa | (s) ∈ Gn } ,
n,b = {s ∈ PERb | (s)−1 ∈ Gn } .
The set n = n,a ∪ n,b is a prefix central code [5] called the Farey code of order n. The words of n,b are obtained from those of n,a by interchanging the letter a with b. The pre-codes of n,a , n,b , and n will be respectively denoted by Fn,a , Fn,b , and Fn . The prefix code Fn = Fn,a ∪ Fn,b will be called the Farey pre-code of order n. Example 33. In the following table, we report the elements of G6 with the corresponding words of the prefix code 6,a and their lengths. In the last column, are reported the elements of the pre-code F6,a . 1 6 aaaaaa 7 aaaaaa 2 7 3 7 4 7 3 5 5 7 4 5 5 6 6 7
abababa
7 abbb
aabaabaa
8 aabb
aabaaabaa
9 aaba
abaaba
6 aba
ababaababa 10 abba aaabaaa
7 aaab
aaaabaaaa
9 aaaab
aaaaabaaaaa 11 aaaaab
Some interesting properties of Farey codes have been proved in [5]. We limit ourselves to recall that for all n > 0, Card n =
n+1 i=1
(i) ,
where is the totient Euler’s function.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
235
Proposition 34. For all n 1, the Farey code of order n is a maximal prefix central code. Proof. We shall prove that the set (n ) is independent and full, so that the result will follow from Proposition 29. One has that (n ) = {p/q | p/q ∈ Gn or q/p ∈ Gn }. First, we prove the independence. Let p/q and r/s be distinct elements of (n ) such that p/q ⇒∗ r/s. We suppose p < q (the case where p > q is similarly dealt with). There exists a sequence of irreducible fractions pi /qi , i = 1, . . . , m, such that pm r p1 p ⇒ ··· ⇒ = . ⇒ q q1 qm s Hence, q1 = p + q n + 2, so that s = qm q1 n + 2 and r < s. This contradicts the assumption that r/s ∈ (n ). Now, we prove the fullness of (n ). Let r/s be an element of I. We suppose r < s (the cases where r > s or r = s = 1 are similarly dealt with). First we consider the case that s n + 2. There exists a sequence of irreducible fractions pi /qi , i = 1, . . . , m, such that pm r p1 1 ⇒ ··· ⇒ = . ⇒ 1 q1 qm s Let k be the minimal integer such that qk n + 2. One has qk−1 n + 1 and pk−1 + qk−1 = qk n + 2, so that pk−1 /qk−1 ∈ Gn and pk−1 /qk−1 ⇒∗ r/s. Now, we consider the case that s < n + 2. Let k be the minimal integer such that kr + s n + 2. One has (k − 1)r + s n + 1 so that r/((k − 1)r + s) ∈ Gn and r/s ⇒∗ r/((k − 1)r + s). As a consequence of Proposition 23 one has: Proposition 35. For all n 1, the Farey pre-code of order n is a maximal prefix code. The following proposition gives an equivalent definition for Farey codes. Proposition 36. For any n > 0 one has
n = {w ∈ PER | n |w| n + w − 1} . Proof. First we suppose w ∈ PERa and set (w) = p/q, so that p = w and q = |w| − w + 2. One has w ∈ n,a if and only if p/q ∈ Fn+1 and p + q − 2 = |w| n. Since p/q ∈ Fn+1 if and only if q = |w| − w + 2 n + 1, one derives that w ∈ n,a if and only if n |w| n + w − 1. If w ∈ PERb , by a similar argument one obtains that w ∈ n,b if and only if n |w| n+ w − 1. From this the assertion follows. From Proposition 36 one derives immediately that for all n > 0,
n+1 \ n = {w ∈ PER | |w| = n + w }
(9)
236
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
and
n \ n+1 = Un ,
(10)
where Un = PER ∩ An . The following proposition shows a relation between Farey codes of consecutive orders. Proposition 37. For any n > 0 one has
n+1 = (n \ Un ) ∪ (AUn )(−) . Proof. From Eqs. (9) and (10) one derives
n+1 = (n \ Un ) ∪ {w ∈ PER | |w| = n + w } . Thus it is sufficient to prove that (AUn )(−) = {w ∈ PER | |w| = n + w } .
(11)
Let us suppose w = (xv)(−) , with x ∈ A and v ∈ Un . Then w ∈ PER and by Eq. (3), |w| = w + n. This proves the inclusion “⊆”. Conversely, suppose that w ∈ PER and |w| = n + w . Let u ∈ PER and x ∈ A be such that w = (xu)(−) . Since by Eq. (3), |w| = |u| + w , one derives |u| = n so that u ∈ Un and w ∈ (AUn )(−) . This proves the inclusion “⊇”. Example 38. Consider the case n = 5. One has
5,a = {a 5 , ababa, aba 2 ba, a 2 ba 2 , a 3 ba 3 , a 4 ba 4 } and U5,a = U5 ∩ aA∗ = {a 5 , ababa, a 2 ba 2 } . Moreover, (AU5,a )(−) = {a 6 , a 5 ba 5 , ababa 2 baba, abababa, a 2 ba 3 ba 2 , a 2 ba 2 ba 2 } . The set 6,a is given in Example 33. As one easily verifies, 6,a = (5,a \ U5,a ) ∪ (AU5,a )(−) . In a similar way, setting U5,b = U5 ∩ bA∗ one obtains 6,b = (5,b \ U5,b ) ∪ (AU5,b )(−) so that 6 = (5 \ U5 ) ∪ (AU5 )(−) . 7. Uniform central codes Let n be a positive integer. A central code X is uniform of order n if X ⊆ An . In this case X ⊆ Un so that Un is the maximal uniform central code of order n. As is well known [6], for any n, Card Un = (n + 2).
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
237
For instance, one has U5 = {aaaaa, aabaa, ababa, babab, bbabb, bbbbb} , U7 = {aaaaaaa, aaabaaa, abababa, bababab, bbbabbb, bbbbbbb} . From Eqs. (9) and (11) one derives the following noteworthy relation between maximal uniform codes and Farey codes:
n+1 \ n = (AUn )(−)
for all n > 0 .
Proposition 39. Let n > 0 and 0 k n. There exists a (unique) word w ∈ Un such that |w|a = k if and only if gcd(n + 2, k + 1) = 1. Proof. We recall that the slope of a central word is a bijection of PER onto I. Thus, if w ∈ Un and |w|a = k, then
(w) =
n−k+1 |w|b + 1 = |w|a + 1 k+1
(12)
with gcd(n − k + 1, k + 1) = gcd(n + 2, k + 1) = 1. Conversely, if gcd(n − k + 1, k + 1) = gcd(n + 2, k + 1) = 1, then, since is a bijection, there exists a word w ∈ PER satisfying Eq. (12), so that |w| = n and |w|a = k. From the previous proposition, one derives the following: Corollary 40. There exists a (unique) word w ∈ Un such that |w|a = k for all k, 0 k n if and only if n + 2 is a prime. Example 41. In the case n = 7, the set of numbers which are coprime with 9 is {1, 2, 4, 5, 7, 8}. Hence, for w ∈ U7 we have |w|a ∈ {0, 1, 3, 4, 6, 7}. In the case n = 5, since n+2 = 7 is prime, {|w|a | w ∈ U5 } = {0, 1, 2, 3, 4, 5}. Appendix
Proof of Proposition 19. One easily verifies that D = (ab∗ a ∗ ∪ ba ∗ b∗ ∪ ) .
(A.1)
From the rational identity A∗ \ (ab∗ a ∗ ∪ ba ∗ b∗ ∪ ) = ab∗ a + bA∗ ∪ ba ∗ b+ aA∗ one derives X = (ab∗ a + bA∗ ∪ ba ∗ b+ aA∗ ) . Let us prove that X is a code. By contradiction, suppose that one has x1 · · · xm = x1 · · · xn ,
|x1 | < |x1 |
(A.2)
with x1 , . . . , xm , x1 , . . . , xn ∈ X, m, n > 0. One has m 2. Moreover we may suppose without loss of generality that x2 ∈ PERa . Thus, x2 has a prefix
(abi a j b) = (a(ba)i )j +1 ba(a(ba)i )j ,
i 0, j 1 .
238
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
From Eq. (A.2) one derives x1 (a(ba)i )j +1 baa = x1 · · · xn ,
(A.3)
for a suitable ∈ A∗ . Hence, x1 has the prefix x1 a. By Lemma 6, x1 has the prefix (ax1 )(−) . By Lemma 5, (ax1 )(−) has the form x1 abs = sbax1 , with s ∈ PER. Now let y be the longest prefix of x1 of the form y = x1 abz = zbax1
with z ∈ PER .
(A.4)
By Proposition 3, y ∈ PER. We set x1 = y with ∈ A∗ . By Eq. (A.3) one gets i > 0 and a(ba)i−1 (a(ba)i )j baa = zx2 · · · xn .
(A.5)
Since z is palindrome one has to consider the following cases: Case 1: z = . By Eq. (A.4) one has y = x1 ab = bax1 so that x1 = (ba)p b, p 0. Thus x1 ∈ D which is a contradiction. Case 2: z = a(ba)h , 0 h i − 1. By Eq. (A.4) one has y = x1 a(ba)h+1 = a(ba)h+1 x1 so that x1 = (a(ba)h+1 )p , p 0. Thus x1 ∈ D which is a contradiction. Case 3: z = a(ba)i−1 (a(ba)i )k a(ba)i−1 , 0 k j − 1. By Eq. (A.5) one gets ba(a(ba)i )j −k−1 baa = x2 · · · xn .
(A.6)
If = , then the first letter of is b. Thus, x1 = y has the prefix yb and consequently the prefix (by)(−) . By Lemma 5 it follows that (by)(−) = (bx1 abz)(−) = x1 abzbax1 = x1 aby = ybax1 . This contradicts the maximality of y. If = , by Eq. (A.6) one derives that x2 has the prefix baa or babaa (according to whether k < j − 1 or k = j − 1). This is a contradiction as by Corollary 4 no central word has such prefixes. Case 4: z has the prefix a(ba)i−1 (a(ba)i )j b. Set u = a(ba)i−1 (a(ba)i )j −1 a(ba)i−1 ∈ PER. Since ub is a prefix of z, also (bu)(−) should be a prefix of z. By Lemma 5 one has (bu)(−) = a(ba)i−1 (a(ba)i )j a(ba)i−1 . Thus, z has the prefix a(ba)i−1 (a(ba)i )j a, which is a contradiction. This proves that X is a central code. To prove that X is a maximal central code one has to show that for all y ∈ D, X ∪ {y} is not a central code. In view of Eq. (A.1) it is sufficient to consider the case that y = (abi a j ), with i, j 0 (the case y = (ba i bj ) is similarly dealt with). One easily checks that in this case y (abi ab)y = (abi a j +2 b) which proves the assertion since (abi ab), (abi a j +2 b) ∈ X. References [1] [2] [3] [4]
J.-P. Allouche, J. Shallit, Automatic Sequences, Cambridge University Press, Cambridge, UK, 2003. J. Berstel, A. de Luca, Sturmian words, Lyndon words and trees, Theoret. Comput. Sci. 178 (1997) 171–203. J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. J. Berstel, P. Séébold, Sturmian words, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, UK, 2002, pp. 45–110. [5] A. de Luca, Sturmian words: structure, combinatorics, and their arithmetics, Theoret. Comput. Sci. 183 (1997) 45–82. [6] A. de Luca, F. Mignosi, On some combinatorial properties of Sturmian words, Theoret. Comput. Sci. 136 (1994) 361–385.
A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239
239
[7] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, Clarendon, Oxford University Press, Oxford, UK, 1968. [8] L. Ilie, W. Plandowski, Two-variable word equations, Theoret. Inform. Appl. 34 (2000) 467–501. [9] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983; M. Lothaire, Combinatorics on Words, second ed., Cambridge University Press, Cambridge, UK, 1997. [10] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72 (1990) 55–64.
Theoretical Computer Science 340 (2005) 240 – 256 www.elsevier.com/locate/tcs
An enhanced property of factorizing codes Clelia De Felice1 Dipartimento di Informatica e Applicazioni, Università di Salerno, 84081 Baronissi (SA), Italy
Abstract The investigation of the factorizing codes C, i.e., codes satisfying Schützenberger’s factorization conjecture, has been carried out from different viewpoints, one of them being the description of structural properties of the words in C. In this framework, we can now improve an already published result. More precisely, given a factorizing code C over a two-letter alphabet A = {a, b}, it was proved by De Felice that the words in the set C1 = C ∩ a ∗ ba ∗ could be arranged over a matrix related to special factorizations of the cyclic groups. We now prove that, in addition, these matrices can be recursively constructed starting with those corresponding to prefix/suffix codes. © 2005 Elsevier B.V. All rights reserved. Keywords: Variable length codes; Formal languages; Factorizations of cyclic groups
1. Introduction In this paper, a subset C of a free monoid A∗ is a (variable-length) code if each word in A∗ has at most one factorization into words of C, i.e., C is the base of a free submonoid of A∗ [1]. This algebraic approach was initiated by Schützenberger in [24] and subsequently developed mainly by his school. The theory of codes is rich in significant results, which have been obtained by using several different methods (combinatorial, probabilistic, algebraic) and tools from automata, formal power series and semigroup theory.
E-mail address:
[email protected]. 1 Partially supported by MIUR Project “Linguaggi Formali e Automi: Metodi, Modelli e Applicazioni” (2003)
and by 60 % Project “Linguaggi formali e codici: modelli e caratterizzazioni strutturali” (University of Salerno, 2004). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.022
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
241
Nevertheless some basic problems are still open. One of the most difficult of these, the factorization conjecture, was proposed by Schützenberger as follows: given a finite maximal code C, there would be finite subsets P, S of A∗ such that C − 1 = P (A − 1)S, with X denoting the characteristic polynomial of X [1,2,5]. We refer to Section 2 for all the known results concerning this conjecture. Any code C which satisfies the above equality is finite, maximal, and is called a factorizing code, whereas a finite maximal code is a maximal object in the class of finite codes for the order of set inclusion. For example, finite biprefix maximal codes are factorizing [1]. This note deals with the investigation of the class of the factorizing codes C. This research line has been carried out from different viewpoints, one of them being the description of structural properties of the words in C. Continuing the investigation initiated in [9], here we enhance a property of the sets C1 ⊆ a ∗ ba ∗ such that C1 = C ∩ a ∗ ba ∗ for a factorizing code C over a two-letter alphabet A = {a, b}. Precisely, we already know that C1 satisfies the property reported below: Property 1.1. The words in C1 can be arranged over a matrix C1 = (a rp,q ba vp,q )1 p m,1 q such that for each row Rp = {rp,q | q ∈ {1, . . . , }} and each column Tq = {vp,q | p ∈ {1, . . . , m}} in this matrix, (Rp , Tq ) is a Hajós factorization of Zn . We recall that a pair (R, T ) of subsets of N is a factorization of Zn if for each z ∈ {0, . . . , n − 1} there exists a unique pair (r, t), with r ∈ R and t ∈ T , such that r + t = z (mod n). The general structure of the pairs (R, T ) is still unknown but two simple families of these pairs can be recursively constructed: Krasner factorizations and Hajós factorizations (see Section 4 for precise definitions). The latter factorizations seem to have an important role in the description of the structure of factorizing codes (see [6–9]). In this paper we prove, that for each factorizing code C, an arrangement of C1 = C ∩ a ∗ ba ∗ satisfying Property 1.1 can be recursively constructed by a natural two-dimensional generalization of Hajós method. This improved version of the result given in [9] is interesting in its own right but it has additional appeal since, as conjectured in [12], given a set C1 satisfying this property, there would exist a factorizing code C such that C1 = C ∩ a ∗ ba ∗ . As we have already said, we take into account codes over a two-letter alphabet but, as done in [9], extending the results presented here to alphabets of larger size should not be difficult. This paper is organized as follows. Section 2 contains all the basic definitions and results concerning codes. Section 3 summarizes the contents of the subsequent sections and outlines the main result. In Section 4 we have gathered basics on the factorizations of cyclic groups and in Sections 5, 6 we have collected intermediate results, subsequently used in Section 7 to show the above-mentioned property of the factorizing codes.
2. Basics Given a finite alphabet A, let A∗ be the free monoid generated by it. We denote by 1 the empty word and we set A+ = A∗ \ 1.
242
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
A subset C of A∗ is a code if C ∗ is a free submonoid of A∗ of base C. In other words, C is a code if, for any c1 , . . . , ch , c1 , . . . , ck ∈ C, we have: c1 , . . . , ch = c1 , . . . , ck ⇒ h = k,
∀i ∈ {1, . . . , h},
ci = ci .
Examples of codes can be easily constructed by considering, for instance, the class of the prefix codes, C being prefix if C ∩ CA+ = ∅. A more complex class is that of maximal codes. A code C is maximal over A if C is not a proper subset of another code over A. As one of Schützenberger’s basic theorems shows, a finite code C is maximal if and only if C is complete, that is C ∗ ∩ A∗ wA∗ = ∅, for any w ∈ A∗ [1]. The class of codes which we consider in this paper is that of the factorizing codes, introduced by Schützenberger. The definition of such codes is given in terms of polynomials. Here, we denote ZA the ring of the noncommutative polynomials in variables A and coefficients in the ring Z of the integers and NA the semiring of the noncommutative polynomials in variables A and coefficients in the semiring N of the nonnegative integers [2]. P 0 means P ∈ NA. As usual, the value of P ∈ ZA on w ∈ A∗ is denoted by (P , w) and is referred to as the coefficient of w in P. The characteristic polynomial of a finite language X ⊆ A∗ , denoted X, is the polynomial X = x∈X x. Henceforth, we will at times identify X with its characteristic polynomial even if this is not stated explicitly. A (finite) code C over A is factorizing if there exist two finite subsets P, S of A∗ such that: C − 1 = P (A − 1)S.
(1)
For instance, a finite maximal prefix code C is factorizing, by taking S = {1} and P equal to the set of the proper prefixes of the words in C [1]. If C is a factorizing code then C is a finite maximal code [1]. However it is not known whether every finite maximal code is factorizing. This problem is known as the factorization conjecture [1,2,25]. Conjecture 2.1. (Schützenberger). Any finite maximal code is factorizing. Some partial results are known and are mentioned below. The first examples of families of factorizing codes can be found in [3,4]. Subsequently, Reutenauer obtained the result that was closest to a solution of the conjecture [2,21,22]. He proved that Eq. (1) holds for any finite maximal code C if we substitute P , S with P , S ∈ ZA. Results concerning problems which are closely connected to the factorization conjecture can be found in [17,18]. Another class of results has been obtained by considering finite maximal codes over a two-letter alphabet A = {a, b} having a constraint on the number of the occurrences of the letter b in each word. More precisely, consider a finite maximal code C over A such that each word in C has a maximum of m occurrences of the letter b. C is also named an m-code. If m is less than or equal to three, then C is factorizing [7,13,19]. Moreover, C is also factorizing if bm ∈ C and m is a prime number or m = 4 [26]. For m 3, the structure of the m-codes has also been described and is related to the solutions to some inequalities which are, in turn, related to the factorizations of the cyclic groups [6,7,19]. Furthermore, other results which relate words in a finite maximal code to the factorizations of the cyclic groups can be found in [16,20].
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
243
3. Outline of the results The aim of this paper is to prove that, for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C having one occurrence of b s satisfy a special property. In Section 1, we introduced factorizations of cyclic groups. A special class of these is the so-called Hajós factorizations. There exist at least two recursive definitions of this class of factorizations and they are recalled in Section 4. In this note we will introduce a two-dimensional extension of Hajós factorizations such that they still admit a recursive construction. More precisely, we consider a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations and we consider matrices with integer entries in which each row is one of the Rj ’s. Obviously, several matrices exist with this property but one of them exists, named good arrangement of R1 , . . . , Rm , which can be obtained starting with simpler good arrangements and by a natural two-dimensional extension of Hajós method (Section 5). Finally, we introduce the crossed two-dimensional Hajós factorizations. Namely, given a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations, we consider matrices having pairs (r, v) of integers as elements and r ∈ Rj , v ∈ Ti . We focus our attention on arrangements such that a recursive algorithm exists constructing them. Once again these are called good arrangements (Section 7). We prove that, for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C with one occurrence of b s, can be canonically associated with a matrix which is a good arrangement of a crossed two-dimensional Hajós factorization.
4. Hajós factorizations and their recursive constructions In [14], Hajós gave a method, slightly corrected later by Sands in [23], for the construction of a class of factorizations of an abelian group (G, +) which are of special interest in the construction of factorizing codes. As done in [8], we report this method for the cyclic group Zn of order n (Definition 4.1). The corresponding factorizations will be named Hajós factorizations. The operation ◦ also intervenes: for subsets S = {s1 , . . . , sq }, T of Zn , S ◦ T denotes the family of subsets of Zn having the form {si + ti | i ∈ {1, . . . , q}}, where {t1 , . . . , tq } is any multiset of elements of T having the same cardinality as S. Furthermore, it is convenient to translate the definitions in a polynomial form. For a polynomial in Na, the notation a H = n∈N (H, n)a n will be used with H ∈ N1, i.e., with H being a finite multiset of nonnegative integers. Therefore, if H1 , H2 , . . . , Hk ∈ N1, the expression a H1 ba H2 , . . . , a Hk is a notation for the product of the formal power series a H1 , b, a H2 , . . . , a Hk . For instance, a {2,3} ba {1,5} = a 2 ba + a 2 ba 5 + a 3 ba + a 3 ba 5 . Computation rules are also defined: a M+L = a M a L , a M∪L = a M + a L , a M◦L = a M ◦ a L , a ∅ = 0, a 0 = 1. Finally, let X1 , X2 ⊆ N, let n ∈ N. The equation X1 = X2 (mod n) means that for each x1 ∈ X1 a unique x2 ∈ X2 exists with x1 = x2 (mod n) and for each x2 ∈ X2 a unique x1 ∈ X1 exists with x1 = x2 (mod n). Definition 4.1. Let R, T be subsets of N. (R, T ) is a Hajós factorization of Zn if and only if there exists a chain of divisors of n: k0 = 1 | k1 | k2 | . . . | ks = n,
(2)
244
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
such that: aR ∈ T
a ∈
a − 1 a k1 − 1 ◦ a−1 a−1 a − 1 a k1 − 1 · a−1 a−1
·
a k2 − 1 a k1 − 1
a k2 − 1 ◦ k a 1 −1
an − 1 , a ks−1 − 1
(3)
an − 1 · ... ◦ ... k . a s−1 − 1
(4)
◦ ... · ...
Furthermore we have R, T ⊆ {0, . . . , n − 1}. Observing the definition of the Hajós factorizations we can obtain a recursive construction of them with ease. This recursive construction, which will be widely used in this paper, was first given in [16] as a direct result, then it was proved in [11] for the sake of completeness, and now it is illustrated in Proposition 4.1. Proposition 4.1 (Lam [16]). Let R, T ⊆ {0, . . . , n−1} and suppose that (R, T ) is a Hajós factorization of Zn with respect to the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n. Then either (R, T ) = (R1 , T1 ) or (R, T ) = (T1 , R1 ), where (R1 , T1 ) satisfies one of the two following conditions: (1) There exists t ∈ {0, . . . , n−1} such that R1 = {0, . . . , n−1} and T1 = {t}. Furthermore, s = 1. (2) R1 = R (1) + {0, 1, . . . , g − 1}h, T1 = T (1) ◦ {0, 1, . . . , g − 1}h, (R (1) , T (1) ) being a Hajós factorization of Zh , g, h ∈ N, n = gh, R (1) , T (1) ⊆ {0, . . . , h − 1}. The chain of divisors defining (R (1) , T (1) ) is k0 = 1 | k1 | k2 | . . . | ks−1 = h. Theorem 4.1 is one of the results which allow us to link factorizing codes and Hajós factorizations of Zn . In Theorem 4.1 a crucial role is played by particular factorizations defined as follows. Starting with the chain of divisors of n in Eq. (2), let us consider the two polynomials a I and a J defined by: aI =
(a kj − 1) , kj −1 − 1) (a j even,1 j s
aJ =
(a kj − 1) . kj −1 − 1) (a j odd,1 j s
(5)
The two polynomials above have been considered by Krasner and Ranulac in [15] and are the simplest examples of Hajós factorizations of Zn . In the same paper they proved that a pair (I, J ) satisfies Eqs. (5) if and only if (I, J ) satisfies the following property: for any z ∈ {0, . . . , n − 1} there exists a unique (i, j ), with i ∈ I and j ∈ J , such that i + j = z, i.e., a I a J = (a n − 1)/(a − 1). (I, J ) is called a Krasner factorization. Theorem 4.1 (De Felice [8]). For R, T ⊆ {0, . . . , n − 1} the following conditions are equivalent: (1) (R, T ) is a Hajós factorization of Zn . (2) There exists a Krasner factorization (I, J ) of Zn such that (I, T ), (R, J ) are (Hajós) factorizations of Zn .
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
245
(3) There exist L, M ⊆ N and a Krasner factorization (I, J ) of Zn such that: a R = a I (1 + a M (a − 1)),
a T = a J (1 + a L (a − 1)).
(6)
Furthermore, (2) ⇔ (3) also holds for R, T ⊆ N. As stated in Theorem 4.1, the equivalence between conditions (2) and (3) still holds under the more general hypothesis that R, T are arbitrary subsets of N (not necessarily with max R < n, max T < n). In order to keep this general framework, in the next part of this paper, for R, T ⊆ N, we will say that (R, T ) is a Hajós factorization of Zn if (R(n) , T(n) ) satisfies the conditions contained in Definition 4.1 where, for a subset X of N and n ∈ N, we denote X(n) the subset of {0, . . . , n − 1} such that X(n) = X (mod n). This is equivalent, as Lemma 4.1 below shows, to defining Hajós factorizations of Zn as those pairs satisfying Eqs. (6). The recursive construction of the solutions of Eqs. (6), given in [6] allowed us to obtain another recursive construction of the Hajós factorizations, given in [8]. Lemma 4.1 (De Felice [10]). Let (I, J ) be a Krasner factorization of Zn . Let R, R , M be subsets of N such that a R = a I (1 + a M (a − 1)) and a R = a R(n) . Then, M ⊆ N exists such that a R = a I (1 + a M (a − 1)) and I + max M + 1 ⊆ {0, . . . , n − 1}. Furthermore, if we set R = {r1 , . . . , rq }, R = {r1 + 1 n, . . . , rq + q n}, for 1 , . . . , q 0, and if we set a H = a r1 +{0,n,...,(1 −1)n} + · · · + a rq +{0,n,...,(q −1)n} then we have a disjoint union M = M ∪ M with M ⊆ N, a M = a J a H and a R = a R + a I (a − 1)a M . It is worthy of note that there is a relationship between Krasner factorizations and Hajós factorizations which goes beyond the observation that the former are simple examples of the latter. Firstly, Theorem 4.1 points out that, for each Hajós factorization (R, T ), we can associate a Krasner factorization (I, J ) with (R, T ), called a Krasner companion factorization of (R, T ) in [16]. Secondly, given a Hajós factorization (R, T ) of Zn such that (R(n) , T(n) ) is defined by Eqs. (3), (4), a Krasner companion factorization (I, J ) is naturally associated with (R, T ): in order to get (I, J ) we have to erase from Eq. (3) polynomials Pj = (a kj − 1)/(a kj −1 − 1) with j odd, and from Eq. (4) polynomials Pj with j even [8]. (I, J ) will be called the Krasner companion factorization of (R, T ) with respect to the chain of divisors of n given in Eq. (2). Proposition 4.2 shows how these two notions are related to each other. Proposition 4.2. Each Krasner companion factorization (I, J ) of (R, T ) is a Krasner companion factorization of (R, T ) with respect to a chain of divisors of n which defines (R, T ). Proof. Let (I, J ) be a Krasner companion factorization of (R, T ), i.e., suppose that (R, T ) satisfies Eqs. (6). Since (I, J ) is also a Krasner companion factorization of (R(n) , T(n) ), we can suppose that R, T ⊆ {0, . . . , n − 1}. We prove the statement by induction on the length s of the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n which defines (I, J ). If s = 1 and (I, J ) = ({0}, {0, . . . , n − 1}) then a R = 1 + a M (a − 1), a T = [(a n − 1)/(a − 1)] + a L (a n − 1), and (R, T ) satisfies condition (1) in Proposition 4.1 (see also
246
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
[6]). Thus, (I, J ) is a Krasner companion factorization of (R, T ) with respect to the chain k0 = 1 | k1 = n which defines (R, T ). Suppose s > 1. By using Eqs. (5), there exist g, h ∈ N such that n = gh and I = I (1) + {0, 1, . . . , g − 1}h, J = J (1) , (I (1) , J (1) ) being a Krasner factorization of Zh defined by k0 = 1 | k1 | k2 | . . . | ks−1 = h. Since R ⊆ {0, . . . , n−1} and a R = a I (1+a M (a −1)) 0, we have max I + max M + 1 < n which implies max I1 + max M + 1 < h. Thus, for each (1) (1) t ∈ N, we have (a I (1 + a M (a − 1)), a t ) = 0 if t h, otherwise (a I (1 + a M (a − 1)), (1) (1) a t ) = (a I (1 + a M (a − 1)), a t ). Consequently, we have a R = a I (1 + a M (a − 1)) 0, (1) R = R + {0, 1, . . . , g − 1}h (see also [10,12]). In addition, by using Lemma 4.1 we (1) also have a T(h) = a J (1 + a L (a − 1)) 0. Thus, by Theorem 4.1, (a R , a T(h) ) is a Hajós factorization of Zh having (I (1) , J ) as a Krasner companion factorization, where (I (1) , J ) is defined by the chain k0 = 1 | k1 | k2 | . . . | ks−1 . By induction hypothesis, (I (1) , J ) is a (1) Krasner companion factorization of (a R , a T(h) ) with respect to this chain which defines (1) (a R , a T(h) ). Since R = R (1) + {0, 1, . . . , g − 1}h, T = T(h) ◦ {0, 1, . . . , g − 1}h, we conclude that (I, J ) is a Krasner companion factorization of (R, T ) with respect to the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n which defines (R, T ). Let us consider Hajós factorizations (R1 , T1 ), . . . , (Rm , Tm ) having the same Krasner companion factorization (I, J ). In the next part of this paper, all the elements denoted by the same symbol R with different indices will refer to the same element in the Krasner pair, i.e., the statement “(R1 , T1 ), . . . , (Rm , Tm ) have (I, J ) as Krasner companion factorization” will mean (Ri , J ) and (I, Ti ) are factorizations of Zn , i ∈ {1, . . . , m}. Furthermore, by using Proposition 4.2, we can conclude that (R1 , T1 ), . . . , (Rm , Tm ) can be defined by the same chain of divisors and have the same Krasner companion factorization (I, J ) with respect to this chain of divisors. 5. Two-dimensional Hajós factorizations In the next part of this paper, matrices with entries in A∗ or in N will also be considered and A = (ap,q )1 p m, 1 q will be an alternative notation for the matrix of size m × : a ... a 1,1
a2,1 A= .. . am,1
... .. . ...
1,
a2, .. . .
am,
Given a matrix A = (ap,q )1 p m, 1 q with entries in N and a positive integer n, n 2, ) we denote A(n) = (ap,q 1 p m, 1 q , where, for each p, q, 1 p m, 1 q , we n−1. We also denote h+A = (b have ap,q = ap,q (mod n), 0 ap,q p,q )1 p m, 1 q , where, for each p, q, 1 p m, 1 q , we have bp,q = h + ap,q and nA ∪ B = (ap,q )1 p m, 1 q 2 , where B = (ap,q )1 p m, +1 q 2 . Finally, i=1 Ai = A ) ∪ A . ( n−1 i n i=1 Given X, with X ⊆ A∗ (resp. X ⊆ N) an arrangement of X will be an arrangement of the elements of X in a matrix with entries in A∗ (resp. N) and size |X|. We now define
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
247
special arrangements of Hajós factorizations by a natural two-dimensional extension of Hajós method. Definition 5.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. An arrangement D = (rp,q )1 p m, 1 q l of m p=1 Rp having the Rp ’s as rows is a good arrangement of (R1 , . . . , Rm ) (with respect to the rows) if D can be recursively constructed using the following three rules. (1) D is a good arrangement of m p=1 Rp (with respect to the rows) if D(n) is a good m arrangement of p=1 (Rp )(n) (with respect to the rows). (2) Suppose that (Rp , Tp ) satisfies condition (1) in Proposition 4.1, for all p ∈ {1, . . . , m}. If Rp = {rp } with rp ∈ {0, . . . , n − 1}, then D is the matrix with only one column having rp as the pth entry. If Rp = {rp,0 , . . . , rp,n−1 } with rp,i = i, then D = (rp,j )1 p m, 0 j n−1 . (3) Suppose that (Rp , Tp ) satisfies condition (2) in Proposition 4.1, for all p ∈ {1, . . . , m}, (1) (1) i.e., either Rp = Rp + {0, h, . . . , (g − 1)h)} or Rp = Rp ◦ {0, h, . . . , (g − 1)h)}. Let g−1 (1) D(1) be a good arrangement of m p=1 Rp . In the first case, we set D = k=0 (kh + D(1) ). In the second case, D is obtained by taking D(1) and then substituting in it each (1) (1) (1) rp,q ∈ Rp with the corresponding rp,q + p,q h ∈ Rp . Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. It goes without saying that we can consider arrangements of m p=1 Rp having the R ’s as columns and therefore, we can give a dual notion of a good arrangement p of m to the columns. This arrangement will be the transpose matrix of p=1 Rp with respect a good arrangement of m p=1 Rp with respect to the rows. Example 5.1. It is easy to see that ({0}, {0, 1}), ({1}, {1, 2}) and ({2}, {2, 3}) are Hajós factorizations of Z2 having ({0}, {0, 1}) as a Krasner companion factorization. According to Definition 5.1, D1 is a good arrangement whereas D2 is not a good arrangement, where we set: D1 =
0 2 2
1 1 , 3
D2 =
0 1 2
1 2 . 3
Indeed, (D1 )(2) satisfies condition (2) in Definition 5.1 whereas (D2 )(2) does not satisfy the same condition (2). As another example, ({0, 2, 4}, {0, 1, 6, 7}), ({0, 2, 4}, {1, 2, 8, 19}) and ({0, 2, 4}, {2, 3, 8, 9}) are Hajós factorizations of Z12 having ({0, 2, 4}, {0, 1, 6, 7}) as a Krasner companion factorization. According to Definition 5.1, D3 is a good arrangement whereas D4 is not a good arrangement, where we set: D3 =
0 2 2
1 1 3
6 8 8
7 19 , 9
D4 =
0 8 8
1 19 9
6 2 2
7 1 . 3
248
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
Indeed, (D3 )(12) satisfies condition (3) in Definition 5.1 since we have (D3 )(12) = 1k=0 (1) (1) (6k + (D3 )(12) ), where (D3 )(12) = D1 is a good arrangement. On the contrary, D4 is not a good arrangement since, in view of Proposition 5.1, there exists a unique good arrangement of {0, 1, 6, 7} ∪ {1, 2, 8, 19} ∪ {2, 3, 8, 9} and that is D3 . Note that for each column Wq = (r1,q , r2,q , r3,q ) of D3 , 1 q 4, an ordered sequence Iq = (i1,q , i2,q , i3,q ) of elements of {0, 2, 4} exists satisfying: r1,q + i1,q = r2,q + i2,q = r3,q + i3,q = nq (mod 12). Indeed, we have 0 + 2 = 2 + 0 = 2 + 0 = 2 (mod 12), 1 + 2 = 1 + 2 = 3 + 0 = 3 (mod 12), 6 + 2 = 8 + 0 = 8 + 0 = 8 (mod 12), 7 + 2 = 19 + 2 = 9 + 0 = 9 (mod 12). In Proposition 6.1, we will prove that each good arrangement satisfies this special property. Proposition 5.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as ma Krasner companion factorization. There exists a (unique) good arrangement D of p=1 Rp with respect to the rows (resp. columns). Proof. As we observed at the end of Section 4, if (R1 , T1 ), . . . , (Rm , Tm ) are Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization then (R1 , T1 ), . . . , (Rm , Tm ) can be defined by a same chain of divisors of n of length s. Thus, (R1 , T1 ), . . . , (Rm , Tm ) satisfy the same condition contained in Proposition 4.1. The proof is by induction on s and we will prove the statement for good arrangements with respect to the rows (an analogous argument can be used for good arrangements with respect to the columns). Suppose Rp , Tp ⊆ {0, . . . , n−1} for p ∈ {0, . . . , n−1}. If s = 1, then (R1 , T1 ), . . . , (Rm , Tm ) satisfy condition (1) in Proposition 4.1 and (a unique) D exists which satisfies condition (2) in Definition 5.1. Thus, let s > 1. Hence, (R1 , T1 ), . . . , (Rm , Tm ) satisfy condition (2) in Proposition 4.1. Therefore, looking at condition (3) in Definition 5.1, (a unique) D exists since (a unique) D(1) exists by induction hypothesis. If Rp , Tp ⊆ {0, . . . , n − 1} for p ∈ {0, . . . , n − 1}, then ((Rp )(n) , (Tp )(n) ) are Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Thus, for the (R argument above, a (unique) good arrangement D(n) of m p=1 p )(n) exists. Hence, looking at condition (1) in Definition 5.1, (a unique) D also exists.
6. A property of good arrangements of Hajós factorizations In this section we will prove technical results concerning good arrangements of Hajós factorizations which will be subsequently used in the proof of Proposition 7.3. The argument used in the proof of Proposition 6.1 has also been used in the proof of another result stated in [9]. Nevertheless the complete proof of Proposition 6.1 is reported here for the sake of completeness. Proposition 6.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Let D = (rp,q )1 p m, 1 q be the good arrangement of m p=1 Rp with respect to the rows. Then, the two following conditions are
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
249
satisfied: (a) For each column Wq = (r1,q , . . . , rm,q ) of D, there exists an ordered sequence Jq = (j1,q , . . . , jm,q ) of elements of J satisfying r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q = nq (mod n).
(7)
(b) Suppose that Rp , Tq ⊆ {0, . . . , n − 1}. Thus, for each column Wq = (r1,q , . . . , rm,q ) of D, there exists an ordered sequence Jq = (j1,q , . . . , jm,q ) of elements of J satisfying r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q = nq .
(8)
The nq ’s are all different. Proof. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Let D be a good arrangement of m p=1 Rp with respect to the rows. Let us demonstrate that the statement is proved if we prove condition (b). Indeed, suppose that m (Rp , Tp ) = ((Rp )(n) , (Tp )(n) ). Using condition (b), the good arrangement D(n) of p=1 (Rp )(n) satisfies Eq. (8). On the other hand, when we change in Eq. (8) the elements in a column Wq of D(n) with the elements in the corresponding column in D, according to condition (1) in Definition 5.1, the sum defines the same integer mod n and so Eq. (7) holds, i.e., D satisfies condition (a). We prove condition (b) by using induction on the length s of the common chain of divisors of n given in Eq. (2) and defining (R1 , T1 ), . . . , (Rm , Tm ) (Definition 4.1). Let us firstly suppose that s = 1. Then, (Rp , Tp ) satisfies condition (1) in Proposition 4.1. If Rp = {rp,0 . . . , rp,n−1 } = {0, . . . , n − 1} then J = {0} and obviously D (defined by condition (2) in Definition 5.1) satisfies Eq. (8). Otherwise we have Rp = {rp } ⊆ {0, . . . , n − 1}, J = {0, . . . , n − 1}. Set rmax = max{rp | 1 p m}. We obviously have: r1 + (rmax − r1 ) = r2 + (rmax − r2 ) = · · · = rm + (rmax − rm ), where rmax − rp ∈ {0, . . . , n − 1} = J . Thus, D (defined by condition (2) in Definition 5.1) satisfies Eq. (8). Let us suppose that condition (b) holds for good arrangements of Hajós factorizations (Rp , Tp ) defined by starting with a chain of divisors of length less than s > 1 and let k0 = 1 | k1 | k2 | . . . | ks = n be the chain of divisors of n associated with (Rp , Tp ). Thus, (Rp , Tp ) satisfies condition (2) in Proposition 4.1. Then we have I = {0}, J = {0}. Furthermore, (1) (1) either Rp = Rp + {0, h, . . . , (g − 1)h}, J = J (1) or Rp ∈ Rp ◦ {0, h, . . . , (g − 1)h}, (1) (1) J = J (1) + {0, h, . . . , (g − 1)h} with (Rp , Tp ) being a Hajós factorization of Zh having the Krasner companion factorization (I (1) , J (1) ), g > 1, n = gh, with respect to the chain k0 = 1 | k1 | k2 | . . . | ks−1 = h of divisors of h = ks−1 of length less than s and (1) (1) (1) (1) defining (Rp , Tp ). Furthermore, Rp , Tp ⊆ {0, . . . , h − 1}. By induction hypothesis, m (1) (1) the good arrangement D of p=1 Rp satisfies condition (b). Thus, for each column (1)
(1)
(1)
(1)
Wq = (r1,q , . . . , rm,q ) of D(1) , an ordered sequence Jq of J (1) exists satisfying (1)
(1)
(1)
(1)
(1) (1) r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q .
(1)
(1)
= (j1,q , . . . , jm,q ) of elements (9)
250
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256 (1)
Firstly, we suppose Rp = Rp + {0, h, . . . , (g − 1)h}. Then, for each ∈ {0, . . . , g − 1}, in virtue of Eq. (9), we have: (1)
(1)
(1)
(1)
(1) (1) + h + jm,q . r1,q + h + j1,q = r2,q + h + j2,q = · · · = rm,q Looking at Definition 5.1, we see that the good arrangement D of m p=1 Rp (defined by condition (3) in Definition 5.1) satisfies condition (b). (1) We now suppose Rp ∈ Rp ◦ {0, h, . . . , (g − 1)h}. Let p,q ∈ {0, . . . , g − 1} such that (1) rp,q + p,q h ∈ Rp . Thanks to Eq. (9), we have (1)
(1)
(r1,q + 1,q h) + (j1,q + (max,q − 1,q )h) (1) (1) = · · · = (rm,q + m,q h) + (jm,q + (max,q − m,q )h), (1)
where max,q = max{p,q | 1 p m}. As max,q − p,q ∈ {0, . . . , g − 1} then jp,q + (max,q − p,q )h ∈ J . Looking at Definition 5.1, we see that the good arrangement D of m p=1 Rp (defined by condition (3) in Definition 5.1) satisfies condition (b). Finally, the nq ’s are all different since (Rp , J ) is a Hajós factorization of Zn (if nq = nq then we would have rp,q +jp,q = rp,q +jp,q (mod n) with jp,q , jp,q ∈ J , rp,q , rp,q ∈ Rp , rp,q = rp,q , a contradiction). Proposition 6.2. Let A = (zp,q )0 p m−1, 0 q n−1 , be a matrix of size m × n satisfying the following conditions: (1) For each p ∈ {0, . . . , m − 1}, we have Rp = {zp,q | 0 q n − 1} = {0, . . . , n − 1} (mod n). (2) For each q, q ∈ {0, . . . , n − 1}, for each p, p ∈ {0, . . . , m − 1}, we have zp,q = zp ,q (mod n) if and only if q = q . (3) There exists a Krasner factorization (I, J ) of Zn and Hajós factorizations (R0 , T0 ), . . . , (I, J ) as a Krasner companion factorization, such that A (Rm−1 , Tm−1 ) of Zn having is an arrangement of m−1 (R p + J ) with (zp,q )0 q n−1 = Rp + J , 0 p m − 1. p=0 Set = |I |. Then, for each q ∈ {0, . . . , − 1}, an ordered sequence Jq = (j0,q , . . . , jm−1,q ) of elements of J exists and = |I | columns (zp,nq )0 p m−1 in A also exist such that, for each p ∈ {0, . . . , m − 1} and q ∈ {0, . . . , − 1}, we have zp,nq = rp,q + jp,q , with D = (rp,q )0 p m−1, 0 q −1 being a good arrangement of m−1 p=0 Rp with respect to the rows. Proof. Let D = (rp,q )0 p m−1, 0 q −1 be the good arrangement of m−1 p=0 Rp with respect to the rows, where we obviously have = |I |. In virtue of Proposition 6.1, for each q ∈ {0, . . . , − 1}, an ordered sequence Jq = (j0,q , . . . , jm−1,q ) of elements of J exists satisfying Eq. (7), i.e., rp,q + jp,q = nq (mod n). Now, let us consider the integers nq defined by Eq. (7). In view of condition (2) in the statement, for each q ∈ {0, . . . , − 1}, there is a unique column (zp,nq )0 p m−1 in A associated with nq , i.e., such that zp,nq = nq (mod n). Thus, in view of condition (3) in the statement, we have zp,nq = rp,q + jp,q for the unique pair (rp,q , jp,q ) ∈ Rp × J such that rp,q + jp,q = nq (mod n). Clearly, the columns (zp,nq )0 p m−1 , 0 q − 1 satisfy the conditions contained in the statement.
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
251
Finally, we explicitly note that we can state a dual version of Propositions 6.1 and 6.2 for good arrangements with respect to the columns.
7. Crossed two-dimensional Hajós factorizations Given a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations, we now consider matrices m having pairs m(r, v) of integers as elements and such that the good arrangement of R (resp. p=1 p q=1 Tq ), with respect to the rows (resp. columns), can be obtained by taking the induced arrangement having the first (resp. second) elements in the pairs as entries (Definition 7.2). We prove that for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C with one occurrence of b s, can be canonically associated with one of these special matrices. We recall that for a finite subset X of A∗ , we set Xk = X ∩ (a ∗ b)k a ∗ . Definition 7.1. Let C1 = (a rp,q ba vp,q )1 p m, 1 q be an arrangement of C1 ⊆ a ∗ ba ∗ . The matrix R = (rp,q )1 p m, 1 q is the induced arrangement of the rows Rp = {rp,q | q ∈ {1, . . . , }} and the matrix T = (vp,q )1 p m, 1 q is the induced arrangement of the columns Tq = {vp,q | p ∈ {1, . . . , m}}. Furthermore, Rp,w = {a rp,q ba vp,q | 1 q } (resp. Tq,w = {a rp,q ba vp,q | 1 p m}) is a word-row (resp. a word-column) of C1 , for 1 p m (resp. 1 q ). Definition 7.2. An arrangement C1 = (a rp,q ba vp,q )1 p m, 1 q of C1 ⊆ a ∗ ba ∗ is a good arrangement (with (I, J ) as a Krasner associated pair) if it satisfies the following three conditions: (1) For each row Rp and each column Tq , 1 p m, 1 q , (Rp , Tq ) is a Hajós factorization of Zn having (I, J ) as a Krasner companion factorization with respect to a chain of divisors of n = |C1 |. (2) The induced arrangement of the rows is a good arrangement of m p=1 Rp with respect to the rows. (3) The induced arrangement of the columns is a good arrangement of lq=1 Tq with respect to the columns. Example 7.1. C1 = a {0,2,4} b + a {3,5} ba {3} + aba 5 has the following good arrangement (with ({0, 2, 4}, {0, 1}) as a Krasner associated pair): 0 a b a2b a4b C1 = . aba 5 a 3 ba 3 a 5 ba 3 Analogously, for C1 = a {0,2,4,12,14,16} ba {0,6,21} + a {0,4,8,12,16,20} ba {3} we have the following good arrangement (with ({0, 2, 4, 12, 14, 16}, {0, 1, 6, 7}) as a Krasner associated pair): 0 a2b a4b a 12 b a 14 b a 16 b a b 0 3 8 3 4 3 12 3 20 3 16 3 a ba a ba a ba a ba a ba a ba C1 = 0 6 . a ba a 2 ba 6 a 4 ba 6 a 12 ba 6 a 14 ba 6 a 16 ba 6 0 21 2 21 4 21 12 21 14 21 16 21 a ba a ba a ba a ba a ba a ba
252
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
Let us recall two known equations associated with sets C1 of words with one b s in a factorizing code C. Let P , S be finite subsets of A∗such that C = P (A−1)S +1. As a direct result, we have C0 = P0 (a − 1)S0 + 1 and Cr = i+j =r Pi (a − 1)Sj + i+j =r−1 Pi bSj , for r > 0 [7]. Consequently, there exists n ∈ N and a Krasner factorization (I, J ) of Zn such that: an − 1 C0 = a n , P 0 = a I , S 0 = a J , a I a J = . (10) a−1 Furthermore, if we set P1 = i∈I a i ba Li , S1 = j ∈J a Mj ba j , I , J , Li , Mj ⊆ N, we have: C1 = C ∩ a ∗ ba ∗ = a I ba J + a i ba Li (a − 1)a J + a Mj (a − 1)a I ba j 0. i∈I
(11)
j ∈J
Proposition 7.1 (De Felice [9]). Let C1 be a subset of a ∗ ba ∗ which satisfies Eqs. (10) and (11). Then, there exists a unique arrangement A 1 = (a zp,q ba tp,q )0 p n−1, 0 q n−1 of a J C1 a I which satisfies the following properties, for p, q ∈ {0, . . . , n − 1}: (1) Rp = {zp,q | q ∈ {0, . . . , n − 1}} = {0, . . . , n − 1} (mod n), Tq = {tp,q | p ∈ {0, . . . , n − 1}} = {0, . . . , n − 1} (mod n). (2) Two words a zp,q ba tp,q , a zp ,q ba tp ,q have the same exponent zp,q = zp ,q = q (mod n) (resp. tp,q = tp ,q = p (mod n)) if and only if q = q (resp. p = p ), i.e., they belong ). to the same word-column Tq,w (resp. word-row Rp,w it holds: (3) For the word-rows Rp,w and the word-columns Tq,w ∀ i ∈ I, j ∈ J, a r ba v ∈ C1 a r+j ba v+i ∈ Rp,w ⇒ a J a r ba v+i ⊆ Rp,w , a r+j ba v+i ∈ Tq,w ⇒ a r+j ba v a I ⊆ Tq,w .
) in (resp. word-column Tq,w Proposition 7.2 (De Felice [9]). For every word-row Rp,w
A 1 , a subset Rp,w = a rp,1 ba vp,1 + · · · + a rp, ba vp, (resp. Tq,w = a r1,q ba v1,q + · · · + a rm,q ba vm,q ) of words in C1 exists such that: Rp,w = a J (a rp,1 ba vp,1 +ip,1 + · · · + a rp,l ba vp, +ip, ),
= (a r1,q +j1,q ba v1,q + · · · + a rm,q +jm,q ba vm,q )a I , ) (resp. Tq,w
where the order of the elements is not taken into account and ip,1 , . . . , ip, ∈ I (resp. , . . . ,j rp,g ba vp,g ∈ j1,q m,q ∈ J ) are not necessarily different. Furthermore, let Rp = {rp,g | a
| a rg,q ba vg,q ∈ T Rp,w } and Tq = {vg,q q,w }. Thus, for p, q ∈ {0, . . . , n − 1}, (Rp , Tq ) is a Hajós factorization of Zn having (I, J ) as a Krasner companion factorization and it holds:
a Rp = a J a Rp ,
a Tq = a Tq a I .
Let C1 be a subset of a ∗ ba ∗ which satisfies Eq. (11) with (I, J ) defined by Eq. (10). Then C1 satisfies the conditions contained in Proposition 7.1 and let A 1 be the corresponding arrangement of a J C1 a I . In Proposition 7.3 below, we show that there exists a good
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
253
arrangement B of C1 with (I, J ) as a Krasner associated pair. In the proof of this result, we construct B starting with A 1 and with the induced arrangement A of the rows in A 1 , by the following matrix transformations: 1
A 1 → B1 (defined by|I | columns in A 1 which are selected according to Proposition 6.2), 2
→ B1 (defined by erasing the elements of a J on the left of b in B1 ), 3
→ B (defined by the dual version of 1 , i.e., by |J | selected rows in B1 ), 4
→ B (defined by the dual version of 2 , i.e., by erasing the elements of a I on the right of b in B ). Proposition 7.3. Let C1 be a subset of a ∗ ba ∗ which satisfies Eq. (11) with (I, J ) defined by Eq. (10). Then, there exists a good arrangement of C1 with (I, J ) as a Krasner associated pair. Proof. Let C1 be a subset of a ∗ ba ∗ which satisfies Eq. (11) with (I, J ) defined by Eq. (10). Then, C1 satisfies the conditions contained in Propositions 7.1 and 7.2 and we will use the same notations used in these propositions. Let (Rp , Tq ) be the Hajós factorizations of Zn defined in Proposition 7.2, let A 1 be the arrangement of a J C1 a I satisfying the conditions contained in Proposition 7.1. Consider the induced arrangement A = (zp,q )0 p n−1, 0 j n−1 of the rows in A 1 . By using Proposition 6.2, an ordered sequence Jq = (j0,q , . . . , jn−1,q ) of elements of J exists and = |I | columns A = (zp,nq )0 p n−1, 0 q −1 in A also exist such that, for each p ∈ {0, . . . , n − 1} and q ∈ {0, . . . , − 1}, we have zp,nq = rp,q + jp,q with D = (rp,q )0 p n−1, 0 q −1 being a good arrangement of n−1 p=0 Rp with respect to the rows. Consider the columns tp,nq r +j p,q p,q ba )0 p n−1, 0 q −1 of A 1 such that the induced arrangement of B1 = (a the rows of B1 is A . We claim that, when we erase in B1 the elements of a J on the left (i.e., when we consider the matrix defined by the word-columns Tn q ,w = (a r0,q ba v0,q + · · · + a rn−1,q ba vn−1,q )a I ) we obtain an arrangement B1 of C1 a I . Intuitively, when we erase the elements of a J on the left , we obtain |J | copies of a subset of C a I : B is obtained by selecting in a word-row Rp,w 1 1 one copy of each element in this subset. In detail, for each word a r ba v ∈ C1 , there exist a of A 1 and i ∈ I such that all the elements in a r+J ba v+i are elements of word-row Rp,w Rp,w . Thus, r ∈ Rp and there exist q, rp,q , jp,q such that r = rp,q and zp,nq = rp,q + jp,q . Since a r a J ba ∗ ∩ Rp,w = a r+J ba v+i we have that a r ba v+i is in Tn q ,w . Furthermore, when we consider in B1 the corresponding arrangement of the exponents of the a s on the left of b, we find the good arrangement D. We now find the required arrangement of C1 by using the same argument as above with respect to the columns and to the exponents of the a s on the right of b. Indeed, each word-column Tn q ,w in B1 is also a word-column in A 1 . Thus, B1 (and so B1 ) maintains all properties of A 1 contained in Propositions 7.1, 7.2, with respect to the columns. In particular, the induced arrangement T = (tp,nq )0 p n−1, 0 q −1 of the columns in B1 is an arrangement of −1 q=0 (Tq + I ) such that (Tq + I ) is the qth column, 0 q − 1.
254
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
Furthermore, we have Tn q = Tq +I = {tp,nq | p ∈ {0, . . . , n−1}} = {0, . . . , n−1} (mod n) tp ,n
and two words a rp,q ba tp,nq , a rp ,q ba q have the same exponent tp,nq = tp ,nq = . p (mod n) if and only if p = p , i.e., they belong to the same word-row Rp,w Then, by using the dual version of Proposition 6.2, for each p ∈ {0, . . . , m − 1}, with m = |J |, an ordered sequence Ip = (ip,0 , . . . , ip,−1 ) of elements of I exists and m = |J | rows T = (tn p ,nq )0 q −1 in T also exist such that, for each p ∈ {0, . . . , m − 1} and q ∈ {0, . . . , − 1}, we have tn p ,nq = vp,q + ip,q with D = (vp,q )0 p m−1, 0 q −1 being a good arrangement of −1 q=0 Tq with respect to the columns. Consider the rows B = (a rp,q ba vp,q +ip,q )0 p m−1, 0 q −1 of B1 such that the induced arrangement of the columns of B is T . Let us prove that when we erase in B the elements of a I on the right, we obtain a good arrangement B = (a rp,q ba vp,q )0 p m−1, 0 q −1 of C1 . Firstly, B is an arrangement of C1 . Intuitively, when we erase the elements of a I on the right in a word-column Tn q ,w , we obtain |I | copies of a subset of C1 : B is obtained by selecting one copy of each element in this subset. In detail, we have already observed that B1 is an arrangement of C1 a I which maintains all properties of A 1 contained in Propositions 7.1, 7.2, with respect to the columns. Now, for each a r ba v ∈ C1 , a r ba v belongs to a column in B1 (since B1 is an arrangement of C1 a I ) and so all the elements in a r ba v+I are in a word-column Tn q ,w in B1 , in view of condition (3) in Proposition 7.1. Thus, there exist p, vp,q , ip,q such that v = vp,q and tn p ,nq = vp,q + ip,q . Since a ∗ ba v+I ∩ Tn q ,w = a r ba v+I , we have a r ba v ∈ B and B is an arrangement of C1 . Finally, when we consider B, we see that the induced arrangement of the rows is a set of rows in D and the induced arrangement of the columns is D . Thus, B is a good arrangement of C1 .
Suppose that (I, J ) is a Krasner factorization of Zn and suppose that C1 has a good arrangement with (I, J ) as a Krasner associated pair. A natural question which arose is whether the set C1 ∪ a n is a code and partial results towards a positive answer to this question have been given in [12]. We end this section with an example which shows that the hypothesis of the existence of this special arrangement is necessary. Indeed, in Example 7.2, we point out that sets C1 of words exist with C1 ⊆ a ∗ ba ∗ , which are not codes but which have arrangements over a matrix such that for any row Tp and any column Rq , (Tp , Rq ) is a Hajós factorization of Zn . Example 7.2. Consider C1 = {b, aba, a 4 ba, a 5 b, a 4 ba 2 , a 3 ba 3 , ba 3 , a 7 ba 2 }. C1 is not a code since (ba 3 )(aba) = b(a 4 ba). Observe that n is uniquely defined by n = |C1 | and thus n = 8. We have two possible arrangements of C1 over a matrix such that for any row Tp and any column Rq , (Tp , Rq ) is a Hajós factorization of Z8 which correspond to the chain 1|2|4|8 of divisors of 8. They are not good arrangements and are reported below: C1 =
a 0 ba 0 a 4 ba 2
a 1 ba 1 a 3 ba 3
a 4 ba 1 a 0 ba 3
a 5 ba 0 a 7 ba 2
,
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
255
with corresponding Krasner pair I = {0, 1, 4, 5}, J = {0, 2} and
a 4 ba 1 a 5 ba 0 C1 = 7 2 a ba a 0 ba 3
a 0 ba 0 a 1 ba 1 , a 3 ba 3 a 4 ba 2
with corresponding Krasner pair I = {0, 4}, J = {0, 1, 2, 3}. We also observe that codes exist which have no (good) arrangement, namely X = {ba, ab, ba 2 , a 3 ba 2 }. We know that X + a 4 has no factorizing completion but we do not know whether X + a 4 has a finite completion. If this finite completion existed then it would be a counterexample to the factorization conjecture.
References [1] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [2] J. Berstel, C. Reutenauer, Rational Series and Their Languages, EATCS Monographs on Theoretical Computer Science, Vol. 12, Springer, Berlin, 1988. [3] J.M. Boë, Sur les codes factorisants, in: D. Perrin (Ed.), “ Théorie des codes”, LITP, 1979, pp. 1–8. [4] J.M. Boë, Sur les codes synchronisants coupants, in: A. de Luca (Ed.), “Non Commutative Structures in Algebra and Geometric Combinatorics”, Quaderni della Ric. Sci. del C.N.R., Vol. 109, 1981, pp. 7–10. [5] V. Bruyère, M. Latteux, Variable-Length Maximal Codes, in: Proc. Icalp 96, Lecture Notes in Computer Science, Vol. 1099, 1996, pp. 24–47. [6] C. De Felice, Construction of a family of finite maximal codes, Theoret. Comput. Sci. 63 (1989) 157–184. [7] C. De Felice, A partial result about the factorization conjecture for finite variable-length codes, Discrete Math. 122 (1993) 137–152. [8] C. De Felice, An application of Hajós factorizations to variable-length codes, Theoret. Comput. Sci. 164 (1996) 223–252. [9] C. De Felice, On a Property of the Factorizing Codes, Internat. J. Algebra Comput. (Special issue dedicated to M. P. Schützenberger) 9 (1999) 325–345. [10] C. De Felice, On some Schützenberger conjectures, Inform. Comput. 168 (2001) 144–155. [11] C. De Felice, On a complete set of operations for factorizing codes, Theoret. Inform. Appl. (2005) to appear. [12] C. De Felice, Solving inequalities with factorizing codes: part 1, manuscript (2005). [13] C. De Felice, C. Reutenauer, Solution partielle de la conjecture de factorisation des codes, C.R. Acad. Sci. Paris 302 (1986) 169–170. [14] G. Hajós, Sur la factorisation des groupes abéliens, Casopis Pest. Mat. Fys. 74 (1950) 157–162. [15] M. Krasner, B. Ranulac, Sur une propriété des polynômes de la division du cercle, C.R. Acad. Sci. Paris 240 (1937) 397–399. [16] N.H. Lam, Hajós factorizations and completion of codes, Theoret. Comput. Sci. 182 (1997) 245–256. [17] D. Perrin, M.P. Schützenberger, Un problème élémentaire de la théorie de l’information, “Théorie de l’Information”, Colloques Internat. CNRS, Vol. 276, Cachan, 1977, pp. 249–260. [18] D. Perrin, M.P. Schützenberger, A conjecture on sets of differences of integer pairs, J. Combin. Theory B 30 (1981) 91–93. [19] A. Restivo, On codes having no finite completions, Discrete Math. 17 (1977) 309–316. [20] A. Restivo, S. Salemi, T. Sportelli, Completing codes, RAIRO Inform. Théoret. Appl. 23 (1989) 135–147. [21] C. Reutenauer, Sulla fattorizzazione dei codici, Ricerche Mat. XXXII (1983) 115–130. [22] C. Reutenauer, Non commutative factorization of variable-length codes, J. Pure Appl. Algebra 36 (1985) 167–186.
256
C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256
[23] A.D. Sands, On the factorisation of finite abelian groups, Acta Math. Acad. Sci. Hungar. 8 (1957) 65–86. [24] M.P. Schützenberger, Une théorie algébrique du codage, Séminaire Dubreil-Pisot 1955–56, exposé no 15, 1955, 24pp. [25] M.P. Schützenberger, Codes à longueur variable, manuscript, 1965, reprinted in: D. Perrin (Ed.), “Théorie des codes”, LITP, 1979, pp. 247–271. [26] L. Zhang, C.K. Gu, Two classes of factorizing codes-(p, p)-codes and (4, 4)-codes, in: M. Ito, H. Jürgensen (Eds.), “Words Languages and Combinatorics II”, World Scientific, Singapore, 1994, pp. 477–483.
Theoretical Computer Science 340 (2005) 257 – 272 www.elsevier.com/locate/tcs
Tile rewriting grammars and picture languages夡 Stefano Crespi Reghizzi, Matteo Pradella∗ DEI - Politecnico di Milano and CNR IEIIT-MI, Piazza Leonardo da Vinci, 32, I-20133 Milano, Italy
Abstract Tile rewriting grammars (TRG) are a new model for defining picture languages. A rewriting rule changes a homogeneous rectangular subpicture into an isometric one tiled with specified tiles. Derivation and language generation with TRG rules are similar to context-free grammars. A normal form and some closure properties are presented. We prove this model has greater generative capacity than the tiling systems of Giammarresi and Restivo and the grammars of Matz, another generalization of context-free string grammars to 2D. Examples are shown for pictures made by nested frames and spirals. © 2005 Elsevier B.V. All rights reserved. Keywords: Picture languages; 2D languages; Tiling systems; Context-free grammars; Locally testable languages
1. Introduction In the past, several proposals have been made for applying to pictures (or 2D) languages the generative grammar approach but in our opinion none of them matches the elegance and descriptive adequacy that made context free (CF) grammars so successful for string languages. A picture is a rectangular array of terminal symbols (the pixels). A survey of formal models for picture languages is [3] where different approaches are compared and related: tiling systems, cellular automata, and grammars. The latter had been
夡 A preliminary version is [6]. Work partially supported by MIUR, Progetto Linguaggi formali e automi, teoria e applicazioni.
∗ Corresponding author. Tel.: +39 02 2399 3495; fax: +39 02 2399 3666.
E-mail addresses:
[email protected] (S. Crespi Reghizzi),
[email protected] (M. Pradella). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.041
258
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
surveyed in more detail by Siromoney [7]. Classical 2D grammars can be grouped into two categories 1 called matrix and array grammars. The array grammars, introduced by Rosenfeld, impose the constraint that the left and right parts of a rewriting rule must be isometric arrays; this condition overcomes the inherent problem of “shearing” which pops up while substituting a subarray in a host array. Siromoney’s matrix grammars are parallel-sequential in nature, in the sense that first a horizontal string of nonterminals is derived sequentially, using the horizontal productions; and then the vertical derivations proceed in parallel, applying a set of vertical productions. Several variations have been made, for instance [1]. A particular case is the 2D right-linear grammars in [3]. Matz’s context-free picture grammars [5] rely on the notion of row and column concatenation and their closures. A rule is like a string CF one, but the right part is a 2D regular expression. The shearing problem is avoided because, say, row concatenation is a partial operation which is only defined on pictures of identical width. Exploring a different course, our new model, tile rewriting grammar (TRG), intuitively combines Rosenfeld’s isometric rewriting rules with the tiling system (TS) of Giammarresi and Restivo [2]. The latter defines the family of recognizable 2D languages (the same accepted by on-line tessellation automata of Inoue and Nakamura [4]). A TRG rule is a schema having a nonterminal symbol to the left and a local 2D language to the right over terminals and nonterminals; that is the right part is specified by a set of fixed size tiles. As in matrix grammars, the shearing problem is avoided by an isometric constraint, but the size of a TRG rule need not be fixed. The left part denotes any rectangle filled with the same nonterminal. Whatever size the left part takes, the same size is assigned to the right part. To make this idea effective, we impose a tree partial order on the areas which are rewritten. A progressively refined equivalence relation implements the partial ordering. Derivations can then be visualized in 3D as well nested prisms, the analogue of syntax trees of string grammars. To our knowledge, this approach is novel and is able to generate an interesting gamut of pictures: grids, spirals, and in particular a language of nested frames, which is in some way the analogue of a Dyck language. Section 2 lists the basic definitions. Section 3 presents the definition of TRG grammar and derivation, two examples, and proves the basic properties of the model: canonical derivation, uselessness of concave rules, normal forms, closures for some operations. Section 3 compares TRG with other models, proving that its generative capacity exceeds that of TS and of Matz’s CF picture grammars. The appendix contains the grammar of Archimedes spirals.
2. Basic definitions Many of the following notation and definitions are from [3].
1 Leaving aside the graph grammar models because they generate graphs, not 2D matrices.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
259
Definition 1. For a finite alphabet , the set of pictures is ∗∗ . For h, k 1, (h,k) denotes the set of pictures of size (h, k) (we will use the notation |p| = (h, k), |p|row = h, |p|col = k). # is used when needed as a boundary symbol; pˆ refers to the bordered version of picture p. That is
p ∈ (h,k)
p(1, 1) . . . p(1, k) .. .. .. ≡p= , . . . p(h, 1) . . . p(h, k)
# # # p(1, 1) .. pˆ = ... . # p(h, 1) # #
... # . . . p(1, k) .. .. . . . . . p(h, k) ... #
# # .. . . # #
A pixel is an element p(i, j ). If all pixels are identical to C ∈ the picture is called homogeneous and denoted as C-picture. Row and column concatenations are denoted and ¸, respectively. pq is defined iff p and q have the same number of columns; the resulting picture is the vertical juxtaposition of p over q. pk is the vertical juxtaposition of k copies of p; p ∗ is the corresponding closure. ¸,k ¸ ,∗¸ are the column analogues. The pixel-by-pixel cartesian product (written p ⊗ q) is defined iff |p| = |q| and is such that for all i, j , (p ⊗ q)(i, j ) = p(i, j ), q(i, j ) . Definition 2. Let p be a picture of size (h, k). A subpicture of p at position (i, j ) is a picture q such that, if (h , k ) is the size of q, then h h, k k, and there exist integers i, j (i h − h +1, j k−k +1) such that q(i , j ) = p(i+i −1, j +j −1) for all 1 i h , 1 j k . We will write also q (i,j ) p, or the shortcut q p ≡ ∃i, j (q (i,j ) p). Moreover, if q (i,j ) p, we define coor (i,j ) (q, p) as the set of coordinates of p where q is located: coor (i,j ) (q, p) = {(x, y) | i x i + |q|row − 1 ∧ j y j + |q|col − 1}. Conventionally, coor (i,j ) (q, p) = ∅, if q is not a subpicture of p. If q coincides with p we write coor(p) instead of coor (1,1) (p, p).
Definition 3. Let be an equivalence relation on coor(p), written (x, y) ∼(x , y ). Two
subpictures q (i,j ) p, q (i ,j ) p are -equivalent, written q ∼ q , iff for all pairs (x, y) ∈
coor (i,j ) (q, p) and (x , y ) ∈ coor (i ,j ) (q , p) it holds (x, y) ∼(x , y ). A homogeneous C-subpicture q p is called maximal with respect to relation iff for every -equivalent C-subpicture q we have coor(q, p) ∩ coor(q , p) = ∅ ∨ coor(q , p) ⊆ coor(q, p). In other words q is maximal if any C-subpicture which is equivalent to q is either a subpicture of q or it is not overlapping. 2
2 Maximality as used in [6] is different. It corresponds to the condition coor(q, p)coor(q , p).
260
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
Definition 4. For a picture p ∈ ∗∗ the set of subpictures (or tiles) with size (h, k) is Bh,k (p) = {q ∈ (h,k) | q p}. We assume B1,k to be only defined on (1,∗) (horizontal strings), and Bh,1 on (∗,1) (vertical strings). For brevity, for tiles of size (1, 2), (2, 1), or (2, 2), we introduce the following notation: if |p| = (1, k), k > 1, B1,2 (p) 'p( = B2,1 (p) if |p| = (h, 1), h > 1, B2,2 (p) if |p| = (h, k), h, k > 1. Definition 5. Consider a set of tiles ⊆ (i,j ) . The locally testable language in the strict sense defined by (written LOC u () 3 ) is the set of pictures p ∈ ∗∗ such that Bi,j (p) ⊆ . The locally testable language defined by a finite set of tiles LOC u,eq ({1 , 2 , . . . , n }) 4 is the set of pictures p ∈ ∗∗ such that for some k, Bi,j (p) = k . The bordered locally testable language defined by a finite set of tiles LOC eq ({1 , 2 , . . . , n }) is the set of pictures p ∈ ∗∗ such that for some k, Bi,j (p) ˆ = k . Definition 6. Substitution. If p, q, q are pictures, q (i,j ) p, and q, q have the same size, then p[q /q](i,j ) denotes the picture obtained by replacing the occurrence of q at position (i, j ) in p with q . Definition 7. The (vertical) mirror image and the (clockwise) rotation of a picture p (with |p| = (h, k)), respectively, are defined as follows: p(h, 1) . . . p(h, k) .. .. .. Mirror(p) = , . . . p(1, 1) . . . p(1, k)
p(h, 1) . . . p(1, 1) .. .. .. p = . . . . p(h, k) . . . p(1, k) R
Note that the sizes of Mirror(p) and pR are, respectively, (h, k) and (k, h). 3. Tile rewriting grammars The main definition follows: Definition 8. A Tile Rewriting Grammar (in short grammar) is a tuple (, N, S, R), where is the terminal alphabet, N is a set of nonterminal symbols, S ∈ N is the starting symbol, R is a set of rules. R may contain two kinds of rules: Fixed size: A → t, where A ∈ N , t ∈ ( ∪ N )(h,k) , with h, k > 0; Variable size: A → , where A ∈ N , ⊆ ( ∪ N )(h,k) , with 1 h, k 2. 3 To avoid confusion with LOC defined in [3], we mark these with “u” (stands for unbordered, because they do not use boundary symbols). 4 eq stands for equality test.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
261
Intuitively a fixed size rule is intended to match a subpicture of (small) bounded size, identical to the right part t. A variable size rule matches any subpicture of any size which can be tiled using all the elements t of the tile set . However, fixed size rules are not a special case of variable size rules. Definition 9. Consider a grammar G = (, N, S, R), let p, p ∈ ( ∪ N )(h,k) be pictures of identical size, and let , be equivalence relations over coor(p). We say that (p , ) is derived in one step from (p, ), written (p, ) ⇒G (p , ) iff for some A ∈ N and for some rule : A → . . . ∈ R there exists in p a A-subpicture r (m,n) p, maximal with respect to , such that • p is obtained substituting r with a picture s, i.e. p = p[s/r](m,n) , where s is defined as follows: Fixed size: if = A → t, then s = t; Variable size: if = A → , then s ∈ LOC u,eq (). • Let z be coor (m,n) (r, p). Let be the -equivalence class containing z. Then, is equal to , for all the equivalence classes = ; in is divided in two equivalence classes, z and its complement with respect to (= ∅ if z = ). More formally,
= \ {((x1 , y1 ), (x2 , y2 )) | (x1 , y1 ) ∈ z xor (x2 , y2 ) ∈ z} . The subpicture r is named the application area of rule in the derivation step. n We say that (q, ) is derivable from (p, ) in n steps, written (p, ) ⇒G (q, ), iff p = q
and = , when n = 0, or there are a picture r and an equivalence relation
such that n−1
∗
(p, ) ⇒G (r,
) and (r,
) ⇒G (q, ). We use the abbreviation (p, ) ⇒G (q, ) for a derivation with n 0 steps. Definition 10. The picture language defined by a grammar G (written L(G)) is the set of p ∈ ∗∗ such that, if |p| = (h, k), then ∗ S (h,k) , coor(p) × coor(p) ⇒G (p, ), (1) ∗
where the relation is arbitrary. For short we write S ⇒G p. Note that the derivation starts with a S-picture isometric with the terminal picture to be generated, and with the universal equivalence relation over the coordinates. The equivalence relations computed by each step of (1) are called geminal relations. When writing examples by hand, it is convenient to visualize the equivalence classes of a geminal relation, by appending the same numerical subscript to the pixels of the application area rewritten by a derivation step. The final classes of equivalence represent in some sense a 2D generalization of the parenthesis structure that parenthesized context-free string grammars assigned to a sentence.
262
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
Example 11. Chinese boxes: G = (, N, S, R), where = {, , , , ◦}, N = {S}, and R consists of one fixed size, one variable size rule: S→
; S→
◦
◦ , S
◦
S , ◦
◦ ◦
S , S
S ◦
S , ◦
◦ S
◦ S , S S
S , S
◦ S
◦
,
S S
◦ , ◦
S ◦
◦
.
For brevity and readability, we will often specify a set of tiles by a sample picture exhibiting the tiles as its subpictures. We write | to separate alternative right parts of rules with the same left part (analogously to string grammars). The previous grammar becomes ◦ ◦ ◦ S S ◦ . | S→ ◦ S S ◦ ◦ ◦ A picture in L(G) is
◦ ◦ ◦ ◦
◦ ◦ ◦ ◦
◦ ◦ ◦ ◦
◦ ◦ ◦ ◦
◦ ◦ ◦ ◦
◦ ◦ ◦ ◦
and is obtained applying the variable size rule twice and then the fixed size rule. We show a complete derivation for a more general version of this language in the following example. Example 12. 2D Dyck analogue: The next language Lbox , a superset of Chinese boxes, can be defined by a sort of blanking rule. But since terminals cannot be deleted without shearing the picture, we replace them with a character b (blank or background). Empty frame: Let k 0. An empty frame is a picture defined by the regular expression: (¸(◦)k ¸ ¸)(◦¸bk ¸ ¸◦)k (¸(◦)k ¸ ¸), i.e. a box bordered by ◦, containing just b’s. Blanking: The blanking of an empty frame p is the picture del(p) obtained by applying the projection del(x) = b, x ∈ ∪ {b}. A picture p is in Lbox iff by repeatedly applying del to subpictures which are empty frames, an empty frame is obtained. To obtain the grammar, we add the following rules to the Chinese boxes grammar: S S S S S S X X , X → S S . S→ | X X S S X X S S X X To illustrate, in Fig. 1 we list the derivation steps of a picture. Nonterminals in the same equivalence class are marked with the same subscript. Although this language can be viewed as a 2D analogue of a Dyck’s string language, variations are possible and we do not claim the same algebraic properties as in 1D.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
263
Fig. 1. Example derivation with marked application areas.
3.1. Basic properties The next two statements, which follow immediately from Definitions 3 and 9, may be viewed as a 2D formulation of well-known properties of 1D CF derivations. Let p1 ⇒ · · · ⇒ pn+1 be a derivation, and r1 (i1 ,j1 ) p1 , . . . , rn (in ,jn ) pn the corresponding application areas. Disjointness of application areas: For any pf , pg , f < g, one of the following holds: (1) coor (ig ,jg ) (rg , pg ) ⊆ coor (if ,jf ) (rf , pf ); (2) coor (if ,jf ) (rf , pf ) ∩ coor (ig ,jg ) (rg , pg ) = ∅. That is, the application area of a later step is either totally placed within the application area of a previous step, or it does not overlap. As a consequence, a derivation can be represented in 3D as a well-nested forest of rectangular prisms, the analogous of derivation trees of string languages. Canonical derivation: The previous derivation is lexicographic iff f < g implies (if , jf ) lex (ig , jg ) (where lex is the usual lexicographic order). Then, the following result holds: ∗
∗
L(G) ≡ {p | S ⇒G p and ⇒G is a lexicographic derivation}.
264
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272 ∗
Definition 13. A rule of a grammar G is useful if there exists a derivation S ⇒G p ∈ ∗∗ which makes use of at some step; otherwise is called useless. Definition 14. Consider a grammar G = (, N, S, R). A variable size rule A → is called concave iff contains an element of the following set: A A x A A x A A , , , , x A A A A A A x where A ∈ N, x ∈ N ∪ , x = A. Theorem 15. A concave rule is useless. Proof. By contradiction, if A → , a concave rule, is used in a derivation, then LOC u,eq in Definition 9 compels the use of every tile in . But concave tiles generate pictures having a concave area filled with the same nonterminal, say A, and the geminal relation updated by the derivation step is such that this whole area is in the same equivalence class. But Definition 3 makes it impossible to find at following steps, a A-subpicture which is maximal with respect to the geminal relation; hence the derivation fails to produce a terminal picture. A useful grammar transformation consists of moving terminal symbols to fixed size rules.
Definition 16. A grammar G is in terminal normal form iff the only rules with terminals have the form A → x, x ∈ , i.e. they are unitary rules. Theorem 17. Every grammar G = (, N, S, R) has an equivalent grammar G = (, N , S, R ) in terminal normal form. Proof. To construct G , we eliminate terminals from variable size rules and nonunitary fixed size rules. N contains N, and for every terminal a, we have in N two nonterminals a, 0 and a, 1 . The idea is to replace every homogeneous a-subpicture with a chequered area of a, 0 and a, 1 , in which every application area has size (1, 1). (m,n) (m,n) Let Ch0 , (Ch1 , respectively) be a chequerboard made of 0 and 1 symbols, starting with a 0 (1, resp.) at the top-leftmost position. Let : N ∪ (N × {0, 1}) → N be the projection defined as (a, k ) = a, k , if a∈ ; (A, k ) = A, if A ∈ N . The mapping Chequer : P ( ∪ N )(m,n) → P (N )(m,n) is defined by |t| |t| Chequer() = (t ⊗ t ) | t ∈ ∧ t ∈ {Ch0 , Ch1 } . Then, for every variable size rule X → in G, the following rules are in G : X → | ⊆ Chequer() ∧ Chequer −1 ( ) = . |t| For every nonunitary fixed size rule X → t, the rule X → t ⊗ Ch0 is in G . Moreover, the unitary fixed size rules a, 0 → a, a, 1 → a are in G . G is by construction in terminal normal form.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
265
By construction, rules in G maintain the same structure and applicability of rules in G, as far as nonterminals in N are concerned. The only difference resides in derived terminal subpictures that are replaced in G by chequered subpictures made of new nonterminals, which maintain information about the terminal symbol originally derivable in G in the same area. The chequered structure of these subpictures contains only unitary application areas. Therefore, starting from these subpictures, and using the unitary terminal rules introduced in R , it is always possible to derive homogeneous terminal subpictures, identical to those derivable from G. Example 18. Terminal normal form of Example 11. It is possible to obtain the equivalent terminal normal form grammar by using the construction presented in Theorem 17. For ease of reading, we write the nonterminals a, k , a ∈ , k ∈ {0, 1} as ak . The resulting grammar (without useless rules) is the following: 0 ◦ 1 ◦ 0 ◦ 1 ◦ 0 1 ◦ 1 S S S S ◦0 0 ◦1 ◦0 1 ◦ 0 S S S S ◦1 ◦ 1 S S ◦0 0 1 | S→ | 1 0 ◦ 0 S S ◦1 ◦ 1 S S S S ◦0 ◦ 0 S S S S ◦1 1 ◦0 ◦1 0 1 ◦0 ◦1 ◦0 ◦1 0
0 → ; 1 → ; ◦0 → ◦; ◦1 → ◦. 3.2. Closure properties For simplicity, in the following theorem we suppose that L(G1 ), L(G2 ) contain pictures of size at least (2,2). Theorem 19. The family L(TRG) is closed under union, column/row concatenation, column/row closure operations, rotation, and alphabetical mapping (or projection). Proof. Consider two grammars G1 = (, N1 , A, R1 ) and G2 = (, N2 , B, R2 ). Suppose for simplicity that N1 ∩ N2 = ∅, S ∈ / N1 ∪ N2 , and that G1 , G2 generate pictures having size at least (2, 2). Then it is easy to show that the grammar G = (, N1 ∪ N2 ∪ {S}, S, R1 ∪ R2 ∪ R), where Union ∪: A A B B R= S→ , S→ A A B B is such that L(G) = L(G1 ) ∪ L(G2 ). Concatenation ¸/: A A B B R= S→ A A B B is such that L(G) = L(G1 )¸L(G2 ). The row concatenation case is analogous.
266
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
Closures ∗¸ /∗ : G = (, N1 ∪ {S}, S, R1 ∪ R), where
R= S→
A A S S A A S S
A A | A A
is such that L(G) = L(G1 )∗¸ . The row closure case is analogous. Rotation R: Construct the grammar G = (, N, A, R ), where R is such that, if B → t ∈ R1 is a fixed size rule, then B → t R is in R ; if B → ∈ R1 is a variable size rule, then B → is in R , with t ∈ implying t R ∈ . It is easy to verify that L(G) = L(G1 )R . Projection : Without loss of generality, we suppose G1 in terminal normal form (Theorem 17). Consider a projection : 1 → 2 . It is immediate to build a grammar G = (2 , N1 , A, R2 ), such that L(G ) = (L(G1 )): simply apply to unitary rules. That is, if X → x ∈ R1 , then X → (x) ∈ R2 , while the other rules of G1 remain in R2 unchanged. 4. Comparison with other models We first compare with CF string grammars, then TS, and finally with Matz’s 2D CF grammars. 4.1. String grammars If in Definition 8 we choose h = 1, then a TRG defines a string language. Such 1D TRGs are easily proved to be equivalent to CF string grammars. 5 In fact, the TRG model for string languages is tantamount to a notational variant [6] of classical CF grammars, where the right parts of rules are local languages. 4.2. Tiling systems and 2D CF grammars The next comparison has to face two technical difficulties: TS are defined by local languages with boundary symbols, which are not present in TRG, and the test of which tiles are present uses inclusion in TS, equality in TRG. First we prove that a class of local languages is strictly included in L(TRG). Lemma 20. L(LOC u,eq ) ⊆ L(TRG). Proof. Consider a local 2D language over defined (without boundaries) by the set of sets of allowed tiles {ϑ1 , ϑ2 , . . . , ϑn }, ϑi ⊆ (2,2) . An equivalent grammar is S → ϑ1 | ϑ2 | . . . | ϑn . 5 However the empty string cannot be generated by a 1D TRG.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
267
To simplify the comparison with TS, we reformulate them using the terms of Definition 5, showing their equivalence. Then we prove strict inclusion with respect to TRG. First we recall the original definition. Definition 21 (Giammarresi and Restivo [3, Definition 7.2]). A tiling system (TS) is a 4ple T = (, , ϑ, ), where and are two finite alphabets, (1) ϑ is a finite set of tiles over the alphabet ∪ {#}, and : → is a projection. Definition 22. The tiling systems T Seq and T Su,eq are the same as a TS, with the following respective changes: • Replace the local language defined by (1) with LOC eq ({ϑ1 , ϑ2 , . . . , ϑn }), where ϑi is a finite set of tiles over . • Replace the local language defined by (1) with LOC u,eq ({ϑ1 , ϑ2 , . . . , ϑn }), where ϑi is a finite set of tiles over . In T Su,eq there is no boundary symbol #. Lemma 23. L(T Seq ) ≡ L(T S). Proof. First, L(T S) ⊆ L(T Seq ). This is easy, because if we consider the tile set ϑ of a TS, by taking {ϑ1 , ϑ2 , . . . , ϑn } = P(ϑ) (the powerset) we obtain an equivalent T Seq . Second, we have to prove that L(T Seq ) ⊆ L(T S). In [3], the family of languages L(LOC eq ()), where is a set of sets of tiles, is proved to be a proper subset of L(T S) (Theorem 7.8). But L(T S) is closed with respect to projection, and L(T Seq ) is the closure with respect to projection of L(LOC eq ()). Therefore, L(T Seq ) ⊆ L(T S). Next we prove that boundary symbols can be removed. Lemma 24. L(T Su,eq ) ≡ L(T Seq ). Proof (Sketch). Part L(T Seq ) ⊆ L(T Su,eq ): Let T = (, , {ϑ1 , ϑ2 , . . . , ϑn }, ) be a T Seq . For every tile set ϑi , separate its tiles containing the boundary symbol # (call this subset ϑ i ) from the other tiles (ϑ
i ). That is, ϑi = ϑ i ∪ ϑ
i . Introduce a new alphabet and a bijective mapping br : → . We use symbols in to encode boundary, and new tile sets i to contain them: for every tile t in ϑ
i , if there is a tile in ϑ i which overlaps with t, then encode this boundary in a new tile t and put it in the set i . For example, suppose a b ∈ ϑ
1 c d overlaps with # # ∈ ϑ 1 a b and with d # ∈ ϑ 1 , # #
268
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
then both br(a) br(b) , c d and a br(b) br(c) br(d) are in i . Consider a T Su,eq T = (, ∪ , , ), where extends to as follows:
(br(a)) = (a) = (a), a ∈ , and ubr : ∪ → is defined as ubr(a) = br −1 (a), if a ∈ , otherwise = a, and it is naturally extended to tiles and tile sets. is the set ϑ | ϑ ⊆ ϑ
i ∪ i ∧ ubr(ϑ) = ϑ
i ∧ ϑ ∩ i = ∅ ∧ 1 i n . The proof that L(T ) = L(T ) is straightforward and is omitted. Part L(T Su,eq ) ⊆ L(T Seq ): Let T = (, , {ϑ1 , ϑ2 , . . . , ϑn }, ) be a T Su,eq . To construct an equivalent T Seq , we introduce the boundary tile sets i , defined as follows. For every tile a b ∈ ϑi , c d the following tiles are in i : # # # # # # # a b # # c c d d # , , , , , , , . # a a b b # # c d # # # # # # # Consider a T Seq T = (, , , ), where is the set ϑ ∪ ϑ i | ϑ ⊆ i ∧ ϑ = ∅ ∧ 1 i n . It is easy to show that L(T ) = L(T ).
Example 7.2 of [3], the language of squares over the alphabet {a}, is defined by the following T Su,eq : 1 0 0 0 1 0 0 0 2 0 0 , ϑ2 = 0 2 0 , ϑ3 = 1 0 , ϑ1 = 0 0 2 0 0 3 0 0 3 0 0 0 3
(0) = (1) = (2) = (3) = a. Theorem 25. L(T S) ⊆ L(TRG). Proof. It follows from Theorems 19, 20, 23 and 24, and the fact that L(T Su,eq ) is the closure of L(LOC u,eq ) with respect to projection.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
269
The following strict inclusion is an immediate consequence of the fact that, for 1D languages, L(T S) ⊂ L(CF ), and L(TRG) = L(CF ) \ { }. But we prefer to prove it by exhibiting an interesting picture language, made by the vertical concatenation of two specularly symmetrical rectangles. Theorem 26. L(T S) = L(TRG). Proof. Let = {a, b}. Consider the 2D language of palindromic columns, such as a b b a
b a a b
b b b b
L = {p | p = s Mirror(s) ∧ s ∈ (h,k) , h > 1, k 1}. Consider the grammar G: X S S X S X S→ | | , X S S X S X a b X X a b | . X→ | | X X a b a b It is easy to see that L(G) = L. We prove by contradiction that L ∈ / L(T S). Suppose that L ∈ L(T S). Therefore L is a projection of a local language L defined over some alphabet . Let a = || and b = ||, with a b. For an integer n, let Ln = {p | p = s Mirror(s) ∧ |s| = (n, n)}. Clearly, |Ln | = a n . Let L n be the set of pictures in L over whose projections are in Ln . By choice of b and by construction of Ln there are at most bn possibilities for the nth and (n + 1)th rows in the pictures of L n , because this is the number of mirrored stripe pictures of size (2, n) over . 2 For n sufficiently large a n bn . Therefore, for such n, there will be two different pictures p = sp Mirror(sp ), q = sq Mirror(sq ) such that the corresponding p = sp sp
, q = sq sq
have the same nth and (n+1)th rows. This implies that, by definition of local language, pictures v = sp sq
, w = sq sp
belong to L n , too. Therefore, pictures (v ) = sp Mirror(sq ), and (w ) = sq Mirror(sp ) belong to Ln . But this is a contradiction. 2
We terminate by comparing with a different generalization of CF grammars in two dimensions, Matz’s CF picture grammars (CFPG) [5], a model syntactically very similar to string CF grammars. The main difference is that the right parts of their rules use ¸, operators. Nonterminals denote unbound rectangular pictures. Derivation is analogous to
270
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
string grammars, but the resulting regular expression may or may not define a picture (e.g. a ¸(bb) does not generate any picture). Theorem 27. L(CFPG) ⊆ L(TRG). Proof (Sketch). Consider now a Matz’s CFPG grammar in Chomsky normal form. It may contain three types of rules: A → B ¸ C; A → B C; A → a. Moreover, suppose that B = C (this is always possible, if we permit copy rules like A → B). Then, A → B ¸ C corresponds to the following TRG rules: B B C C B C C B B C A→ | | | B B C C B C C B B C B B C C | B C C | B B C | B C. To obtain A → B, just delete C from the previous rules. The case is analogous to ¸, while A → a is trivial. Theorem 28. L(CFPG) = L(TRG). Proof. It is a consequence of Theorems 25, 26, and 27, and the fact from [5] that L(T S) L(CFPG). An example of a TRG but not CFPG language is the following. We know from [5] that the “cross” language, which consists of two perpendicular b-lines on a background of a, is not in L(CFPG). It is easy to show that the following grammar defines the language: a a b a a b a a , B B A A B → a a , A → B B A A b b b b b , S→ C C D D C C D D a a b a a C→ , D→ . a a b a a The fine control on line connections provided by TRG rules allows the definition of complex recursive patterns, exemplified by the spirals presented in the appendix. 5. Conclusions The new TRG model extends the context-free string grammars to two dimensions. Each rule rewrites a homogeneous rectangle as an isometric one, tiled with a specified tile set. In a derivation the rectangles, rewritten at each step, are partially ordered by the subpicture relation, which can be represented in three dimensions by a forest of well-nested prisms, the analogue of syntax trees for strings. Spirals and nested boxes are typical examples handled by TRG.
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
271
The generative capacity of TRG is greater than that of two previous models: TS and Matz’s context-free picture grammars. Practical applicability to picture processing tasks (such as pattern recognition and image compression) remains to be investigated, which will ultimately depend on the expressive power of the new model and on the availability of good parsing algorithms. The analogy with string grammars raises to the educated formal linguist a variety of questions, such as the formulation of a pumping lemma. For comparison with other models, several questions may be considered, e.g whether TRG and TS families coincide on a unary alphabet, or the generative capacity of nonrecursive TRG versus TS.
Acknowledgements Antonio Restivo called our attention to the problem of “2D Dyck languages”. We thank Alessandra Cherubini, Pierluigi San Pietro, Alessandra Savelli, and Daniele Scarpazza for their comments.
Appendix A Grammar for defining discrete Archimedean spirals with step 3. 6
A A V S→ V C C
A A V V C C
H H Q Q K K
H H Q Q K K
H B H B Q W Q W • D • D
B B W W D D
• • • • ; Q → S S |• · · •, S S • · · •
• • • • • • • · · · A → • · · ; B → · · •; C → • · · ; D → · • · · · · • • • • ·
· • · •, · •
• • · · · · ; K → · · ; V → • · · ; W → · H → • · · · · · • •
6 By Daniele Paolo Scarpazza.
· • · •
.
272
S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272
An example picture:
References [1] H. Fernau, R. Freund, Bounded parallelism in array grammars used for character recognition, in: P. Perner, P. Wang, A. Rosenfeld (Eds.), Advances in Structural and Syntactical Pattern Recognition (Proc. of the SSPR’96), Vol. 1121, Springer, Berlin, 1996, pp. 40–49. [2] D. Giammarresi, A. Restivo, Recognizable picture languages, Internat. J. Pattern Recogn. Artif. Intell. 6 (2–3) (1992) 241–256 (Special Issue on Parallel Image Processing). [3] D. Giammarresi, A. Restivo, Two-dimensional languages, in: A. Salomaa, G. Rozenberg (Eds.), Handbook of Formal Languages, Vol. 3, Beyond Words, Springer, Berlin, 1997, pp. 215–267. [4] K. Inoue, A. Nakamura, Some properties of two-dimensional on-line tessellation acceptors, Inform. Sci. 13 (1977) 95–121. [5] O. Matz, Regular expressions and context-free grammars for picture languages, in: Proc. of the 14th Annu. Symp. on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science, Vol. 1200, Lübeck, Germany, 27 February–1 March 1997, Springer, Berlin, pp. 283–294. [6] S. Crespi Reghizzi, M. Pradella, Tile rewriting grammars, in: Proc. of the Seventh Internat. Conf. on Developments in Language Theory (DLT 2003), Lecture Notes in Computer Science, Vol. 2710, Szeged, Hungary, July 2003, Springer, Berlin, pp. 206–217. [7] R. Siromoney, Advances in Array Languages, in: H. Ehrig, M. Nagl, G. Rozenberg, A. Rosenfeld (Eds.), Proc. of Third Internat. Workshop on Graph-Grammars and Their Application to Computer Science, Lecture Notes in Computer Science, Vol. 291, Springer, Berlin, 1987, pp. 549–563.
Theoretical Computer Science 340 (2005) 273 – 279 www.elsevier.com/locate/tcs
Counting bordered and primitive words with a fixed weight Tero Harjua,∗ , Dirk Nowotkab a Department of Mathematics, Turku Centre for Computer Science (TUCS), University of Turku,
FIN-20014 Turku, Finland b Institute of Formal Methods in Computer Science, University of Stuttgart, D-70569 Stuttgart, Germany
Abstract A word w is primitive if it is not a proper power of another word, and w is unbordered if it has no prefix that is also a suffix of w. We study the number of primitive and unbordered words w with a fixed weight, that is, words for which the Parikh vector of w is a fixed vector. Moreover, we estimate the number of words that have a unique border. © 2005 Elsevier B.V. All rights reserved. Keywords: Combinatorics on words; Borders; Primitive words; Möbius function
1. Introduction Let w denote a finite word over some alphabet A. We say that w is bordered if there is a non-empty proper prefix x of w that is also a suffix of w. If there is no such x then w is called unbordered. We say that w is primitive if w = x k , for some k ∈ N, implies that k = 1 and x = w. We often assume that the alphabet is ordered, A = {a1 , a2 , . . . , aq }. In this case, for a word w ∈ A∗ , let (w) denote by (|w|a1 , |w|a2 , . . . , |w|aq ) the Parikh vector of w, where |w|a denotes the number of occurrences of the letter a in w. We also say that w has weight (w). The number of primitive words and unbordered words of a fixed length and an alphabet of a fixed size is well-known, see for example [1–5,7] and the sequences A027375, A003000, ∗ Corresponding author. Fax: +358 2 3336595.
E-mail addresses: harju@utu.fi (T. Harju),
[email protected] (D. Nowotka). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.040
274
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
A019308, and A019309 in Sloane’s database of integer sequences [6]. We will recall these results with short arguments and extend them to the case where the words we consider have a fixed weight. Moreover, we estimate the number of words that have exactly one border. Section 2 contains results on counting the number of primitive words. Section 3 investigates the number of bordered words. Finally, we deal with the number of words with exactly one border in Section 4. In the rest of this section we will fix our notation. For more general definitions see [2]. Let A be a finite, non-empty set called alphabet. The elements of A are called letters. Let a finite sequence of letters be called (finite) word. Let A∗ denote the monoid of all finite words over A where ε denotes the empty word. Let |w| denote the length of w, and let |w|a denote the number of occurrences of a in w, where a ∈ A. If w = uv then u is called prefix of w, denoted by u p w, and v is called suffix of w, denoted by v s w. A word w is called bordered if there exist non-empty words x, y, and z such that w = xy = zx, and x is called a border of w. Let X be a set, then |X| denotes the cardinality of X. The Möbius function : N → Z is defined as follows: (−1)t if n = p1 p2 . . . pt for distinct primes pi , (n) = 1 if n = 1, 0 if n is divisible by a square. The Möbius inversion formula for two functions f and g is given by: g(n) = f (d) d|n
if and only if f (n) =
d|n
(d)g(n/d).
2. Primitive words Let Pq (n) denote the number of primitive words of length n over an alphabet of size q. It is well-known, see for example [3,2] and the sequence A027375 in [6], that (d)q n/d . (1) Pq (n) = d|n
Indeed, let A with |A| = q be a finite alphabet of letters. Every word w has a unique primitive root v for which w = v d for some d|n, where n = |w|. Since there are exactly q n words of length n, Pq (d). qn = d|n
We are in the divisor poset, where the Möbius inversion gives (1). In this paper we investigate the number of primitive words with a fixed weight, that is, each letter has a fixed number of occurrences. Consider an ordered alphabet A = {a1 , a2 , . . . , aq }
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
275
of q 1 letters. For a word w ∈ A∗ , let (w) denote (|w|a1 , |w|a2 , . . . , |w|aq ) which is called the Parikh vector of w. For a given vector k = (k1 , k2 , . . . , kq ), let P(k) = {w | w primitive and (w) = k} q and let P (k) = |P(k)|. Clearly, if w ∈ P(k), then |w| = i=1 ki . Also, denote by gcd(k) the greatest common divisor of the components ki . If d| gcd(k), then denote k/d = (k1 /d, k2 /d, . . . , kq /d). The multinomial coefficients under consideration are n! n n = = , k k1 , k2 , . . . , kq k1 ! k2 ! . . . kq ! where n =
q
i=1
ki .
Theorem 1. Let k = (k1 , k2 , . . . , kq ) be a vector with n = P (k) =
d| gcd(k)
(d)
n/d k/d
q
i=1 ki .
Then
.
Proof. We use the principle of inclusion and exclusion to prove our claim. Let the distinct prime divisors of gcd(k) be p1 , p2 , . . . , pt . For an integer d| gcd(k), define Qd = {w | w = ud where (u) = k/d}. If w ∈ Qd , then (w) = k. Clearly, |Qd | equals the number of all words u, primitive and imprimitive alike, of length n/d such that u has the Parikh vector k/d. Therefore, |Qd | =
n/d k/d
.
(2)
Notice also that if d|e, then Qe ⊆ Qd , and hence I (k) =
t i=1
Qpi
(3)
is the set of all imprimitive words of length n with Parikh vector k. By the principle of inclusion and exclusion, we have then that t Qp = i i=1
∅ =Y ⊆[1,t]
(−1)|Y |−1 Qpi , i∈Y
(4)
276
where
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
Qpi = Qp(Y ) for p(Y ) = i∈Y pi . Hence, by (2), (−1)|Y |−1 |Qp(Y ) | |I (k)| = i∈Y
∅ =Y ⊆[1,t]
n/p(Y ) k/p(Y ) ∅ =Y ⊆[1,t] n/d (d) , =− k/d d| gcd(k)
=−
(−1)|Y |
d>1
by the definition of the Möbius function . This proves the claim, because P (k) = |I (k)|.
n k
−
3. Unbordered words Let Uq (n) denote the number of all unbordered words of length n over an alphabet of size q. The following formula for Uq (n) is well-known, see for example [1,4,5,7] and also the sequences A003000, A019308, A019309 in [6]. Surely, we have Uq (1) = q and if n > 1 then Uq (2n + 1) = q Uq (2n), Uq (2n) = q Uq (2n − 1) − Uq (n).
(5) (6)
Indeed, case (5) is clear since a word of odd length is unbordered if and only if it is unbordered after its middle letter (at position n + 1) is deleted. For case (6) consider that a word w of even length is unbordered if and only if it is unbordered after one of its middle letters (say, at position n + 1) is deleted except if w = auau and au is unbordered, where a is an arbitrary letter. Note, that there is an alternative way to obtain Uq (n) by considering the following immediate result. Lemma 2. Let w be a bordered word, and let u be its shortest border. Then (1) 2|u| |w|, (2) u is unbordered, and (3) u is the only unbordered border of w. Let Bq (n) denote the number of all bordered words of length n over an alphabet of size q. Lemma 2 shows that it is enough for every unbordered border u, with |u| n/2, to count the number of words of length n − 2|u| which is q n−2|u| . So, we have Uq (i) q n−2i . Bq (n) = 1 i n/2
This gives the formula in (5) and (6) for Uq (n) where Uq (n) = q n − Bq (n) for every q > 1 and where Uq (1) = q.
(7)
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
277
In this paper we investigate the number of unbordered words with a fixed weight. Let us fix a binary alphabet A = {a, b} for now. Let U (n, k) denote the number of all binary unbordered words of length n that have a fixed weight k in the sense that, for every such word w, we have |w|b = k and |w|a = n − k. It is easy to check that U (1, 0) = U (1, 1) = 1 and U (n, k) = 0, if n k and k > 1, and U (n, 0) = 0, if n > 1. Theorem 3. If 0 < k < n then U (n, k) = U (n − 1, k) + U (n − 1, k − 1) − E(n, k) where
E(n, k) =
(8)
U (n/2, k/2) if n and k are even, 0 otherwise.
Proof. Suppose first that w has odd length 2n + 1. Each word w = ucv, with c ∈ A and |u| = |v| = n, contributing to U (2n + 1, k) is obtained by adding a middle letter c to an unbordered word uv of even length. If c = a then uv contributes to U (2n, k), and if c = b then uv contributes to U (2n, k − 1). Assume then that w has even length 2n. If w = cudv, with c, d ∈ A and |u| = |v| = n−1, then it contributes to U (2n, k ) if and only if cuv is unbordered (so it contributed to either U (2n − 1, k ) or U (2n − 1, k − 1)) and cu = dv (that is, borderedness is not obtained by adding a letter to cuv such that w is a square). Consider the case where cuv is unbordered but cudv is not, that is, cu = dv. Then w = cucu and cuu is unbordered. Note, that cuu is unbordered if and only if cu is unbordered. Let |cu|b = k. We have that cuu contributes to U (2n − 1, 2k) (if c = a) or U (2n − 1, 2k − 1) (if c = b) if and only if cu contributes to U (n, k) which is therefore subtracted in case |w|b = 2k. Eq. (8) can be generalized to alphabets of arbitrary size q. For this, consider an ordered alphabet {a1 , a2 , . . . , aq } of size q, and let U (k) denote the number of all unbordered words q w of length n = i=1 ki that have a fixed weight (w) = k = (k1 , k2 , . . . , kq ). Moreover, let k[ki − 1] denote (k1 , . . . , ki−1 , ki − 1, ki+1 , . . . , kq ). If there exists 1 j q such that kj = 1 and ki = 0 for all i = j , then only the letter aj q contributes to U (k). Hence U (k) = 1, if i=1 ki = 1 and ki 0 for all 1 i q. q Theorem 4. If i=1 ki > 0 then U (k) = U (k[ki − 1]) − E(k), 1i q ki >0
where
E(k) =
U (k/2) if ki is even for all 1 i q, 0 otherwise.
Proof. Indeed, the arguments of adding a letter at the point |w|/2 of a word w are similar to those of Theorem 3. For the explanation of E(k) we note that a bordered word (created
278
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
by adding a middle letter) is a square ai uai u, for some 1 i q. Note that the length of w and the number of occurrences of every letter is even in that case. Now, w is only counted if ai u is unbordered, that is, if ai u contributes to U (k/2) which must be therefore subtracted.
4. Words with a unique border In this section we count the number of words that have one and only one border. Let us start with an obvious result which belongs to folklore. Lemma 5. Let w be a bordered word, and let u be its shortest border. If w has a border v with |v| > |u| border, then |v| 2|u|. Proof. Indeed, if, for the shortest border u, we have |v| < 2|u| then u overlaps itself (since u p v and u s v), and hence, u is bordered contradicting Lemma 2(2). In order to estimate the number of words with exactly one border, we make the following two observations. Lemma 6. Let u be a fixed unbordered word of length s. Then the number of words of length r of the form xuyux is the number of bordered words of length r − 2s, that is, Bq (r − 2s). Indeed, every word of the form xyx produces exactly one word of the form xuyux, and the condition xuyux = x uy ux would imply that u is bordered; a contradiction. Lemma 7. Let u be a fixed unbordered word of length s. Then the number of words of length r of the form zuz is the number of words of length (r − s)/2. Indeed, each word z produces exactly one word of the form zuz, and the condition zuz = z uz implies that z = z . Let k n and Bq (n, k) denote the number of all words of length n over an alphabet of size q that have exactly one border of length k. It is clear that Bq (1, k) = Bq (n, 0) = 0, for all 1 n and 0 k, and Bq (n, k) = 0, if n < 2k, see Lemma 2(1). Theorem 8. If 1 2k n then Bq (n, k) = Uq (k) (q n−2k − Wq (n − 2k, k) − Eq (n − 2k, k)), where Bq (r − 2s) if 2s < r, if 2s = r, Wq (r, s) = 1 0 otherwise.
T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279
and
279
(r−s)/2 if s < r < 3s and r − s even, q Eq (r, s) = 1 if s = r, 0 otherwise.
Proof. Indeed, following the argument of Lemma 2(2) we count all unbordered words of length k (that is Uq (k)) which are possible borders of a word of length n. For every such border we have to count the number of different combinations of letters for the rest of the n−2k letters, that is q n−2k . However, we have to exclude those cases where new borders are created. Given an unbordered border u of length k, we have the following cases for words with more than one border: uxuyuxu and uzuzu, where x, y, z ∈ A∗ . These two cases are taken care of by Wq (r, s) and Eq (r, s) where both terms equal 1 if u4 and u3 are counted; see also Lemmas 6 and 7. Note that the latter case is included in the former one if and only if |u| |z| (where the “only if” part comes from the fact that u is unbordered, and hence, it does not overlap itself), therefore r < 3s is required in Eq (r, s). Clearly, the number Bq (n) of words of length n over an alphabet of size q with exactly one border is the following: Bq (n) = Bq (n, i). 1 i n/2
References [1] H. Harborth, Endliche 0 − 1-Folgen mit gleichen Teilblöcken, J. Reine Angew. Math. 271 (1974) 139–154. [2] M. Lothaire, Combinatorics on Words, Encyclopedia of Mathematics and its Applications, Vol. 17, AddisonWesley Publishing Co., Reading, MA, 1983. [3] H. Petersen, On the language of primitive words, Theoret. Comput. Sci. 161 (1–2) (1996) 141–156. [4] M. Régnier, Enumeration of bordered words, The Language of the Laughing Cow, RAIRO Inform. Théor. Appl. 26 (4) (1992) 303–317. [5] I. Simon, String matching algorithms and automata, Results and Trends in Theoretical Computer Science, Graz 1994, Lecture Notes in Computer Science, Vol. 812, Springer, Berlin, 1994, pp. 386–395. [6] N.J.A. Sloane, On-line encyclopedia of integer sequences, http://www.research.att.com/ ∼njas/sequences/. [7] P. Tolstrup Nielsen, A note on bifix-free sequences, IEEE Trans. Inform. Theory IT-19 (1973) 704–706.
Theoretical Computer Science 340 (2005) 280 – 290 www.elsevier.com/locate/tcs
Growth of repetition-free words—a review Jean Berstel Institut Gaspard-Monge (IGM), Université de Marne-la-Vallée, 5 Boulevard Descartes, F-77454 Marne-la-Vallée, Cédex 2, France
Abstract This survey reviews recent results on repetitions in words, with emphasis on the estimations for the number of repetition-free words. © 2005 Published by Elsevier B.V. Keywords: Repetitions in words; Square-free words; Overlap-free words; Combinatorics on words
1. Introduction A repetition is any bordered word. Quite recently, several new contributions were made to the field of repetition-free words, and to counting repetition-free words. The aim of this survey is to give a brief account of some of the methods and results. The terminology deserves some comments. Let > 1 be a rational number. A nonempty word w is an -power if there exist words x, x with x a prefix of x and an integer n, such that w = x n x and = n + |x |/|x| = |w|/|x|. For example, the French word entente is a 73 -power, and the English word outshout is a 85 -power. If = 2 or 3, we speak about a square and a cube, like for murmur or kokoko (the examples are taken from [41]). A word w is an overlap if it is a -power for some > 2. For instance, entente is an overlap. Let > 1 be a real number. A word w is said to avoid -powers or is -power-free if it contains no factor that is an -power for . A word w is + -power-free if it contains no factor that is an -power for > . Thus, a word is overlap-free if and only if it is 2+ -power-free.
E-mail address:
[email protected]. 0304-3975/$ - see front matter © 2005 Published by Elsevier B.V. doi:10.1016/j.tcs.2005.03.039
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
281
This review reports results on the growth of the number of -free words of length n over an q-letter alphabet. In some cases, growth is bounded by a polynomial in n, in other cases, it is shown to be exponential in n. We consider overlap-free words in the next section, square-free words in Section 3 and some generalizations in the final section. For basics and complements, the reader should consult the book of Allouche and Shallit [3].
2. Counting overlap-free words We first review estimations for the number of overlap-free words over a binary alphabet. Let V be the set of binary overlap-free words and let v(n) be the number of overlap-free binary words of length n. This sequence starts with 2, 4, 6, 10, 14, 20 (Sloane’s sequence A007777, see [42]). It is clear that V is factorial (factor-closed). It follows that, as for any factorial set, one has v(n + m) v(n)v(m). Thus the sequence (v(n)) is submultiplicative or the sequence (log v(n)) is subadditive. This in turn implies, by a well-known argument, that the sequence limn→∞ v(n)1/n has a limit, or equivalently, that the limit
(V ) = lim
n→∞
1 log v(n) n
exists. The number (V ) is called the (topological) entropy of the set V. For a general discussion about entropy of square-free words, see [4]. The entropy of the set of square-free words is strictly positive, as we will see later. On the contrary, the entropy of the set of overlap-free words is zero. This is a consequence of the following result of Restivo and Salemi [34,35]. Theorem 1. The number v(n) of binary overlap-free words of length n is bounded from above by a polynomial in n. They proved that v(n) is bounded by n4 . The proof is based on the following structural property of overlap-free words which we state in the more general setting of [22]. Recall first that the Thue–Morse morphism is defined by
:
0 → 01 1 → 10
Lemma 2. Let 2 < < 7/3, and let x be a word that avoids -powers. There exist words u, y, v with u, v ∈ {e, 0, 1, 00, 11} and y avoiding -powers such that x = u(y)v. This factorization is unique if |x| 7. First, observe that the lemma does not hold for 7/3 since x = 0110110 is a 7/3power and has no factorization of the required form. Next, consider as an example the word x = 011001100 which is a 9/4-power, and contains no higher repetition. One gets x = (0101)0, and y = 0101 itself avoids repetitions of exponent greater than 9/4.
282
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
It follows from the lemma that an overlap-free word x has a factorization x = u1 (u2 ) · · · h−1 (uh )h (xh )h−1 (vh ) · · · (v2 )v1 , where each ui and vi has length at most 2, and xh has length at most 4. A simple computation shows that log |x| − 3 < h log |x|. Thus, the value of h and each ui and vi and xh may take a finite number of values, from which the total number of overlap-free words results to be bounded by c · d log n = c · nlog d for some constants c and d. Another consequence of the lemma is that the Thue–Morse word t = (0) is not only overlap-free but avoids 7/3-powers. A clever generalization, by Rampersad [32], of a proof of [39,40] shows that t (and its opposite t¯) is the only infinite binary word avoiding 7/3-powers that is a fixed point of a nontrivial morphism. Restivo and Salemi’s theorem says that v(n) Cns for some real s. The upper bound log 15 for s given by Restivo and Salemi has been improved by Kfoury [24] to 1.7, by Kobayashi [25] to 1.5866 and by Lepistö in his Master thesis [26] to 1.37; Kobayashi [25] gives also a lower bound. So Theorem 3. There are constants C1 and C2 such that C1 nr < v(n) < C2 ns , where r = 1.155 . . . and s = 1.37 . . . . One might ask what the “real” limit is. In fact, a result by Cassaigne [12] shows that there is no limit. More precisely, he proves Theorem 4. Set r = lim inf 1.332 < s.
log v(n) log n
and s = lim sup loglogv(n) n . Then r < 1.276 and
It is quite remarkable that the sequence v(n) is 2-regular. This was shown by Carpi [9] (see [3] for the definition of regular sequences). As we shall see in the next section, the number of square-free ternary words grows exponentially. In fact, Brandenburg [6] proves also that the number of binary cube-free words grows exponentially. The exact frontier between polynomial and exponential growth has been shown to be the exponent 7/3 by Karhumäki and Shallit [22]. Theorem 5. There are only polynomially many binary words of length n that avoid 7/3-powers, but there are exponentially many binary words that avoid 7/3+ -powers. 3. Counting square-free words We report now estimations for the number of square-free words over a ternary alphabet. Let S be the set of ternary square-free words and let s(n) be the number of square-free ternary words of length n. Since S is factorial (factor-closed), the sequence (s(n)) is submultiplicative and the (topological) entropy (S) exists. We will show that (S) is not zero, and give bounds for (S). The
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
283
sequence s(n) starts with 3, 6, 12, 18, 30, 42, 60 (Sloane’s sequence A006156, see [42]). The sequence s(n) is tabulated for n 90 in [4] and for 91 n 110 in [21]. 3.1. Getting upper bounds There is a simple method to get upper bounds for the number of ternary square-free words, based on using better and better approximations by regular languages. Clearly, any square-free word over A = {0, 1, 2} contains no factor 00, 11 or 22, so S ⊂ A∗ \ A∗ {00, 11, 22}A∗ . Since the latter is a regular set, its generating function is a rational function. It is easily seen to be f (t) = (1 + t)/(1 − 2t). Indeed, once an initial letter is fixed in a word of this set, there are exactly two choices for the next letter (this remembers Pansiot’s encoding [31], see also [28]). So s(n) 2n +2n−1 for n 1. Moreover, since a word of length at most 3 is square-free if and only if is in A∗ \ A∗ {00, 11, 22}A∗ , the equality s(n) = 2n + 2n−1 holds for n 3, and thus s(2) = 6 and s(3) = 12. One can continue in this way: clearly none of the 6 squares of length 4: 0101, 0202, 1010, 1212, 2020, 2121 is a factor of a word in S, and it suffices to compute the generating function of the set A∗ \A∗ XA∗ , where X = {00, 11, 22, 0101, 0202, 1010, 1212, 2020, 2121} to get better upper bound for s(n). Some of these generating functions are given explicitly in [36]. For words without squares of length 2 or 4, the series is (1 + 2t + 2t 2 + 3t 3 )/(1 − t − t 2 ) (see [36]). Again, a direct argument gives the reason: a ternary word without squares of length 2 or 4 either ends with aba for a = b, or with abc where the letters a, b, c are distinct. Denote by un (resp. by vn ) the number of words of the first (of the second) type, and by s (2) (n) the total number. Then it is easily seen that, for n 4, un = vn−1 and vn = s (2) (n − 1), and consequently s (2) (n) = s (2) (n − 1) √ + s (2) (n − 2). This shows of n course that s(n) C , for some constant C, with = (1 + 5)/2 the golden ratio. More generally, we consider any finite alphabet A, a finite set X and the set K = A∗ \ ∗ A XA∗ . We may assume that X contains no proper factor of one of its elements, so it is a code. Since the set K is a quite particular regular set, we will compute its generating function by using special techniques. There exist at least two (related) ways to compute these generating functions. First, we consider the semaphore code C = A∗ X \ A∗ XA+ . Semaphore codes (see e.g. [5]) were introduced by Schützenberger [38] under the name J codes. The computation below remembers of course also recurrent events in the sense of Feller [18]. The set C is the set of words that have a suffix in X but have no other factor in X. Thus the set K is also the set of proper prefixes of elements in C, and since C is a maximal prefix code, one has C ∗ K = A∗ . Next, one has (see [5] or [27]) Kx = Cy Ry,x (x ∈ X), y∈X
where Cy = C ∩ A∗ y and Ry,x is the correlation set of y and z, given by Ry,z = {z−1 x | z ∈ S(y) ∩ P (x)}.
(1)
(2)
284
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
Here, S(y) (resp P (x)) is the set of proper suffixes of y (proper prefixes of x). Of course, Cy . (3) C= y∈X
Eqs. (1)–(3) are CardX + 2 equations in CardX + 2 unknowns and allow to compute the languages or their generating series. As an example, consider X = {00, 11, 22}. Denote by fZ the generating function of the set Z. Then Eqs. (1)–(3) translate into (1 − 3t)fK = 1 − fC , fKaa = t 2 fK = (1 + t)fCaa ,
fC = 3fCaa
(a ∈ A)
since Raa,aa = {1, a} and Raa,bb = ∅ for a = b. Thus 3t 2 fK = (1+t)fC and (1−3t)fK = 3t 2 1 − fC = 1 − 1+t fK , whence fK =
1 1 − 3t +
3t 2
=
1+t
1+t . 1 − 2t
The second technique is called the “Goulden–Jackson clustering technique” in [29]. The idea is to mark occurrences of words in X in a word, and to weight a marked word with an indicator of the number of its marks. If a word w has r marks, then its weight is (−1)r t |w| . As an example, if X is just the singleton X = {010}, the word w = 01001010 exist in eight marked versions, namely 01001010, 01001010, 01001010, 01001010, 010 01010, 01001010, 01001010, 01001010. Let us write w for a marked version of w, and p(w) for its weight. The sum of the weights of the marked versions of a word w is 0 if w contains a factor in X, and is 1 otherwise. In other terms, the generating series of the set K = A∗ \ A∗ XA∗ is p(w), fK = w∈A∗
where the sum is over all marked versions of all words. Now, it appears that this series is rather easy to compute when one considers clusters: a cluster is a marked word w where every position is marked, and that is not the product of two other clusters. Thus, for X = {010}, the word 01001010 is not a cluster since it is the product of the two clusters 010 and 01010. A marked word is a unique product of unmarked letters and of clusters. Thus, a marked word w is either the empty word, or its last letter is not marked, or it ends with a cluster. Thus fK = 1 + fK (t)kt + fK (t)p(C), where k is the size of the alphabet and p(C) is the generating series of the set C of clusters. It follows that 1 . (4) fK (t) = 1 − kt − p(C) A cluster ends with a word in X. Let Cx = C ∩ A∗ x be the clusters ending in x. Then the generating series p(Cx ) are the solutions of the system (y : x)p(Cy ), (5) p(Cx ) = −t |x| − y∈X
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
285
where y : x is the (strict) correlation polynomial of y and x defined by t |z| . y:x= z∈Ry,x \{e}
Eq. (5) is a system of linear equations, and the number of equations is the size of X. Solving this system gives the desired expression. Consider the example X = {010} over A = {0, 1}. Then the generating series of K = A∗ \ A∗ 010A∗ is fK (t) =
1 1 − 2t − p(C010 )
and p(C010 ) = −t 3 − t 2 p(C010 ), whence p(C010 ) = fK (t) =
1 1 − 2t +
t3
=
1+t 2
−t 3 1+t 2
and
1 + t2 . 1 − 2t + t 2 − t 3
Both methods are just two equivalent formulations of the same computation, as pointed out to me by Dominique Perrin. When X = {x} is a singleton, Eq. (2) indeed becomes Kx = CR with R = Rx,x , and in noncommuting variables, Eq. (1) is just K(1 − A) = 1 − C so K(1 − A) = 1 − KxR −1 whence K(1 − A + xR −1 ) = 1.
(6)
Now, the coefficients of the series −xR −1 are precisely the weights of the cluster of x. So Eq. (6), converted to a generating series, yields precisely Eq. (4)! In the general case = (x)x∈X and C = (Cx )x ∈ X and the X × X matrix one considers the (row) vectors X = CR and the same computation as above gives R = (Rx,y )x,y∈X . Then Eq. (2) is K X −1 )x = 1. (XR K = 1−A+ x∈X
The computation of the generating functions for sets K of the form above, or more generally of the series w∈K (w)t |w| , where is a probability distribution on A∗ , is an important issue both in concrete mathematics [20], in the theory of codes [5] and in computational biology (see e.g. chapters 1, 6 and 7 in [27]). Extensions are in [30,33]. In their paper [29], Nanoon and Zeilberger present a package that allows to compute the generating functions and their asymptotic behaviour for the regular sets of words without squares yy of length |y| = % for % up to 23. Richard and Grimm [36] go one step further, to % = 24. The entropy (S) of the set of square-free ternary words is now known to be at most 1.30194.
286
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
3.2. Getting lower bounds In order to get an exponential lower bound on the number of ternary square-free words, there are two related methods, initiated by Brandenburg [6] and Brinkhuis [7]. The first method is used for instance in [22], the second one, which gives now sharper bounds, was recently used in [2]. Both rely on the notion of a finite square-free substitution from A∗ into B ∗ , for some alphabet B. Let us recall that a substitution in formal language theory is a morphism f from some free monoid A∗ into the monoid of subsets of B ∗ that is a function satisfying f (e) = {e} and f (xy) = f (x)f (y), where the product on the righthand side is the product of the sets f (x) and f (y) in B ∗ . The substitution is finite if f (a) is a finite set for each letter a ∈ A (and so for each word w ∈ A∗ ), it is called squarefree if each word in f (w) is square-free whenever w is a square-free word on A. For an overview of recent results about power-free morphisms in connection with open problems, see [36]. ¯ 1, ¯ 2}. ¯ Brandenburg’s method goes as follows. Let A = {0, 1, 2} and let B = {0, 1, 2, 0, Let g : B ∗ → A∗ be the morphism that erases bars. Define a substitution f by f (a) = g −1 (a). Clearly, f is finite and square-free. Also each square-free word w of length n over A is mapped onto 2n square-free words of length n over B. The second step consists in finding a square-free morphism h from B ∗ into A∗ . Assume that h is uniform of length r. Then each square-free word w of length n over B is mapped into a square-free word of length rn over A by the morphism h. It follows that there are 2n square-free words of length rn for each square-free word of length n, that is s(rn) 2n s(n). Since s(n) is submultiplicative, one has s(rn) s(n)r . Reporting in the previous equation yields s(n) 2n/(r−1) and proves that growth is exponential. ¯ 1, ¯ 2}. ¯ It remains to give a square-free morphism h from B ∗ into A∗ , where B = {0, 1, 2, 0, It appears that 0 → 0102012021012102010212 1 → 0102012021201210120212 h:
2 → 0102012102010210120212 0¯ → 0102012102120210120212 1¯ → 0102012101202101210212 2¯ → 0102012101202120121012
is a square-free morphism. Here r = 22, and consequently s(n) 2n/21 . The following is a slight variation of Brandenburg’s result: Theorem 6. The number s(n) of square-free ternary words of length n satisfies the inequality s(n) 6 · 1.032n .
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
287
A more direct method was initiated by Brinkhuis [7]. He considers a 25-uniform substitution f from A∗ into itself defined by 0 → {U0 , V0 } f : 1 → {U1 , V1 } 2 → {U2 , V2 } where U0 = x1x, ˜ V0 = y0y˜ and x = 012021020102 and y = 012021201021. The words U1 , . . . , V2 are obtained by applying the circular permutation (0, 1, 2). He proves that f is square-free, and thus every square-free word w of length n is mapped onto 2n square-free words of length 25n. His bound is only 2n/24 . The substitution f can be viewed as the composition of an inverse morphism and a morphism, when U0 , . . . , V2 are considered as letters and then each of these letters is mapped to the corresponding word. However, the second mapping is certainly not square-free since the image of U0 V0 contains the square 00. Thus, the construction of Brinkhuis is stronger. Indeed, Ekhad and Zeilberger [17] found 18-uniform square-free substitution of the same form than Brinkhuis’ and thus reduced the bound from 2n/24 to 2n/17 . A relaxed version of Brinkhuis’ construction is used by Grimm [21] to derive the better bound 65n/40 , and by Sun [43] to improve this bound to 110n/42 . 4. Other bounds We review briefly other bounds on the number of repetition-free words. Concerning cube-free binary words, already Brandenburg [6] gave the following bounds. Theorem 7. The number c(n) of binary cube-free words of length n satisfies 2 · 1.080n < 2 · 2n/9 c(n) 2 · 1251(n−1)/17 < 1.315 · 1.522n . The upper bound was improved by Edlin [16] to B · 1.4576n for some constant B by using the “cluster” method. Next, we consider Abelian repetitions. An Abelian square is a nonempty word uu , where u and u are commutatively equivalent, that is u is a permutation of u. For instance, 012102 is an Abelian square. It is easy to see that there is no infinite Abelian square-free word over three letters. The existence of an infinite word over four letters without Abelian squares was demonstrated by Keränen [23]. Also, the question of the existence of exponentially many quaternary infinite words without Abelian squares was settled by Carpi [10] positively. He uses an argument similar to Brinkhuis’ but much more involved. Square-free morphisms from alphabets with more than four letters into alphabets with four letters seem not to exist [8]. He shows Theorem 8. The number d(n) of quaternary words avoiding Abelian squares satisfies 3 d(n) C · 219n/(85 −85) for some constant C. This result should be compared to the following, concerning ternary words without Abelian cubes [2].
288
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
Theorem 9. The number r(n) of ternary words avoiding Abelian cubes grows faster than 2n/24 . The number of ternary words avoiding Abelian cubes is 1, 3, 9, 24, 66, 180, . . . . It is the sequence A096168 in [42]. The authors consider the 6-uniform substitution 0 → 001002 h : 1 → 110112 2 → 002212, 122002 This does not preserve Abelian cube-free words since the word 0010|02110|11200|10021|10112 which contains an Abelian cube is in h(0101). However, the set {hn (0) : n 0} is shown to avoid Abelian cubes. There is an interesting intermediate situation between the commutative and the noncommutative case which is the case where, for the definition of squares, only some of the letters are allowed to commute. To be precise, consider a set of commutation relations of the form ab = ba for a, b letters, and define the relation u ∼ v mod as the transitive closure of the relation uabv ∼ ubav for all words u, v and ab = ba in . A -square is a word uu such that u ∼ u mod . If is empty, a -square is just a square, and if is the set of all ab = ba for a = b, a -square is an Abelian square. Since there is an infinite quaternary word that avoids Abelian squares, the same holds for -squares. For 3 letters, the situation is on the edge since ther exist infinite square-free words, but no infinite Abelian square-free word. The result proved by Cori and Formisano [13] is: Theorem 10. If the set of commutation relations contains at most one relation, then the set of ternary words avoiding -squares is infinite, otherwise it is finite. It has been proved by the same authors [14] that the number of words grows only polynomially with the length. This result is different from [11] where square-free words in partially commutative monoids are investigated. Another variation concerns circular words. A circular word avoids -powers if all its conjugates avoid -powers. For instance, 001101 is a circular 2+ -power free word because each word in the set {001101, 011010, 110100, 101001, 010011, 100110} is a 2+ -power free word. On the contrary, the word 0101101 is cube-free but its conjugate 1010101 is not cube-free and not even 3+ -power free; so, viewed as a circular word, 0101101 is not 3+ -power free. It is proved in [1] that there exist infinitely many 5/2+ -power free binary circular words, whereas every circular word of length 5 either contains a cube or a 5/2-power. This improves a previous result [15] showing that there are infinitely many cube-free circular binary words, see also [19]. No informations is available about the growth of the number of these words.
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
289
Acknowledgements Many thanks to the anonymous referee who contributed additional references and corrected several misprints.
References [1] A. Aberkane, J. Currie, There exist binary circular 5/2+ power free words of every length, Electron. J. Combin. 11 (2004) R10. [2] J. Aberkane, J. Currie, N. Rampersad, The number of ternary words avoiding Abelian cubes grows exponentially, in: Workshop on Word Avoidability, Complexity and Morphisms, LaRIA Techn. Report 200407, 2004, pp. 21–24. [3] J.-P. Allouche, J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge University Press, Cambridge, 2003. [4] M. Baake, V. Elser, U. Grimm, The entropy of square-free words, Math. Comput. Modelling 26 (1997) 13–26. [5] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [6] F.-J. Brandenburg, Uniformly growing k-th power-free homomorphisms, Theoret. Comput. Sci. 23 (1983) 69–82. [7] J. Brinkhuis, Nonrepetitive sequences on three symbols, Quart. J. Math. Oxford 34 (1983) 145–149. [8] A. Carpi, On Abelian power-free morphisms, Internat. J. Algebra Comput. 3 (1993) 151–167. [9] A. Carpi, Overlap-free words and finite automata, Theoret. Comput. Sci. 115 (2) (1993) 243–260. [10] A. Carpi, On the number of Abelian square-free words on four letters, Discrete Appl. Math. 81 (1998) 155–167. [11] A. Carpi, A. De Luca, Square-free words on partially commutative monoids, Inform. Proc. Lett. 22 (1986) 125–131. [12] J. Cassaigne, Counting overlap-free words, in: P. Enjalbert, A. Finkel, K. Wagner (Eds.), STACS ’93, Lecture Notes in Computer Science, Vol. 665, Springer, Berlin, 1993, pp. 216–225. [13] R. Cori, M. Formisano, Partially Abelian squarefree words, RAIRO Inform. Théoret. Appl. 24 (6) (1990) 509–520. [14] R. Cori, M. Formisano, On the number of partially Abelian squarefree words on a three-letter alphabet, Theoret. Comput. Sci. 81 (1) (1991) 147–153. [15] J. Currie, D. Fitzpatrick, Circular words avoiding patterns, in: M. Ito, M. Toyama (Eds.), Developments in Language Theory, DLT 2002, Lecture Notes in Computer Science, Springer, Berlin, 2004, pp. 319–325. [16] A. Edlin, The number of binary cube-free words of length up to 47 and their numerical analysis, J. Differential Equations Appl. 5 (1999) 153–154. [17] S.B. Ekhad, D. Zeilberger, There are more than 2n/17 n-letter ternary square-free words, J. Integer Seq. (1998) (Article 98.1.9). [18] W. Feller, An Introduction to Probability Theory and its Applications, Wiley, New York, 1966. [19] D. Fitzpatrick, There are binary cube-free circular words of length n contained within the Thue–Morse word for all positive integers n, Electron. J. Combin. 11 (2004) R14. [20] R. Graham, D. Knuth, O. Patashnik, Concrete Mathematics, Addison-Wesley, Reading, MA, 1989. [21] U. Grimm, Improved bounds on the number of ternary square-free words, J. Integer Seq. (2001) (Article 01.2.7). [22] J. Karhumäki, J. Shallit, Polynomial versus exponential growth in repetition-free binary words, J. Combin. Theory Ser. A 105 (2004) 335–347. [23] V. Keränen, Abelian squares are avoidable on 4 letters, in: ICALP ’92, Lecture Notes in Computer Science, Vol. 623, Springer, Berlin, 1992, pp. 41–52. [24] R. Kfoury, A linear time algorithm to decide whether a binary word contains an overlap, RAIRO Inform. Théoret. Appl. 22 (1988) 135–145. [25] Y. Kobayashi, Enumeration of irreductible binary words, Discrete Appl. Math. 20 (1988) 221–232.
290
J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290
[26] A. Lepistö, A characterization of 2+ -free words over a binary alphabet, Master’s Thesis, University of Turku, Finland, 1995. [27] M. Lothaire, Applied Combinatorics on Words, Cambridge University Press, Cambridge, 2005. [28] J. Moulin-Ollagnier, Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters, Theoret. Comput. Sci. 95 (1992) 187–205. [29] J. Nanoon, D. Zeilberger, The Goulden–Jackson cluster method: extensions, applications and implementations, J. Differential Equations Appl. 5 (1999) 355–377. [30] P. Nicodème, B. Salvy, P. Flajolet, Motif statistics, Theoret. Comput. Sci. 287 (2002) 593–617. [31] J.-J. Pansiot, A propos d’une conjecture de F. Dejean sur les répétitions dans les mots, Discrete Appl. Math. 7 (1984) 297–311. [32] N. Rampersad, Words avoiding 7/3-powers and the Thue–Morse morphism, 2003, Preprint available at http://www.arxiv.org/abs/math.CO/0307401. [33] M. Régnier, A unified approach to word occurrence probabilities, Discrete Appl. Math. 104 (2000) 259–280. [34] A. Restivo, S. Salemi, On weakly square free words, Bull. EATCS 21 (1983) 49–56. [35] A. Restivo, S. Salemi, Overlap-free words on two symbols, in: M. Nivat, D. Perrin (Eds.), Automata on Infinite Words, Lecture Notes in Computer Science, Vol. 192, Springer, Berlin, 1985, pp. 198–206. [36] C. Richard, U. Grimm, On the entropy and letter frequencies of ternary square-free words, http://arXiv:math.CO/0302302, July 2004. [37] G. Richomme, P. Séébold, Conjectures and results on morphisms generating k-power-free words, Internat. J. Found. Comput. Sci. 15 (2) (2004) 307–316. [38] M.-P. Schützenberger, On the synchronizing properties of certain prefix codes, Inform. Control 7 (1964) 23–36. [39] P. Séébold, Morphismes itérés, mot de Morse et mot de Fibonacci, C. R. Acad. Sci. Paris 295 (1982) 439–441. [40] P. Séébold, Sequences generated by infinitely iterated morphisms, Discrete Appl. Math. 11 (1985) 255–264. [41] J. Shallit, Avoidability in words: recent results and open problems, in: Workshop on Word Avoidability, Complexity and Morphisms, LaRIA Techn. Report 2004-07, 2004, pp. 1–4. [42] N.J.A. Sloane, The on-line encyclopedia of integer sequences, http://www.research.att. com/∼njas/sequences/. [43] X. Sun, New lower bound on the number of ternary square-free words, J. Integer Seq. (2003) (Article 03.2.2).
Theoretical Computer Science 340 (2005) 291 – 321 www.elsevier.com/locate/tcs
Algebraic recognizability of regular tree languages Zoltán Ésika, b,∗,1 , Pascal Weilc,2 a Department of Computer Science, University of Szeged, Hungary b Research Group on Mathematical Linguistics, Rovira i Virgili University, Tarragona, Spain c LaBRI, CNRS, Université Bordeaux-1, France
Abstract We propose a new algebraic framework to discuss and classify recognizable tree languages, and to characterize interesting classes of such languages. Our algebraic tool, called preclones, encompasses the classical notion of syntactic -algebra or minimal tree automaton, but adds new expressivity to it. The main result in this paper is a variety theorem à la Eilenberg, but we also discuss important examples of logically defined classes of recognizable tree languages, whose characterization and decidability was established in recent papers (by Benedikt and Ségoufin, and by Boja´nczyk and Walukiewicz) and can be naturally formulated in terms of pseudovarieties of preclones. Finally, this paper constitutes the foundation for another paper by the same authors, where first-order definable tree languages receive an algebraic characterization. © 2005 Elsevier B.V. All rights reserved. Keywords: Recognizability; Regular tree languages; Variety theorem; Pseudovariety; Preclones
1. Introduction The notion of recognizability emerged in the 1960s (Eilenberg, Mezei, Wright, and others, cf. [17,30]) and has been the subject of considerable attention since, notably because ∗ Corresponding author.
E-mail addresses:
[email protected] (Z. Ésik),
[email protected] (P. Weil). 1 Partial support from the National Foundation of Hungary for Scientific Research, Grant T46686 is gratefully
acknowledged. 2 Partial support from the ACI Sécurité Informatique (projetVERSYDIS) of the French Ministère de la Recherche
is gratefully acknowledged. Part of this work was done while P. Weil was an invited professor at the University of Nebraska in Lincoln. 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.038
292
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
of its close connections with automata-theoretic formalisms and with logical definability, cf. [6,15,18,38] for some early papers. Recognizability was first considered for sets (languages) of finite words, cf. [16] and the references contained in op. cit. The general idea is to use the algebraic structure of the domain, say, the monoid structure on the set of all finite words, to describe some of its subsets, and to use algebraic considerations to discuss the combinatorial or logical properties of these subsets. More precisely, a set of words is said to be recognizable if it is a union of classes in a (locally) finite congruence. The same concept was adapted to the case of finite trees, traces, finite graphs, etc, cf. [17,30,14,9], where it always entertains close connections with logical definability [11,12]. It follows rather directly from this definition of (algebraic) recognizability that a finite— or finitary—algebraic structure can be canonically associated with each recognizable subset L, called its syntactic structure. Moreover, the algebraic properties of the syntactic structure of L reflect its combinatorial and logical properties. The archetypal example is that of starfree languages of finite words: they are exactly the languages whose syntactic monoid is aperiodic, cf. [34]. They are also exactly the languages that can be defined by a first-order sentence of the predicate < (FO[<]), cf. [29], and the languages that can be defined by a temporal logic formula, cf. [27,22,7]. In particular, every algorithm we know for deciding the FO[<]-definability of a regular language L, works by checking, more or less explicitly, whether the syntactic monoid of L is aperiodic. Let be a ranked alphabet. In this paper, we are interested in sets of finite -labeled trees, or tree languages. It has been known since the 1960s [17,30,15] that the tree languages that are definable in monadic second-order logic are exactly the so-called regular tree languages, that is, those accepted by bottom-up tree automata. Moreover, deterministic tree automata suffice to accept these languages, and each regular tree language admits a unique, minimal deterministic automaton. From the algebraic point of view, the set of all -labeled trees can be viewed in a natural way as a (free) -algebra, where is now seen as a signature. Moreover, a deterministic bottom-up tree automaton can be identified with a finite algebra, with some distinguished (final) elements. Thus regular tree languages are also the recognizable subsets of the free -algebra. The situation however is not entirely satisfying, because we know very little about the structure of finite -algebras, and very few classes of tree languages have been characterized in algebraic terms, see [26,32,33] for attempts to use -algebra-theoretic considerations (and some variants) for the purpose of classifying tree languages. In particular, the important problem of deciding whether a regular tree language is FO[<]-definable remained open [33]. Based on the word language case, it is tempting to guess that an answer to this problem ought to be found using algebraic methods. In this paper, we introduce a new algebraic framework to handle tree languages. More precisely, we consider algebras called preclones (they lack some of the operations and axioms of clones [13]). Precise definitions are given in Section 2.1. Let us simply say here that, in contrast with the more classical monoids or -algebras, preclones have infinitely many sorts, one for each integer n 0. As a result, there is no nontrivial finite preclone. The corresponding notion is that of finitary preclones, that have a finite number of elements of each sort. An important class of preclones is given by the transformations T (Q) of a set Q. The elements of sort (or rank) n are the mappings from Qn into Q and the (preclone)
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
293
composition operation is the usual composition of mappings. Note that T (Q) is finitary if Q is finite. It turns out that the finite -labeled trees can be identified with the 0-sort of the free preclone generated by . The naturally defined syntactic preclone of a tree language L is finitary if and only if L is regular. In fact, if S is the syntactic -algebra of L, the syntactic preclone is the sub-preclone of T (S) generated by the elements of (if ∈ is an operation of rank r, it defines a mapping from S r into S, and hence an element of sort r in T (S)). Note that this provides an effectively constructible description of the syntactic preclone of L. It is important to note that the class of recognizable tree languages in the preclonetheoretic sense, is exactly the same as the usual one—we are simply adding more algebraic structure to the finitary minimal object associated with a regular tree language, and thus, we give ourselves a more expressive language to capture families of tree languages. In order to justify the introduction of such an algebraic framework, we must show not only that it offers a well-structured framework, that accounts for the basic notions concerning tree languages, but also that it allows the characterization of interesting classes of tree languages. The first objective is captured in the definition of varieties of tree languages, and their connection with pseudovarieties of finitary preclones, by means of an Eilenberg-type theorem. This is not unexpected, but it requires combinatorially much more complex proofs than in the classical word case, the details of which can be found below in Section 5.1. As for the second objective, we offer several elements. First the readers will find in this paper a few simple but hopefully illuminating examples, which illustrate similarities and differences with the classical examples from the theory of word languages. Second, we discuss a couple of important recent results on the characterization of certain classes of tree languages: one concerns the tree languages that are definable in the first-order logic of successors (FO(Succ)), and is due to Benedikt and Ségoufin [3]; the second one concerns the tree languages defined in the logics EF and EX, and is due to Boja´nczyk and Walukiewicz [5]. Neither of these remarkable results can be expressed directly in terms of syntactic -algebras; neither mentions preclones (of course) but both use mappings of arity greater than 1 on -algebras, that is, they can be naturally expressed in terms of preclones, as we explain in Sections 5.2.2 and 5.2.3. It is also very interesting to note that the conditions that characterize these various classes of tree languages include the semigroup-theoretic characterization of their word language analogues, but cannot be reduced to them. Another such result, and that was our original motivation to introduce the formalism of preclones, is a nice algebraic characterization of FO[<]-definable tree languages (and a number of extensions of FO[<], such as the introduction of additional, modular quantifiers), briefly discussed in Section 5.2.4. Let us say immediately that we do not know yet whether this characterization can be turned into a decision algorithm! In order to keep this paper within a reasonable number of pages, this characterization will be the subject of another paper by the same authors [21]. The main results of this upcoming paper can be found, along with an outline of the present paper, in [20]. To summarize the plan of the paper, Section 2 introduces the algebraic framework of preclones, discussing in particular the all-important cases of free preclones, in which tree languages live (Section 2.2), and of preclones associated with tree automata (Section 2.3). Section 2.4 discusses in some details the notion of finite determination for a preclone, a finiteness condition different from being finitary, which is crucial in the sequel. Section 2.5 is
294
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
included for completeness (and can be skipped at first reading): its aim is to make explicit the connection between our preclones and other known algebraic structures, namely magmoids and strict monoidal categories. Recognizable tree languages are the subject of Section 3. Here tree languages are meant to be any subset of some Mk , and the preclone structure on M naturally induces a notion of recognizability, as well as a notion of syntactic preclone (Section 3.1). As pointed out earlier, the usual recognizable tree languages, that is, subsets of M0 , fall nicely in this framework, and there is a tight connection between the minimal automaton of such a language and its syntactic preclone (Section 3.2). Specific examples are given in Section 3.3. Pseudovarieties of finitary preclones are discussed in detail in Section 4. As it turns out, this notion is not a direct translate of the classical notion for semigroups or monoids, due to the infinite number of sorts. The technical treatment of these classes is rather complex, and we deal with it thoroughly, since it is the foundation of our construction. We show in particular that pseudovarieties are characterized by their finitely determined elements (Section 4.1), and we describe the pseudovarieties generated by a given set of finitary preclones, showing in particular that membership in a 1-generated pseudovariety is decidable (Section 4.2). Finally, we introduce varieties of tree languages and we establish the variety theorem in Section 5.1. Section 5.2 presents the examples described above, based on the results by Benedikt and Ségoufin [3] and by Boja´nczyk and Walukiewicz [5]. 2. The algebraic framework In this section, we introduce the notion of preclones, a multi-sorted kind of algebra which is our central tool in this paper. In the sequel, if n is an integer, [n] denotes the set of integers {1, . . . , n}. In particular, [0] denotes the empty set. 2.1. Preclones and preclone-generators pairs Let Q be a set and let Tn (Q) denote the set of n-ary transformations of Q, that is, mappings from Qn to Q. Let then T (Q) be the sequence of sets of transformations T (Q) = (Tn (Q))n 0 , which will be called the preclone of transformations of Q. The set T1 (Q) of transformations of Q is a monoid under the composition of functions. Composition can be considered on T (Q) in general: if f ∈ Tn (Q) and gi ∈ Tmi (Q) (1 i n), then the composite h = f (g1 , . . . , gn ), defined in the natural way, is an element of Tm (Q) where m = i∈[n] mi : h(q1,1 , . . . , qn,mn ) = f (g1 (q1,1 , . . . , q1,m1 ), . . . , gn (qn,1 , . . . , qn,mn )) for all qi,j ∈ Q, 1 i n, 1 j mi . This composition operation and its associativity properties are exactly what is captured in the notion of a preclone. In general, a preclone is a many-sorted algebra S = ((Sn )n 0 , •, 1). The elements of the sets Sn , where n ranges over the nonnegative integers, are said to have rank n. The composition operation • associates with each f ∈ Sn and g1 ∈ Sm1 , . . . , gn ∈ Smn , an element •(f, g1 , . . . , gn ) ∈ Sm , of rank m = i∈[n] mi . We usually write f ·(g1 ⊕· · ·⊕gn )
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
295
for •(f, g1 , . . . , gn ). Finally, the constant 1 is in S1 . Moreover, we require the following three equational axioms: (f · (g1 ⊕ · · · ⊕ gn )) · (h1 ⊕ · · · ⊕ hm ) = f · ((g1 · h1 ) ⊕ · · · ⊕ (gn · hn )), (1) where f, g1 , . . . , gn are as above, hj ∈ Skj (j ∈ [m]), and if we denote j ∈[i] mj by m[i] , then hi = hm[i−1] +1 ⊕ · · · ⊕ hm[i] for each i ∈ [n]; 1 · f = f, f · (1 ⊕ · · · ⊕ 1) = f,
(2) (3)
where f ∈ Sn and 1 appears n times on the left-hand side of the last equation. Note that Axiom (1) generalizes associativity, and Axioms (2) and (3) can be said to state that 1 is a neutral element. Remark 2.1. The elements of rank 1 of a preclone form a monoid. It is immediately verified that T (Q), the preclone of transformation of a set Q, is indeed a preclone for the natural composition of functions, with the identity function idQ as 1. Preclones are an abstraction of sets of n-ary transformations of a set, which generalizes the abstraction from transformation monoids to monoids. Remark 2.2. Clones [13], or equivalently, Lawvere theories [4,19] are another more classical abstraction. Readers interested in the comparison between clones and preclones will have no difficulty tracing their differences in the sequel. We will simply point out here the fact that, in contrast with the definition of the clone of transformations of Q, each of the m arguments of the composite f (g1 , . . . , gn ) above is used in exactly one of the gi ’s, the first m1 in g1 , the next m2 in g2 , etc. We observe that a preclone with at least one element of rank greater than 1 must have elements of arbitrarily high rank, and hence cannot be finite. We say that a preclone S is finitary if and only if each Sn is finite. For instance, the preclone of transformations T (Q) is finitary if and only if the set Q is finite. The notions of morphism between preclones, sub-preclone, congruence and quotient are defined as usual [25,39]. Note that, as is customary for multi-sorted algebras, a morphism maps elements of rank n to elements of the same rank, and a congruence only relates elements of the same rank. To facilitate discussions, we introduce the following short-hand notation. An n-tuple (g1 , . . . , gn ) of elements of S will often be written as a formal ⊕-sum: g1 ⊕ · · · ⊕ gn . Moreover, if gi ∈ Smi (1 i n), we say that g1 ⊕ · · · ⊕ gn has total rank m = i∈[n] mi . Finally, we denote by Sn,m the set of all n-tuples of total rank m. With this notation, S1,n = Sn . The n-tuple 1 ⊕ · · · ⊕ 1 ∈ Sn,n is denoted by n. If G is a subset of S, we also denote by Gn,m the set of n-tuples of elements of G, of total rank m. Observe that a preclone morphism : S → T , naturally extends to a map : Sn,m → Tn,m for each n, m 0, by mapping g1 ⊕ · · · ⊕ gn to (g1 ) ⊕ · · · ⊕ (gn ). For technical reasons, it will often be preferable to work with pairs (S, A) consisting of a preclone S and a (possibly empty) set A of generators of S. We call such pairs
296
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
preclone-generators pairs, or pg-pairs. The notions of morphisms and congruences must be revised accordingly: in particular, a morphism of pg-pairs from (S, A) to (T , B) must map A into B. A pg-pair (S, A) is said to be finitary if S is finitary and A is finite. Besides preclones of transformations of the form Tn (Q), fundamental examples of preclones and pg-pairs are the free preclones and the preclones associated with a tree automaton. These are discussed in the next sections. 2.2. Trees and free preclones Let be a ranked alphabet, say, = (n )n 0 , and let (vk )k 1 be a sequence of variable names. We let Mn be the set of finite trees whose inner nodes are labeled by elements of (according to their rank), whose leaves are labeled by elements of 0 ∪ {v1 , . . . , vn }, and whose frontier (the left to right sequence of variables appearing in the tree) is the word v1 · · · vn : that is, each variable occurs exactly once, and in the natural order. Note that M0 is the set of finite -labeled trees. We let M = (Mn )n .
f
f
υ1
υ2
...
υn
... g1
g2
gn
If f ∈ Mn and g1 , . . . , gn ∈ M, the composite tree f · (g1 ⊕ · · · ⊕ gn ) is obtained by substituting the root of the tree gi for the variable vi in f for each i, and renumbering consecutively the variables in the frontiers of g1 , . . . , gn . Let also 1 ∈ M1 be the tree with a single vertex, labeled v1 . Then (M, ·, 1) is a preclone. Each letter ∈ of rank n can be identified with the tree with root labeled , where the root’s children are leaves labeled v1 , . . . , vn . It is easily verified that every rank-preserving map from to a preclone S can be extended in a unique fashion to a preclone morphism from M into S. That is: Proposition 2.3. M is the free preclone generated by , and (M, ) is the free pg-pair generated by . Remark 2.4. If n = ∅ for each n = 1, then Mn = ∅ for all n = 1, and M1 can be assimilated with the set of all finite words on the alphabet 1 . If at least one n with n > 1 is nonempty, then infinitely many Mn are nonempty, and if in addition 0 = ∅, then each Mn is nonempty. 2.3. Examples of preclones We already discussed preclones of transformations and free preclones. The next important class of examples is that of preclones (and pg-pairs) associated with -algebras and tree
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
297
automata. We also discuss a few simple examples of preclones that will be useful in the sequel. 2.3.1. Preclone associated with an automaton Let be a ranked alphabet as above and let Q be a -algebra: that is, Q is a set and each element ∈ n defines an n-ary transformation of Q, i.e., a mapping Q : Qn → Q. Recall that Q, equipped with a set F ⊆ Q of final states, can also be described as a (deterministic, bottom-up) tree automaton accepting trees in M0 , cf. [15,38,23,24,8]. More precisely, the mapping → Q induces a morphism of -algebras from M0 , viewed here as the initial -algebra (i.e., the algebra of -terms), to Q, say, val: M0 → Q, and the tree language accepted by Q is the set val−1 (F ) of trees which evaluate to an element of F. Now, since the elements of n can be viewed also as elements of Tn (Q), the mapping → Q also extends to a preclone morphism : M → T (Q), whose restriction to the rank 0 elements is exactly the morphism val. The range of is called the preclone associated with Q, and the pg-pair associated with Q, written pg(Q), is the pair ((M), ()). We observe in particular that a morphism of -algebras : Q → Q induces a morphism of pg-pairs : pg(Q) → pg(Q ) in a functorial way. Conversely, if Q is a set and : M → T (Q) is a preclone morphism such that (M0 ) = Q, letting Q = () endows the set Q with a structure of -algebra, for which the associated preclone is the range of . In the sequel, when discussing decidability issues concerning preclones, we will say that a preclone is effectively given if it is given as the preclone associated with a finite -algebra Q, that is, by a finite set of generators in T (Q). By definition, such a preclone is finitary. 2.3.2. Simple examples of preclones The following examples of preclones and pg-pairs will be discussed throughout the rest of this paper. Example 2.5. Let B be the 2-element set B = {true, false}, and let T∃ be the subset of T (B) whose rank n elements are the n-ary or function and the n-ary constant true, written, respectively, orn and truen . One verifies easily that T∃ is a preclone, which is generated by the binary or2 function and the nullary constants true0 and false0 . That is, if consists of these three generators, T∃ is the preclone associated with the -automaton whose state set is B. Moreover, the rank 1 elements of T∃ form a 2-element monoid, isomorphic to the multiplicative monoid {0, 1}, and known as U1 in the literature on monoid theory, e.g. [31]. Example 2.6. Let p be an integer, p 2 and let Bp = {0, 1, . . . , p − 1}. We let Tp be the subset of T (Bp ) whose rank n elements (n 0) are the mappings fn,r : (r1 , . . . , rn ) → r1 + · · · + rn + r mod p for 0 r < p. It is not difficult to verify that Tp is a preclone, and that it is generated by the nullary constant 0, the unary increment function f1,1 and the binary sum f2,0 . As in Example 2.5, Tp can be seen as the preclone associated with a p-state automaton. Moreover, its rank 1 elements form a monoid isomorphic to the cyclic group of order p.
298
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
Example 2.7. Let again B = {true, false}, and let Tpath be the subset of T (B) whose rank 0 elements are the nullary constants true0 and false0 , and the rank n elements (n > 0) are the n-ary constants truen and falsen , and the n-ary partial disjunctions orP (if P ⊆ [n], orP is the disjunction of the ith arguments, i ∈ P ). One verifies easily that Tpath is a preclone, which is generated by the binary or2 function, the nullary constants true0 and false0 , and the unary constant false1 . The rank 1 elements of Tpath form a 3-element monoid, isomorphic to the multiplicative monoid {1, a, b} with xy = y for x, y = 1, known as U2 in the literature on monoid theory, e.g. [31]. 2.4. Representation of preclones Section 2.3.1 shows the importance of the representation of preclones as preclones of transformations. It is not difficult to establish the following analogue of Cayley’s theorem. Proposition 2.8. Every preclone can be embedded in a preclone of transformations. Proof. Suppose that S is a preclone and let Q be the disjoint union of the sets Sn , n 0. For each f ∈ Sn , let f be the function Qn → Q given by f (g1 , . . . , gn ) = f · (g1 ⊕ · · · ⊕ gn ). The assignment f → f defines an injective morphism S → T (Q).
This result however is not very satisfactory: it does not tell us whether a finitary preclone can be embedded in the preclone of transformations of a finite set. It is actually not always the case, and this leads to the following discussion. Let k 0. We say that a preclone S is k-determined if distinct elements can be separated by k-ary equations. Formally, let ∼k denote the following equivalence relation: for all f, g ∈ Sn (n 0), f ∼k g ⇐⇒ f · h = g · h,
for all h ∈ Sn,! with ! k.
Note that for each ! k, ∼k is the identity relation on T! . We call S k-determined if the relation ∼k is the identity relation on each Sn , n 0, and we say that S is finitely determined if it is k-determined for some integer k. We also say that a pg-pair (S, A) is k-determined (resp. finitely determined) if S is. Example 2.9. The preclone of transformations of a set is 0-determined. We observe the two following easy lemmas: Lemma 2.10. For each k, ∼k is a congruence relation. Proof. Let f, g ∈ Sn be ∼k -equivalent. For each i ∈ [n], let fi , gi ∈ Smi , such that fi ∼k gi . Wewant to show that f · (f1 ⊕ · · · ⊕ fn ) ∼k g · (g1 ⊕ · · · ⊕ gn ). Let m = i∈[n] mi and let h ∈ Sm,! for some ! k. Then h is an m-tuple, and we let h1 be the tuple of the first m1 terms of h, h2 consist of the next m2 elements, etc, until hn ,
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
299
which consists of the last mn elements of h. Note that each hi lies in some Smi ,!i and that i∈[n] !i = !. In particular, !i k for each i and we have f · (f1 ⊕ · · · ⊕ fn ) · h = f · (f1 · h1 ⊕ · · · ⊕ fn · hn ) = f · (g1 · h1 ⊕ · · · ⊕ gn · hn ) = g · (g1 · h1 ⊕ · · · ⊕ gn · hn ) = g · (g1 ⊕ · · · ⊕ gn ) · h.
Lemma 2.11. For each k 0, the quotient preclone S/∼k is k-determined. Proof. Let T = S/∼k and let [f ], [g] ∈ Tn , where [f ] denotes the ∼k -equivalence class of f (necessarily in Sn ). Let ! k and assume that [f ] · [h] = [g] · [h] for each h ∈ Tn,! . Then f · h ∼k g · h for each h. But f · h and g · h lie in S! , and we already noted that ∼k is the identity relation on S! (since ! k). Thus f · h = g · h for all h ∈ Sn,! , and since this holds for each ! k, we have f ∼k g, and hence [f ] = [g]. We say that a preclone morphism : S → T is k-injective if it is injective on each S! with ! k. The next lemma, relating k-determination and k-injectivity, will be used to discuss embeddability of a finitary preclone in the preclone of transformations of a finite set. Lemma 2.12. Let S be a k-determined preclone. If : S → T is a k-injective morphism, then is injective. Proof. If (f ) = (g) for some f, g ∈ Sn , then (f · h) = (f ) · (h) = h(g) · (h) = (g · h), for all h ∈ Sn,! with ! k. Since is k-injective, it follows f · h = g · h for all h ∈ Sn,! with ! k, and since S is k-determined, this implies f = g. Proposition 2.13. Let S be a finitary and finitely determined preclone. Then there is a finite set Q such that S embeds in T (Q). If in addition S is 0-determined, the set Q can be taken equal to S0 . Proof. We modify the construction in the proof of Proposition 2.8. Let k 0 be such that S is k-determined, and let Q = {⊥} ∪ i k Si , where the sets Si are assumed to be pairwise disjoint and ⊥ is a new symbol, not in any of those sets. For each f ∈ Sn (n 0), let (f ) = f : Qn → Q be the function defined by f · (q1 ⊕ · · · ⊕ qn ) if q1 ∈ Sm1 , . . . , qn ∈ Smn and mi k, f (q1 , . . . , qn ) = ⊥ otherwise. It is easy to check that is a morphism. By Lemma 2.12, is injective. To conclude, we observe that if k = 0, we can choose Q = S0 since ni=1 mi 0 is possible if and only if each mi = 0. For later use we also note the following technical results:
300
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
Proposition 2.14. Let S and T be preclones, with T k-determined. Let G be a (ranked) generating set of S and let : G → T be a rank-preserving map, whose range includes all of T! , for each ! k. Then can be extended to a preclone morphism : S → T iff for all g ∈ Gn , n 0, and for all h ∈ Gn,! with ! k,
(g · h) = (g) · (h).
(4)
Proof. Condition (4) is obviously necessary, and we show that it is sufficient. Let f ∈ Sn , n 0. Any possible image of f by a preclone morphism is an element g ∈ Tn such that, if h ∈ Tn,! for some ! k and if h ∈ Gn,! is such that (h ) = h, then g · h = (f · h ). Since T is k-determined and each T! (! k) is in the range of , the element g is completely determined by f. That is, if an extension of exists, then it is unique. We now show the existence of this extension. We want to assign an image to an arbitrary element f of S, and we proceed by induction on the height of an expression of f in terms of the elements of G; such an expression exists since G generates S. If f ∈ G, we let (f ) = (f ). Note also that if f = 1, then we let (f ) = 1. If f ∈ / G, then f = g · h for some g ∈ G ∩ Sn and some h = h1 ⊕ · · · ⊕ hn . By induction, the elements (hi ) are well defined for each i ∈ [n]. We then let (f ) = (g) · ((h1 ) ⊕ · · · ⊕ (hn )). To show that (f ) is well defined, we consider a different decomposition of f, say, f = g · h with h = h1 ⊕ · · · ⊕ hm . If f has rank ! k, then h ∈ Sn,! and h ∈ Sm,! , so ¯ and h = (h¯ ) for some h¯ ∈ Gn,! and h¯ ∈ Gm,! . By Condition (4), we have h = (h) ¯ = (g) · (h) ¯ = (g · h) = (f ) (g) · (h) ¯ = (g ) · (h¯ ). So is well defined on all the elements of and by symmetry, (g) · (h) S of rank at most k. Now if f has rank ! > k, let x ∈ T!,p with p k. Then there exists x ∈ G!,p such that x = (x ). Note that h · x and h · x are well defined, in Sn,p and in Sm,p , respectively. In particular, (h · x ) is well defined, and equal to (h) · x. Similarly, (h · x ) is well defined, equal to (h ) · x. It follows that ((g) · (h)) · x = (g) · ((h) · x) = (g) · (h · x ) = (g · (h · x )) = (f · x ). By symmetry, ((g ) · (h )) · x = ((g) · (h)) · x, and since T is k-determined, we have (g) · (h) = (g ) · (h ). Thus, is well defined on S. By essentially the same argument, one verifies that preserves composition. Corollary 2.15. Let S and T be k-determined preclones that are generated by their elements of rank at most k. If there exist bijections from S! to T! for each ! k, that preserve all compositions of the form f · g, where f ∈ Sn , g ∈ Sn,! with n, ! k, then S and T are isomorphic. Proof. Using the fact that S is generated by its elements of rank at most k and T is kdetermined, Proposition 2.14 shows that the given bijections extend to a morphism from S to T. This morphism is onto since T as well is generated by its rank k elements. It
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
301
is also k-injective by construction, and hence it is injective by Lemma 2.12 since S is k-determined. 2.5. Preclones, magmoids and strict monoidal categories The point of this short subsection is to verify the close connection between the category of preclones and the category of magmoids, cf. [2], which are in turn a special case of strict monoidal categories, cf. [28]. We recall that a magmoid is a category M whose objects are the nonnegative integers equipped with an associative bifunctor ⊕ such that 0 ⊕x = x = x ⊕0. A morphism of magmoids is a functor that preserves objects and ⊕. We say that a magmoid M is determined by its scalar morphisms if each morphism f : n → m can be written in a unique way as a ⊕-sum f1 ⊕· · ·⊕fn , where each fi is a morphism with source 1. Moreover, there is a morphism 0 to n if and only if n = 0 (in which case there is a unique morphism). Proposition 2.16. The category of preclones is equivalent to the full subcategory of magmoids spanned by those magmoids which are determined by their scalar morphisms. Proof. With each preclone S, we associate a category whose objects are the nonnegative integers and whose morphisms n → m are the elements of Sn,m , that is, the n-tuples of elements of S of total rank m. Composition is defined in the following way: let f = f1 ⊕ · · · ⊕ fn ∈ Sn,m and g = g1 ⊕ · · · ⊕ gm ∈ Sm,p , and suppose that fi ∈ Smi , i ∈ [n] (so that m = i∈[n] mi ). For each i, let g i = gm1 +···+mi−1 +1 ⊕ · · · ⊕ gm1 +···+mi . Then we let f · g = f1 · g 1 ⊕ · · · ⊕ fm · g m . The identity morphism at object n is the n-tuple n = 1 ⊕ · · · ⊕ 1. Note that when n = 0, this is the unique morphism 0 → 0, and there are no morphisms from 0 to n if n = 0. One may then regard ⊕ as a bifunctor S × S → S that maps a pair (f, g) with f = f1 ⊕ · · ·⊕fn ∈ Sn,p and g = g1 ⊕· · ·⊕gm ∈ Sm,q to the morphism f1 ⊕· · ·⊕fn ⊕g1 ⊕· · ·⊕gm from n + m to p + q. Then S, equipped with the bifunctor ⊕, is a magmoid. Moreover, S, as a magmoid, is determined by its scalar morphisms. It is clear that each preclone morphism determines a functor between the corresponding magmoids which is the identity function on objects and preserves ⊕ and is thus a morphism of magmoids. Conversely, if M is a magmoid determined by its scalar morphism, then its morphisms with source 1 constitute a preclone S, moreover, M is isomorphic to the magmoid determined by S. 3. Recognizable tree languages As discussed in the introduction, the theory of (regular) tree languages is well developed [23,24,8] (see also Section 2.3.1 above). Here we slightly extend the notion of tree languages, to mean any subset of some Mk , k 0. In the classical setting, tree languages are subsets of M0 .
302
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
The preclone structure on M, described in Section 2, leads in a standard fashion to a definition of recognizable tree languages [17,30,9,12,40]. This is discussed in some detail in Section 3.1. As we will see in Section 3.2, recognizability extends the classical notion of regularity for tree languages, and it gives us richer algebraic tools to discuss these languages. Further examples are given in Section 3.3. 3.1. Syntactic preclones Suppose that : M → S is a preclone morphism, or a morphism (M, ) → (S, A). We say that a subset L of Mk is recognized by if L = −1 (L), or equivalently, if L = −1 (F ) for some F ⊆ Sk . Moreover, we say that L is recognized by S, or by (S, A), if L is recognized by some morphism M → S or (M, ) → (S, A). Finally, we say that a subset L of Mk is recognizable if it is recognized by a finitary preclone, or pg-pair. As usual, the notion of recognizability can be expressed equivalently by stating that L is saturated by some locally finite congruence on M, that is, L is a union of classes of a congruence which has finite index on each sort [10,11,40]. With every subset L ⊆ Mk , recognizable or not, we associate a congruence on M, called the syntactic congruence of L. This relation is defined as follows. First, an n-ary context in Mk is a tuple (u, k1 , v, k2 ) where • k1 , k2 are nonnegative integers, • u ∈ Mk1 +1+k2 , and • v = v1 ⊕ · · · ⊕ vn ∈ Mn,! , with k = k1 + ! + k2 . (u, k1 , v, k2 ) is an L-context of an element f ∈ Mn if u · (k1 ⊕ f · v ⊕ k2 ) ∈ L. Recall that k denotes the ⊕-sum of k terms equal to 1. Below, when k1 and k2 are clear from the context (or do not play any role), we will write just (u, v) to denote the context (u, k1 , v, k2 ).
For each f, g ∈ Mn , we let f ∼L g if and only if f and g have the same L-contexts. Proposition 3.1. The relation ∼L , associated with a subset L of Mk , is a preclone congruence which saturates L. Proof. Suppose that f, f ∈ Mn and g, g ∈ Mn,m with f ∼L f , g = g1 ⊕ · · · ⊕ gn , g = g1 ⊕ · · · ⊕ gn and gi ∼L gi for each 1 i n. We prove that f · g ∼L f · g . Let mi be
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
303
the rank of gi and gi , so that m = i∈[n] mi and consider any m-ary context (u, k1 , v, k2 ) in Mk . Then v ∈ Mm,! with ! = k − (k1 + k2 ). Thus, v is an m-tuple, and we let w1 be the ⊕-sum of the first m1 -terms of v, w2 be the ⊕-sum of the following m2 -terms of v, etc, until finally wn is the ⊕-sum of the last mn terms of v. In particular, we may write v = w1 ⊕ · · · ⊕ wn . Since f ∼L f , we have u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L. It suffices to consider the n-ary context (u, k1 , g · v, k2 ), where g · v = (g1 ⊕ · · · ⊕ gn ) · v stands for g1 · w1 ⊕ · · · ⊕ gn · wn . Moreover, since gi ∼L gi , we have u · (k1 ⊕ f · (g1 · w1 ⊕ · · · ⊕ gi−1 · wi−1 ⊕ gi · wi ⊕ gi+1 ·wi+1 · · · ⊕ gn · wn ) ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f · (g1 · w1 ⊕ · · · ⊕ gi · wi ⊕ gi+1 ·wi+1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 ) ∈ L
for each 1 i n. To justify this statement, it suffices to consider the following mi -ary context (for gi and gi ), u · (g1 · w1 ⊕ · · · gi−1 · wi−1 ⊕ 1 ⊕ gi+1 · wi+1 ⊕ · · · ⊕ gn · wn ), k1 + li,1 , wi , li,2 + k2 , where !i,1 is the sum of the ranks of w1 , . . . , wi−1 , and !i,2 is the sum of the ranks of wi+1 , . . . , wn . We now have u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f ⇐⇒ u · (k1 ⊕ f .. . ⇐⇒ u · (k1 ⊕ f ⇐⇒ u · (k1 ⊕ f
· (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · g · v ⊕ k2 ) ∈ L.
This completes the proof that ∼L is a congruence. Next we observe that an element f ∈ Mk is in L if and only if the k-ary context (1, k) is an L-context of f: it follows immediately that ∼L saturates L. We denote by (ML , L ) the quotient pg-pair (M/∼L , /∼L ), called the syntactic pgpair of L. ML is the syntactic preclone of L and the projection morphism L : M → ML , or L : (M, ) → (ML , L ), is the syntactic morphism of L. We note the following, expected result: Proposition 3.2. The syntactic congruence of a subset L of Mk is the coarsest preclone congruence which saturates L. A preclone morphism : M → S (resp. a morphism of pg-pairs : (M, ) → (S, A)) recognizes L if and only if can be factored through the syntactic morphism L . In particular, L is recognizable if and only if ∼L is locally finite, if and only if ML is finitary.
304
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
Proof. Let ≈ be a congruence saturating L and assume that f, g ∈ Mn are ≈-equivalent. Let (u, k1 , v, k2 ) be an n-ary context: then u · (k1 ⊕ f · v ⊕ k2 ) ≈ u · (k1 ⊕ g · v ⊕ k2 ), and since ≈ saturates L, u · (k1 ⊕ f · v ⊕ k2 ) ∈ L iff u · (k1 ⊕ g · v ⊕ k2 ) ∈ L. Since this holds for all n-ary contexts in Mk , it follows that f ∼L g. We also note that syntactic preclones are finitely determined. Proposition 3.3. The syntactic preclone of a subset L of Mk is k-determined. Proof. We show that if f, g ∈ Mn and f · h ∼L g · h for all h ∈ Mn,! with ! k, then f ∼L g, that is, f and g have the same L-contexts. Let (u, k1 , v, k2 ) be an L-context of f. Note in particular that v ∈ Mn,p with k = k1 + p + k2 . It follows that f · v ∈ Mp , and that f · v ∼L g · v. Moreover (u, k1 , p, k2 ) is an L-context of f · v. But in that case, (u, k1 , p, k2 ) is also an L-context of g · v, and hence (u, k1 , v, k2 ) is an L-context of g, which concludes the proof. 3.2. The usual notion of regular tree languages We now turn to tree languages in the usual sense, that is, subsets of M0 . For these sets, there exists a well-known notion of (bottom-up) tree automaton, whose expressive power is equivalent to monadic second-order definability, to certain rational expressions, and to recognizability by a finite -algebra [23,24] (see Section 2.3.1). The tree languages captured by these mechanisms are said to be regular. It is an essential remark (Theorem 3.4 below) that the regular tree languages are exactly the subsets of M0 that are recognized by a finitary preclone. Recall that the minimal tree automaton of a regular tree language is the least deterministic tree automaton accepting it, and the -algebra associated with this automaton is called the syntactic -algebra of the language. It is characterized by the fact that the natural morphism from the initial -algebra to the syntactic -algebra of L, factors through every morphism of -algebra which recognizes L (see Section 2.3.1 and [23,24,1]). Theorem 3.4. A tree language L ⊆ M0 is recognizable if and only if it is regular. Moreover, the syntactic preclone (resp. pg-pair) of L is the preclone (resp. pg-pair) associated with its syntactic -algebra. Proof. Let Q be the syntactic -algebra of L, let (S, A) be its syntactic pg-pair, and let : (M, ) → (S, A) be its syntactic morphism. As discussed in Section 2.3.1, the pg-pair associated with Q, pg(Q), recognizes L, and hence the syntactic morphism of L factors through an onto morphism of pg-pairs pg(Q) → (S, A). In particular, if L is regular, then Q is finite, so pg(Q) is finitary, and so is (S, A): thus L is recognizable. Conversely, assume that L is recognizable. Since (S, A) is finitary and 0-determined (Proposition 3.3), so (S, A) is isomorphic to a sub-pg-pair of T (S0 ) by Proposition 2.13. Using again the discussion in Section 2.3.1, S0 has a natural structure of -algebra (via the morphism ), such that (S, A) = pg(S0 ) and such that S0 recognizes L as a -algebra. In particular, L is recognized by a finite -algebra, and hence L is regular.
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
305
Moreover, the recognizing morphism M0 → S0 is the restriction to M0 of , the syntactic morphism of L. Therefore there exists an onto morphism of -algebras S0 → Q, which in turn induces a morphism of preclones from (S, A) = pg(S0 ) onto pg(Q). Since Q and S0 are finite, it follows that the morphisms between them described above are isomorphisms, and this implies that pg(Q) is isomorphic to (S, A). While not difficult, Theorem 3.4 is important because it shows that we are not introducing a new class of recognizable tree languages. We are simply associating with each regular tree language a finitary algebraic structure which is richer than its syntactic -algebra (a.k.a. minimal deterministic tree automaton). This theorem also implies that the syntactic pg-pair of a recognizable tree language has an effectively computable finite presentation. Remark 3.5. If L ⊆ M0 , the definition of the syntactic congruence of L involves the consideration of n-ary contexts in M0 . Such contexts are necessarily of the form (u, 0, v, 0), where u ∈ M1 and v ∈ Mn,0 , which somewhat simplifies matters. 3.3. More examples of recognizable tree languages The examples in this section are directly related with the preclones discussed in Section 2.3.2. Let be a ranked Boolean alphabet, that is, a ranked alphabet such that each n is either empty or equal to {0n , 1n }, and 0 and at least one n (n 2) are nonempty. Let k 0 be an integer. 3.3.1. Verifying the occurrence of a letter Let Kk (∃) be the set of all trees in Mk containing at least one vertex labeled 1n (for some n). Then Kk (∃) is recognizable, by a morphism into the preclone T∃ (see Example 2.5). Let : M → T∃ be the morphism of preclones given by (0n ) = orn ((00 ) = false0 ) and (1n ) = truen whenever n = ∅. It is not difficult to verify that −1 (truek ) = Kk (∃). Moreover, () contains a generating set of T∃ , so is onto, and the syntactic morphism of Kk (∃) factors through . But T∃ has at most 2 elements of each rank, so any proper quotient M of T∃ has exactly one element of rank n for some integer n. One can then show that M cannot recognize Kk (∃). Thus the syntactic pg-pair of Kk (∃) is (T∃ , ()). If is any ranked alphabet such that 0 and at least one n (n > 1) is nonempty, if is a proper nonempty subset of , and Kk ( ) is the set of all trees in Mk containing at least a node labeled in , then Kk ( ) too has syntactic preclone T∃ . The verification of this fact can be done using a morphism from M to M, mapping each letter of rank n to 1n if it is in , to 0n otherwise. 3.3.2. Counting the occurrences of a letter Let p, r be integers such that 0 r < p and let Kk (∃rp ) consist of the trees in Mk such that the number of vertices labeled 1n (for some n) is congruent to r modulo p. Then Kk (∃rp ) is recognizable, by a morphism into the preclone Tp (see Example 2.6). Let indeed : M → Tp be the morphism given by (0n ) = fn,0 and (1n ) = fn,1 whenever n = ∅. Then one verifies that −1 (fk,r ) = Kk (∃rp ). Moreover, () contains a generating set of Tp , so is onto, and the syntactic morphism of Kk (∃rp ) factors through
306
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
. An elementary verification then establishes that no proper quotient of Tp can recognize Kk (∃rp ), and hence the syntactic pg-pair of Kk (∃rp ) is (Tp , ()). As above, this can be extended to recognizing the set of all trees in Mk where the number of nodes labeled in some proper nonempty subset of is congruent to r modulo p. Using the same idea, one can also handle tree languages defined by counting the number of occurrences of certain letters modulo p threshold q. It suffices to consider, in analogy with the mod p case, the languages of the form Kk (∃rp,q ), and the preclone Tp,q , a sub-preclone of T (Bp+q ), whose rank n elements are the mappings fr : (r1 , . . . , rn ) → r1 + · · · + rn + r, where the sum is taken modulo p threshold q. Note that this notion generalizes both above examples, since Tp = Tp,0 and that T∃ = T1,1 . 3.3.3. Identification of a path Let Kk (path) be the set of all the trees in Mk such that all the vertices along at least one maximal path from the root to a leaf are labeled 1n (for the appropriate values of n). Then Kk (path) is recognized by the preclone Tpath (see Example 2.7). Let indeed : M → Tpath be the morphism given by (0n ) = falsen , (10 ) = true0 and (1n ) = orn (n = 0). One can then verify that −1 (truek ) = Kk (path). 3.3.4. Identification of the next modality Let Kk (next) consist of all the trees in Mk such that each maximal path has length at least two and the children of the root are labeled 1n (for the appropriate n). We show that Kk (next) is recognizable. Recall that B = {true, false}, and let : M → T (B × B) be the morphism given as follows: • (00 ) is the nullary constant (false, false)0 , • (10 ) is the nullary constant (false, true)0 , • if n > 0, then (0n ) is the n-ary map ((x1 , y1 ), . . . , (xn , yn )) → (∧i yi , false), • if n > 0, then (1n ) is the n-ary map ((x1 , y1 ), . . . , (xn , yn )) → (∧i yi , true). One can verify by structural induction that for each element x ∈ Mk , the second component of (x) is true if and only if the root of x is labeled 1n for some n, and the first component of (x) is true if and only if every child of the root of x is labeled 1n for some n, that is, if and only if x ∈ Kk (next). Thus Kk (next) is recognized by the morphism . 4. Pseudovarieties of preclones In the usual setting of one-sorted algebras, a pseudovariety is a class of finite algebras closed under taking finite direct products, sub-algebras and quotients. Because we are dealing with preclones, which are infinitely sorted, we need to consider finitary algebras instead of finite ones, and to adopt more constraining closure properties in the definition. (We discuss in Remark 4.18 an alternative approach, which consists in introducing stricter finiteness conditions on the preclones themselves, namely in considering only finitely generated, finitely determined, finitary preclones.) We say that a class of finitary preclones is a pseudovariety if it is closed under finite direct product, sub-preclones, quotients, finitary unions of -chains and finitary inverse limits of
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
307
-diagrams. Here, we say that a union T = n T (n) of an -chain of preclones T (n) , n 0 is finitary exactly when T is finitary. Finitary inverse limits limn T (n) of -diagrams n : T (n+1) → T (n) , n 0 are defined in the same way. Remark 4.1. To be perfectly rigorous, we actually require pseudovarieties to be closed under taking preclones isomorphic to a finitary -union or to a finitary inverse limit of an -diagram of their elements. Remark 4.2. Recall that the inverse limit T of the -diagram (n )n 0 , written T = T (n) if the n : T (n+1) → T (n) are clear, is the sub-preclone of the direct prodlimn uct n T (n) whose set of elements of rank m consists of those sequences (xn )n 0 with (n) xn ∈ Tm such that n (xn+1 ) = xn , for all n 0. We call the coordinate projections p : limn T (n) → T (p) the induced projection morphisms. T
. . .π
n+1
T (n+1)
πn
n
... T (n)
...
T (0)
The inverse limit has the following universal property. Whenever S is a preclone and the morphisms n : S → T (n) satisfy n = n ◦ n+1 for each n 0, then there is a unique morphism : S → limn T (n) with n ◦ = n , for all n. This morphism maps an element s ∈ S to the sequence ( n (s))n 0 . Example 4.3. Here we show that the inverse limit of an -diagram of 1-generated finitary preclones need not be finitary. Let = {}, where has rank 1 and consider the free preclone M. Note that M has only elements of rank 1, and that M1 can be identified with the monoid ∗ . For each n 0, let ≈n be the congruence defined by letting k ≈n ! if and only if k = !, or k, ! n. Let T (n) = M/≈n . Then T (n) is again -generated, and it can be identified with the monoid {0, 1, . . . , n} under addition threshold n. In particular, T (n) is a finitary preclone. Since ≈n+1 -equivalent elements of M are also ≈n -equivalent, there is a natural morphism of preclones from T (n+1) to T (n) , mapping to itself, and the inverse limit of the resulting -diagram is M itself, which is not finitary. Pseudovarieties of preclones can be characterized using the notion of division: we say that a preclone S divides a preclone T, written S < T , if S is a quotient of a sub-preclone of T. It is immediately verified that a nonempty class of finitary preclones is a pseudovariety if and only if it is closed with respect to division, binary direct product, finitary unions of -chains and finitary inverse limits of -diagrams. Example 4.4. It is immediate that the intersection of a collection of pseudovarieties of preclones is a pseudovariety. It follows that if K is a class of finitary preclones, then the pseudovariety generated by K is well defined, as the least pseudovariety containing K. In
308
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
particular, the elements of this pseudovariety, written K!, can be described in terms of the elements of K, taking subpreclones, quotients, direct products, finitary unions of -chains and inverse limits of -diagrams. See Section 4.2 below. We discuss other examples in Section 5.2. We first explore the relation between pseudovarieties and their finitely determined elements, then we discuss pseudovarieties generated by a class of preclones, and finally, we explore some additional closure properties of pseudovarieties. 4.1. Pseudovarieties and their finitely determined elements Proposition 4.5. Let S be a preclone. • S is isomorphic to the inverse limit limn S (n) of an -diagram, where each S (n) is an n-determined quotient of S. • If S is finitary, then S is isomorphic to the union of an -chain n 0 T (n) , where each T (n) is the inverse limit of an -diagram of finitely generated, finitely determined divisors of S. Proof. Let S (n) = S/∼n (where ∼n is defined in Section 2.4) and let n : S → S (n) be the corresponding projection. Since ∼n+1 -related elements of S are also ∼n -related, there exists a morphism of preclones n : S (n+1) → S (n) such that n = n ◦ n+1 . Thus the n determine a morphism : S → limn S (n) , such that (s) = ( n (s))n for each s ∈ S (Remark 4.2). Moreover, since ∼n is the identity relation on the elements of S of rank at most n, we find that for each k n, n establishes a bijection between the elements of rank k of S and those of S (n) . In particular, is injective since each element of S has rank k for some finite integer k. Furthermore, for each k n, n establishes a bijection between the elements of rank k, and it follows that each element of rank k of limn S (n) is the -image of its kth component. That is, is onto. Finally, Lemma 2.11 shows that each S (n) is n-determined. This concludes the proof of the first statement. We now assume that S is finitary, and we let T (m) be the sub-preclone generated by the elements of S of rank at most m. Then T (m) is finitely generated, and the first statement shows that T (m) is the inverse limit of an -diagram of finitely generated, finitely determined quotients of T (m) , which are in particular divisors of S. The following corollary follows immediately: Corollary 4.6. Every pseudovariety of preclones is uniquely determined by its finitely generated, finitely determined elements. We can go a little further, and show that a pseudovariety is determined by the syntactic preclones it contains. Proposition 4.7. Let S be a finitely generated, k-determined, finitary preclone, let A be a finite ranked set and let : AM → S be an onto morphism. Then S divides the direct product
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
309
of the syntactic preclones of the languages −1 (s), where s runs over the ( finitely many) elements of S of rank at most k. Proof. It suffices to show that if x, y ∈ AMn for some n 0 and x ∼−1 (s) y for each s ∈ S! , ! k, then (x) = (y). First, suppose that x and y have rank n k, and let s = (x). Then (1, 0, n, 0) is a −1 (s)context of x, so it is also a −1 (s)-context of y, and we have (y) = s = (x). Now, if x and y have rank n > k, let v ∈ Sn,p for some p k. Since is onto, there exists an element z ∈ AMn,p such that (z) = v. For each s ∈ S! , ! k, we have x ∼−1 (s) y, and hence also x · z ∼−1 (s) y · z. The previous discussion shows therefore that (x · z) = (y · z), that is, (x) · v = (y) · v. Since S is k-determined, it follows that (x) = (y). Corollary 4.8. Every pseudovariety of preclones is uniquely determined by the syntactic preclones it contains. Proof. This follows directly from Corollary 4.6 and Proposition 4.7.
4.2. The pseudovariety generated by a class of preclones Let I, H, S, P, L, U denote, respectively, the operators of taking all isomorphic images, homomorphic images, subpreclones, finite direct products, finitary inverse limits of an diagram, and finitary -unions over a class of finitary preclones. The following fact is a special case of a well-known result in universal algebra. Lemma 4.9. If K is a class of finitary preclones, then HSP(K) is the least class of finitary preclones containing K, closed under homomorphic images, subpreclones and finite direct products. Next, we observe the following elementary facts: Lemma 4.10. For all classes K of finitary preclones, we have (1) PL(K) ⊆ LP(K), (3) SL(K) ⊆ LS(K),
(2) PU(K) ⊆ UP(K), (4) SU(K) ⊆ US(K).
Proof. To prove the first inclusion, suppose that S is the direct product of the finitary preclones S (i) , i ∈ [n], where each S (i) is a limit of an -diagram of preclones S (i,k) in K determined by a family of morphisms i,k : S (i,k+1) → S (i,k) , k 0. For each k, let T (k) be the direct product i∈[n] S (i,k) , and let k = i∈[n] i,k : T (k+1) → T (k) . It is a routine matter to verify that S is isomorphic to the limit of the -diagram determined by the family of morphisms k : T (k+1) → T (k) , k 0. Thus, S ∈ LP(K). (i,k) Now, for each i ∈ [n], preclones in K. Let us as let (S (i,k))k 0 be an -chain of finitary (i) is finitary, and let S = i∈[n] S (i) . If s = (s1 , . . . , sn ) ∈ sume that each S = k 0 S S, then each si belongs to S (i,ki ) , for some ki . Thus s ∈ i∈[n] S (i,k) , where k = max ki , and we have shown that S ∈ k 0 i∈[n] S (i,k) , so that S ∈ UP(K).
310
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
To prove the third inclusion, let T be a sub-preclone of limn S (n) , the finitary inverse limit of an -diagram n : S (n+1) → S (n) of elements of K. Let n : T → S (n) be the natural projections (restricted to T), and let T (n) = n (T ). Then T (n) is a subpreclone of S (n) for each n. Moreover, the restrictions of the n to T (n+1) define an -diagram of subpreclones of elements of K, and it is an elementary verification that T = limn T (n) . Since T is finitary, we have proved that T ∈ LS(K). As for the last inclusion, let T be a subpreclone of a finitary union k 0 S (k) with S (k) ∈ K, for all k 0. Let T (k) = S (k) ∩ T for each k 0. Then each T (k) is a subpreclone of S (k) and T = k 0 T (k) . It follows that T ∈ US(K). Our proof of the third inclusion actually yields the following result. Corollary 4.11. If a finitary preclone S embeds in an inverse limit limn S (n) , then S is isomorphic to a ( finitary) inverse limit limn T (n) , where each T (n) is a finitary sub-preclone of S (n) . We can be more precise than Lemma 4.10 for what concerns finitely generated, finitely determined preclones. Lemma 4.12. Let T be a preclone which embeds in the union of an -chain (S (n) )n . If T is finitely generated, then T embeds in S (n) for all large enough n. Proof. Since T is finitely generated, its set of generators is entirely contained in some S (k) , and hence T embeds in each S (n) , n k. Lemma 4.13. Let T be a quotient of the union of an -chain (S (n) )n . If T is finitely generated, then T is a quotient of S (n) for all large enough n. Proof. Let be a surjective morphism from S = n S (n) onto T. Since T is finitely generated, there exists an integer k such that (S (k) ) contains all the generators of T, and this implies that the restriction of to S (k) (and to each S (n) , n k) is onto. Lemma 4.14. Let T be a preclone which embeds in the inverse limit limn S (n) of an diagram, and for each n, let n : T → S (n) be the natural projection (restricted to T). If T is finitary, then for each k, n is k-injective for all large enough n. If in addition T is finitely determined, then T embeds in Sn for all large enough it n. Proof. Since T is finitary, Tk is finite for each integer k, and hence there exists an integer nk such that n is injective on Tk for each n nk . In particular, for each integer k, n is k-injective for all large enough n. The last part of the statement follows from Lemma 2.12. Lemma 4.15. Let T be a quotient of the finitary inverse limit limn S (n) of an -diagram. If T is finitely determined, then T is a quotient of a sub-preclone of one of the S (n) .
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
311
Proof. Let S = limn S (n) and let n : S → S (n) be the corresponding projection. Let also : S → T be an onto morphism, and let k 0 be an integer such that T is k-determined. By Lemma 4.14, n is k-injective for some integer n. Consider the preclone n (S) ⊆ S (n) . Then we claim that the assignment n (s) → (s) defines a surjective morphism n (S) → T . The only nontrivial point is to verify that this assignment is well defined. Let s, s ∈ Sp and suppose that n (s) = n (s ). We want to show that (s) = (s ), and for that purpose, we show that (s) · v = (s ) · v for each v ∈ Tp,! , ! k (since T is k-determined). Since is onto, there exists w ∈ Sp,! such that v = (w). In particular, (s) · v = (s · w) and similarly, (s ) · v = (s · w). Moreover, we have n (s · w) = n (s · w). Now s · w and s · w lie in S! , and n is injective on S! , so s · w = s · w. It follows that (s) · v = (s ) · v, and hence (s) = (s ). We are now ready to describe the finitely generated, finitely determined elements of the pseudovariety generated by a given class of finitary preclones. Proposition 4.16. Let K be a class of finitary preclones. A finitely generated, finitely determined, finitary preclone belongs to the pseudovariety K! generated by K if and only if it divides a finite direct product of preclones in K, i.e., it lies in HSP(K). Proof. It is easily verified that K! = n Vn , where V0 = K and Vn+1 = HSPUHSPL (Vn ). We show by induction on n that if T a finitely generated, finitely determined preclone in Vn , then T ∈ HSP(K). The case n = 0 is trivial and we now assume that T ∈ Vn+1 . By Lemma 4.10, T lies in HUSPHLSP(Vn ). Then Lemma 4.13 shows that T is in fact in HSPHLSP(Vn ), which is equal to HSPLSP(Vn ) by Lemma 4.9, and is contained in HLSP(Vn ) by Lemma 4.10 again. Now Lemma 4.15 shows that T lies in fact in HSP(Vn ), and we conclude by induction that T ∈ HSP(K). Corollary 4.17. If K is a class of finitary preclones, then K! = IULHSP(K). Proof. The containment IULHSP(K) ⊆ K! is immediate. To show the reverse inclusion, we consider a finitary preclone T ∈ K!. Then T = T (n) , where T (n) denotes the subpreclone of T generated by the elements of rank at most n. Now each T (n) is finitely generated, and by Proposition 4.5, it is isomorphic to the inverse limit of the -diagram formed by the finitely generated, finitely determined preclones Tn /∼m , m 0. By the Proposition 4.16, each of these preclones is in HSP(K), so T ∈ IULHSP(K). Remark 4.18. As indicated in the first paragraph of Section 4, Proposition 4.16 hints at an alternative treatment of the notion of pseudovarieties of preclones, limited to the consideration of finitely generated, finitely determined, finitary preclones. Say that a class K of finitely generated, finitely determined, finitary preclones is a relative pseudovariety if whenever a finitely generated, finitely determined, finitary preclone S divides a finite direct product of preclones in K, then S is in fact in K. For each pseudovariety V, the class Vfin of all its finitary, finitely generated, finitely determined members is a relative pseudovariety, and the map V → Vfin is injective by Corollary 4.6. Moreover, Proposition 4.16 can be used
312
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
to show that this map is onto. That is, the map V → Vfin is an order-preserving bijective correspondence (with respect to the inclusion order) between pseudovarieties and relative pseudovarieties of preclones. Proposition 4.16 also leads to the following useful result. Recall that a finitely generated preclone S is effectively given if we are given a finite generating set A as transformations of finite arity of a given finite set Q, see Section 2.3.1. Corollary 4.19. Let S and T be effectively given, finitely generated, finitely determined preclones. Then it is decidable whether T belongs to the pseudovariety of preclones generated by S. Proof. Let A (resp. B) be the given set of generators of S (resp. T) and let V be the pseudovariety generated by S. By Corollary 4.16, T ∈ V if and only if T divides a direct power of S, say, T < S m . Since B is finite, almost all the sets Bk are empty. We claim that the exponent m can be bounded by |Ak ||Bk | . Bk =∅
Indeed, there exists a sub-preclone S ⊆ S m and an onto morphism S → T . Since B generates T, we may assume without loss of generality that this morphism defines a bijection from a set A of generators of S to B, and in particular, we may identify Bk with Ak , a subset of Am k . Next, one verifies that if m is greater than the bound in the claim, then there exist 1 i < j m such that for all k and x ∈ Ak , the ith and the jth components of x are equal—but this implies that the exponent can be decreased by 1. Thus, it suffices to test whether or not T divides S m , where m is given by the above formula. But as discussed above, this holds if and only if Am contains a set A and a rank preserving bijection from A to B which can be extended to a morphism from the sub-preclone of S m generated by A to T. By Proposition 2.14, and since S and T are effectively given and T is finitely determined, this can be checked algorithmically. 4.3. Closure properties of pseudovarieties Here we record additional closure properties of pseudovarieties of preclones. Lemma 4.20. Let V be a pseudovariety of preclones and let T be a finitary preclone. If T embeds in the inverse limit of an -diagram of preclones in V, then T ∈ V. Proof. The lemma follows immediately from Corollary 4.11.
Proposition 4.21. Let V be a pseudovariety of preclones and let S be a finitary preclone. If for each n 0, there exists a morphism n : S → S (n) such that S (n) ∈ V and n is injective on elements of rank exactly n, then S ∈ V. Proof. Without loss of generality we may assume that each n is surjective. For each n 0, consider the direct product T (n) = S (0) × · · · × S (n) , which is in V, and let n denote the
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
313
natural projection of T (n+1) onto T (n) . Let also n : S → T (n) be the target tupling of the morphisms i , i n, let T be the inverse limit limn T (n) determined by the morphisms n , and let n : T → T (n) be the corresponding projection morphisms. Note that each n is n-injective, and equals the composite of n+1 and n . Thus, there exists a (unique) morphism : S → T such that the composite of and n is n for each n. It follows from the n-injectivity of each n , that is injective. Thus, S embeds in the inverse limit of an -diagram of preclones in V, and we conclude by Lemma 4.20. We note the following easy corollary of Proposition 4.21: Corollary 4.22. Let V be a pseudovariety of preclones. Let S be a finitary preclone such that distinct elements of equal rank can be separated by a morphism from S to a preclone in V. Then S ∈ V. Proof. For any distinct elements f, g of equal rank n, let f,g : S → Sf,g be a morphism such that Sf,g ∈ V and f,g (f ) = f,g (g). For any integer n, let n be the target tupling of the finite collection of morphisms f,g with f, g ∈ Sn . Then n is injective on Sn and we conclude by Proposition 4.21. 4.4. Pseudovarieties of pg-pairs The formal treatment pseudovarieties of pg-pairs is similar to the above treatment of pseudovarieties of preclones—but for the following remarks. We define a pseudovariety of pg-pairs to be a class of finitary pg-pairs closed under finite direct product, sub-pg-pairs, quotients and finitary inverse limits of -diagrams. Our first remark is that, in this case, we do not need to mention finitary unions of -chains: indeed, finitary pg-pairs are finitely generated, so the union of an -chain, if it is finitary, amounts to a finite union. Next, the notion of inverse limit of -diagrams of pg-pairs needs some clarification. Consider a sequence of morphisms of pg-pairs, say n : (S (n+1) , A(n+1) ) → (S (n) , A(n) ). That is, each n is a preclone morphism from S (n+1) to S (n) , which maps A(n+1) into A(n) . We can then form the inverse limit limn S (n) of the -diagram determined by the preclone morphisms n , and the inverse limit limn A(n) determined by the set mappings n . The inverse limit limn (S (n) , A(n) ) of the -diagram determined by the morphisms of pg-pairs n (as determined by the appropriate universal limit, see Remark 4.2) is the pg-pair (S, A), where A = limn A(n) and S is the subpreclone of limn S (n) generated by A. Recall that this inverse limit is called finitary exactly when S is finitary and A is finite (see Example 4.3). We now establish the close connection between this inverse limit and the inverse limit of the underlying -diagram of preclones, when the latter is finitary. Proposition 4.23. Let n : (S (n+1) , A(n+1) ) → (S (n) , A(n) ) be an -diagram of pg-pairs. Let S = limn S (n) and let and (T , A) = limn (S (n) , A(n) ). If S is finitary, then S = T . Proof. We need to show that A generates S. Without loss of generality, we may assume that each n maps A(n+1) surjectively onto A(n) , and we denote by n the restriction of n to
314
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
A(n+1) . By definition, A is the inverse limit of the -diagram given by the n , and we denote by n : A → A(n) the corresponding projection. We also denote by n and n the extensions of these mappings to preclone morphisms A(n+1) M → A(n) M and AM → A(n) M. It is not difficult to verify that AM is the inverse limit of the -diagram given by the n , and that the n are the corresponding projections.
Moreover, each k is onto (even from A to A(k) ). Let indeed ak ∈ A(k) . Since the n are onto, we can define by induction a sequence (an )n k such that n (an+1 ) = an for each n k. This sequence can be completed with the iterated images of ak by k−1 , . . . , 0 to yield an element of A whose kth projection is ak . Since A(n) generates S (n) , the morphism n : A(n) M → S (n) induced by idA(n) is surjective. Moreover, the composites n ◦ n+1 and n ◦ n coincide.
It follows that the morphisms n ◦ n : AM → S (n) and n ◦ n+1 ◦ n+1 coincide, and hence there exists a morphism : AM → S such that n ◦ = n ◦ n for each n. Since n and n are onto, it follows that each n is surjective. We now use the fact that S is finitary. By Lemma 4.14, n is k-injective for each large enough n. Now let s ∈ Sk . We want to show that s ∈ (AM). Let nk be such that n is k-injective for each n nk . We can choose an element tnk ∈ A(nk ) M such that
nk (tnk ) = nk (s). Then, by induction, we can construct a sequence (tn )n of elements such that n (tn+1 ) = tn for each n 0. We need to show that n (tn ) = n (s) for each n. This equality is immediate for n nk , and we assume by induction that it holds for some n nk . We have
n ( n+1 (tn+1 )) = n ( n (tn+1 )) = n (tn ) = n (s) = n ( n+1 (tn+1 )). Since n and n+1 are surjective, since they are injective on Sk , and since n ◦ n+1 = n , (n+1) we find that n is injective on Sk , and hence n+1 (tn+1 ) = n+1 (s), as expected. Thus (tn )n ∈ AM and (t) = s, which concludes the proof that S is generated by A.
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
315
5. Varieties of tree languages Let V = (V,k ),k be a collection of nonempty classes of recognizable tree languages L ⊆ Mk , where runs over the finite ranked alphabet and k runs over the nonnegative integers. We call V a variety of tree languages, or a tree language variety, if each V,k is closed under the Boolean operations, and V is closed under inverse morphisms between free preclones generated by finite ranked sets, and under quotients defined as follows. Let L ⊆ Mk be a tree language, let k1 and k2 be nonnegative integers, u ∈ Mk1 +1+k2 and v ∈ Mn,k . Then the left quotient (u, k1 , k2 )−1 L and the right quotient Lv −1 are defined by (u, k1 , k2 )−1 L = {t ∈ Mn | u · (k1 ⊕ t ⊕ k2 ) ∈ L} where k = k1 + n + k2 , Lv −1 = {t ∈ Mn | t · v ∈ L}, that is, (u, k1 , k2 )−1 L is the set of elements of Mn for which (u, k1 , n, k2 ) is an L-context, and Lv −1 is the set of elements of Mn for which (1, 0, v, 0) is an L-context. Below we will write just u−1 L for (u, k1 , k2 )−1 L if k1 and k2 are understood, or play no role. A literal variety of tree languages is defined similarly, but instead of closure under inverse morphisms between finitely generated free preclones, we require closure under inverse morphisms between finitely generated free pg-pairs. Thus, if L ⊆ Mk is in a literal variety V and : M → M is a preclone morphism with , finite and () ⊆ , then −1 (L) is also in V. 5.1. Varieties of tree languages vs. pseudovarieties of preclones The aim of this section is to prove an Eilenberg correspondence between pseudovarieties of preclones (resp. pg-pairs), and varieties (resp. literal varieties) of tree languages. For each pseudovariety V of preclones (resp. pg-pairs), let var(V) = (V,k ),k , where V,k denotes the class of the tree languages L ⊆ Mk whose syntactic preclone (resp. pg-pair) belongs to V. It follows from Proposition 3.2 that var(V) consists of all those tree languages that can be recognized by a preclone (resp. pg-pair) in V. Conversely, if W is a variety (resp. a literal variety) of tree languages, we let psv(W) be the class of all finitary preclones (resp. pg-pairs) that only accept languages in W, i.e., −1 (F ) ⊆ Mk belongs to W, for all morphisms : M → S (resp. : (M, ) → (S, A)), k 0 and F ⊆ Sk . Theorem 5.1. The mappings var and psv are mutually inverse lattice isomorphisms between the lattice of pseudovarieties of preclones (resp. pg-paris) and the lattice of varieties (resp. literal varieties) of tree languages. Proof. We only prove the theorem for pseudovarieties of pg-pairs and literal varieties of tree languages. It is clear that for each pseudovariety V of finitary pg-pairs, if var(V) = (V,k ),k , then each V,k is closed under complementation and contains the languages ∅ and Mk . The closure of V,k under union follows in the standard way from the closure of V under direct product: if L, L ⊆ Mk are recognized by morphisms into pg-pairs (S, A)
316
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
and (S , A ) in V, then L ∪ L is recognized by a morphism into (S, A) × (S , A ). Thus V,k is closed under the Boolean operations. We now show that V is closed under quotients. Let L ⊆ Mk be in V,k , let : (M, ) → (S, A) be a morphism recognizing L with (S, A) ∈ V and L = −1 (L), and let F = (L). Let (u, k1 , v, k2 ) be an n-ary context, that is, u ∈ Mk1 +1+k2 , v ∈ Mn,! and k1 +!+k2 = k. Now let F = {f ∈ S! | (u) · (k1 ⊕ f ⊕ k2 ) ∈ F }. Then for any t ∈ M! , (t) ∈ F if and only if (u) · (k1 ⊕ (t) ⊕ k2 ) ∈ F , if and only if (u · (k1 ⊕ t ⊕ k2 )) ∈ F iff u · (k1 ⊕ t ⊕ k2 ) ∈ L. Thus, −1 (F ) = (u, k1 , k2 )−1 L, which is therefore in V,! . Now let F = {f ∈ Mn : f · (v) ∈ L}. It follows as above that Lv −1 = −1 (F ) and hence Lv −1 ∈ V,n . Before we proceed, let us observe that we just showed the following: if L ⊆ Mk is a recognizable tree language, then for each n 0 there are only finitely many distinct sets of the form ((u, k1 , k2 )−1 L)v −1 , where (u, k1 , v, k2 ) is an n-ary context of Mk . Next, let : (M, ) → (M, ) be a morphism of pg-pairs and L ⊆ Mk . If L is recognized by a morphism : (M, ) → (S, A), then −1 (L) is recognized by the composite morphism ◦ , and the closure of V by inverse morphisms between free pgpairs follows immediately. Thus the mapping var does associate with each pseudovariety of pg-pairs a literal variety of tree languages, and it clearly preserves the inclusion order. Now consider the mapping psv: we first verify that if W is a literal variety of tree languages, then the class psv(W) is a pseudovariety. Recall that, if (S, A) < (T , B), then any language recognized by (S, A) is also recognized by (T , B), so if each language recognized by (T , B) belongs to W, then the same holds for (S, A). Note also that any language recognized by the direct product (S, A) × (T , B) is a finite union of intersections of the form L∩M, where L is recognized by (S, A) and M by (T , B); thus psv(W) is closed under binary direct products. Finally, if (S, A) = limn (S (n) , A(n) ) is the finitary inverse limit of an -diagram of finitary pg-pairs, then Lemma 4.14 shows that the languages recognized by (S, A) are recognized by almost all of the (S (n) , A(n) ). Thus (S, A) ∈ psv(W), which concludes the proof that psv(W) is a pseudovariety of pg-pairs. Let W be a literal variety of tree languages, and let V = var(psv(W)). We now show that V = W. Since V consists of all the tree languages recognized by a pg-pair in W = psv(W), it is clear that V ⊆ W. Now let L ∈ W,k , and let (ML , AL ) be its syntactic pg-pair. To prove that (ML , AL ) ∈ W, it suffices to show that if : (M, ) → (ML , AL ) is a morphism of pg-pairs and x ∈ ML , then −1 (x) ∈ W. Since a morphism of pg-pairs maps generators to generators, up to renaming and identifying letters (which can be done by morphisms between free pg-pairs), we may assume that is the syntactic morphism of L. Thus −1 (x) is an equivalence class [w] in the syntactic congruence of L, and hence −1 (x) = ((u, k1 , k2 )−1 L)v −1 w∈((u,k1 ,k2 )−1 L)v −1
∩
w∈((u,k / 1 ,k2
)−1 L)v −1
((u, k1 , k2 )−1 L)v −1 ,
where L denotes the complement of L. If x has rank n, the intersections in this formula run over n-ary contexts (u, k1 , v, k2 ), and as observed above, these intersections are in fact finite. It follows that −1 (x) ∈ V. This concludes the verification that V = W, so var ◦ psv is the identity mapping, and in particular var is surjective and psv is injective.
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
317
It is clear that both maps var and psv preserve the inclusion order. In order to conclude that they are mutually reciprocal bijections, it suffices to verify that var is injective. If V and W are pseudovarieties such that var(V) = var(W) = V, then a tree language is in V if and only if its syntactic preclone is in V, if and only if its syntactic preclone is in W. Thus V and W contain the same syntactic preclones, and it follows from Corollary 4.8 that V = W. Remark 5.2. Three further variety theorems for finite trees exist in the literature. They differ from the variety theorem proved above since they use a different notion of morphism, quotient, and syntactic algebra. The variety theorem in [1,35] is formulated for tree language varieties over some fixed ranked alphabet and the morphisms are homomorphisms between finitely generated free algebras, whereas the “general variety theorem” of [36] allows for tree languages over different ranked alphabets and a more general notion of morphism, closely related to the morphisms of free pg-pairs. On the other hand, the morphisms in [19] are much more general than those in either [1,35,36] or the present paper, they even include nonlinear tree morphisms that allow for the duplication of a variable. Another difference is that the tree language varieties in [1,35,36] involve only left quotients, whereas the one presented here (and the varieties of [19]) are defined using two-sided quotients. The notion of syntactic algebra is also different in these papers: minimal tree automata in [1,35], a variant of minimal tree automata in [36], minimal clone (or Lawvere theory) in [19], and minimal preclone, or pg-pair, here. We refer to [19, Section 14] for a more detailed comparative discussion. As noted above, the abundance of variety theorems for finite trees is due to the fact that there are several reasonable ways of defining morphisms and quotients, and a choice of these notions is reflected by the corresponding notion of syntactic algebra. No variety theorem is known for the 3-sorted algebras proposed in [41]. 5.2. Examples of varieties of tree languages 5.2.1. Small examples As a practice example, we describe the variety of tree languages associated with the pseudovariety T∃ ! generated by T∃ (see Section 2.3.2). Let be a finite ranked alphabet and let L ⊆ Mk be a tree language accepted by a preclone in T∃ !. Then the syntactic preclone S of L lies in T∃ !. Recall that a syntactic preclone is finitely generated and finitely determined: it follows from Proposition 4.16 that S divides a product of a finite number of copies of T∃ . By a standard argument, L is therefore a (positive) Boolean combination of languages recognized by a morphism from M to T∃ . Now let : M → T∃ be a morphism. As discussed in Section 3.3, a tree language in M recognized by is either of the form Kk ( ) for some ⊆ , or it is the complement of such a language. From there, and using the same reasoning as in the analogous case concerning word languages, one can verify that a language L ∈ Mk is accepted by a preclone in T∃ ! if and only if L is a Boolean combination of languages of the form Kk ( ) ( ⊆ ), or equivalently, L is a Boolean combination of languages of the form Lk ( ), ⊆ , where Lk () is the set of all -trees of rank k, for which the set of node labels is exactly .
318
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
Similarly—and referring again to Section 3.3 for notation—one can give a description of the variety of tree languages associated with the pseudovariety Tp !, or the pseudovariety Tp,q !, using the languages of the form Kk (∃rp ) or Kk (∃rp,q ) instead of the Kk (∃). 5.2.2. FO[Succ]-definable tree languages In a recent paper [3], Benedikt and Ségoufin considered the class of FO[Succ]-definable tree languages. Note that the logical language used in FO[Succ] does not allow the predicate <, and FO[Succ] is a fragment of FO[<]. We refer the reader to [3] for precise definitions, and we point out here that the characterization established there can be expressed in the framework developed in the present paper. More precisely, the results of Benedikt and Ségoufin establish that FO[Succ]-definable tree languages form a variety of languages, and that the corresponding pseudovariety of preclones consists of the preclones S such that (1) the semigroup S1 satisfies x ! = x !+1 and exfyezf = ezfyexf for all elements e, f, x, y, z such that e = e2 and f = f 2 and for ! = |S1 |; (2) for each x ∈ S2 , e ∈ S1 such that e = e2 , and s, t ∈ S0 , x ·(e ·s ⊕e ·t) = x ·(e ·t ⊕e ·s). In particular, FO[Succ]-definability is decidable for regular tree languages. It is clearly argued in [3] that FO[Succ]-definable tree languages are exactly the locally threshold testable languages, for general model-theoretic reasons, but that this fact alone does not directly yield a decision procedure. The result stated above is analogous to the characterization of FO[Succ]-definability for recognizable word languages—more precisely, Condition (1) suffices for languages of words and their syntactic semigroups. Condition (2), which makes sense in trees but not in words, must be added to the other one to characterize FO[Succ]-definability for tree languages. 5.2.3. Some classes of languages definable in modal logic Boja´nczyk and Walukiewicz also characterized interesting logically defined classes of tree languages [5]. Again, their results are not couched in terms of preclones, but they can conveniently be expressed in this way. These authors consider three fragments of CTL∗ : TL(EX), TL(EF) and TL(EX + EF). Here EX (resp. EF) denotes the modality whereby a tree t satisfies EX (resp. EF) if some child of the root (resp. some node properly below the root) of t satisfies . The set of formulas constructed using one or both of these modalities, plus Boolean operations and letter constants form the logical languages TL(EX), TL(EF) and TL(EX + EF). Boja´nczyk and Walukiewicz first observe that a tree language L is TL(EX)-definable if and only if there exists an integer k such that membership of a tree t in L depends only on the fragment of t consisting of the nodes of depth at most k. They then show that these tree languages form a variety, and the corresponding pseudovariety of preclones consists of the preclones S such that the semigroup S1 satisfies ex = e for each idempotent e. Note that this is exactly the same characterization as for languages of finite words [31]. For the characterization of TL(EF)-definable languages, let us first define the following relation on a preclone S : if s, t ∈ Sn , we say that s t if s = u · t for some u ∈ S1 . It is easily verified that is a quasi-order. (The direction of the order is reversed from that used by Boja´nczyk and Walukiewicz, to enhance the analogy with the R- and L-orders in semigroup theory.)
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
319
Now let (S, A) be the syntactic pg-pair of a tree language L ⊆ M0 . Then L is TL(EF)definable if and only if • S1 satisfies the pseudoidentity v(uv) = (uv) (where x designates the unique idempotent which is a power of x); that is, S1 is L-trivial, and equivalently, the relation is an order relation; • a · (s1 ⊕ · · · ⊕ sn ) = a · (s (1) ⊕ · · · ⊕ s (n) ) for each a ∈ An and s1 , . . . , sn ∈ S0 , and for each permutation of [n]; • a · (s1 ⊕ s2 ⊕ s3 ⊕ · · · ⊕ sn ) = a · (s2 ⊕ s2 ⊕ s3 ⊕ · · · ⊕ sn ) for each a ∈ An and s1 , . . . , sn ∈ S0 such that s2 s1 ; • if b, c ∈ Ap and y ∈ Sp,0 are such that, for each d ∈ Ap , we have d · (b · y ⊕ · · · ⊕ b · y) = d · y = d · (c · y ⊕ · · · ⊕ c · y), then a · (z ⊕ b · y) = a · (z ⊕ c · y) for each a ∈ An and z ∈ Sn−1,0 . This characterization directly implies the decidability of TL(EF)-definability. Boja´nczyk and Walukiewicz also give an interesting characterization of the TL(EF)definable languages in terms of so-called type dependency. In particular, they show that a tree language is TL(EF)-definable if and only if its syntactic preclone S is such that, whenever a is the syntactic equivalence class of a letter in n , and the ti ’s are syntactic equivalence classes of trees in M0 , then the value of a product a · (t1 ⊕ · · · ⊕ tn ) depends only on a and on the set {t | ti t for some 1 i n}. The characterization of TL(EX+EF)-definable languages given in [5] can also be restated in similar—albeit more complex—terms. 5.2.4. FO[<]-definable tree languages The characterization and decidability of FO[<]-definable regular tree languages is an open problem that has attracted some efforts along the years, as discussed in the introduction. We obtained an algebraic characterization of FO[<]-definable regular tree languages in terms of pseudovarieties of preclones, as is reported in [20]. A detailed report of this result will appear in [21], and the present paper lays the foundations for this proof. Let us note here that this characterization is analogous to the characterization of FO[<]definable languages of finite words in the following sense: it is established in [21] that FO[<]-definable tree languages form a variety of tree languages, whose associated pseudovariety of preclones is the least pseudovariety containing the preclone T∃ and closed under a suitable notion of block product. It was pointed out in Example 2.5 that the rank 1 elements of T∃ form the 2-element monoid U1 = {1, 0}, and it is a classical result of language theory that the least pseudovariety of monoids containing U1 and closed under block product is associated with the variety of FO[<]-definable word languages [37]. It is also known that, in the word case, this pseudovariety is exactly that of aperiodic monoids, and membership in it is decidable, which shows that FO[<]-definability is decidable for recognizable word languages. At the moment, we do not have an analogue of this result, and we do not know whether FO[<]-definability is decidable for regular tree languages. Our result [20,21] actually applies to a larger class of logically defined regular tree languages, based on the use of Lindström quantifiers. First-order logic is thus a particular case of our result, which also yields (for instance) an algebraic characterization of first-order logic with modular quantifiers added.
320
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
References [1] J. Almeida, On pseudovarieties, varieties of languages, filters of congruences, pseudoidentities and related topics, Algebra Univ. 27 (1990) 333–350. [2] A. Arnold, M. Dauchet, Théorie des magmoïdes I and II, RAIRO Theoret. Inform. Appl. 12 (1978), 235–257; 3 (1979) 135–154 (in French). [3] M. Benedikt, L. Ségoufin, Regular tree languages definable in FO, in: V. Diekert, B. Durand (Eds.), STACS 2005, Lecture Notes in Computer Science, vol. 3404, Springer, Berlin, pp. 327–339. [4] S.L. Bloom, Z. Ésik, Iteration Theories, Springer, Berlin, 1993. [5] M. Boja´nczyk, I. Walukiewicz, Characterizing EF and EX tree logics, in: P. Gardner, N. Yoshida (Eds.), CONCUR 2004, Lecture Notes in Computer Science, Vol. 3170, Springer, Berlin, 2004. [6] J.R. Büchi, Weak second-order arithmetic and finite automata, Z. Math. Logik Grundlagen Math. 6 (1960) 66–92. [7] J. Cohen, J.-E. Pin, D. Perrin, On the expressive power of temporal logic, J. Comput. System Sci. 46 (1993) 271–294. [8] H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, M. Tommasi, Tree Automata Techniques and Applications, available on: http://www.grappa.univ-lille3.fr/tata (release October 2002). [9] B. Courcelle, The monadic second-order logic of graphs. I. Recognizable sets of finite graphs, Inform. and Comput. 85 (1990) 12–75. [10] B. Courcelle, Basic notions of universal algebra for language theory and graph grammars, Theoret. Comput. Sci. 163 (1996) 1–54. [11] B. Courcelle, The expression of graph properties and graph transformations in monadic second order logic, in: G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformations, Vol. 1, World Scientific, Singapore, 1997, pp. 313–400. [12] B. Courcelle, P. Weil, The recognizability of sets of graphs is a robust property, Theoret. Comput. Sci. to appear. [13] K. Denecke, S.L. Wismath, Universal Algebra and Applications in Theoretical Computer Science, Chapman & Hall, New York, 2002. [14] V. Diekert, Combinatorics on Traces, Lecture Notes in Computer Science, Vol. 454, Springer, Berlin, 1987. [15] J. Doner, Tree acceptors and some of their applications, J. Comput. System Sci. 4 (1970) 406–451. [16] S. Eilenberg, Automata, Languages, and Machines, Vols. A and B, Academic Press, New York, 1974, 1976. [17] S. Eilenberg, J.B. Wright, Automata in general algebras, Inform. and Control 11 (1967) 452–470. [18] C.C. Elgot, Decision problems of finite automata design and related arithmetics, Trans. Amer. Math. Soc. 98 (1961) 21–51. [19] Z. Ésik, A variety theorem for trees and theories, Publ. Math. 54 (1999) 711–762. [20] Z. Ésik, P. Weil, On logically defined recognizable tree languages, in: P.K. Pandya, J. Radhakrishnan (Eds.), Proc. FST TCS 2003, Lecture Notes in Computer Science, Vol. 2914, Springer, Berlin, 2003, pp. 195–207. [21] Z. Ésik, P. Weil, Algebraic characterization of logically defined tree languages, in preparation. [22] D.M. Gabbay, A. Pnueli, S. Shelah, J. Stavi, On the temporal analysis of fairness, in: Proc. 12th ACM Symp. Principles of Programming Languages, Las Vegas, 1980, pp. 163–173. [23] F. Gécseg, M. Steiby, Tree Automata, Akadémiai Kiadó, Budapest, 1984. [24] F. Gécseg, M. Steiby, Tree languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 3, Springer, Berlin, 1997, pp. 1–68. [25] G. Grätzer, Universal Algebra, Springer, Berlin, 1979. [26] U. Heuter, First-order properties of trees, star-free expressions, and aperiodicity, in: R. Cori, M, Wirsing (Eds.), STACS 88, Lecture Notes in Computer Science, Vol. 294, Springer, Berlin, 1988, pp. 136–148. [27] J.A. Kamp, Tense logic and the theory of linear order, Ph.D. Thesis, UCLA, 1968. [28] S. MacLane, Categories for the Working Mathematician, Springer, Berlin, 1971. [29] R. McNaughton, S. Papert, Counter-Free Automata, MIT Press, Cambridge, MA, 1971. [30] J. Mezei, J.B. Wright, Algebraic automata and context-free sets, Inform. and Control 11 (1967) 3–29. [31] J.-E. Pin, Variétés de langages formels, Masson, Paris, 1984 (English translation: Varieties of formal languages, Plenum, New York, 1986).
Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321
321
[32] A. Potthoff, Modulo counting quantifiers over finite trees, in: J.-C. Raoult (Ed.), CAAP ’92, Lecture Notes in Computer Science, Vol. 581, Springer, Berlin, 1992, pp. 265–278. [33] A. Potthoff, First order logic on finite trees, in: P.D. Mosses, M. Nielsen, M.I. Schwartzbach (Eds.), TAPSOFT ’95, Lecture Notes in Computer Science, Vol. 915, Springer, Berlin, 1995, pp. 125–139. [34] M.P. Schützenberger, On finite monoids having only trivial subgroups, Inform. and Control 8 (1965) 190–194. [35] M. Steinby,A theory of tree language varieties, in: M. Nivat,A. Podelski (Eds.), TreeAutomata and Languages, North-Holland, Amsterdam, 1992, pp. 57–81. [36] M. Steinby, General varieties of tree languages, Theoret. Comput. Sci. 205 (1998) 1–43. [37] H. Straubing, Finite Automata, Formal Logic, and Circuit Complexity, Birkhaüser, Boston, MA, 1994. [38] J.W. Thatcher, J.B. Wright, Generalized finite automata theory with an application to a decision problem of second-order logic, Math. Systems Theory 2 (1968) 57–81. [39] W. Wechler, Universal algebra, EATCS Monographs on Theoretical Computer Science, Vol. 10, Springer, Berlin, 1992. [40] P. Weil, Algebraic recognizability of languages, in: J. Fiala, V. Koubek, J. Kratochvíl (Eds.), MFCS 2004, Lecture Notes in Computer Science, Vol. 3153, Springer, Berlin, 2004, pp. 149–175. [41] Th. Wilke, An algebraic characterization of frontier testable tree languages, Theoret. Comput. Sci. 154 (1996) 85–106.
Theoretical Computer Science 340 (2005) 322 – 333 www.elsevier.com/locate/tcs
Commutation with codes Juhani Karhumäkia,∗ , Michel Latteuxb , Ion Petrec a Department of Mathematics, University of Turku and Turku Centre for Computer Science, Turku 20014, Finland b LIFL, URA CNRS 369, Université des Sciences et Technologie de Lille, F-59655 Villeneuve d’Ascq, France c Department of Computer Science, Abo ˚ Akademi University and Turku Centre for Computer Science,
Turku 20520, Finland
Abstract The centralizer of a set of words X is the largest set of words C(X)commuting with X: X C(X) = C(X)X. It has been a long standing open question due to [J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London (1971).], whether the centralizer of any rational set is rational. While the answer turned out to be negative in general, see [M. Kunc, Proc. of ICALP 2004, Lecture Notes in Computer Science, Vol. 3142, Springer, Berlin, 2004, pp. 870–881.], we prove here that the situation is different for codes: the centralizer of any rational code is rational and if the code is finite, then the centralizer is finitely generated. This result has been previously proved only for binary and ternary sets of words in a series of papers by the authors and for prefix codes in an ingenious paper by [B. Ratoandromanana, RAIRO Inform. Theor. 23(4) (1989) 425–444.]—many of the techniques we use in this paper follow her ideas. We also give in this paper an elementary proof for the prefix case. © 2005 Elsevier B.V. All rights reserved. Keywords: Codes; Commutation; Centralizer; Conway’s problem; Prefix codes
1. Introduction The centralizer of a set of words X is the largest set of words C(X) commuting with X: XC(X) = C(X)X. It is easy to see that the centralizer is well-defined for any language X — indeed, C(X) is the union of all languages commuting with X. It is important to note that for any language X, X∗ ⊆ C(X) and C(X) is a monoid. Conway raised the following problem ∗ Corresponding author.
E-mail addresses:
[email protected].fi (J. Karhumäki), michel.latteux@lifl.fr (M. Latteux), ion.petre@abo.fi (I. Petre). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.037
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
323
related to centralizers, see [8, p. 55] (note that Conway uses the term “normalizer”), more than thirty years ago: Conway’s Problem. Is it true that the centralizer of any rational language is rational? This problem has recently received much attention. In a series of papers by the authors and others, see [5,10,11,13–16,26,22,23], it has been proved that the problem has indeed a positive answer for sets with at most three words and for rational prefix codes. It has also been proved in [14] that the centralizer of any recursive language is Co-RE. However, it has recently been proved in a breakthrough paper [18], see also [17] for related issues, that Conway’s problem has a negative answer in general: there are finite languages with non-RE centralizer. The surprising power of finite sets of words is also shown in a related result of [12], showing that the equivalence problem for finite substitutions on ab∗ c is undecidable! Ratoandromanana raised a related question in [23] concerning the commutation with codes. In a paper displaying an impressive array of technical results related to codes she proved that the commutation with prefix codes can be characterized as in free monoids: if X is a prefix code, then for any language L commuting with X, L = (X)I , where I ⊆ N and (X) is the primitive root of X. In particular, this implies that the centralizer of any prefix code X is (X)∗ and thus, Conway’s problem has a positive answer for rational prefix codes. Two conjectures are stated in [23]: Conjecture 1 (Ratoandromanana [23]). Two codes commute if and only if they have a common root. Conjecture 2 (Ratoandromanana [23]). Any code has a unique primitive root. These two conjectures, remained open until now, provide evidence that the commutation with codes has very special properties, in particular that Conway’s problem may have a positive answer for codes. We prove in this paper that this is indeed the case: Theorem 1. The centralizer of any rational code is rational. We also prove that the centralizer of any finite code is finitely generated. It is worth mentioning that throughout the paper we essentially use the techniques of [23], at times refined and extended to codes rather than prefixes. We also give in Section 4 an elementary proof for Ratoandromanana’s result [23] that C(X) = (X)∗ , for any prefix code X. 2. Definitions For basic notions and results of Combinatorics on Words we refer to [3,19,20] and for those of Theory of Codes to [2]. For details on the notion of centralizer and the commutation of languages we refer to [14,15,22]. In the sequel, denotes a finite alphabet, ∗ the set of all finite words over and the set of all (right) infinite words over . We denote by 1 the empty word and by |u| the
324
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
length of u ∈ ∗ . For a word u ∈ ∗ , u denotes the infinite word uuu . . . , while for a language L ⊆ ∗ , L = {u1 u2 u3 . . . | un ∈ L, n 1} ⊆ . For a language L ⊆ ∗ , we denote by l(L) the length of a shortest word in L and by Lmin = {u ∈ L | |u| = l(L)}. We say that a word u is a prefix of a word v, denoted as u v, if v = uw, for some w ∈ ∗ . We say that u and v are prefix comparable if either u v, or v u. A language L is called a prefix code if no two words of L are prefix comparable. The following result is well-known. Lemma 2 (Berstel and Perrin [2], Perrin [21]). The set of prefix codes forms a free semigroup. In particular, any prefix code has a unique primitive root. For a word u and a language L, we say that v1 . . . vn is an L-factorization of u if u = v1 . . . vn and vi ∈ L, for all 1 i n. For an infinite word , we say that v1 v2 . . . vn . . . is an L-factorization of if = v1 v2 . . . vn . . . and vi ∈ L, for all i 1. A relation over L is an equality u1 . . . um = v1 . . . vn , with ui , vj ∈ L, for all 1 i m, 1 j n; the relation is trivial if m = n and ui = vi , for all 1 i m. We say that L is a code if any word of ∗ has at most one L-factorization. Equivalently, L is a code if and only if all relations over L are trivial. The following simple result is often useful in our considerations. Lemma 3. For any language L ⊆ + and any u ∈ L, z ∈ C(L), (zu) ∈ L . Proof. Let z1 = z, u1 = u and for all n 1, define zn+1 ∈ C(L) and un+1 ∈ L such that zn un = un+1 zn+1 . Then, by induction on n, it follows that (z1 u1 )n = u2 u3 . . . un un+1 zn+1 zn . . . z2 , for all n 1, and so, (zu) = u2 u3 . . . un . . . ∈ L . Indeed, since 1 ∈ / L, the two infinite words have arbitrarily long common prefixes, and so they coincide. 3. Preliminary results We prove in this section several results related to the commutation of arbitrary sets of words. We will use these results in the following sections when we discuss the commutation with codes and prefixes. For any sets R ⊆ ∗ , S ⊆ ∗ × ∗ and any nonnegative integer n ∈ N, we denote by R
S
We also denote by Rn , Sn the sets Rn = {u ∈ R | |u| = n},
Sn = {(u, v) ∈ S | |uv| = n}.
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
325
For two sets of words X, Y ⊆ ∗ , we say that the product XY is unambiguous if x1 y1 = x2 y2 implies x1 = x2 and y1 = y2 , for any x1 , x2 ∈ X and y1 , y2 ∈ Y . Lemma 4. Let A, B be some subsets of + such that the product AB is unambiguous. Then (i) If AB ⊆ BA, then AB = BA and the product BA is unambiguous. (ii) For any n 1, if (AB)
0 and ∈ X+ such that (xy)k ∈ X+ . Proof. It follows from Lemma 2, [23] that there exists ∈ X+ such that (yx)k ∈ X+ , for some k 1. But then (xy)k x = x(yx)k ∈ X+ . The following are results of Ratoandromanana [23] that we will use often in our considerations.
326
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
Lemma 8 (Ratoandromanana [23], Lemma 3). For any code X and any language Y such that Y X ⊆ XY or XY ⊆ Y X, if X ∩ Y = ∅, then X ⊆ Y . Lemma 9 (Ratoandromanana [23], Proposition 7). Two codes X, Y commute if and only if there are positive integers m, n such that X m = Y n . Lemma 10 (Ratoandromanana [23], Lemma 10). For any code X consider the set C(X) = {Y | Y is a code commuting with X}. Then C(X) is a commutative stable semigroup. In particular, for any two codes Y, Z commuting with X, YZ is a code and Y Z = ZY . 4. The commutation with prefix codes We characterize in this section the commutation with prefix codes, proving that for any prefix X, C(X) = (X)∗ and LX = XL implies L = (X)I , where (X) is the primitive root of X and I ⊆ N. These results were originally proved in Ratoandromanana [23] using ingenious combinatorial techniques on words and prefix codes. Following the ideas in [23], we give here simpler proofs of those results. There are two crucial ingredients in our proof. First, we observe that the products LX and XL are unambiguous for any language L commuting with X. Second, we prove that for any such L, there is a prefix code P(L) ⊆ L that commutes with X, thus being able to exploit the fact that the set of prefix codes forms a free monoid. We prove several lemmata first. Lemma 11. For any prefix code X and any language L commuting with X, both LX and XL are unambiguous. Proof. The result follows from Lemma 4 and the fact that XL is necessarily unambiguous since X is a prefix. For a set of words A over the alphabet , let Com(A) = {L ⊆ ∗ | LA = AL} and P(A) = A \ A+ . Note that P(A) is a prefix code for any A and if A = ∅, then P(A) = ∅. Indeed, Amin ⊆ P(A). Lemma 12. For any prefix code X, if L ∈ Com(X), then P(L) ∈ Com(X). Proof. If P(L)X ⊆ XP(L), then we are done by Lemma 4. So, let us assume the contrary and let lx be a shortest word in P(L)X \ XP(L), with l ∈ P(L), x ∈ X and let n = |lx|. Then (P(L)X)
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
327
The following result is proved in [23] in the case of codes, using some involved arguments and results. For the sake of completeness, we give here a simple proof in the case of prefix codes, which are the focus of this section. The techniques used here are essentially those of [23]. Lemma 13 (Ratoandromanana [23], Lemma 17). For any prefix code X and any language L, if Xi L = LXi , for some nonnegative integer i, then X i (L \ X ∗ ) = (L \ X ∗ )X i . Proof. Let L1 = L ∩ X ∗ , L2 = L \ X ∗ . If X i L = LXi , then Xi L1 + X i L2 = L1 X i + L2 X i . Let us assume that Xi L2 ∩ L1 X i = ∅. Then there are x1 , x2 ∈ Xi and l1 ∈ L1 , l2 ∈ L2 such that x2 l2 = l1 x1 . Thus, x2 l2 ∈ X∗ and, since X is a prefix code, l2 ∈ X∗ , a contradiction. Thus, Xi L2 ⊆ L2 X i . Since X i L2 is unambiguous, it follows by Lemma 4 that X i L2 = L2 X i . We are now ready to characterize the centralizer of a prefix code. Based on this characterization we then answer Conway’s problem and characterize the commutation with prefix codes. Theorem 14. Let X be a prefix code, (X) its primitive root, and C(X) its centralizer. Then C(X) = (X)∗ . Proof. Assume that C(X) = (X)∗ . Then, by Lemma 13, the language L = C(X) \ (X)∗ = ∅ commutes with X and so, by Lemma 12, P(L) is a prefix code commuting with X. Thus, P(L) = (X)t , for some nonnegative integer t. This is a contradiction since P(L) ⊆ L and L ∩ (X)∗ = ∅. Corollary 15. For any prefix code X, if the set of words L commutes with X, then L = i i∈I (X) , for some I ⊆ N. Proof. To prove the claim of the theorem, it is enough to prove that for any n 0, if L ∩ (X)n = ∅, then (X)n ⊆ L. This follows from [23, Lemma 18], but for the sake of completeness, we include a short proof here. Let u1 , . . . , un ∈ (X) such that u1 . . . un ∈ L and let 1 , . . . , n be arbitrary elements of (X). Let also X = (X)k , k 1. Then, since X n L = LXn and (1 . . . n )k ∈ (X)nk = Xn , it follows that u1 . . . un (1 . . . n )k ∈ Xn L = (X)kn L. Since L ⊆ (X)∗ and (X) is a prefix, this can only lead to a trivial (X)-relation, i.e., 1 . . . n ∈ L. Thus, (X)n ⊆ L, proving the claim. Corollary 16. Conway’s problem has an affirmative answer for rational prefix codes: for any rational prefix code X, both (X) and C(X) are rational and C(X) = (X)∗ . Proof. It is not difficult, see, e.g., [4] or [24], to prove that for any rational language R such that R = R0n , for some language R0 and some positive integer n, there is a rational language
328
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
R1 such that R0 ⊆ R1 , and R = R1n . Using this observation it follows that (X) and thus, also C(X) must be rational.
5. The commutation with codes We describe in this section the form of the centralizer of any code. In particular, we prove that the centralizer of any rational code is rational, thus giving a positive answer to Conway’s problem in the case of codes. It also follows that the centralizer of any finite code is finitely generated. One of the crucial ingredients in our proof is that for any code X and any language L commuting with X, the products LX and XL are unambiguous. Theorem 17. For any code X, the products XC(X) and C(X)X are unambiguous. Proof. Assume that XC(X) is ambiguous, i.e., there are x, y ∈ X, u, v ∈ C(X) such that xu = yv and x = y. By Lemma 7, there exists ∈ X+ such that (xu)k ∈ X+ . Let z = (xu)k . Then z = ((xu)k ) = (x(ux)k−1 u) = x((ux)k−1 ux) = x(wx) , where w = (ux)k−1 u ∈ C(X). As it is easy to see, for any ∈ C(X) and any t ∈ X, (t) ∈ X and so, (wx) ∈ X . Consequently, z ∈ xX . Analogously, z = ((yv)k ) ∈ yX and so, z has two different X-factorizations. It is not difficult now to see that this leads to a contradiction. For the sake of completeness, we give here a simple argument on how to conclude it, but note that the same follows also from a result of [9] stating that X is a code if and only if for any ∈ X + , has exactly one X-factorization. Assume that there is a word z ∈ X+ such that z has a second X-factorization z = 1 2 , . . . , i ∈ X, 1 = z. By the pigeon hole principle, it follows that there are i < j such that 1 . . . i = zni and 1 . . . j = znj , for some nonnegative integers ni < nj and a proper prefix of z. It is easy to see then that 1 . . . j = znj −ni 1 . . . i , a contradiction since X is a code. Corollary 18. For any code X and any language L commuting with X, the products LX and XL are unambiguous. Proof. If XL were ambiguous, then necessarily XC(X) would be ambiguous since L ⊆ C(X). Lemma 19. Let X be a code, n a positive integer, and L a language commuting with Xn , with l(L) = l(X). Then X ⊆ L. Proof. Clearly, Lmin and Xmin are two commuting prefix codes and since l(L) = l(X), it follows that Lmin = Xmin .
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
329
Since X n is a code, the product X n L is unambiguous by Corollary 18. Assume now that there exists a word x ∈ X \ L. Let us consider u = xs n with s ∈ Lmin = Xmin . Then u = xs n−1 s ∈ X n L = LX n . Since x ∈ L, u ∈ (L \ X ∗ )X n = X n (L \ X ∗ ). This is a contradiction since u = xs n ∈ Xn (L ∩ X). The following result was proved in Lemma 24, [23] for prefix codes. We extend it here to arbitrary codes, using essentially the techniques in [23]. Lemma 20. Let X be a code and L a language commuting with X. If l(x) = kl(L), for some k > 1, then there exists a code Y such that X = Y k . Proof. Clearly, Xmin Lmin = Lmin Xmin and since l(X) = kl(L), it follows that Xmin = Lkmin . Thus, X ∩ Lk = ∅ and it follows from Lemma 8 that X ⊆ Lk . Let l0 ∈ Lmin and Y = {y ∈ L | l0k−1 y, yl0k−1 ∈ X}. We prove that Y is a code and X = Y k . Clearly, Y = ∅, e.g., Lmin ⊆ Y . Claim 1. If x = l1 . . . lk ∈ X, with li ∈ L, then l2 . . . lk l0 , l0 l1 . . . lk−1 ∈ X. Proof of Claim 1. We have u = l2 . . . lk (l0 )k ∈ Lk−1 X = XLk−1 , so u = wy, with w ∈ X and y ∈ Lk−1 . Note that x(l0 )k = l1 u = l1 wy. Since l1 w ∈ LX = XL, we deduce that l1 w = x l , for some x ∈ X, l ∈ L. Consequently, x(l0 )k = x (l y), with x, x ∈ X, (l0 )k , l y ∈ Lk . Since XLk is unambiguous by Corollary 18, it follows that l0k = l y. Now, l0 ∈ Lmin and so, l = l0 and y = l0k−1 . Then, since l2 . . . lk (l0 )k = wy, it follows that w = l2 . . . lk l0 ∈ X. The second part of Claim 1 is proved analogously. Using Claim 1, we can deduce easily Claim 2. Claim 2. If x = l1 . . . lk ∈ X, with li ∈ L, then for any i ∈ {1, . . . , k}, li l0k−1 , l0k−1 li ∈ X. Claim 3. Any word x ∈ X ⊆ Lk has a unique L-factorization in Lk . Proof of Claim 3. Assume that x = l1 l2 . . . lk = l1 l2 . . . lk ∈ X, with li , li ∈ L, for all i = 1, 2, . . . , k. Then ((l0 )k−1 l1 )(l2 . . . lk ) = ((l0 )k−1 l1 )(l2 . . . lk ), with (l0 )k−1 l1 , (l0 )k−1 l1 ∈ X according to Claim 2. Since XLk−1 is unambiguous, we obtain that (l0 )k−1 l1 = (l0 )k−1 l1 and so l1 = l1 . Then, according to Claim 1, l2 . . . lk l0 = l2 . . . lk l0 ∈ X, etc. Claim 4. If y ∈ Y and x = l1 l2 . . . lk ∈ X, with li ∈ L, then l2 . . . lk y ∈ X. Proof of Claim 4. Since Y ⊆ L, xy ∈ XL = LX and so, xy = l1 x , with x ∈ X and l1 ∈ L. We will prove that l1 = l1 . Then, x = l2 . . . lk y ∈ X, proving the claim. Clearly, x l0k−1 ∈ XLk−1 = Lk−1 X and so, x l0k−1 = l2 . . . lk x , with x ∈ X and li ∈ L. It follows from the definition of Y that u = yl0k−1 ∈ X. Consequently, (l1 l2 . . . lk )(yl0k−1 ) = xyl0k−1 = l1 x l0k−1 = (l1 l2 . . . lk )x .
330
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
Since Lk X is unambiguous by Corollary 18, it follows that l1 l2 . . . lk = l1 l2 . . . lk . Now, l1 l2 . . . lk = x ∈ X and it follows by Claim 3 that l1 = l1 , concluding the proof of Claim 4. We can prove now that X ⊆ Y k . For this, let x ∈ X. As observed in the beginning of the proof, X ⊆ Lk and so, x = l1 . . . lk , with li ∈ L. From Claim 2 it follows that li l0k−1 , l0k−1 li ∈ X for all i = 1, 2, . . . , k and so, li ∈ Y , for all i. Consequently, X ⊆ Y k . For the reverse inclusion, consider y1 , . . . , yk ∈ Y and x = l1 . . . lk ∈ X, with li ∈ L. It follows from Claim 4 by induction that li . . . lk y1 . . . yi−1 ∈ X, for all i = 2, 3, . . . , k. Thus, y1 . . . yk ∈ X, i.e., Y k ⊆ X. It follows then by Claim 3 that X = Y k . It also follows that Y is a code, concluding the proof. Lemma 21. Let X be a code and L ⊆ + be a language commuting with X. Then there exists a code Y commuting with X such that Lmin = Ymin and Y ⊆ L. Moreover, if X is rational, then Y is rational. Proof. Set t = l(X) and s = l(L). Since LX s = X s L, X s is a code and l(X s ) = tl(L), it follows from Lemma 20 that there exists a code Y such that Y t = X s . Then LY t = Y t L, with l(Y ) = l(L) and so, Lmin = Ymin ⊆ Y implying by Lemma 19 that Y ⊆ L. Moreover, from Lemma 9 we also obtain that Y is commuting with X. Observe now that if X is a rational code, then Xs and so, Y t , is a rational code. It follows then that Y is rational. The following result describes the form of all monoids commuting with a given code. Theorem 22. For any code X and any monoid M commuting with X, there exist codes C1 , . . . , Ck commuting with X such that M = (C1 ∪ · · · ∪ Ck )∗ . Moreover, if X is rational, then M is rational. Proof. Let M0 = M \ {1}. It is a result of [23] ([23, Lemma 4]) that M0 X = XM0 . Thus, by Lemma 21, there exists a code C1 ⊆ M0 commuting with X with (C1 )min = (M0 )min . Let B1 = C1 . For all i 1 consider Bi = C1 ∪ · · · ∪ Ci ⊆ M0 and Mi = M \ Bi∗ . Since M is a monoid, ∗ Bi ⊆ M and so, by Lemma 5, we have that Mi X = XMi . If Mi = ∅, then by Lemma 21 there exists a code Ci+1 ⊆ Mi commuting with X such that (Ci+1 )min = (Mi )min . Assume that for all j 1, Mj = ∅ and set d = gcd{l(Cj ) | j 1}. Then d = gcd{l(C1 ), l(C2 ), . . . , l(Cn )}, for some n 1. Clearly, by construction, l(Cp ) < l(Cp+1 ), for all p 1. Thus, there is h > n such that l(Ch ) = t1 l(C1 )+· · ·+tn l(Cn ), for some nonnegative integers t1 , . . . , tn . Let us consider Y = (C1 )t1 . . . (Cn )tn . From Lemma 10 it follows that Y is a code commuting with Ch . Since l(Ch ) = l(Y ), we get that Ymin = (Ch )min , hence Ch ∩ Y = ∅. Consequently, Ch = Y ⊆ Bn∗ , a contradiction since Ch ⊆ Mn = M \ Bn∗ . Now let k be the least integer such that Mk = ∅. Then M = (C1 ∪ . . . Ck )∗ . The second part of the claim follows from Lemma 21: C1 , . . . , Ck are rational and so, M is rational. The main result of this paper follows now as a simple consequence of Theorem 22 since the centralizer of any language is a monoid.
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
331
Theorem 23. The centralizer of any rational code is rational. The following result also follows from Theorem 22 in the case of finite codes. Theorem 24. Any monoid commuting with a finite code is finitely generated. In particular, the centralizer of a finite code is a finitely generated monoid. Proof. Let X be a finite code and M a monoid commuting with X. Then by Theorem 22 M = (C1 ∪ · · · ∪ Ck )∗ with Ci codes commuting with X, for all i = 1, 2, . . . , k. Thus, by Lemma 9, Citi = X si , for some positive integers ti , si . Thus, each Ci is finite, proving the claim.
6. Conclusions The behavior of codes under commutation is special. While the centralizer of a finite set is not necessarily recursively enumerable, we describe here the form of the centralizer of a code and prove that it is necessarily rational if the code is rational. Moreover, if the code is finite, then the centralizer is finitely generated. The crucial difference between codes and arbitrary sets of words seems to be in the fact that for a code X, the product XC(X) is unambiguous, as proved in Theorem 17. We also give in this paper a simple, self-contained proof for the case of prefix codes, proving that for any prefix code X, C(X) = (X)∗ , a result originally proved in [23]. In proving our results, we exploited a series of deep results on commutation proved in Ratoandromanana [23]. Two conjectures proposed in [23], related to commutation with codes, remain however open. Conjecture 1 (Conjecture 1, [23]). Two codes commute if and only if they have a common root. Conjecture 2 (Conjecture 2, [23]). Any code has a unique primitive root. Two other conjectures have been given in the literature in connection with the commutation of codes, see, e.g., [11,14,15]. Conjecture 3. The centralizer of a code is a free monoid. Conjecture 4. For any code X, if LX = XL, then there is a code R such that X = R m and L = R I , for some m 1, I ⊆ N. Note that the characterization conjectured above holds for the commutation of polynomials and formal power series with coefficients in a field, see [1,6,7,25]. We prove here that in fact Conjectures 1–4 are equivalent. Theorem 25. Conjectures 1–4 are equivalent.
332
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
Proof. Let X be a code. We prove first that Conjectures 1 and 2 are equivalent. Considering that Conjecture 1 holds, assume that the code X has two distinct primitive roots Y and Z, X = Y i = Z j . It then follows from Lemma 9 that Y and Z commute and according to Conjecture 1, they have a common root. Since they are primitive, it follows that Y = Z, a contradiction. To prove the reverse implication, assume that Conjecture 2 holds and consider now two commuting codes X, Y and their unique primitive roots U, V : X = U s , Y = V t . Then, by Lemma 9, Xi = Y j , for some i, j > 0 and so, U, V are primitive roots of the code U si = V tj . It follows then from Conjecture 2 that U = V , i.e., X, Y have a common root. We prove now that Conjectures 1 and 2 imply Conjecture 3. Let Z be the primitive root of the code X. Then Z ∗ commutes with X and so, Z ∗ ⊆ C(X). Assume that C(X) \ Z ∗ = ∅. Then, by Lemma 5, C(X) \ Z ∗ commutes with X and then, by Lemma 21, it follows that there is a code Y ⊆ C(X) \ Z ∗ such that XY = Y X. Thus, by Lemma 10, Y Z = ZY and so, from Conjecture 1 it follows that there is a code R such that Y = R m , Z = R n . Since Z is primitive, we have Y = Z m , contradicting the fact that Y ∩ Z ∗ = ∅. We prove now that Conjecture 3 implies Conjecture 4. It follows from Conjecture 3 that C(X) = Z ∗ , for some code Z. It follows then from Theorem 22 that XZ = ZX and then from Lemma 9 that X i = Z j , for some i, j > 0. Consider now a language L commuting with X. Then L commutes also with Xi , i.e., with Z j . Since L ⊆ C(X) = Z ∗ , it follows from Lemma 18 of [23] that L = Z I , where I = {i 0 | Z i ⊆ L = ∅}, concluding Conjecture 4. Note that this also implies that Z is the unique primitive root of X. We prove now that Conjecture 4 implies Conjecture 1. Consider two codes X, Y such that XY = Y X. It then follows from Conjecture 4 that there is a set V such that X = V I , Y = V J , for some I, J ⊆ N. Then necessarily V is a code, I, J are singletons and V is a common root of X and Y. Acknowledgements The authors gratefully acknowledge the detailed referee reports that helped to improve the presentation of the paper. Juhani Karhumäki was supported by Academy of Finland under Grant 44087. Ion Petre was supported by Academy of Finland under Grant 203667. References [1] G. Bergman, Centralizers in free associative algebras, Trans. Amer. Math. Soc. 137 (1969) 327–344. [2] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [3] C. Choffrut, J. Karhumäki, Combinatorics of words, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 1, Springer, Berlin, 1997, pp. 329–438. [4] C. Choffrut, J. Karhumäki, On Fatou properties of rational languages, in: C. Martin-Vide, V. Mitrana (Eds.), Where Mathematics, Computer Science, Linguistics and Biology Meet, Kluwer, Dordrecht, 2000. [5] C. Choffrut, J. Karhumäki, N. Ollinger, The commutation of finite sets: a challenging problem, Theoret. Comput. Sci. 273 (1–2) (2002) 69–79. [6] P.M. Cohn, Factorization in noncommuting power series rings, Proc. Cambridge Philos. Soc. 58 (1962) 452–464. [7] P.M. Cohn, Centralisateurs dans les corps libres, in: J. Berstel (Ed.), Séries formelles, Paris, 1978, pp. 45–54. [8] J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London, 1971.
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333
333
[9] J. Devolder, M. Latteux, I. Litovsky, L. Staiger, Codes and infinite words, Acta Cybernet. 11 (1994) 241–256. [10] J. Karhumäki, Challenges of commutation: an advertisement, in: Proc. of FCT 2001, Lecture Notes in Computer Science, Vol. 2138, Springer, Berlin, 2001, pp. 15–23. [11] J. Karhumäki, M. Latteux, I. Petre, The commutation with ternary sets of words, Theory Comput. Systems, 38 (2) (2005) 161–169. [12] J. Karhumäki, L. Lisovik, The equivalence problem for finite substitutions on ab∗ c, with applications, IJFCS 14 (2003) 699–710; preliminary version in Springer Lecture Notes in Computer Science, Vol. 2380, 2002, pp. 812–820. [13] J. Karhumäki, I. Petre, On the centralizer of a finite set, in: Proc. of ICALP 2000, Lecture Notes in Computer Science, Vol. 1853, Springer, Berlin, 2000, 536–546. [14] J. Karhumäki, I. Petre, Conway’s Problem for three-word sets, Theoret. Comput. Sci. 289/1 (2002) 705–725. [15] J. Karhumäki, I. Petre, Two problems on commutation of languages, in: G. Paun, G. Rozenberg, A. Salomaa (Eds.), Current trends in Theoretical Computer Science (The Challenge of the New Century) Vol. 2, World Scientific 2004, pp. 477–494. [16] J. Karhumäki, I. Petre, The branching point approach to Conway’s problem, Lecture Notes in Computer Science, Vol. 2300, Springer, Berlin, 2002, pp. 69–76. [17] M. Kunc, Regular solutions of language inequalities and well quasi-orders, in: Proc. of ICALP 2004, Lecture Notes in Computer Science, Vol. 3142, Springer, Berlin, 2004, pp. 870–881; final version in Theoret. Comput. Sci. (2005), to appear. [18] M. Kunc, The power of commuting with finite sets of words, Lecture Notes in Computer Science, Vol. 3404, Springer, Berlin, 2005, pp. 569–580. [19] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983. [20] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002. [21] D. Perrin, Codes conjugués, Inform. Control 20 (1972) 222–231. [22] I. Petre, Commutation problems on sets of words and formal power series, Ph.D. Thesis, University of Turku, 2002. [23] B. Ratoandromanana, Codes et motifs, RAIRO Inform. Theor. 23 (4) (1989) 425–444. [24] A. Restivo, Some decision results for recognizable sets in arbitrary monoids, in: Proc. of ICALP 1978, Lecture Notes in Computer Science, Vol. 62, Springer, Berlin, 1978, pp. 363–371. [25] C. Reutenauer, Centralisers of noncommutative series and polynomials, in: M. Lothaire (Ed.), Algebraic Combinatorics on Word, Cambridge University Press, Cambridge, USA, 2002, pp. 312–329. [26] A. Salomaa, S. Yu, On the decomposition of finite languages, in: G. Rozenberg, W. Thomas (Eds.), Developments in Language Theory, World Scientific, Singapore, 2000, pp. 22–31.
Theoretical Computer Science 340 (2005) 334 – 348 www.elsevier.com/locate/tcs
Palindromic factors of billiard words J.-P. Borela,1 , C. Reutenauerb,∗ a LACO, UMR CNRS 6090, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France b UQÀM, Département de Mathématiques, case postale 8888, succursale Centre-Ville, Montreal, Québec,
Canada H3C 3P8
Abstract We study palindromic factors of billiard words, in any dimension. There are differences between the two-dimensional case, and higher dimension. Arbitrary long palindrome factors exist in any dimension, but arbitrary long palindromic prefixes exist in general only in dimension 2. © 2005 Elsevier B.V. All rights reserved. MSC: 68R15 Keywords: Words; Languages; Sturmian; Billiard; Palindromes
1. Introduction and notations 1.1. Billiard and Christoffel words in dimension 2 Let be a positive irrational number. In several ways one may associate to it a Sturmian word on the alphabet A := {a, b}. We use here a geometrical approach. Consider the grid G on the first quadrant of the plane: it is the set of vertical half-lines with integer x-coordinate and of horizontal half-lines with integer y-coordinate. The line D through the origin O and slope divides G into two parts. We construct the word u and the billiard word c (cutting sequence) as follows: ∗ Corresponding author. Tel.: +1 514 9873000x3228; fax: +1 514 9878274.
E-mail addresses: [email protected] (J.-P. Borel), [email protected] (C. Reutenauer). 1 Research partially supported by Région Limousin.
0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.036
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
b b a
b
b a a
a
b
b
a
b
b
a
b
a
b
b b
a
a
b
b
b a
b
b
b
335
a
b
b b
a
a
b
b
b (1)
(2)
(3)
Fig. 1.
Denoting by a the horizontal segment and by b the vertical one, u encodes the discrete path immediately under the half-line D; hence in the example u = ababbababbabb . . . (Fig. 1(1)). Looking at the squares crossed by D (grey in Fig. 1(2)) and their blackened sides, the billiard word c encodes the sequence of sides crossed by D (a for a vertical side, b for an horizontal one). Here c = babbababbabb . . . . Equivalently, c encodes the discrete path joining the center of the crossed squares (see Fig. 1(3)). It is easily seen that u = ac . The words u are called Christoffel words [7] and are particular Sturmian words. Regarding factors, this does not restrict generality. See [1,5,15] for the theory of Sturmian words. Christoffel words are known since Bernoulli and have many applications in mathematics and physics; they are related to continued fractions, Farey sequences, and the Stern–Brocot tree (see e.g. [12] for the latter). 1.2. Billiard words in dimension 3 Let D be the half-line of origin O, in k-dimensional space, and parallel to vector (1 , 2 , . . . , k ), with i positive. We consider the sequence of k-cubes crossed by D, and facets joining each cube to the next: a facet is a subset of the cube formed by all points having a fixed integer ith coordinate. This ith coordinate will be encoded by ai , and thus we obtain a sequence on the alphabet A = {a1 , a2 , . . . , ak }, encoding the facets crossed by the half-line D. This works soon as one has
i ∈ /Q j
(1)
336
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
for any i = j : indeed in this case, each facet is crossed in its interior, so that the corresponding intersection point has a unique integer coordinate, its ith coordinate. 2 In this way we obtain the billiard word c1 ,2 ,...,k , or c , if we denote :=(1 , 2 , . . . , k ). Note that, as in Fig. 1(3), c encodes also the discrete path joining the centers of the k-cubes crossed by D. We use this interpretation in the sequel. It is known that the number of finite factors of length n of c , in dimension 2, is equal to n + 1. This well-known property characterises the Sturmian words in dimension 2; in dimension 3, the number of finite factors of length n is equal to n2 + n + 1, see [2]; in dimension k, it is min(k−1,n) k−1 n i!, i i i=0 see [3]. 1.3. Finite billiard words Let M := (m1 , m2 , . . . , mk ) ∈ Nk , where the mi are pairwise relatively prime. The segment OM crosses several k-cubes and one defines, as before, a finite word cM on the same alphabet, called the (finite) billiard word associated to M. One has |cM |ai = mi − 1, 1 i k, k |cM | = mi − k. i=1
Note that, as usual, |v| is the length of word v, and |v|a its a-degree. Observe that cM is a palindrome, that is, equal to its reversal. We denote by v the reversal of word v. For a palindrome, one has v = v by definition. 2. Main results 2.1. Dimension 2 Everything is known in this case, and palindromic factors and prefixes of Sturmian words have been intensively studied; even, they characterize Sturmian words, see [10,11,14]. 2.1.1. Palindromic prefixes Theorem 2.1. The palindromic prefixes of the infinite billiard word c are finite billiard words; for all n > 0 they are the prefixes of length pn + qn − 2, for all the main and intermediate convergents pn /qn of the continued fraction expansion of the real number . This result is stated in [4,8,9], in a slightly different formulation. 2 Note that the previous condition holds if the coordinates are Q-linearly independents, which is necessary i for some results.
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
0/1 a
337
1/0 b 1/1 a|b 1/2 a|ab 1/3 a|aab 2/5 aaab|aab 3/7 aaabaab|aab 4/9 aaabaabaab|aab 7/16 aaabaabaab|aaabaabaabaab 10/23 aaabaabaab|aaabaabaabaaabaabaaba Fig. 2.
Consider an example. Let u = ac = aaabaabaabaaabaabaabaaabaabaabaab . . . obtained by taking in the interval fraction expansion: 1 7 = 16 2+
1 3+ 21
and
7 16
<<
10 23 .
These two numbers have the continued
10 1 = 23 2+
1 3+ 13
denoted by [0; 2, 3, 2] and [0; 2, 3, 3] = [0; 2, 3, 2, 1]. In other words, = [0; 2, 3, 3, . . .]. The sequence of intermediate and main convergents is 0 1 1 1 2 3 4 7 10 1 , 1 , 2 , 3 , 5 , 7 , 9 , 16 , 23 , . . .
which may be read on the Stern–Brocot tree of 10 23 , see Fig. 2 In Fig. 2, we have indicated the factorization of each Christoffel word, coming from the words above. To each such Christoffel word, which will be of the form aub, associate its factor u. These words u are exactly the palindromic prefixes of c (here, those of length 33): c = aabaabaabaaabaabaabaaabaabaabaab . . . a aa aaa aaabaab aaabaabaab aaabaabaabaaabaabaabaaa aaabaabaabaaabaabaabaaabaabaabaab ε a aabaa aabaabaa aabaabaabaaabaabaabaa aaababaabaaabaabaabaaabaabaabaa
We have listed the Christoffel words on the first line, and the associated billiard words on the second. Note that there exists palindromic factors which are not prefixes, as b, aba, aaa or baab. But they are central factors of the previous palindromes.
338
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
D
Fig. 3.
2.1.2. Palindromic factors It is well-known that the language L of factors of c is stable under reversal: v ∈ L ⇒ v ∈ L. Let v be any palindromic factor of c . It is sometimes possible to extend v into another factor ava or bvb, and to iterate. In the following, we call central factor of a finite word u any factor v such that u = v1 vv2 , with v1 and v2 of the same length. Theorem 2.2. Each palindromic factor of a billiard word c is a central factor of some palindromic prefix of c . This result is obtained in [8]. Note that palindromic factors characterize Sturmian words, see [11]. Proof. Let v be a maximal palindromic factor of c in a nonprefix position; maximal means that nor ava, nor bvb is a factor of c . Consider the figure representing the sequence of squares encoded by v. Since v is a factor of c , either avb or bva is a factor of c . We consider the first case, as in Fig. 3, corresponding to v = aba. Then line D enters the figure through a vertical line, stays inside it, and leaves the figure through an horizontal line. By symmetry under the center of the figure (since v is a palindrome), there exist a parallel line entering by an horizontal and leaving by a vertical segment (in the figure, the dotted line with long segments). Hence the parallel line beginning at the lower left point of the figure stays in the figure (in the figure, the dotted line). This means that v is a prefix of c . 2.2. Thus there are infinitely many palindromic prefixes in an infinite billiard word, and palindromic factors are factors of the palindromic prefixes. They appear infinitely often, since billiard words are recurrent (even uniformly recurrent). The work of Laurent Vuillon [17] gives even precise information on their appearance, through the notion of first return of a factor.
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
339
2.3. Dimension 3 2.3.1. Prefix integer point Let M := (m1 , m2 , . . . , mk ) ∈ Nk and H the orthogonal projection of M on the line through O parallel to (1 , 2 , . . . , k ). Definition 2.1. A point in Rk is a 2-integer point if at least two of its coordinates are integers. Definition 2.2. M = (m1 , m2 , . . . , mk ) is called an integer prefix of (1 , 2 , . . . , k ) if triangle OMH does not contain any 2-integer point, except O and M. 2.3.2. Palindromic prefixes Denote by ij the mapping that associates to a word u on {a1 , a2 , . . . , ak } the word on {ai , aj } obtained by erasing all other letters. Then ij (c1 ,2 ,...,k ) is the billiard word ci ,j . Theorem 2.3. A prefix v of c1 ,2 ,...,k is palindromic if and only if each ij (v) is a palindromic prefix of ci ,j . Theorem 2.4. • For almost all (1 , 2 , . . . , k ) ∈ Rk+ , in the sense of Lebesgue, the word c1 ,2 ,...,k has only finitely many palindromic prefixes. • There exist (1 , 2 , . . . , k ) such that c1 ,2 ,...,k has infinitely many palindromic prefixes. We shall prove these results in the sequel. We give, for the last property, a proof that will imply that the corresponding lines are dense. In the first case, the number of palindromic factors may be very small: we give an example in dimension 3 where the only nonempty palindromic prefix is the first letter. According to Theorem 2.3, in order to have palindromic prefixes, there must be some synchronization between the palindromic prefixes of the words ij (c1 ,2 ,...,k ), hence between the corresponding convergents. 2.3.3. Palindromic factors As said in the abstract, the situation is the same in any dimension. Theorem 2.5. Each factor of c1 ,2 ,...,k is a factor of some palindromic factor of c1 ,2 ,...,k . In particular, arbitrary long palindromic factors exist.
3. Integer prefix point, up-down method and synchronization 3.1. Integer prefix point We consider D as before: it is the half-line of origin O and parallel to vector := (1 , 2 , . . . , k ), with the condition (i /j ) ∈ / Q for any i = j .
340
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
Proposition 3.1. Let M = (m1 , m2 , . . . , mk ) ∈ Nk . Let H be the orthogonal projection of M onto D, and T the intersection of D and the facets of the k-dimensional parallelepiped P of long diagonal OM. Then the following conditions are equivalent: (i) M is an integer prefix point of (1 , 2 , . . . , k ). (i ) There is no 2-integer point in the triangle OMT, except O and M. (ii) Triangle OMH intersects the integer k-cubes in their facets, except for the points O and M. (iii) The finite billiard word cM is a prefix of the infinite billiard word c . Proof. Let M be any point on HM and consider the finite word v encoding the intersection of the segment OM with the facets of the k-dimensional grid in Rk . When M = H , v = v is a prefix of c . When M = M, v = cM . Property (iii) means that v is constant when M varies on segment MH: each segment OM meets the k-cubes by the same facets. Hence, that segment never contains a 2-integer point. Hence (ii) is true, and (i) is equivalent to (ii). Now, triangle OMT is contained in OMH, thus (i) implies (i ). Conversely, if (i ) is true, when D leaves P it enters some k-cube of which M is a vertex. This means that H is interior to this k-cube, and each point of triangle MTH, not on MT, is in the interior of this k-cube, hence has no integer coordinate, and this proves (i). In dimension 2, these integer prefix points are well-known, and 2-integer points are exactly integer points. Consider the billiard word c , with = (1 , 2 ) and 1 /2 irrational. Let D be the half-line of origin O and slope 2 /1 . Proposition 3.2. Let v be a prefix of c and M the point (m1 , m2 ), with mi := |v|i + 1. The following conditions are equivalent: (1) v is a palindrome. (1 ) v = cM . (2) Triangle OMH contains no integer point except O and M. (3) Distance MH is minimal among all distances M H , where M is an integer point, on the same side of D as M, and such that its orthogonal projection H onto D is between O and H . (4) m2 /m1 is an intermediate or main convergent of 2 /1 . Equivalence of (2)–(4) is in [6], and the equivalence between (1) and (4), i.e. Theorem 2.1, is in [8]. 3.2. Up and down 3.2.1. On prefix integer points Let M = (m1 , m2 , . . . , mk ) be a prefix integer point of = (1 , 2 , . . . , k ), and i = j in {1, . . . , k}. Let Mij ∈ Z2 be the projection (mi , mj ), and ij = (i , j ). Proposition 3.3. M is a prefix integer point of if and only if each Mij is a prefix integer point of ij .
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
341
Proof. Denote also by ij the projection Rk → R2 which sends onto (i , j ). Then ij (D) is the half-line Dij , ij (T ) = Tij , Tij is the point where Dij leaves the rectangle of diagonal OMij . Hence: OMij Tij = ij (OMT). If in triangle OMT there is no other point than O and M, having at least two integer coordinates, the same holds for all triangles OMij Tij , and hence Mij is a prefix integer point of (i , j ). If however there is some 2-integer point in triangle OMT, different from O, M, let i, j corresponding to its integer coordinates. Then, by projection, there is some integer point in triangle OMij Tij , different from O and Mij . 3.2.2. On palindromes Proposition 3.4. Let v be a finite word on A = {a1 , a2 , . . . , ak }. Then v is a palindrome if and only if all its projections ij are palindromes. Proof. This is clearly necessary. Suppose now that v is not palindrome: v = v . Then v = wai w , v = waj w , where w is the longest common prefix of v and v , and thus i = j . Then:
ij (v) = ij (w)ai ij (w ) = ij (w)aj ij (w ) = ij ( v ) = ij (v) which means that ij (v) is not a palindrome.
Theorem 2.3 is an immediate consequence, since for v prefix of c1 ,2 ,...,k , ij (v) is a prefix of ij (c1 ,2 ,...,k ) = ci j . 4. Auxiliary results on continued fractions 4.1. A probabilistic result Let q = qn the denominator of a main convergent of the real number (see [13, Chapter X] or [16] for general results on continued fractions). Then by [13, Theorem 171, p. 140]: − pn 1 . qn qn qn+1 Thus, since qn+1 = an+1 qn + qn−1 , where all these numbers are integers and positive, one has qn
1 qn+1
1 . qn
We denote as usual by x the distance of x to the closest integer. When x varies between 0 and 1, the inequality qx 1/q is satisfied in a union of q + 1 disjoint intervals, whose sum of lengths is 2/q. Taking uniform probability, we deduce that the probability that x
342
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
in [0, 1] has q as denominator of some main convergent is bounded by 2/q. Note that this probability exists since the set of all corresponding x is a finite union of intervals, except rational numbers. Note also that this argument does not work for intermediate convergents, since qx may be bigger than 1/q. Proposition 4.1. Let q be a positive integer 2, and 0 < x < 1. Then the probability Pq that q be a denominator of a main or intermediate convergent of x satisfies: 2 2 Pq √ + √ . q q −1 This result is certainly not optimal, but sufficient for our purposes. Note that Pq exists for the same reason as above. Proof. Let q be the denominator of some intermediate convergent of . Then one has, for denominators qn and qn+1 of two successive main convergents: qn q < qn+1 , q = aqn + qn−1 , 1 a an − 1 where all these numbers are positive integers. Moreover, the intermediate convergent is p apn + pn−1 . = q aqn + qn−1 We have two cases. √ √ √ • If a > q − 1, then q = aqn + qn−1 > ( q − 1)qn and qn < q/( q − 1). Moreover: qn <
1 qn+1
1 q
and so belongs to a set whose probability is 2/q. Since qn may take any integer value √ between 1 and q/( q − 1), the probability that x has q as denominator of main or √ intermediate convergent with a > q − 1 is bounded by 2 2 . 1 √ q 1 i< √ q q −1 q−1 √ • If a q − 1, then
√ q = aqn + qn−1 < (a + 1)qn qqn √ which implies qn > q, and q < qn−1 <
1 1 <√ . qn q
Indeed, the first inequality is a consequence of the theory of continued fractions, since the points (p, q) associated to main and intermediate convergents on the same side of D are closer and closer to D. √ Probability that x satisfies this inequality is bounded by 2/ q. We conclude by summing the two bounds above.
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
343
4.2. Synchronization of convergents Proposition 4.2. For almost all positive real number , the set of positive real numbers , having an infinity of denominators of intermediate or main convergents in common with , has Lebesgue measure 0. Proof. Let f (n) be an increasing function, such that the series ∞ n=1 1/f (n) converges. Then for almost all positive real numbers the sequence of partial quotients an of satisfies an f (n) for large n, see [16]. We only consider these numbers , for f (n) := n2 , hence there exists a positive constant C such that an C n2 . √ Let S := 1/ q where the sum is over all intermediate or main convergents of . Then S
∞ a ∞ n2 n √ < 2C √ n/2 < ∞, 5+1 n=1 qn n=1 2
where (qn ) is the sequence of denominators of main convergents of , and where in S we √ √ have grouped those q with qn q < qn+1 , hence 1/ q 1/ qn ; there are an such q. The last inequality follows from the fact that the denominators of main convergents are minimal for the golden number := [1; 1, 1, . . .], and then equal to the Fibonacci sequence: n √ n √ √ √ √ n 5+1 5+1 5+ 5 5− 5 1− 5 1 + > . Fn = 10 2 10 2 2 2 Thanks to Proposition 4.1, the series Pq converges. Then, the lemma of Borel–Cantelli implies that the set of ∈ [0, 1] having an infinity of these q as denominator of convergent (main or intermediate) is of measure 0. Since denominators of convergents depend only on the fractional part of , so is the set of positive real numbers . 5. Proof of main results 5.1. Existence of arbitrarily long palindromic factors Proposition 5.1. Let the j ’s be Q-linearly independents. The word c1 ,2 ,...,k contains arbitrarily long palindromic factors. More precisely: • it contains arbitrarily long palindromic factors of even length, • it contains for any i arbitrary long palindromic factors of odd length and central letter ai . In the proof below, we use two-sided infinite words, and billiard words. Their definition is straightforward. Proof. Let C := ( 21 , 21 , . . . , 21 ). Consider the line D passing through C and parallel to (1 , 2 , . . . , k ). The associated billiard word c is well-defined, since the quotients i /j
344
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
are irrational for i = j . Moreover, due to the linear independence over Q, c1 ,2 ,...,k has the same factors as c. Observe that C is a center of symmetry for the integer lattice, and after C is for the above line D . Hence the right infinite word defined by the half-line D+ before C. Thus the reversal of the left infinite word defined by the symmetric half-line D− c = v v. Hence for each prefix w of v, w w is a palindrome of even length which is factor of c, hence of c1 ,2 ,...,k . We conclude since w is arbitrary long. For the second assertion, we argue similarly, by replacing C by the point ( 21 , 21 , . . . , 21 , 0, 1 1 2 , . . . , 2 ) with a 0 in ith position. 5.2. Palindromic prefixes: general case Let k 3 and v a palindromic prefix of c1 ,2 ,...,k and M = (m1 , m2 , . . . , mk ) its integer prefix point corresponding to v. Then, by Proposition 3.3, (m2 , m1 ) and (m3 , m1 ) are integer prefix points of (3 , 1 ) and (3 , 1 ). Hence m1 is a denominator of some intermediate or main convergent of 2 /1 and 3 /1 . Hence for almost all (2 , 3 ), and a fortiori for almost all (1 , 2 , . . . , k ), m1 is bounded. This means that the number of occurrences of letter a1 in v is bounded, and thus v is of bounded length, since c1 ,2 ,...,k has infinitely many occurrences of letter a1 . Even, for these (1 , 2 , . . . , k ), there are only finitely many palindromic prefixes. This proves the first part of Theorem 2.4. 5.3. An example Let
√ √ 15 + 5 1 + 5 (, , ) = 1, , . 10 2
The expansion into continued fractions are √ 15 + 5 = = [1; 1, 2, 1, 1, 1, . . .] and 10 √ 1+ 5 = = [1; 1, 1, 1, 1, 1, . . .]. 2 The sequence of denominators of intermediate and main convergents are respectively (1, 1, 2, 3, 4, 7, 11, . . .)
and
(1, 2, 3, 5, 8, 13, . . .)
and all convergents are main convergents, except that corresponding to 2 in the left-hand case. The remaining denominators satisfy the same recursion qn+2 = qn+1 + qn and hence the values that appear cannot be equal: in the first case, denominators are qn = Fn−1 −Fn−5 for n 6, and qn = Fn in the second one. Only (1, 2, 3) are common to both sequences: indeed Fn−1 = Fn − Fn−2 < Fn − Fn−4 < Fn for n 4. The billiard word c,, is c,, = bcabcbcabcbacbcabcbcabcbacbcbacbacbcbacbbc . . .
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
345
whose projections onto {a, b} and {a, c} are c, = babbabbababbabbabbababbabb . . . and c, = caccacaccaccacaccacaccacc . . . . The synchronization principle shows that palindromic prefixes of c,, corresponds to palindromic prefixes of c, and c, having the same occurrence of letters a. Since in dimension 2, palindromic prefixes correspond to denominators of convergents (main or intermediate), with q = 1+ numbers of a’s, only common values of q are possible, and hence the number of a’s can be only 0, 1 or 2. Hence, looking for these prefixes, we find for c, the words b, bab et babbab and for c, the words c, cac et caccac. Moving up to c,, , we find, according to the number of a’s (which is 0, 1 or 2): • for 0, b or bc, • for 1, bcabc, • for 2, bcabcbcabcb. Hence, only b is a palindromic prefix of c,, . 5.4. Palindromic prefixes of arbitrarily long length We need the following result on continued fractions due to Lagrange [13, Theorem 184, p. 153]. Proposition 5.2. Let x be a real number and p/q a rational number such that x − p < 1 . q 2q 2 Then p/q is a main convergent of x. Let n be any fixed integer. We consider any numbers 1 , 2 , . . . , k whose continued fractions expansion are given until the order n. We deduce from it the existence of a positive constant K bigger than any convergents p/q of all these i , and bigger than any ratios p/q i , where p/q is a convergent of i . (i) The corresponding denominators are denoted by qj , 1 i k and 0 j n. The expansion is extended two ranks more: • n + 1 : for any i, (i)
(i)
(i)
qn+1 = an+1 qn(i) + qn−1 , (i)
(i)
(i)
where an+1 is chosen in such a way that qn+1 =: n+1 are distinct prime numbers. This (i)
(i)
is possible since qn and qn−1 are relatively prime, by using Dirichlet’s theorem. We may even assume that (i)
an+1 > A := 4K 2 .
346
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348 (i)
• n + 2 : since the n+1 are distinct prime numbers, we may find by the Chinese remainder theorem an arbitrary big integer Q such that: (i)
Q ≡ qn(i)
(mod n+1 ). (i)
Thus we may find an+2 such that: (i)
(i)
(i)
Q = qn+2 = an+2 qn+1 + qn(i) . This construction is iterated, and leads to numbers 1 , 2 , . . . , k , such that from some rank, they have the synchronization property for all steps of type “n + 2’’ above. For such a rank, the convergent of each i can be written (i)
pn+2 (i)
qn+2
=
P (i) Q (i)
as the denominator qn+2 does not depend on i, and the index n+2 is cancelled for simplicity reason, for a given n. Lemma 1. P (i) /P (j ) is a main convergent of ji if i = j . (i)
Proof. Since the coefficients an+3 are of type “n + 1’’ above, we have the corresponding inequality: (i) pn+2 1 P (i) 1 1 . = i − (i) < (i) (i) < (i) (i)2 < i − 2 Q AQ qn+2 qn+3 qn+2 an+3 qn+2 Hence (Qi − P (i) )P (j ) − (Qj − P (j ) )P (i) i P (i) − (j ) = P Qj P (j ) j P (i) P (j ) + P (i) P (j ) P (i) P (j ) 1 1 i < + − (j ) < j Q Q Qj A P (j )2 P AQ2 j P (j ) 2K 2 1 1 < A P (j )2 2P (j )2 which implies the lemma by Proposition 5.2. <
(1) (2) (k) Hence (P ,(i)P , . . . , P ) is an integer prefix point of (1 , 2 , . . . , k ), and the prefix of length P − k of c1 ,2 ,...,k is a palindrome. These points are in infinite number, since this happens for n + 2, n + 4, n + 6 and so on.
5.4.1. An example Let √ √ √ √ 5−1 5+1 5+3 5+5 , , , , 1 ∈ R5 . = 2 2 2 2
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
347
The denominators of convergents of i /5 are the same, and correspond to the Fibonacci sequence. The i are not independent over Q, but i /j is irrational if i = j . The word w = c1 ,2 ,3 ,4 ,5 is w = dcdbcdedcbdcadbcdedcbdcdedcbadcdbcdedcbdcdab cdedcdbcdedcbdacdbcdedcddcde . . . . The number 2 in the Fibonacci sequence corresponds to the approximations ( 21 , 23 , 25 , 27 , 22 ), hence to the prefix of length 13 = 1 + 3 + 5 + 7 + 2 − 5. This prefix is dcdbcdedcbdca, which is not a palindrome (but almost: replace the last letter a by d, which is the following letter in w). 21 29 8 Number 8 in the Fibonacci sequence corresponds to the approximations ( 58 , 13 8 , 8 , 8 , 8 ), hence to the prefix of length 71 = 5 + 13 + 21 + 29 + 8 − 5. This prefix is dcdbcdedcbdcadbcdedcbdcdedcbadcdbcdedcb dcdabcdedcdbcdedcbdacdbcdedcbdcd. It is a palindrome. This occurs again with 34, which results in a palindromic prefix of length 317, and from there, with periodicity 3 on the Fibonacci sequence. In this way are obtained all palindromic prefixes of w, which are curiously all of odd length, with central letter e, except for the two first d and dcd. Acknowledgements The authors thank Srecko Brlek for a preliminary computing in dimension 3, and Aldo de Luca for an important mail exchange on this subject. References [1] J.-P. Allouche, J. Shallit, Automatic Sequences, Cambridge University Press, Cambridge, 2003. [2] P. Arnoux, C. Mauduit, I. Shiokawa, J.I. Tamura, Complexity of sequences defined by billiard in the cube, Bull. Soc. Math. France 122 (1994) 1–12. [3] Y. Baryshnikov, Complexity of trajectories in rectangular billiards, Comm. Math. Phys. 174 (1995) 43–56. [4] J. Berstel, A. de Luca, Sturmian words, Lyndon words and trees, Theoret. Comput. Sci. 178 (1997) 171–203. [5] J. Berstel, P. Seebold, Sturmian words, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002. [6] J.-P. Borel, F. Laubie, Quelques mots sur la droite projective réelle, J. Théorie Nombres Bordeaux 5 (1993) 23–51. [7] E.B. Christoffel, Observatio arithmetica, Ann. Mat. 6 (1875) 148–152. [8] A. de Luca, Sturmian words: structure, combinatorics, and their arithmetics, Theoret. Comput. Sci. 183 (1997) 45–82. [9] A. de Luca, Combinatorics of standard Sturmian words, Structures in Logic and Computer Science, Lecture Notes in Computer Science, Vol. 1261, 1997, pp. 249–267. [10] X. Droubay, Palindromes in the Fibonacci word, Inform. Process. Lett. 55 (1995) 217–221. [11] X. Droubay, G. Pirillo, Palindromes and Sturmian words, Theoret. Comput. Sci. 223 (1999) 73–85. [12] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, second ed., Addison-Wesley, Reading, MA, 1994.
348
J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348
[13] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, fourth ed., Clarendon Press, Oxford, 1975. [14] G. Pirillo, A new characteristic property of the palindrome prefixes of a standard Sturmian word, Sem. Lotharingien Combin. 43, 1999 (electronic, see http://www.mat.univie.ac.at/∼slc/). [15] N. Pytheas Fogg, in: V. Berthé, et al. (Ed.), Substitutions in Dynamics, Arithmetics and Combinatorics, Lecture Notes in Mathematics, Vol. 1794, 2002. [16] A.M. Rockett, P. Szüsz, Continued Fractions, World Scientific Pub. Co, Singapore, 1992. [17] L. Vuillon, A characterization of Sturmian words by return words, European J. Combin. 22 (2001) 263–275.
Theoretical Computer Science 340 (2005) 349 – 363 www.elsevier.com/locate/tcs
Regular splicing languages and subclasses Paola Bonizzoni∗ , Giancarlo Mauri Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano – Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy
Abstract Recent developments in the theory of finite splicing systems have revealed surprising connections between long-standing notions in the formal language theory and splicing operation. More precisely, the syntactic monoid and Schützenberger constant have strong interaction with the investigation of regular splicing languages. This paper surveys results of structural characterization of classes of regular splicing languages based on the above two notions and discusses basic questions that motivate further investigations in this field. In particular, we improve the result in [6] that provides a structural characterization of reflexive symmetric splicing languages by showing that it can be extended to the class of all reflexive splicing languages: this is the larger class for which a characterization is known. © 2005 Elsevier B.V. All rights reserved. Keywords: Automata; Regular languages; Molecular computing; Splicing systems
1. Introduction The formal model of splicing system has been originally introduced in [13] to investigate the potentiality of a fundamental biological mechanism occurring in nature: restriction enzymes act on double-stranded DNA molecules by cutting them into segments that can be joined in the presence of ligase enzyme. The original definition of splicing system was formulated to describe the biochemical processes involved in molecular cut and paste phenomenon. Later the notion was reformulated by G. Paun at a less restrictive level of ∗ Corresponding author.
E-mail addresses: [email protected] (P. Bonizzoni), [email protected] (G. Mauri). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.035
350
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
generality, giving rise to the model of splicing operation that was then commonly adopted in splicing systems theory, and is nowadays a standard [19]. Since, a splicing system is a formal device to generate languages, called splicing languages, the splicing operation has been deeply investigated in the framework of formal language theory, by establishing a link between biomolecular sciences and language theory [20]. Moreover, this strict connection has contributed to a novel interest for the development of language theory. On the other side, theoretical results in splicing systems theory suggested new ideas in the framework of biomolecular science, for example the design of automated enzymatic processes. In this paper, we focus on the original concept of finite splicing system that is closest to the real biological process: the splicing operation is meant to act by a finite set of rules (modelling enzymes) on a finite set of initial strings (modelling DNA sequences). Under this formal model, a splicing system is a generative mechanism of languages, which turn out to be regular splicing languages. This basic result has been first proved in [9], and later proved in several other papers by using different approaches (see for example [23,17,26]). More precisely, not all regular languages can be generated by splicing and a characterization of the class of regular splicing languages is still unknown. This open question is related to several challenging issues concerning splicing theory and formal language theory that motivate a continuous development of the research in this direction [12,6]. Several progress have been made in the investigation of the generative power of finite splicing systems. For a better understanding of the basic issues in this field, it is necessary to classify splicing systems w.r.t. the splicing operation. In the literature three main splicing operations have been introduced, known as Head, Paun and Pixton operations. Each splicing operation leads to distinct classes of splicing languages (known also as Head, Paun, Pixton splicing languages) generated by splicing systems. Actually, it turns out that the relationship between the different classes of splicing languages can be explained by using the classical notion of Paun splicing operation and viewing the set of splicing rules as inducing a binary relation. A set R of rules consists of ordered pairs of factored words, denoted as ((u1 , u2 )$(u3 , u4 )), called rules, where u1 u2 , u3 u4 are splicing sites. The set R specifies a binary relation between factored sites that can be reflexive, symmetric or a transitive relation as shown in [4]. It turns out that distinct classes of splicing languages are generated by splicing systems where R is a binary relation that obeys different restrictions. For instance, when R is restricted to be reflexive, symmetric and transitive it allows one to characterize splicing languages generated by the original Head splicing operation. On the other hand, Paun splicing languages are generated by splicing systems where the set R of rules are not necessarily symmetric or reflexive. In particular, reflexivity and symmetry are natural properties for splicing systems as originally defined in [15]. More precisely, reflexive and symmetric splicing systems are the most important from a modelling perspective. The first characterization of reflexive symmetric splicing languages has been given in [6] by using the concept of constant, introduced by Schützenberger [24]. Every language L in this class is constructed from a finite set of constants for L, as L is expressed by a finite union of constant languages and split languages, where a split language is a language obtained by a single iteration of a splicing operation over constant languages. In this paper, we discuss this result which is a first significative progress in this research field, as it sheds light on the fundamental concepts in formal language theory that can explain
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
351
the generative power of splicing operation and how they can be used in this framework: these are the concepts of constant and of syntactic congruence. Moreover, we improve the result of structural characterization given in [6], by showing that it generalizes to all reflexive (i.e., not necessarily symmetric) splicing languages: this result is stated in Proposition 4.2. Furthermore, by observing that a reflexive regular splicing language L is characterized by one iteration step of splicing rules applied to constant languages, we prove that a recent characterization of reflexive languages given in [12] can be obtained as a Corollary of Proposition 4.2. Two fundamental questions arise when dealing with splicing systems. • Question A: recognition Give an effective procedure to decide whether a regular language is an X-splicing language (X reflexive, symmetric). • Question B: synthesis Give an effective procedure to construct, given L an X-splicing language, a splicing system S with X-rules such that L = L(S). In the paper, we address these two questions by presenting and analyzing main results related to them appearing in the literature. In particular, question A has been solved for the class of reflexive splicing languages (in [12] a decision procedure for this class has been proposed), and for special subclasses of regular languages. Clearly, the problem is strictly related to the question of providing a structural characterization of splicing languages. A graph-based algorithm that solves this question for null context splicing languages (NCS) is proposed in [7]. Other decision results have been given for larger classes of languages including the class NCS, such as the classes of FCS languages and of marker languages characterized by the notions of constant and of syntactic monoid. Question B has been solved for the class of reflexive symmetric languages in [6]. The paper is organized as follows. In Section 2 basic notions on finite splicing systems are given, Sections 3 and 4 are devoted to Question B, while in Section 5 we discuss results concerning Question A. Finally, in Section 6 we list open problems in this research field.
2. Finite splicing systems In this section, we give the basic notions of finite splicing systems theory and of formal language theory that have been used to investigate subclasses of splicing languages. Let A be a finite alphabet. We denote the empty word over A by 1. Assume that w ∈ A+ , a 2-factor of w is an ordered pair (w1 , w2 ) such that w1 , w2 ∈ A∗ and w = w1 w2 . A rule r consists of a pair of 2-factors (u1 , u2 ) and (u3 , u4 ) and is denoted as ((u1 , u2 )$(u3 , u4 )): each single word u1 u2 and u3 u4 is called splicing site of rule r. A set R of rules specifies a binary relation between 2-factors of sites that can be reflexive, symmetric or even a transitive relation [4]. Precisely, R is reflexive if given ((u1 , u2 )$(u3 , u4 )) ∈ R, then ((u1 , u2 )$(u1 , u2 )) ∈ R and ((u3 , u4 )$(u3 , u4 )) ∈ R, while R is symmetric if given ((u1 , u2 )$(u3 , u4 )) ∈ R, then ((u3 , u4 )$(u1 , u2 )) ∈ R.
352
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
Given x, y ∈ A+ , then rule r = ((u1 , u2 )$(u3 , u4 )) applies to x, y if x has factor the splice site u1 u2 and y has factor the splice site u3 u4 , that is x = x1 u1 u2 x2 and y = y1 u3 u4 y2 . Then the result of a splicing operation of x, y by rule r is the word w = x1 u1 u4 y2 which is said to be generated by splicing of x, y by r. If R is symmetric, since given rule r = ((u1 , u2 )$(u3 , u4 )) ∈ R, also r¯ = ((u3 , u4 )$(u1 , u2 )) ∈ R, then word w¯ = y1 u3 u2 x2 is generated by splicing of x, y by r¯ . Let L ⊆ A∗ . We define the closure of L by R as the set cl(L, R) of all words that are obtained as the result of a splicing operation of a pair of words in L by a rule r ∈ R. A splicing system S consists of a triple S = (A, I, R), where A is the alphabet of the system, R is a set of splicing rules and I ⊆ A∗ an initial language. Given a splicing system S = (A, I, R), the iterated splicing is defined as follows:
0 (I ) = I, and i+1 (I ) = i (I ) ∪ cl(i (I ), R), i ∗ (I ) = (I ).
for i 0,
i 0
In this paper, we deal with finite splicing systems that is splicing systems where I and R are meant to be finite sets: in this case S is called H-system and L(S) = ∗ (I ) is the splicing language generated by S. Thus in the rest of the paper, by a splicing system we mean a finite splicing system and a splicing language is a language generated by a finite splicing system. For convenience, we assume that all rules in R are useful for the language L(S), that is, for each rule r ∈ R, there exist w, x, y ∈ L(S) such that w is generated by splicing of x, y by r. A splicing language L is a reflexive or symmetric splicing language if L = L(S), where S = (A, I, R) and R is reflexive or symmetric, respectively. It must be pointed out that in the literature at least two other notions of splicing rules and splicing operations have been proposed. These are known as Head and Pixton splicing operations, respectively. In [8] it has been shown that splicing systems based on Pixton splicing operation are more powerful than the ones based on the standard (Paun) splicing, and these systems are more powerful than Head splicing systems. A classification of these different notions of splicing may be given by using the standard (Paun) splicing operation adopted also in this paper, simply by requiring that the set R of rules is a specific (symmetric, reflexive or transitive) binary relation over 2-factors, as pointed out partially in [4]. The relationship between symmetric and nonsymmetric splicing languages has been investigated in [25]. The class of splicing languages (called 1-splicing languages and introduced in [20]) includes properly the class of symmetric splicing languages as proved in [25] (these are equivalent to the 2-splicing languages introduced in [20]), indeed, the languages of Lemma 4.3 show the strict inclusion. Remark 2.1. Observe that we use a definition of cl(I, R) based on the 1-splicing operation [20]. This notion is generalized to the case of 2-splicing operation by defining the set cl2 (I, R) = {x1 u1 u4 y2 , y1 u3 u2 x2 : x1 u1 u2 x2 , y1 u3 u4 y2 ∈ I, ((u1 , u2 )$(u3 , u4 )) ∈ R}, while cl(I, R) = {x1 u1 u4 y2 : x1 u1 u2 x2 , y1 u3 u4 y2 ∈ I, ((u1 , u2 )$(u3 , u4 )) ∈ R}.
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
353
Given R a set of rules, let us denote by sym(R) the symmetric closure of set R. Formally, sym(R) = {(s1 $s2 ), (s2 $s1 ) : (s1 $s2 ) ∈ R}. Then it is immediate to verify that cl2 (I, R) = cl(I, sym(R)). Vice versa, given cl(I, R), where R is a set of symmetric rules, then cl(I, R) = cl2 (I, R). In [26], a proof that splicing languages are regular languages is given, thus providing an alternative proof of the known inclusion of splicing languages in the class of regular languages. Actually, this main result in splicing theory has been firstly proved in [9], but there are several other proofs using different approaches (see for example [23,17]). For example, in [17], an algorithm has been given to construct a finite state automaton that recognizes the language generated by a splicing system (A, L, R) that is not necessarily finite, as L is a regular language and R is a finite set. Clearly, this result proves the existence of a finite state automaton that recognizes a splicing language generated by a finite splicing systems, i.e., in the special case L is finite. A fundamental property introduced in several papers [6,5,11,12] relating rules to a language L and used to build splicing systems that generate a language is the closure of L under a set R of rules, stated below. Definition 2.1. A language L is closed under a set R of rules iff cl(L, R) ⊆ L. We conclude the section by giving the basic notions of formal language theory used in the paper: these are the notions of a constant and syntactic monoid. In this paper, when dealing with a finite state automaton A = (A, Q, , q0 , F ) recognizing a regular language L, we assume that A is trim, that is each state is accessible and coaccessible, and is the minimal automaton of L (see [21] for basic notions). Then is the transition function of the deterministic automaton A, q0 is the initial state, F the set of final states [2,16]. Given L a regular language, the reduced graph GA (L), denotes the transition diagram for the minimal automaton A recognizing L. A path in the reduced graph GA (L) is a finite sequence = (q, a1 , q1 )(q1 , a2 , q2 ) . . . (qn−1 , an , qn ) where for each i = 1, . . . , n − 1, then (qi , ai+1 ) = qi+1 and (q, a1 ) = q1 . An abbreviated notation for a path is = (q, a1 a2 · · · an , qn ) and a1 a2 . . . an is called the label of . A path = (q, x, qn ) with x ∈ A+ , is a closed path iff q = qn . Moreover, we say that q, q1 , . . . , qn are the states crossed by the path (q, a1 · · · an , qn ) and, for each i ∈ {1, . . . , n − 1}, qi is an internal state crossed by the same path. Given w ∈ A+ and q ∈ Q, then Qw denotes the set {q ∈ Q : (q, w) = q , q ∈ Q}. Given m ∈ A+ , we define the left context and right context of m w.r.t. L, the language Cont L (L, m) = {x ∈ A∗ : xmy ∈ L} and Cont R (L, m) = {y ∈ A∗ : xmy ∈ L}, respectively. Moreover, given q ∈ Qm , then Cont R,q (L, m) = {y ∈ A∗ : (q, my) ∈ F }. A word w ∈ A+ is a constant for a regular language L iff given xwy ∈ L and zwv ∈ L, then xwv, zwy ∈ L, i.e., Cont L (L, w)wCont R (L, w) ⊆ L. The notion of constant has been introduced by Schützenberger [24]. A word w ∈ A+ is singular iff |Qw | = 1. A characterization of constants of a regular language L in terms of the reduced graph GA (L) is given in Proposition 2.1. This result is more or less a folklore and its proof can be found in [6].
354
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
Proposition 2.1. Let L ⊆ A∗ be a regular language and let GA (L) be the reduced graph for L. A word m ∈ A∗ is a constant for L if and only if there exists qm ∈ Q such that for each path in GA (L) which has m as a label, there exists q ∈ Qm such that = (q, m, qm ). The syntactic congruence plays a central role in the development of regular language theory [21,22]. The syntactic congruence ≡L w.r.t. a language L is a binary relation over words: w ≡L z iff for all x, y ∈ A∗ , xwy ∈ L if and only if xzy ∈ L. The quotient A∗ w.r.t. the congruence ≡L is the syntactic monoid of L, M(L). In the paper, the equivalence class of word x is denoted as [x].
3. Classes of splicing languages The notion of constant appears to be crucial in characterizing the computational power of splicing systems. Indeed, the structure of reflexive languages, as well as of other special classes of splicing languages below the regular ones, is defined in terms of constants, as proved in recent papers [6,12]. 3.1. Constant and splicing languages The first characterization of classes of splicing languages in terms of the concept of constant is given in the seminal work on splicing operation [13]: the class of null context splicing languages (NCS, in short) is equal to the one of strictly locally testable languages. This result is based on a characterization of strictly locally testable languages (SLT) by means of the concept of constant given by De Luca and Restivo in [10]. In [10], SLT are characterized as those languages for which there exists a positive integer k such that every string in A∗ of length k is a constant. Let us recall that null context splicing languages are those languages generated by a system S = (A, I, R) where each rule r ∈ R is of the form ((x, 1)$(x, 1)) or ((1, x)$(1, x)), for x ∈ A+ . A crucial notion in finite splicing theory that has been firstly introduced in [14] is that of constant language. Definition 3.1 (constant language for m). Let L be a regular language and m be a word in A∗ that is a constant for L. A constant language in L for m is the language L(m, L) ⊆ L such that L(m, L) = Cont L (L, m)mCont R (L, m). A language L is simply a constant language if L(m, L) = L. In the paper, for simplicity we use the notation L(m) for denoting a constant language L(m, L) in L. Null context splicing languages are properly included in a larger class of languages investigated in [14] that we call finitely constant generated splicing languages, or simply FCS languages. These languages are the splicing languages generated by systems with onesided rules that are reflexive, which generalize the rules of NCS languages: one-sided rules are rules of the form (1, v)$(1, u) or (v, 1)$(u, 1), for u, v ∈ A∗ .
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
355
The language L = b+ ab+ is an example of FCS language that is not a NCS language, as indeed L is not strictly locally testable. Moreover, note that a NCS language is not necessarily a constant language, as it holds in the case of language L = a ∗ ∪ b∗ , as L is an NCS language that is union of two constant languages over two distinct symbols of the alphabet. As for NCS languages, a nice characterization of FCS languages is given in terms of constants in [14]: a language L is a FCS language if it is a finite union of a finite set with a finite set of constant languages in L for a set M of constants of L (these languages are called finitely constant based languages in [12]). This result is stated in the following theorem. Theorem 3.1 (FCS languages (Head [14])). Let L ⊂ A∗ be a regular splicing language. Then the following are equivalent. (1) L is generated by a splicing system S = (A, I, R), where each rule r ∈ R is one-sided and Ris reflexive. (2) L = m∈M L(m) ∪ X, where X is a finite subset of A∗ , L(m) is a constant language in L for m ∈ M ⊆ A∗ and M is a finite set of constants for L. 3.2. Syntactic monoid and splicing languages The notion of syntactic monoid has been used in [3] and in [5] in order to characterize new classes of regular languages generated by splicing. An example of how the use of the syntactic monoid may provide new insight in the investigation of splicing languages is obtained by naturally extending the notion of a constant language introducing congruence classes in place of constants. Precisely, in [3] it has been proved that regular languages of the form L = L1 [x]1 L2 , where L1 and L2 are regular languages and [x]1 is a marker that is defined by means of a syntactic congruence class [x] of M(L) are splicing languages, called marker languages. More precisely, a marker [x]1 for the congruence class [x] is defines as [x]1 = [x], if [x] is finite, otherwise [x]1 = [x] ∪ {1} where x is a label of a closed path that is singular. Marker languages form a class of regular splicing languages which is not comparable to the class of FCS languages [5]. More precisely, there are regular languages that are marker languages of the form L1 [x]1 L2 and are not in the class FCS, even though they are generated by splicing, as shown in the following example. Example 3.1. The regular language L = L1 (ab+ a)∗ L2 = b+ a 2 da(ab+ a)∗ ada 2 b+ , with L1 = b+ a 2 da and L2 = ada 2 b+ is a marker language which is not in the class FCS. First observe that (ab+ a)+ is a syntactic congruence class of language L, and thus [(aba)] ∪ {1} is a marker as aba is the singular label of a closed path. The language L is not in the class FCS, as it is an infinite union of constant languages as proved below. Let us first show that every factor of language L1 ab+ = b+ aL2 is not a constant. Indeed, assume that z is a factor of w ∈ L1 ab+ , that is w = w1 zw2 . As w ∈ b+ aL2 , it follows that there exists a word y ∈ L such that y = ww, as L ⊇ L1 ab+ b+ aL2 . Given y = w1 zw2 w1 zw2 , if z is a constant, by definition of a constant it holds that w1 zw2 ∈ L, that is there exists bi a 2 da 2 bj ∈ L, for i, j > 0. This fact leads to a contradiction as each word in L must
356
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
contain two d symbols of the alphabet. Consequently a constant of L must be a factor of L, but not of L1 ab+ and of b+ aL2 . Thus, each constant z of L must be of the form z1 abi az2 , with i > 0, for z1 , z2 ∈ A∗ . Indeed, each factor abi a is a constant by Proposition 2.1 as it is a singular word and thus every word having abi a as a factor is also a constant, by a known property proved in [6] and in [10]. But, for each i > 0, there exists an infinite set of words in L that do not have z has a factor, thus implying that there exists no finite set M of constants of L such that L = m∈M L(m) ∪ X, where X is a finite subset of A∗ .
4. Reflexive symmetric splicing languages In this section, we illustrate the characterization of reflexive symmetric splicing languages given in [6] and show that this result extends to all reflexive splicing languages. Moreover, we relate this result to a decision algorithm proposed in [12] for reflexive splicing languages. Again, the notion of constant is fundamental in giving a structural description of regular splicing languages. Indeed, given L a reflexive symmetric splicing language, then L is characterized in terms of a finite set M of constants for language L. More precisely, L is defined in finite terms as a finite union of languages obtained by one single iteration of a splicing operation. The first intermediate significative result relating splicing languages to constants has been proved for symmetric and reflexive languages in [6]: it states that splicing sites of rules of a symmetric and reflexive splicing language L are constants for the language. Actually, we can improve this result by showing in Proposition 4.1 that reflexivity is a necessary and sufficient condition for a splicing language to satisfy the above stated property (an independent proof of this Lemma is given in [12]). Lemma 4.1. Let L be a language and r = ((u1 , u2 )$(u1 , u2 )) a rule. Then L is closed under rule r iff u1 u2 is a constant for L. Proof. Let L be closed under rule r. Let w1 = xu1 u2 y ∈ L and w2 = zu1 u2 v ∈ L. Since r is applied to w1 , w2 in different order, then it is immediate that xu1 u2 v ∈ L and zu1 u2 y ∈ L. Consequently, u1 u2 is a constant for L. Vice versa, assume that u1 u2 is a constant for L. By definition of constant, given xu1 u2 y ∈ L and zu1 u2 v ∈ L, then xu1 u2 v ∈ L and zu1 u2 y ∈ L, thus implying that L is closed under r. Thus we state the first characterization theorem for reflexive splicing languages. Proposition 4.1. Let L be a regular language. Then L is a reflexive splicing language iff there exists a splicing system S = (A, I, R) such that L(S) = L and the sites of rules in R are constants for L. Proof. If L is reflexive, then there exists a system S = (A, I, R), where R is a reflexive set of rules and L = L(S). Then for each pair sij = (ui , uj ) such that (sij $s) or (s$sij ) is a rule in R, there exists the rule (sij $sij ) in R. By Lemma 4.1, ui uj is a constant for L. Vice
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
357
versa, if each site ui uj of a rule r is a constant, rule (sij $sij ), with sij = (ui , uj ) may be added to R as by Lemma 4.1, L is closed under rule r. This fact implies that there exists a splicing system S = (A, I, R ), with a reflexive set of rules R such that L = L(S) and thus L is reflexive. Given M a finite set of constants for language L, we define the set F (M) of 2-factors of words in M (a 2-factor in F (M) is named split of a constant in [6]): F (M) = {((mi1 , mi2 ) : mi1 mi2 ∈ M}. A binary relation over F (M) induces a set RM of rules, precisely, RM ⊆ {(s1 $s2 ) : s1 , s2 ∈ F (M)}: let us call RM a set of constant-based rules over M. A splicing operation is defined for a pair of constant languages L(mi ), L(mj ) by a rule r ∈ RM if each of the constants mi and mj is a distinct site of rule r. Formally, given r = ((u1 , u2 )$(u3 , u4 )), such that u1 u2 = mi , u3 u4 = mj , and L(mi ) = Li1 u1 u2 Li2 , L(mj ) = Lj 1 u3 u4 Lj 2 , then the result of a splicing operation of L(mi ), L(mj ) by r is the language Li1 u1 u4 Lj 2 denoted as SPLIT(L(mi ), L(mj ), r) and called split language. Clearly, a split language is obtained as cl(Li ∪ Lj , r) (see Section 2). Remark 4.1. In [6], the notion of a split language is introduced by using the 2-splicing operation. More precisely, the split language of L(mi ) and L(mj ) by a rule rij consists of cl2 (L(mi )∪L(mj ), rij ). But, by Remark 2.1, it is immediate that cl2 (L(mi )∪L(mj ), rij ) = cl(L(mi ) ∪ L(mj ), sym({rij })). By the above remark, the characterization theorem for reflexive symmetric splicing languages in [6] can be also stated as follows: Theorem 4.1. Let L be a regular language. Then L is a reflexive symmetric splicing language iff there exists a finite set X ⊂ A∗ , a finite set of constants M for L, a set RM of constant based rules over M such that is symmetric and L=
mi ∈M
L(mi )
rij ∈RM
SPLIT (L(mi ), L(mj )), rij ) ∪ X.
In [6], Theorem 4.1 is proved under the additional hypothesis that X is a finite set of words such that no factor of a word in X is a constant m ∈ M. Given a rule rij ∈ RM , it holds that the language L of all words in L that have the splice site m of rij as a factor is uniquely specified by the expression Cont L (L, m)mCont R (L, m), i.e., L = L(m). Based on this observation, the finite union of split languages can be denoted by the closure of union of constant languages under rules in RM . Lemma 4.2. Let RM be a set of constant-based rules over M. Then, it holds that SPLIT (L(mi ), L(mj )), rij ) = cl L(mi ), RM . rij ∈RM
mi ∈M
358
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
Proof. Clearly, rij ∈RM SPLIT(L(mi ), L(mj )), rij ) ⊆ cl( mi ∈M L(mi ), RM ). Now, given rij ∈ RM and x, y ∈ mi ∈M L(mi ) such that rij applies to x, y, it holds that x ∈ L(mi ) and y ∈ L(mj ), where mi , mj are the two splicing sites of rij . Consequently, it holds that cl( mi ∈M L(mi ), rij ) ⊆ SPLIT (L(mi ), L(mj ), rij ), which concludes the proof of the Lemma. By using Proposition 4.1, in the following we show that Theorem 4.1 can be generalized to all reflexive (symmetric or nonsymmetric) splicing languages. Proposition 4.2. Let L be a regular language. The following are equivalent: (1) L is a reflexive (symmetric) splicing language. (2) There exists a finite set X ⊂ A∗ , a finite set of constants M for L, a set RM of (symmetric) constant-based rules over M and L= (1) L(mi ) ∪ cl L(mi ), RM ∪ X. mi ∈M
mi ∈M
Proof. The proof of the implication (2) → (1) is obtained by showing that the proof of the same implication given for Theorem 4.1 in [6] holds in general, without assuming that RM is necessarily symmetric. Thus let us now show that (1) → (2) holds. If (1) holds then there exists a splicing system S = (A, I, R) such that L = L(S) and R is reflexive. By Proposition 4.1 the set M of sites of rules in R is a finite set of constants. Thus RM = R is a set of constant base rules over M, and L is closed under RM . By this fact it holds that L ⊇ L = mi ∈M L(mi )∪cl( mi ∈M L(mi ), RM ). Thus L ∪I ⊆ L, where I is the initial language of S. Let us now show by induction on the length of a word w ∈ L that L ⊆ L ∪I . Clearly, if w ∈ I , then w ∈ L. Thus assume that w ∈ L, w ∈ I . Then, w ∈ cl(x ∪ y, r), for r ∈ R. By induction x, y ∈ L ∪ I and consequently w ∈ cl( mi ∈M L(mi ), RM ), thus proving that w ∈ L . Example 4.1. The regular language L = a + ba + ba + ∪a + ca + ba + is a reflexive symmetric splicing language. Indeed, given the set M = {c, bab} of constants for L and the constant languages L1 = a + m1 a + and L2 = a + m2 a + ba + , where m1 = bab, m2 = c, then L = L1 ∪ L2 ∪ Split(L1 ∪ L2 , r), where r = ((b, ab)$(ac, 1)) ∈ RM . The following remark has been stated in [6]. Remark 4.2. Given L a regular language, a constant language L(m) is a special case of split language, as indeed L(m) = SPLIT (L(m), L(m), r), where r = ((m, 1)$(m, 1)) is a constant-based rule. Then, we obtain as a Corollary of Theorem 4.1 the following characterization of reflexive splicing languages, proved in [12]. Corollary 1. Let L be a regular language. Then the following are equivalent: (1) L is a reflexive (symmetric) splicing language.
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
359
(2) There exists a set R of reflexive (symmetric) rules such that L is closed under R and L = cl(L, R) ∪ X, for X ⊂ A∗ a finite set. Proof. By using Remark 4.2, Lemmas 4.2 and 4.1, we can show that statement (2) is equivalent to statement (2) of Proposition 4.2. Consequently, by a direct application of Proposition 4.2, the equality of the two statements holds. There exist splicing languages that are reflexive and not symmetric as stated in Lemma 4.3. Indeed, by applying a Theorem stated in [25], we can show that L1 = a ∗ + a ∗ ba ∗ and L2 = a ∗ ∪ da ∗ ∪ a ∗ c are not symmetric languages, while Lemma 4.3 shows that these languages are reflexive. Lemma 4.3. Languages L1 = a ∗ +a ∗ ba ∗ and L2 = a ∗ ∪da ∗ ∪a ∗ c are splicing languages that are reflexive and not symmetric. Proof. The language L1 can be expressed as L(b) ∪ cl(L(b), R), where R = {r}, r = ((1, b)$(b, 1)), L(b) = a ∗ ba ∗ is a constant language. Similarly, the language L2 = L(d) ∪ L(c) ∪ cl(L(c) ∪ L(d), r), where L(d) = da ∗ and L(c) = a ∗ c and r = ((1, c), (d, 1)). Then, by Proposition 4.2, L1 is a reflexive splicing language. The existence of nonreflexive splicing languages has been proved in [12], indeed, as shown in [12], a ∗ b∗ a ∗ b∗ a ∗ ∪ a ∗ b∗ a ∗ is an example of a symmetric, nonreflexive splicing language, while b∗ a ∗ b∗ a ∗ ∪ a ∗ b∗ a ∗ ∪ a ∗ provides an example of a splicing language that is neither symmetric nor reflexive. Language a ∗ b∗ a ∗ is an example of reflexive splicing language that is not in FCS, as shown in [12].
5. Decision algorithms for subclasses of regular splicing languages A characterization theorem that extends the result for reflexive languages to all regular splicing languages is still unknown. Indeed, a procedure to decide whether a regular language is a splicing language is still unknown. On the other end, we still do not know how to use the characterization of Theorem 4.1 to obtain a procedure to decide whether a regular language is a reflexive splicing languages. Indeed, this question is a generalization of the problem posed in [14]: find a decision procedure for the class of FCS languages. However partial results have been achieved in [12], where it is proved that we can decide whether a regular language is reflexive. The design of algorithms to solve decision problems for regular splicing languages and subclasses of splicing languages is a topic that is still unexplored. In the following we list basic results that have been achieved in different papers and are strongly related to the solution of the above-mentioned questions. These results are stated below and then detailed by the Lemmas and Remarks that follow. • A decision procedure to establish when a language L is closed w.r.t. to a given set R of rules (see Lemmas 5.1, 5.2 and Remark 5.1).
360
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
• A standard procedure for the construction of an initial language and basic rules to generate constant generated splicing languages or marker splicing languages, given the reduced graph for the language [6,5,11]. • A characterization of splice sites in terms of the syntactic congruence (see Lemma 5.3). The following Lemma has been proved in [6] for symmetric splicing systems, but it can be easily extended to the general case. Lemma 5.1. Let S = (A, I, R) be a splicing system, let L ⊆ A∗ be a regular language and let A be the automaton recognizing L. Then L = L(A) is closed with respect to a rule r = ((u1 , u2 )$(u3 , u4 )) ∈ R if and only if for each pair (p, q) ∈ Qu1 u2 × Qu3 u4 , we have Cont R,p (L, u1 u2 ) ⊆ Cont R,q (L, u3 u2 ).
(2)
More precisely, the following result is used to prove containment relation between languages. Lemma 5.2 (Bonizzoni et al. [3]). Let S = (A, I, R) be a splicing system, let L ⊆ A∗ be a regular language such that I ⊆ L. If L is closed under R, then L ⊆ L(S). Remark 5.1. There is an effective procedure to decide whether a language L is closed under a set R of rules, given A the automaton for L. Indeed, given w ∈ A+ , and p ∈ Qw , then Cont R,p (L, w) is a regular language (see definition in Section 2). Remark 5.2. There is an effective procedure to build the splicing system that generates a reflexive splicing language (see [5] and [6] for a simpler construction). The following property relates splice sites w.r.t. the syntactic congruence and has been proved in several papers [5,12]. Lemma 5.3. Let L ⊆ A∗ be a regular language that is closed under rule r = ((u1 , u2 ), (u3 , u4 )). Then L is closed under each rule r¯ = ((u1 , u2 ), (u3 , u4 )), where ui ∈ [ui ], for i ∈ {1, 2, 3, 4}. 5.1. Decision algorithms for reflexive and symmetric splicing languages The characterization Theorem 4.1 (and Proposition 4.2) leads to an effective algorithm to decide whether a regular language L is a reflexive symmetric splicing language, whenever a bound on the size of each rule in R can be given. Assume that given L, such a bound is specified by the value Bound(L). Thus the set of rules generating L consists of the larger set of constant-based rules RM over set M such that L is closed under RM , where M is the finite set of all constants of L of length n Bound(L): by Remark 5.1 such a set has an effective construction algorithm. Since, given two regular languages X, Y , it is decidable whether X = Y , then equation 1 of Theorem 4.1 can be tested by classical algorithms, thus it is immediate to obtain the required decision procedure. Actually, this algorithmic approach has been proposed in [12] to find a procedural application of Corollary 1. Such a procedure is based on an upper bound for Bound(L) in terms of the size of the syntactic monoid for L.
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
361
5.2. Decision algorithms for NCS and marker languages A decision algorithm for marker language, based on properties of markers, is given in [5]. An almost unexplored approach to the development of decision algorithms for the classes of regular splicing languages discussed in the previous sections is based on properties of the reduced graphs recognizing such languages. An example in this direction is given in [7], where a characterization of NCS languages in terms of a property of the reduced graph automaton recognizing such languages is proposed. More precisely, in [7], using the algorithmic approach proposed in [18] to recognize locally testable languages, the graph properties that relate SLT to their reduced graphs are investigated and a graph-based algorithm to recognize SLT languages and other subclasses of regular languages is given. Recently, we discovered that similar results have been achieved independently in [1] in a different framework. 6. Conclusions and open problems Finite splicing systems theory has revealed that there are extensive interactions between the notion of splicing operation and two classical tools in formal language theory: the constant and the syntactic congruence. Even though many theorists have moved their attention towards new models for molecular computation, we believe that the finite splicing systems theory still hides promising developments, mainly from the point of view of formal language theory as well as concerning the original motivation of finding procedures for building simple models to describe enzymatic processes. In this paper, we have discussed the most significative progress in this theory made to understand the structure of regular splicing languages. We improve the result given in [6], by showing that the larger class of regular languages that has a structural characterization is the one of reflexive splicing language. It remains a challenging open question to drop the reflexivity assumption. In this paper, we also discuss the most recent progress made towards the solution of two fundamental questions in this theory: the development of decision algorithms for classes of regular splicing languages, and the synthesis of splicing systems for such languages. In this direction, some basic questions are still open and we believe that it will be fruitful for the formal language theory of splicing systems to look for their solution. Below, we just list some intriguing open questions. • Question 1: Is there a nice characterization of reflexive splicing languages in terms of classes of the syntactic monoids, as for marker languages [3] or in terms of reduced graphs properties as for NCS languages? • Question 2: Find a characterization of the finite set of constants that are used in Theorem 4.1. • Question 3: Investigate boolean closure properties of reflexive and nonreflexive splicing languages. We conclude this list by pointing out an intriguing conjecture proposed in [12] and mentioned in [11]. Conjecture 1. A splicing language must have constants.
362
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
Acknowledgements The authors would like to thank C. De Felice and R. Zizza for long discussions on the topics covered in the paper. This work is partially supported by MIUR Project “Linguaggi Formali e Automi: teoria ed applicazioni”, by the contribution of EU Commission under the Fifth Framework Programme project MolCoNet IST-2001-32008.
References [1] M.P. Béal, Codage Symbolique, Masson, 1993. [2] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [3] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Decision Problems on Linear and Circular Splicing, in: M. Ito, M. Toyama (Eds.), Proc. DLT 2002, Lecture Notes in Computer Science, Vol. 2450, Springer, Berlin, 2003, pp. 78–92. [4] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Regular Languages Generated by Reflexive Finite Linear Splicing Systems, in: Z. Esik, Z. Fülöp (Eds.), Proc. DLT 2003, Lecture Notes in Computer Science, Vol. 2710, Springer, Berlin, 2003, pp. 134–145. [5] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Linear splicing and syntatic monoid, 2004, submitted for publication. [6] P. Bonizzoni, C. De Felice, R. Zizza, The structure of reflexive regular splicing languages via Schützenberger constants, Theoret. Comput. Sci. 334 (1–3) (2005) 71–98. [7] P. Bonizzoni, C. Ferretti, G. Mauri, Splicing Systems with Marked Rules, Romanian J. Inform. Sci. Technol. 1 (4) (1998) 295–306. [8] P. Bonizzoni, C. Ferretti, G. Mauri, R. Zizza, Separating some splicing models, Inform. Process. Lett. 79 (6) (2001) 255–259. [9] K. Culik, T. Harju, Splicing semigroups of dominoes and DNA, Discrete Appl. Math. 31 (1991) 261– 277. [10] A. De Luca, A. Restivo, A characterization of strictly locally testable languages and its application to semigroups of free semigroup, Inform. Control 44 (1980) 300–319. [11] E. Goode, Constants and splicing systems, Ph.D. Thesis, Binghamton University, 1999. [12] E. Goode, D. Pixton, Recognizing Splicing Languages: Syntactic Monoids and Simultaneous Pumping, 2004, submitted for publication. (Available from http://www.math.binghamton.edu/dennis/Papers/index.html.) [13] T. Head, Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviours, Bull. Math. Biol. 49 (1987) 737–759. [14] T. Head, Splicing languages generated with one sided context, in: Gh. Paun (Ed.), Computing with Biomolecules, Theory and Experiments, Springer, Singapore, 1998. [15] T. Head, Gh. Paun, D. Pixton, Language theory and molecular genetics: generative mechanisms suggested by DNA recombination, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Springer, Berlin, 1996, pp. 295–360. [16] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, MA, 2001. [17] S.M. Kim, Computational modeling for genetic splicing systems, SIAM J. Comput. 26 (1997) 1284–1309. [18] S.M. Kim, R. McNaughton, Computing the order of a locally testable automaton, SIAM J. Comput. 23 (1994) 1193–1215. [19] G. Paun, On the splicing operation, Discrete Appl. Math. 70 (1996) 57–79. [20] G. Paun, G. Rozenberg, A. Salomaa, DNA computing, New Computing Paradigms, Springer, Berlin, 1998. [21] D. Perrin, Finite Automata, in: J. Van Leeuwen (Ed.), Handbook of Theoretical Computer Science, Vol. B, Elsevier, Amsterdam, 1990, pp. 1–57. [22] J.-E. Pin, Variétés de langages formels, Masson, Paris, 1984; English translation: Varieties of formal languages, Plenum, New York, 1986.
P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363
363
[23] D. Pixton, Regularity of splicing languages, Discrete Appl. Math. 69 (1996) 101–124. [24] M.P. Schützenberger, Sur certaines opérations de fermeture dans le langages rationnels, Symposia Math. 15 (1975) 245–253. [25] S. Verlan, R. Zizza. 1-splicing vs. 2-splicing: separating results, Proc. Words03, Turku, Finland, 2003, pp. 320–331. [26] S. Verlan, Head systems and applications to bio-informatics, Ph.D. Thesis, University of Metz, 2004.
Theoretical Computer Science 340 (2005) 364 – 380 www.elsevier.com/locate/tcs
Collage of two-dimensional words Christian Choffrut∗ , Berke Durak Université Paris VII, L.I.A.F.A., 2 Place Jussieu, 75221 Paris, France
Abstract We consider a new operation on one-dimensional (resp. two-dimensional) word languages, obtained by piling up, one on top of the other, words of a given recognizable language (resp. two-dimensional recognizable language) on a previously empty one-dimensional (resp. two-dimensional) array. The resulting language is the set of words “seen from above”: a position in the array is labeled by the topmost letter. We show that in the one-dimensional case, the language is always recognizable. This is no longer true in the two-dimensional case which is shown by a counter-example, and we investigate in which particular cases the result may still hold. © 2005 Published by Elsevier B.V. Keywords: Regular languages; Picture languages
1. Introduction The present paper deals with the notion of recognizable collection of pictures, a picture being a matrix whose entries (pixels) are taken in a finite alphabet (colors). The reader unfamiliar with the formal definition might find it suggestive to think of the set of chessboards of arbitrary dimension or of the set of squares with, say, their north-west to south-east diagonal marked with some particular color, as typical examples. Assume we are given a collection of strips of wallpapers of different textures in such a way that it forms a recognizable collection. Assume further that starting from an empty frame we can paste these strips one at a time, in any arbitrary way, with possible overlapping but without rotation. At each position, the visible pixel is that belonging to the last pasted ∗ Corresponding author.
E-mail addresses: [email protected] (C. Choffrut), [email protected] (B. Durak) URLs: http://www.liafa.jussieu.fr/∼cc (C. Choffrut), http://www.liafa.jussieu.fr/∼durak. 0304-3975/$ - see front matter © 2005 Published by Elsevier B.V. doi:10.1016/j.tcs.2005.03.034
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
365
strip. This is reminiscent of the so-called painter’s algorithm achieving face elimination in computer graphics where the objects nearest to the observer are painted last. Our result says that if we start with a recognizable collection of strips reduced to one column (resp. to one row), then all possible collages form again a recognizable collection. This property is obtained by studying the particular case of one-dimensional pictures, i.e., words, and by extending to two-dimensional pictures via row- (or column-) Kleene concatenation. Furthermore, we show that this closure property no longer holds when this hypothesis fails; using a counting argument, we show that there exists a finite language consisting of two strips whose collage is not recognizable. There exist general simple conditions guaranteeing recognizability of the collage in terms of the parameters of the collage, such as the maximum number of levels of strips. In the case where the alphabet is unary, yielding thus binary pictures with a color for the background and a color for the foreground, the collage is recognizable whatever the collection of strips (it may even be non-recursive). As far as we know, the operation of collage as we mean it here is new. In [7, Proposition 5.1.], the author considers the operation consisting of tiling a picture with nonoverlapping strips and shows a closure property for recognizable pictures. Concerning one-dimensional pictures, the notion of quasiperiodicity, which is remotely connected to our notion of collage, was introduced in [1]. In our terminology it is a collage of a unidimensional picture with a unique strip as explained above and where the overlapping occurrences of the strip are obliged to match. A final word of caution though: the term collage was coined in [5] as a means of defining pictures via recursive geometric functions in the spirit of fractals [2]. We use it here in a different meaning which we think appropriate for its kinship with the art movement in painting of the first decades of the 20th century.
2. The unidimensional case Given a finite alphabet , we denote by ∗ the free monoid of words or strings over , and by the empty string. The product or concatenate of two words u and v is simply denoted by uv. For a string w ∈ ∗ , we denote by |w| its length and by w[i] the ith symbol of w, i = 1, . . . , |w|. A string z ∈ ∗ is a subword or factor of w if there exist two strings u, v ∈ ∗ such that w = uzv and we write z = w[i . . . j ] where |u| = i − 1 and |uz| = j . If t ∈ ∗ has the same length as z, the substitution of t for z in w results in the word utv r which we write w → utv. We say that u is placed at position i. The notations → for the ∗ rth iterate and → for the reflexive and transitive closure of → are used with their standard meaning. Given a subset W ⊆ ∗ of patches, the operation of collage consists of producing words in ( ∪ {})∗ ( is a new symbol not in ) by starting with a word of the form n and then repeatedly replacing random factors of the current word with elements of W . A word thus obtained is called a collage of W . Formally C0 (W ) = ∗ and for all k 0 Ck+1 (W ) = {w | ∃w ∈ Ck (W ), w → w }. The set of collages of W is the union Collage(W ) = k 0 Ck (W ). We say position 0 < j n of w ∈ Ck (W ) is covered by an occurrence u ∈ W whenever there exists an
366
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
integer < k and two words w and w such that
k−−1
n → w → w −−−−→ w holds where for some w1 , w2 , w3 ∈ ∗ we have w = w1 w2 w3 , w = w1 uw3 and |w1 | < j |w1 | + |u|. An occurrence u of W placed on the interval 1, . . . , n is obscured by some occurrence v placed at some later time, whenever the subintervals corresponding to the two occurrences intersect. Example 1. Consider n = 11 and assume the words aba, bbbbc, and abaabcab belong to the subset W and are placed respectively at the positions 2, 4, 10 and 1 in that order. The resulting word is at the top of Table 1. Position 4 is covered by the occurrences aba, bbbbc, and abaabcab. Position 9 is covered by no occurrence and position 2 is covered by the occurrences aba and abaabcab. Said differently, the collage is the word obtained when reading “from above” the rectangular array. It is convenient to define the structure obtained by packing the words aba, ca, bbbbc and abaabcab. This is achieved by removing all spaces between vertically aligned letters. In the previous example, each occurrence of a letter of a word of W would “fall” in its slot as long as it does not hit another letter or the floor of the structure (the indices in the following examples are just meant for clarifying further explanations and refer to the ordering of the word to which each letter belongs while processing the collage). This leads to a sequence of columns of varying height (possibly height 0), as shown in Table 2. The next two definitions provide a more formal approach (it might prove useful to have the previous example in mind). Definition 1. Let W be a collection of words, let n be an integer and let P be a finite sequence of pairs (x, u) ∈ N × W called a stack. The pile PilenP defined by these data is an array of n words in ∗ which is defined by induction on the length of the sequence P as follows. Table 1 a a
1
b b a 2
a a
a a
b b
c c
a a
b b
b 3
b a 4
b
b
b
c
5
6
7
8
c4 b2
a4 b2
c
a
c
a
9
10
11
b4 c2
c3
a3
Table 2 The pile resulting from Example 1
a4
b4 a1
a4 b1
a4 b2 a1
b4 b2
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
367
If P has length 0 then PilenP is an array of n empty words. Otherwise let P be the sequence P deprived of its last pair (x, u). Let be the length of u. For all 1 i n we define PilenP [i]u[i − x + 1] if x i x + − 1, PilenP [i] = PilenP [i] otherwise. The height of the pile is the maximum of the lengths of its entries. In the running example, the pile is the array with 11 elements and its height is 3 (a4 , a1 b4 ,b1 a4 , a1 b2 a4 , b2 b4 , b2 c4 , b2 a4 , c2 b4 , , c3 , a3 ). Now we state a precise definition for what is meant by “seen from above”. Definition 2. Given a pile p = PilenP , its associated collage is the word w of length n, denoted by CollagenP , where for i = 1, . . . , n the following holds: if p[i] = , w[i] = a ∈ if p[i] = ua, u ∈ ∗ , a ∈ . In other words, the collage is obtained by selecting the topmost letter of each column and by taking when the column is empty. If the resulting word w does not contain the symbol , i.e., if w ∈ ∗ , it is said to be completely covered by W . We extend the operation of collage to subsets by setting for all W ⊆ ∗ Collage(W ) = {w ∈ ( ∪ )∗ | w = CollagenP , n ∈ N, P ∈ (N × W )∗ }. (1) Observation. There are other natural definitions of collage of a language. Indeed, we may suppress the condition that the occurrences of W are contained in the interval 1, . . . , n by allowing to clip them to the interval. Another possibility is not to fix the length of the resulting word a priori, i.e., to achieve the collage along the infinite integer line and consider the smallest interval which contains all occurrences pasted. As far as recognizable languages are concerned, the closure property is equally valid in these three cases. Observe, however, that the closure property no longer holds for context-free languages. Indeed, we leave it to the reader to verify that the collage of the context-free language = {ca n bm d ∈ {a, b, c, d}∗ | n > m} is not context-free. Theorem 1. If W ⊆ ∗ is recognizable then so is Collage(W ). Proof. First observe that it suffices to prove that the set of completely covered words is recognizable. Indeed, if we denote this set by Covered(W ) then we have Collage(W ) = ∗ (Covered(W )+ )∗ Covered(W )∗ . The crux of the proof is that the pile defining a given collage may be assumed of bounded height. Lemma 2. Let N be the number of states of an automaton recognizing W . For each w ∈ Covered(W ) there exists a pile of height at most 2N whose associated collage is w.
368
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
Proof. There is no loss of generality to assume that, given w ∈ Collage(W ), the associated pile PilenP , where P = (xt , ut )1 t r , satisfies the condition that (xt , ut ) is not obscured by (xt+1 , ut+1 ) for 0 t r − 1, since otherwise we can eliminate (xt , ut ) to start with. We can further assume that no factor of length N of a word ut is obscured by the set of words {ut+1 , ut+2 , . . . , ur }. Indeed, consider the factor ut [i . . . i + N − 1] and assume that for all i j i + N − 1, the position xt + j − 1 is covered by some us where t < s r.
.. .. ..
ut 1
i+N−1
i
m
Let q (resp. p) be the state of the automaton after reading u[1 . . . i − 1] (resp. u[1 . . . i + N − 1]) starting from the initial state. Let v (resp. z) be a word of length less than N taking q to some final state (resp. the initial state to p). Replacing the pair (xt , ut ) by the pair (xt , ut [1 . . . i − 1]v) followed by the pair (xt + i + N − |z|, zut [i + N . . . m]), where m = |ut |, results in the same collage.
.. .. .. z u[1...i − 1]
u[i+N...m]
v
Now assume a pile satisfying the preliminary claim has height greater than or equal to 2N at a position 1 i n. Let 1 s1 < · · · < sk r be the maximal increasing sequence of indices such that us1 , us2 , . . . , usk cover position i and for t = 1, . . . , k let [st , st ] be the interval covered by the sequence ust , ust+1 , . . . , usk with k 2N by hypothesis. The sequence [s1 , s1 ] ⊃ [s2 , s2 ] ⊃ · · · ⊃ [sk , sk ] is strictly decreasing by the preliminary remark, and each element contains the position i. Then either i − s1 − 1 or sk − i − 1 is greater than N, a contradiction. We now turn to the proof of our theorem. We call a language over a given alphabet marked local whenever it is possible to partition the alphabet = I ∪ H ∪ F in such a way that a word belongs to the language if and only if its initial letter belongs to I, its final letter belongs to F, the remaining letters belong to H and the transitions between consecutive letters belong to a subset V ⊆ × . This is a strengthening of the standard notion of local
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
369
languages. It is clear that the recognizable language W is the image of some marked local language U in a letter-to-letter morphism f. It is also clear that the collage of W is the image of the collage of U in the morphism f. Consequently, we may assume without loss of generality that W itself is marked local. Because of Lemma 2, it suffices to consider piles of words of length bounded by some integer h in order to generate all words in the collage of W . If we prove that the set of piles, viewed as words over the alphabet = 0 i h i , is recognizable, then the collage itself is recognizable. This is achieved by showing that the language Pile(W ) is marked local. Indeed, the possible initial (resp. final) letters (over the alphabet ) are of the form a1 · · · ak with 0 < k h and ai ∈ I (resp. ai ∈ F ) for i = 1, . . . , k. The allowed transitions are the pairs (A, B), where A = a1 · · · ak and B = b1 · · · b satisfy the following condition. Let ai1 , ai2 , …, air with 1 i1 < i2 · · · < ir be the sequence of the letters of the alphabet I ∪H in A and bj1 , bj2 , …, bjs with 1 j1 < j2 · · · < js be the sequence of the letters of the alphabet F ∪ H in B. Then r = s and the pairs (ait , bjt ) belong to V for t = 1, . . . , r. We illustrate this last construction with the example of Table 2, without distinguishing explicitly between the three subalphabets I, F and H. Consider the transition between column 3 and 4. Then A = b1 a4 and B = a1 b2 a4 . For A the subsequence of letters in I ∪ H is b1 , a4 , for B the subsequence of letters in F ∪ H is a1 , a4 . Similarly, consider the transition between columns 4 and 5. Then A = a1 b2 a4 and B = b2 b4 . For A the subsequence of letters in I ∪ H is b2 , a4 and for B the subsequence of letters in F ∪ H is b2 , a4 .
3. Preliminaries on picture languages Here we borrow the terminology to the chapter of the Handbook of Formal Languages written by Giammarresi and Restivo [4]. The reader is also referred to [6]. We restrict ourselves to the results which are necessary for a self-contained exposition of our work. The definitions for the free monoid extend to two-dimensional strings in a rather natural way. A two-dimensional string (or picture) is a two-dimensional rectangular array of elements in . The size of a picture p is the pair (r(p), c(p)) of its number of rows and columns, also denoted by (r, c) when the picture p is understood. The element at position (i, j ) with 1 i r, 1 j c, also called pixel, is denoted by p[i, j ]. As for usual arrays, the indices grow from top to bottom for the rows and from left to right for the columns. The set of all pictures over is denoted by ∗×∗ . The subset of all pictures with n columns (resp. with p rows, with n columns and p rows) is denoted by ∗×n (resp. p×∗ , p×n ). A two-dimensional language over is a subset of ∗×∗ . 3.1. Different characterizations The first attempt at defining some procedure for recognizing pictures is credited to Blum and Hewitt in the 1970s, [3]. Their model is an extension of the ordinary two-way onetape automata by allowing the read head to move in all cardinal directions. It was however superseded by the more powerful and robust class of recognizable languages.
370
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
There are different and equivalent definitions for recognizable picture languages, see [4, Theorem 8.7]. In particular the notions of tiling systems and of (some type of) regular expressions lead to the same family. The notion of tiling systems is the most suitable for our purpose and we recall it now. 3.1.1. Tiling systems Before running a procedure on the pictures, we border them with occurrences of a symbol %∈ / , e.g.,
2
1
3
1
2
2
3
3
1
3
3
1
3
2
3
2
1
2
3
2
We first define a local language as a language consisting of all pictures whose 2 × 2subpictures belong to a fixed subset of { ∪%}2×2 . For example, the following 10 subpictures define all rectangular chessboards of odd number of rows and columns. 0 ,
0 ,
0
, 0
1 ,
, 1
0
, 0
1
0
0 ,
1
0
1
,
1
0
0
1
, 1
0
,
Formally, we have Definition 3. A local system is a pair (, ) where is a finite alphabet and a subset of 2×2 . The language defined by the system is the set of all pictures whose 2 × 2-subpictures belong to . The definition of the more general family of recognizable picture languages requires the notion of projection which is a mapping h from an alphabet into some other alphabet which extends to pictures by substituting the color h(a) ∈ for color a ∈ for each pixel, resulting in a picture of the same size. Formally, we have Definition 4. A tiling system is a quadruple (, , , h) where and are finite alphabets, is a subset of 2×2 and h : → is a projection. The language recognized by the tiling system is the projection by h of the local language recognized by the local system (, ). A language is tiling recognizable if it is recognized by some tiling system.
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
371
With the previous example, identifying 0 and 1 defines the collection of all pictures with uniform contents and odd number of rows and of columns. Using this characterization, it can be seen that the collection of all squares is tiling recognizable but not local. 3.1.2. Regular expressions The allowed operations are the union, intersection (not complementation), row- and column-concatenation which are partial operations and row- and column-Kleene closure. By row-concatenation of two picture languages P , Q ⊆ ∗×∗ is meant the language, denoted P Q, of all pictures obtained by taking two arbitrary pictures p ∈ P and q ∈ Q with the same number of columns and by putting p on top of q. The Kleene row-concatenation closure of a language P ⊆ ∗×∗ is the set of all pictures obtained by taking a finite sequence of pictures with the same number of columns p1 , . . . , pn ∈ P and by putting p1 on top of p2 , …,pn−1 on top of pn . The notions of column-concatenation and column-Kleene closure are defined dually by concatenating from left to right. The column-concatenation of two picture languages P , Q ⊆ ∗×∗ is denoted by P ¸Q. The fundamental result of this theory is that the collection of languages recognized by some tiling system is identical to the smallest family of languages of pictures comprising all finite languages and closed under union, intersection, row- and column-concatenation, under Kleene row- and column-concatenation closure and projection. Henceforth, this collection of pictures is called the family of recognizable picture languages. Example 2.
1
0
2
0
1
0
p=
1 p= 0 2
,q=
2
2
1
0
0
0
1
1
1
0
2
2
1
1 ,q=
0
0
0
2
1
1
1
,p
,p
1
0
2
0
1
0
q= 2
2
1
0
0
0
1
1
1
1
0
2
2
1
q= 0
1
0
0
0
2
2
1
1
1
3.2. A necessary condition Recognizable languages over strings are characterized by the finiteness of the number of different right (or left) contexts. For picture languages there exists some weaker version. Indeed, it can be shown that for such a language to be recognizable, the number of nonequivalent pictures of a given size may not grow too quickly relative to the size of the picture. More precisely, given two pictures p and q with r rows and c columns respectively
372
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
and a picture language X, we say that p, q ∈ X are equivalent relative to X when for all pictures htop , hbottom , vleft , vright of suitable size, we have vleft ¸(htop p hbottom )¸vright ∈ X, ⇔ vleft ¸(htop q hbottom )¸vright ∈ X. htop
vleft
p
hbottom
htop
vright
=
vleft
q
vright
hbottom
Given a picture language X and two integers r, c, we denote by f (r, c) the number of nonequivalent pictures relative to X. Then we have a weak form of syntactic characterization. Proposition 3. If the language X is recognizable then there exists an integer k such that for all pairs (r, c), the number of non-equivalent pictures relative to X is less than k r+c . 3.3. A new closure property Because of the fundamental theorem on recognizable picture languages, this family is closed under projection. It can be proven that it is not closed under complementation [4, Theorem 7.5]. The following property transforms the contents of the pictures, not their size. It is inspired from bit blitting operations used in computer graphics. Let 1 , 2 and 3 be three alphabets and let f : 1 × 2 → 3 be a function. Given and p2 ∈ r×c define F (p1 , p2 ) as the picture p3 ∈ r×c two pictures p1 ∈ r×c 1 2 3 , where p3 [i, j ] = f (p1 [i, j ], p2 [i, j ]). This operation extends naturally to pairs of picture languages. For example, on pictures over the binary alphabet {0, 1}, if we take f to be logical disjunction and if X, Y ⊆ {0, 1}∗×∗ then F (X, Y ) will be the set of pictures obtained by combining one picture of X and one picture of Y (these two pictures having the same dimension) with a logical OR operation. The following proposition shows that the resulting language is recognizable if X and Y are. Proposition 4. If X ⊆ ∗×∗ and Y ⊆ ∗×∗ are recognizable languages then F (X, Y ) is 1 2 recognizable. Proof. Let (1 , 1 , 1 , h1 ) and (2 , 2 , 2 , h2 ) be tiling systems recognizing X and Y. Define 3 = 1 × 2 , 3 = {t | 1 (t) ∈ 1 ∧ 2 (t) ∈ 2 } and h3 (x, y) = f (h1 (x), h2 (y)), where 1 : 1 × 2 → 1 and 2 : 1 × 2 → 2 are projections. The system (3 , 3 , 3 , h3 ) recognizes F (X, Y ).
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
373
4. Collage of pictures Here, we extend to the two-dimensional case the notions introduced in Section 2. It consists of “piling up” pictures belonging to a given collection one on top of the other, above a horizontal surface filled with a blank symbol. The result is the picture seen from above, the top symbol at each position obscuring all symbols under it. We directly define the collage of a picture instead of proceeding as in the previous one-dimensional case with the intermediate notion of pile. Definition 5. Let P ⊆ ∗×∗ be a collection of pictures, called patches, let (r, c) be a pair of integers and let S be a finite sequence of triples (x, y, p) ∈ N2 × P , called a stack. The (r,c) collage CollageS is the r × c-array of symbols in ∪ {} defined by induction on the number of elements in S as follows. (r,c) If S = ∅ then CollageS is the r × c-array whose entries are all equal to the letter . Otherwise let S be the sequence S deprived of its last triple (x, y, p). p[i − x + 1, j − y + 1] if i ∈ [x, x + r(p) − 1] (r,c) and j ∈ [y, y + c(p) − 1], CollageS [i, j ] = (r,c) otherwise. CollageS [i, j ] Example 3. The sequence of triples S = {(1, 1, p1 ), (4, 2, p2 ), (4, 2, p3 ), (2, 2, p4 ), (4, 1, p5 )} with the following patches 1 1 p1 =
1 1
2
2
2
2
2
2
2
2
2
2
1
3
1 1
p2 =
p3 =
1
3 3 3
3 4
4
4
4
4
4
3 p4 =
3 3
p5 =
5
5
5
5
5
5
5
5
defines the collage
p=
1
1
1
2
1
4
4
3
1
4
4
4
5
5
4
5
5
5
3
6
5
5
3
7
5
5
3
2
2
8
We are interested in studying the languages of pictures obtained by applying the collage operation from a recognizable picture language. This requires extending the operation to subsets of pictures.
374
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
Definition 6. Given a set of patches P ⊆ ∗×∗ , its collage closure is the set (r,c)
Collage(P ) = {p ∈ ( ∪ )∗×∗ | p = CollageS
for some r, c and S}.
(2)
When the resulting picture p does not contain the symbol , it is said to be completely covered by P. 4.1. Closure and non-closure properties The two-dimensional case does not enjoy as nice closure properties as the one-dimensional case, as far as the collage operation is concerned. Indeed, the collage of a set of recognizable patches is not recognizable in general. However, there are a number of hypotheses under which this property still holds. As an immediate consequence of Section 2, this is the case when the patches all have a unique column (or all have a unique row). Another type of restriction is when the stacks associated with a collage have bounded height. Unary alphabets are special in the sense that closure by collage is recognizable, regardless of the set of patches we start with (it may even be non-recursive). We finally give an example of a set of two patches whose collage is non-recognizable. Proposition 5. Let P ⊆ ∗×1 (resp. P ⊆ 1×∗ ) be a recognizable language of patches. Then the picture language Collage(P ) ⊆ ( ∪ {%})∗×∗ is again recognizable. Proof. Indeed, a picture in Collage(P ) is a row- (column-) concatenation of unidimensional collages. We may conclude by Theorem 1 and by the definition of recognizable picture languages. We may also bound the height of the stack, which is defined as follows. Let S = (x1 , y1 , p1 ), . . . , (xk , yk , pk ) be a stack of k elements. Intuitively, we may consider the elements as falling on the ground and being prevented from hitting it only by previously fallen other elements occupying a position overlapping their own position. The th element of the stack is placed at a particular integer altitude z 0 such that two elements at the same altitude do not overlap while minimizing the maximum altitude, which by definition is the height of the stack. Observe that the number of patches covering a particular position may be bounded even if the height is not, think of a staircase for example. (r,c) Formally given a collage CollageS we define the height h(i, j ) of each pixel 1 i r, 1 j c by induction on the cardinality k of the stack S. If k = 0 then h(i, j ) = 0. Otherwise, let S be the stack S deprived of its last triple (x, y, p) and let h (i, j ) be the height of the pixel (i, j ) in this collage. Then we have by setting I = [x, x + r(p) − 1]× [y, y + c(p) − 1] 1 + max{h (k, ) | (k, ) ∈ I } if (i, j ) ∈ I, h(i, j ) = h (i, j ) otherwise. The height of the stack is the maximum value of h(i, j ) when (i, j ) runs over the picture.
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
375
Proposition 6. Let P ⊆ ∗×∗ be a recognizable language of patches and let k be an integer. The set Collageh k (P ) of collages of P which can be obtained by stacks of height k or less is recognizable. Proof. In the case where k = 1, the proposition is a consequence of [7, Proposition 5.1.] asserting that tilings of recognizable picture language are recognizable. Indeed, a tiling is a collage of patches such that no two patches overlap and that the whole picture is covered by some patch. Then the tiling by the recognizable picture language P ∪ {}∗×∗ is precisely a collage of height 1. Let P be this language. Let f : ( ∪ {})2 → ∪ {} be defined by f (x, y) = x for x = and f (, y) = y otherwise. This function allows us to combine layers of tilings of P by treating as a transparent color. We then have, with the notations of Proposition 4,
Collageh (P ) = F (P , F (P , . . . , F (P , P ) · · ·)) . h−1 times
Using the closure under union allows us to complete the proof.
In the one-letter case, the resulting pictures are binary with e.g., 1 standing for the letter and 0 for the symbol . Theorem 7. If P ⊆ {a}∗×∗ is an arbitrary picture language then the picture language Collage(P ) is recognizable. Proof. Let us start with some elementary observations. If p belongs to P, then each occurrence of a rectangle q is contained in some occurrence of a rectangle p satisfying r(p) r(q) and c(p) c(q). Thus, Collage(P ) equals Collage(Q) where Q is the set of minimal patches in P where minimal is meant componentwise. By Dickson’s Lemma asserting in particular that all subsets of N2 have finitely many minimal elements, the subset Q is finite. In the unary case, collage corresponds to taking the logical OR on pixels; thus by Proposition 4 where the function f achieves the logical disjunction of the pixels, we see that it suffices to consider the case where Q is reduced to a unique element. Let P = {a}r×c be a singleton. Consider an element p ∈ Collage(P ). Every pixel (i, j ) of p can belong to a number of occurrences. Since there is only one non-blank letter, the order in which these patches are laid is irrelevant. The number of times a given patch is laid on a given position is also irrelevant. The position of the patches however is relevant. We may therefore consider the set Bi,j of pairs (k, ) such that the rectangle a r×c can be placed in p with its top left corner at position (i − k + 1, j − + 1), that is if the subpicture [i − k + 1 . . . i − k + r] × [j − + 1 . . . j − + c] of p is made of all a’s. Let be the power set of {1, . . . , r} × {1, . . . , c}. A tiling system (, , , h) recognizing Collage(P ) is specified as follows. The primary alphabet is = {a, } and the auxiliary alphabet is . The projection h maps ∅ to and every other element to a. Consider the following four
376
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
subsets of ( ∪ {#})2×2 :
x y 1 = | ∀k∀ (k, ) ∈ x ∧ < c ⇒ (k, + 1) ∈ y , z t
x y 2 = | ∀k∀ (k, ) ∈ y ∧ > 1 ⇒ (k, − 1) ∈ x , z t
x y 3 = | ∀k∀ (k, ) ∈ x ∧ k < r ⇒ (k + 1, ) ∈ z , z t
x y 4 = | ∀k∀ (k, ) ∈ z ∧ k > 1 ⇒ (k − 1, ) ∈ x . z t The set 1 (resp. 2 , 3 , 4 ) enforces coherent propagation of the hypotheses towards the right (resp. leftwards, downwards and upwards). We set = 1 i 4 i . We make the somewhat untidy convention that a condition of the form (k, ) ∈ x is false whenever x = #. It is clear that all pictures in Collage(P ) are recognized by the tiling system. Conversely, assume a picture is recognized by the system. Then it suffices to observe that if (k, ) is an element of a subset of which labels the pixel at position (i, j ), then all pixels at positions (i + , j + ) satisfying i − k + 1 i + i − k + r and j − + 1 j + j − + c are labeled by a subset containing the element (k + , + ), proving thus that the picture is a union of occurrences of the rectangle. 4.2. The general case We show in this paragraph that even if the language P of patches is finite, Collage(P ) might no longer be recognizable. Actually we prove it with a set P consisting of two patches of dimension 1 × 3 and 3 × 1, respectively. Indeed, consider the language of pictures over the alphabet {a, b, e} (b for suggesting the beginning and e the end) consisting of the horizontal patch b a e
(3)
and of the vertical patch b a. e
(4)
Theorem 8. The language Collage(P ) where P consists of the two horizontal and vertical patches 3 and 4 is not recognizable. Proof. Given a permutation on the set of integers {1, . . . , p} we construct a picture with 3p −1 rows and 3p +1 columns. This picture is based on a structure composed of p different paths in the discrete plane, each path being itself composed of a horizontal line followed by
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
377
1 2 3 4 1
2
3
4
Fig. 1. The permutation (1) = 1, (2) = 4, (3) = 2, (4) = 3.
1
3 (i)
3i−2
b b a b b a b b a b b a b b b
3p−1
b b a b b a b b a b b a b
e
Fig. 2. A path associated with the pair (i, (i)).
a vertical, see Fig. 1. The coordinates of the end points of these two lines are respectively (3i − 2, 1) and
(3i − 2, 3 (i) + 1),
(3i − 2, 3 (i)) and
((3p − 1, 3 (i)).
Now we view each path of this picture as obtained by pasting one on top of the previous one, from left to right and from top to bottom, occurrences of the horizontal and then of the vertical patch. The horizontal line starts at position (3i − 2, 1), has length 3 (i) + 1 and is covered by occurrences of the horizontal patch with periodic shift resulting in the sequence of labels b, b, a, b, b, a, . . . , b, b, a followed by the final sequence b, b, b, e, see Fig. 2. The vertical path starts at position (3i − 2, 3 (i)), has length 3(n − i) + 2 and is covered by occurrences of the vertical patch starting with the sequence of labels b, b and followed by a periodic sequence b, a, b, b, a, . . . , b, b, a, b (Fig. 3).
378
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
1
b
b
b
e
b b 2
3
4
b
b
b
b
a
b
b
a
b
b
a
b
b
b b
b
b
a
b
b
b
b
b
b
e
a
b
b
b
b
b
b
a
b
b
a
b
e
b
b
e
a
b
b
b
b
1
2
3
4
Fig. 3. The 4 paths associated with the permutation of Fig. 1.
1 2 .. . i .. . p−1 p
(1)
...
(i) . . .
(p)
Fig. 4. A general view with a context creating a loop by closing the path connecting i and (i).
Consequently, the picture consists of p different strips built on the previous p paths, which are covered by piling up occurrences of the two patches in such a way that the collage of a patch is done on top of the previous patch. Furthermore, the order of achieving the collage of two strips is irrelevant as they intersect on an element labeled by a letter belonging to both patches. Consider two different permutations and and assume (i) = (i) for some 1 i p. Then it is not difficult to design a context which, as Fig. 4 suggests intuitively, connects (i) back to i and adds the minimum information so that all paths associated with the integer j = i represent a legal collage of patches. This latter is done by simply appending a below e positions (3p − 1, 3 (j )) for all j = i.
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
379
(i,1) b a a e e e e e e e e e a e
b
a b
a e
b e
b
e
b b a b b a b a b (3p − 1,3 (i)) a e a e
Fig. 5. A closer view at the loop.
b
b
b
e
b b b
a
b
b
b
a
b
b
a
b
b
a
b
b
b
a
b
b
e
b
b
e
b
b
a
b
b
b
e
a
e
b
b
b
e
b
b
b
e
b
b
a
b
b
a
b
b
b
e
a
e
b
b
b
b
e
a
a
a
a
e
e
e
e
b
e a
e
a a
e
e
e
e
e
e
e
e
e
e
e
a
e
e Fig. 6. The sub-picture associated with the current permutation surrounded by a context creating a loop.
Since all permutations on {1, . . . , p} define a picture having context discriminating them among all other permutations, there exist (p!) non-equivalent pictures whose number of rows and columns is in O(p), contradicting thus Proposition 3 and completing the proof (Figs. 5 and 6).
380
C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380
References [1] A. Apostolico, A. Ehrenfeucht, Efficient detection of quasiperiodicities in strings, Theoret. Comput. Sci. 119 (1) (1993) 247–265. [2] M. Barnsley, Fractals Everywhere, Academic Press, Boston, 1988. [3] M. Blum, C. Hewitt, Automata on two-dimensional tape, in: IEEE Symposium on Switching and Automata Theory, 1967, pp. 155–160. [4] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages: Beyond Words, Vol. 3, Springer, Berlin, 1997, pp. 215–267. [5] A. Habel, H.-J. Kreowski, Collage grammars, in: H. Ehrig, H.-J. Kreowski, G. Rozenberg (Eds.), Proc. Fourth Internat. Workshop on Graph Grammars and their Applications to Computer Science, Lecture Notes in Computer Science, Vol. 532, 1991, pp. 411–429. [6] K. Inoue, I. Takanami, A survey of two-dimensional automata theory, Proc. Fifth Internat. Meeting of Young Computer Scientists, Lecture Notes in Computer Science, Vol. 381, 1990, pp. 72–91. [7] D. Simplot, A characterization of recognizable picture languages by tilings by finite sets, Theoret. Comput. Sci. 218 (2) (1999) 297–323.
Theoretical Computer Science 340 (2005) 381 – 393 www.elsevier.com/locate/tcs
Codes and sofic constraints Marie-Pierre Béal∗ , Dominique Perrin Institut Gaspard-Monge, Université de Marne-la-Vallée, 77454 Marne-la-Vallée Cedex 2, France
Abstract We study the notion of a code in a sofic subshift. We first give a generalization of the Kraft–McMillan inequality to this case. We then prove that the polynomial of the alphabet in an irreducible sofic shift divides the polynomial of any finite code which is complete for this sofic shift. This settles a conjecture from Reutenauer. © 2005 Elsevier B.V. All rights reserved. Keywords: Kraft-McMillan inequality; Sofic shifts; Symbolic dynamics; Variable-length codes
1. Introduction There is a rich and fruitful interplay between two theories which arose at first independently. One is the theory of automata and formal languages, born in the context of theoretical computer science. The other one is symbolic dynamics which arose from the theory of dynamical systems in topology and probability theory. The theory of variable-length codes is one of the contact points between these domains, with a counterpart in symbolic dynamics in renewal systems and finite-to-one maps. Antonio Restivo has initiated in [6] a new direction by studying systematically the notion of a code in a subshift of finite type. In particular, he has studied the relationship between maximal and complete codes, with results that essentially extend those known in the case of the free monoid, or equivalently the full shift in symbolic dynamics. In this paper, we continue this exploration. First of all, we adopt a definition which is not exactly the same one. To be more specific, we consider a sofic shift S and a subset X of the ∗ Corresponding author.
E-mail addresses: [email protected] (M.-P. Béal), [email protected] (D. Perrin). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.033
382
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
set FS of factors of S. We consider such a set X which is a code. Observe that we do not require, as in [6,7] that X ∗ ⊂ FS . This modifies the subsequent notions of an S-complete or S-maximal code. Actually, our notion has also connections with the definition of a code in a graph introduced by Christophe Reutenauer in [8], as we shall see below. The case of bifix codes was studied by Clelia De Felice in [3]. Codes in subshifts have also a connection with the study of permutation groups in syntactic semigroups (see [5]). We first show that one may generalize to this situation the classical Kraft–McMillan inequality. In fact, we associate to each set X of words a series pX (z) which, in the case where S is the full shift, reduces to the generating series of the words of X by length, i.e. n pX (z) = n 1 un z where un is the number of words of length n in X. Let h(S) be the entropy of S and let S be such that h(S) = − log(S ). The precise definition of the entropy is given in the next section. It uses a logarithm and the same basis has to be used in both definitions of S and h(S). In particular, S = 1/k when S is the full shift on k symbols. We prove that if X is a code, then pX (S ) 1. Actually, we obtain this result as a corollary of a more general one, corresponding to an assignment of real values to the letters generalizing the notion of a Bernoulli distribution (Theorem 1). We say that X is S-complete if X ⊂ FS ⊂ F (X ∗ ), where F (X ∗ ) denotes the set of factors of the words in X∗ . We prove that when X is regular, then pX (S ) = 1 if and only if X is S-complete. This is again obtained as a corollary of a more general result (Theorem 2). We prove in the second part of the paper a generalization of a result of [8] concerning a multivariate polynomial p(X) associated with a code X, called the determinant of the code. It says that, for a finite S-complete code, where S is an irreducible sofic shift, this polynomial is divisible by the polynomial p(A). The proof uses the results of the previous section, in contrast with the algebraic arguments of [8] using the notion of syntactic category, a generalization of the syntactic semigroup.
2. Codes and sofic shifts Let A be a finite alphabet. We denote by A∗ the set of finite words and by AZ the set of bi-infinite words on A. A subshift is a closed subset S of AZ which is invariant by the shift transformation (i.e. (S) = S) defined by ((ai )i∈Z ) = (ai+1 )i∈Z . A finite automaton is a finite multigraph labeled by a finite alphabet A. It is denoted A = (Q, E), where Q is a finite set of states, and E a finite set of edges labeled by A. A sofic shift is the set of labels of all bi-infinite paths on a finite automaton. A sofic shift is irreducible if there is such a finite automaton with a strongly connected graph. If A is deterministic, for any state p ∈ Q and any word u, there is at most one path labeled u and going out of p. We denote by pu the target of this path when it exists. Irreducible sofic shifts have a unique (up to isomorphisms of automata) minimal deterministic automaton, that is a deterministic automaton having the fewest states among all deterministic automata representing the shift. This automaton is called the right Fischer cover of the shift. A subshift of finite type is defined as the bi-infinite words on a finite alphabet avoiding a finite set of finite words. It is a sofic shift. An edge shift is the set of the labels of all bi-infinite paths on a finite automaton where all edges have distinct labels. The full shift on the finite alphabet A is the set of all bi-infinites sequences on A, i.e. AZ .
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
383
b a
1
2 a
Fig. 1. The golden mean system.
Let S be a subshift on the alphabet A. We denote by FS the set of finite factors, or blocks, of words in S. We denote by h(S) the entropy of a S. It is equal to entropy h(L) of the language L = FS , where h(L) = lim sup n→∞
1 log Card(L ∩ An ). n
We denote by S the unique positive real number such that h(S) = log(1/S ). The positive real number S is the convergence radius of the generating series of L by length. Example 1. If S be the full shift on A, then S = 1/Card(A). Example 2. Let S be the irreducible subshift of finite type on A = {a, b} defined by the finite set of forbidden sequences I = {bb}. It is the so-called golden mean system, and S is the inverse of the golden mean, solution of 2S = 1 − S . The right Fischer cover of S is represented in Fig. 1. A set of finite words X on an alphabet A is a code if and only if whenever x1 x2 . . . xn = y1 y2 . . . ym , where xi , yj ∈ X and n, m are positive integers, one has n = m and xi = yi for 1 i n. If X is set of finite words on a finite alphabet, X ∗ denotes all finite concatenations of words of X. Let S be a sofic shift. A set X on the alphabet A is said to be complete in S, or S-complete, if X ⊂ FS , and any word in FS is a factor of a word in X ∗ . Observe that we do not require that X∗ ⊂ FS . A code X is S-maximal if X ⊂ FS and it is maximal for this property. Let S be an irreducible sofic shift over the alphabet A. Let (Q, E) be its right Fischer cover. Let be the morphism from A∗ into the monoid of Q × Q matrices over the monoid A∗ ∪ {0} defined as follows. For each word u, the matrix (u) is defined by u if pu = q, (u)pq = 0 otherwise. The elements of the matrix (u) can be considered as subsets of A∗ , interpreting 0 as the empty set. The morphism can be extended to the semiring P(A∗ ) of subsets of A∗ by linearity. Thus, it becomes a morphism from P(A∗ ) to the semiring of Q × Q matrices on P(A∗ ). For any subset X of A∗ , we have
(Xn ) = (X)n
384
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
and
(X∗ ) =
n0
(Xn ).
We denote by an assignment of positive real values to the elements of A. We extend it to a semigroup morphism from A∗ ∪ {0} to R. We denote by u → fu (z) the morphism from A∗ into the monoid of Q × Q matrices with coefficients in the polynomial ring R[z], defined for each word w by (u)z|u| if pu = q, fu (z)pq = 0 otherwise. If U is a set of words, we denote by fU (z) the matrix whose coefficients are real power series defined by fU (z) = fu (z). u∈U
Actually, fU (z) can also be considered as a series whose coefficients are real matrices. In this sense, we will talk about the radius of convergence of fU (z) and denote it by (fU (z)). It is the minimum of the radii of convergence of its elements when viewed as a matrix. Note that, for a set of words U, the elements of the matrix fU (z) are obtained from the elements of (U ) as the generating series of the values by . More precisely fU (z)pq = (u)z|u| , where the sum runs over all u ∈ (U )pq . If X is a code, then fXn (z) = (fX (z))n and fX∗ (z) = (I − fX (z))−1 . We consider the polynomial d(z) = det(I − fA (z)). We say that an assignment is admissible if the following condition is satisfied: the value z = 1 is a root of d(z) and || 1 for any other root . There is always at least one admissible assignment for each nonempty alphabet. For instance, one can show that the assignment defined by (a) = S for any a ∈ A is admissible (see for example [4]). An admissible assignment can be seen as a generalization of a Bernoulli distribution on the alphabet. Actually, coincide. Indeed, in when S is the full shift on A, both definitions this case, fA (1) = a∈A (a) and thus is admissible if and only if a∈A (a) = 1. Note that, when is admissible, the radius of convergence of fA∗ (z) is 1. Indeed, for every complex number z such that |z| < 1, det(I − fA (z)) = 0 and thus fA∗ (z) converges. The following example describes the admissible assignments in the case of the golden mean system.
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
385
Example 3. We consider again the golden mean system of Example 2. The morphism is defined by: a 0 0 b (a) = , (b) = . a 0 0 0 Let (a) = p, (b) = q. We have pz 0 0 qz , fb (z) = , fa (z) = pz 0 0 0
fA (z) =
pz qz . pz 0
Hence d(z) = 1 − pz − pqz2 . Thus is admissible if and only if p(1 + q) = 1. In particular, (a) = (b) = S is admissible. Our first result is the following statement. Theorem 1. Let S be an irreducible sofic shift and let be an admissible assignment. If X ⊂ A+ is a code, then the series fX (z) converges for z = 1 and det(I − fX (1)) 0. Proof. Let A = (Q, E) be the right Fischer cover of S. For any two states p, q ∈ Q, (X∗ )pq ⊆ (A∗ )pq . Thus (fX∗ (z)) (fA∗ (z)). Since is admissible, (fA∗ (z)) = 1. It follows that (fX∗ (z)) 1. Since X is a code, fX∗ (z) = (I − fX (z))−1 . Thus det(I − fX (z)) = 0 for 0 z < 1. Since det(I − fX (0)) = 1, we obtain det(I − fX (z)) 0 for 0 z < 1 by continuity. Again by continuity we conclude that det(I − fX (1)) 0. Example 4. Let S be the golden mean system and let X = {aa, ab}. Let (a) = p, (b) = q be an admissible assignment, i.e. such that p(1 + q) = 1. We have 2 p z pqz2 fX (z) = . p 2 z pqz2 Hence det(I − fX (1)) = 1 − p 2 − pq which is at most equal to 1. We will now prove a complement to Theorem 1 describing the equality case. The proof uses the following lemma stating a classical property on regular languages. If u is word and L a set of finite words, u−1 L denotes the set {w ∈ A∗ | uw ∈ L}. Lemma 1. If L is a regular language of finite words, then is a finite subset P of A∗ there −1 such that the set of factors F (L) of L satisfies F (L) ⊆ v,w∈P v Lw −1 . Proof. Let A = (Q, I, T , E) be a finite automaton recognizing the language L with I the set of initial states, T the set of terminal states, and E the set of edges. We may assume that, vq
wq
for each state q ∈ Q, there exist (i, t) ∈ I × T and two words vq , wq such that i → q → t. u For each word u ∈ F (L), there exist p, q ∈ Q such that p → q. Thus vp uwq ∈ L. This −1 −1 shows that F (L) ⊆ p,q∈Q vp Lwq .
386
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
Theorem 2. Let S be an irreducible sofic shift and let be an admissible assignment. If X ⊂ FS is a regular code, then X is S-complete if and only if det(I − fX (1)) = 0. Proof. Let A = (Q, E) be the right Fischer cover of S. Let p be a state of A. If X is S-complete, any word u in FS with pu = p is a factor of a word in X ∗ . If u is factor of word in X, then there are states s, t ∈ Q such that u is factor of a word in (X)st , since X ⊂ FS . If u is not factor of a word in X, by considering the words un , for n 1, we get that u is factor of some word in (X ∗ )st , for some states s, t ∈ Q. It follows that, in any case, there are two states s, t ∈ Q such that u is factor of a word in (X ∗ )st . Since X is regular, (X∗ )st is also regular. It follows from Lemma 1 that, for each p ∈ Q, there is a finite set of words U such that −1 (A∗ )pp ⊂ v (X ∗ )st w −1 . s,t∈Q v,w∈U
For any s, t ∈ Q, any v, w ∈ A∗ ,
(fv−1 (X∗ )st w−1 (z)) (f(X∗ )st (z)) (fX∗ (z)). Thus (fA∗ (z)) (fX∗ (z)). Since is admissible, (fA∗ (z)) = 1. By Theorem 1, (fX∗ (z)) 1. Hence (fX∗ (z)) = 1. Since X is a regular code, for any two states p, q ∈ Q, fX∗ (z)pq is a rational series with nonnegative real coefficients. By Lemma [2, Lemma 2.3, pp. 82], either fX∗ (z)pq is a polynomial, or the minimal modulus of the poles of fX∗ (z)pq is itself a pole. Since fX∗ (z) = (I − fX (z))−1 , it follows that det(I − fX (z)) vanishes at 1. Conversely, let us assume that X is not S-complete. There is a word u ∈ FS such that u is not factor of any word in X∗ . Thus X ∗ ⊆ A∗ − A∗ uA∗ . Because S is irreducible and assigns a positive real value to each letter, the matrix fA (1) is an irreducible nonnegative real matrix. Thus, it follows from Proposition [4, Theorem 4.4.7] that, for p, q ∈ Q, (fA∗ −A∗ uA∗ (z)pq ) > (fA∗ (z)pq ). We get that, for any states p, q ∈ Q, 1 = (fA∗ (z)) (fA∗ (z)pq ) < (fA∗ −A∗ uA∗ (z)pq ) (fX∗ (z)pq ). Hence (fX∗ (z)) > 1. This implies that det(I − fX (1)) > 0. We now derive from Theorems 1 and 2 two corollaries which constitute a generalization of the Kraft–McMillan inequality. Let be the admissible assignment defined by (a) = S for any a ∈ A. For a subset X of A∗ , we denote for convenience pX (z) = 1 − det(I − fX (z/S )). Thus pX ( S ) = 1 − det(I − fX (1)). When S is the full shift on k symbols S = 1/k and pX (z) = n 1 un zn , where un is the number of words of length n in X. Thus the inequality pX (S ) 1 takes the form n 1 un k −n 1, which is the Kraft–McMillan inequality. Corollary 3. Let S be an irreducible sofic shift. If X ⊂ A+ is a code, then pX (S ) 1.
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
387
a b
1
2 a
Fig. 2. The even system.
Corollary 4. Let S be an irreducible sofic shift and let X ⊂ FS be a regular code. The code X is S-complete if and only if pX (S ) = 1. The following examples illustrate the different possible cases. The first two ones give examples of S-complete codes. Example 5. Let S be the golden mean system. The set X = a + ab is both a code and an S-complete set. We have pX (z) = z + z2 and thus pX (S ) = 1. Example 6. Let S be the golden mean system again. Let X = aa + ab + ba. The set X is an S-complete code since it is formed of all factors of length 2 of S. We have pX (z) = 3z2 − z4 and pX (S ) = 32S − 4S = 1. The last example shows a case of a code which is not S-complete. Example 7. Let S be the even system represented in Fig. 2. The set X = b(a 2 )∗ b is a code. It is not S-complete. The value of S is the same as for the golden mean system. We have 2 2 ∗ z (z ) 0 , fX (z/S ) = 0 0 and
z2 pX (z) = 1 − (1 − z (z ) ) = 1 − 1 − 1 − z2 √ Hence pX (S ) = 2/(1 + 5) < 1. 2
2 ∗
=
z2 . 1 − z2
We briefly investigate the relation between the extremal properties of being S-complete and S-maximal. It is shown in [7] (and it is also a consequence of a result of [1]) that, when S is a subshift of finite type, any S-maximal code is S-complete (with the definitions of an S-code given in [7]). The result still holds for shifts of finite type with our definitions of S-maximal and S-complete codes. Conversely, there is an example in [8] of an S-complete code which is not S-maximal. Indeed, consider the shift S described in Fig. 3. The code X = {ab} is S-complete and not S-maximal since it is included in {ab, ba}.
388
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
b 1
2 a
Fig. 3. A shift of null entropy.
3. Factorization In this section, we consider the case where X is a finite code, and S is an irreducible sofic shift with A as right Fischer cover. We prove that the polynomial of the alphabet for S divides the polynomial of any finite code X which is complete for S. Both polynomials are defined below. This settles a conjecture from Reutenauer given in [8, pp. 150]. In [8, pp. 150], Reutenauer proves the same result when S is an edge shift satisfying a constraint called the condition (0): each state of its right Fischer cover has a loop. Let us denote by the morphism obtained from by taking the commutative image of the elements. For any finite code X, the polynomial p(X) = det(I − (X)) is in Z[A], the set of polynomials over Z in the commuting variables a in A. The polynomial det(I − fX (z)) and pX (z) are of course closely related to p(X). Indeed, for any point x = ((a)z)a∈A , where is an assignment, det(I − fX (z)) = p(X)(x), where p(X)(x) denotes the value of p(X) at the point x. Theorem 5. Let S be an irreducible sofic shift. When X is a finite S-complete code, the polynomial p(X) is a multiple of the polynomial p(A). Proof. We first assume that S is an irreducible edge shift. It is known that det(I − (A)) is equal to 1+ (−1)k (c1 ) . . . (ck ), k 1
c1 ...ck
where the second sum is over all simple cycling paths c1 , . . . , ck of S which do not share any state, two by two. If c is such a simple cycling path, (c) denotes its label seen as a monomial of N[A]. Since S is an irreducible edge shift, the partial degree of p(A) in each letter is at most 1 and all its monomials have coefficients 1 or −1. Moreover, it is proven in [8, Theorem 3] that p(A) is an irreducible polynomial of Z[A]. Let a be a letter appearing in a simple cycling path c with (c) = au, and B = A − {a}. Thus u is a word of commutative letters in B. It follows from the above remarks that p(A) = −au(1 − q) + 1 + r, where q and r are polynomials in Z[B] with no constant term. The polynomials p(X) and p(A) can be seen as polynomials in a with coefficients in Z[B]. We now divide p(X) by p(A) in Z[A, 1/(u(1 − q))]. Thus p(X) = p(A)s + t,
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
389
where s and t are polynomials in Z[A, 1/(u(1 − q))] which are respectively the quotient and the rest of this division. The degree of t in a is zero. It follows that there is a positive integer n such that p(X)(u(1 − q))n = p(A)s + t , with s ∈ Z[A], t ∈ Z[B]. Let be an admissible assignment. We denote by the point = ((a) = S )a∈A . Let w = ((b) = S )b∈B . We get p(A)() = det(I − (A))() = det(I − fA (1)) = 0. Hence (au(1 − q))() = (1 + r)(), or
S =
(1 + r)(w) . (u(1 − q))(w)
We denote by B(w, ) the ball of dimension Card(B) and radius > 0 centered at w. A positive ball is a ball containing only points with positive coefficients. There is a positive ball B(w, ) such that, for any point denoted x = (x1 , . . . , x|B| ) in B(w, ),
(1 + r)(x) > 0, (u(1 − q))(x)
and the assignment x defined by
x (b) = xb if b ∈ B, x (a) =
(1 + r)(x) , (u(1 − q))(x)
is admissible. Indeed, since S is irreducible, by the Perron–Frobenius theorem, z = 1 is a simple root of det(I − fA (z)) with the assignment . Moreover, the roots of modulus 1 are e2i/k for 0 k m − 1, where m is a positive integer. By definition of x , z = 1 is still a root of det(I − fA (z)) with the assignment x . When x is close enough to w, it is the single positive real root of modulus less than or equal to 1. Thus, again by the Perron–Frobenius theorem, any root of det(I − fA (z)) with the assignment x has a modulus greater than or equal to 1. Hence x is admissible. By Theorem 2, p(X)() = 0 and p(A)() = 0 for any = ((a))a∈A where is an admissible assignment. We get that t vanishes at any point in B(w, ). Because B(w, ) has dimension Card(B), we conclude that t vanishes, and p(A) divides p(X)(u(1 − q)n ). Since p(A) is irreducible and has a monomial containing the letter a, p(A) divides p(X) in Z[A]. We now extend the result for irreducible edges shifts to irreducible sofic shifts. Let S be an irreducible sofic shift and let A be its right Fischer cover. We denote by A the automaton obtained from A as follows. For any a ∈ A, if there are m edges labeled by a in A, one labels these edges by a1 , . . . , am , respectively in A . We denote by A the finite alphabet formed of the letters with indices. Since all edges of A have distinct labels, A is the right Fischer cover of an irreducible edge shift S . We denote by the map assigning
390
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
a to each ai , for any a ∈ A, and its extension as a morphism from A∗ to A∗ . Note that, u
u
for each path p → q in A, there is a unique path p → q in A with (u ) = u, since A is deterministic. Let X be a finite S-complete code. We define the finite set of words X as the set of labels v of all paths in A such that (v) ∈ X. We claim that X is a code. Indeed, whenever u1 u2 . . . ur = v1 v2 . . . vs , with r, s positive integers, and ui , vj ∈ X , we have
(u1 )(u2 ) . . . (ur ) = (v1 )(v2 ) . . . (vs ). Since X is a code, r = s and (ui ) = (vi ) for any 1 i r. It follows that |ui | = |vi | for any 1 i r and finally ui = vi . We claim that X is a finite S -complete code. Since A is a minimal deterministic automaton recognizing the sofic shift S, it has a strongly connected graph and a synchronizing word (or reset sequence) u. A synchronizing word in a deterministic automaton is a word u such that the cardinality of the set Qu = {pu | q in A} is one. Hence Qu = {q0 }. Moreover u is the label of a path from p to q0 in A, where p is some state in Q. w
w
Let w be a factor in FS . There is path q → r in A and a path q → r in A with (w ) = w. Let x, y and z be words of A∗ labels of paths from q0 to p, from q0 to q, and r to p, respectively. Hence we have in A the path x
y
u
y
w
u
q0 → p → q0 → q → r → p → q0 . Since X is S-complete, xuywzu is a factor of a word in X ∗ . Moreover, there are two states s, t ∈ Q such that xuywzu is factor of a word in (X ∗ )st (see the beginning of proof of Theorem 2). As a consequence, and since u is synchronizing, there are words g, h ∈ A∗ such that g
y
w
z
u
h
s → q0 → q → r → p → q0 → t is a path in A labeled by a word in X∗ . Let g
y
w
z
u
h
s → q0 → q → r → p → q0 → t be the unique path in A such that (l ) = l for l = g, y, w, z, u, and h. We have w
w
g y w z u h ∈ X ∗ . Moreover since q → r and q → r are paths in A and (w ) = (w ), w = w . It follows that w is factor of a word in X∗ . Finally, we apply the result obtained in the case of irreducible edge shifts to S and X . It follows that p(A ) divides P (X ) in Z[A ]. By removing the indices of the letters in A (or, equivalently, by applying ), we obtain that p(A) divides P (X) in Z[A]. The following example illustrates Theorem 5 in the case of an irreducible edge shift. Example 8. If S is the irreducible edge shift described in Fig. 4, then a 0 0 b 0 0 , (b) = , (c) = . (a) = 0 0 0 0 c 0
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
391
b a
1
2 c
Fig. 4. An edge shift.
b a1
1
2 a2
Fig. 5. The edge shift S .
Let X be the finite S-complete code {aa, ab, ca, cb, bc}. We get 2 a + bc ab . (X) = ca cb We have p(X) = 1 − a 2 − 2bc + b2 c2 = (1 + a + bc)(1 − a − bc) = (1 + a + bc)p(A). The next example illustrates the proof of Theorem 5 in the case of an irreducible subshift of finite type which is not an edge shift. Example 9. If S be the golden mean subshift, then a 0 0 b (a) = , (b) = . a 0 0 0 Let X be the finite S-complete code X = {aa, ab, aab}. The shift S is recognized by the automaton in Fig. 5. Note that S is, up to a renaming of the alphabet, the edge shift of Example 8. We have X = {a1 a1 , a1 b, a1 a1 b, a2 a1 , a2 b, a2 a1 b}, and a a a b + a1 a1 b (X ) = 1 1 1 . a 2 a1 a2 b + a 2 a1 b We get p(X ) = (1 + a1 )(1 − a1 − a2 b) = (1 + a1 )p(A ). As a consequence p(X) = (1 + a)p(A), p(A) = (1 − a − ab).
392
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
b a
1
2
a
c Fig. 6. A sofic shift.
We may observe that the factorization of p(X) can be obtained from the following factorization of the matrix (I − (X)): I − (X) = (I + (a))(I − (A))(I + (b)). This phenomenon is linked, in the case of the full shift, to Reutenauer’s non-commutative factorization theorem [2]. We do not know whether this theorem holds for shifts of finite type. The last example below shows an irreducible sofic shift S which is not a shift of finite type. We give an example of an S-complete code X for which there are two ways to find the factorization of p(X). Example 10. If S is the irreducible sofic shift described in Fig. 6, then a 0 0 b 0 0 (a) = , (b) = , (c) = . 0 a 0 0 c 0 Let X be the finite S-complete code {aa, ab, ac, ba, bc, cb, ca}. We get 2 a + bc ab + ba (X) = . ac + ca a 2 + cb We have p(X) = 1 − 2a 2 − 2bc + a 4 + b2 c2 − 2a 2 bc = (1 − 2a − bc + a 2 )(1 + 2a + a 2 − bc) = p(A)(1 + 2a + a 2 − bc). From 1 − A2 = (1 − A)(1 + A), we get I − (A2 ) = (I − (A))(I + (A)). It follows that det(I − (A2 )) = det(I − (A)) det(I + (A)). Since det(I − (A2 )) = p(X), det(I − (A)) = p(A), and (A) = ( ac ab ), we recover the factorization of p(X): p(X) = det(I − (A2 )) = det(I − (A)) det(I + (A)) = p(A)(1 + 2a + a 2 − bc).
M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393
393
Acknowledgements We thank an anonymous reviewer for constructive suggestions which helped us to improve the presentation of this article. References [1] J. Ashley, B. Marcus, D. Perrin, S. Tuncel, Surjective extensions of sliding-block codes, SIAM J. Discrete Math. 6 (1993) 582–611. [2] J. Berstel, Ch. Reutenauer, Rational Series and their Languages, Springer, Berlin, 1988. [3] C. De Felice, Finite biprefix sets of paths in a graph, Theoret. Comput. Sci. 58 (1988) 103–128 Thirteenth Internat. Colloq. on Automata, Languages and Programming, Rennes, 1986. [4] D.A. Lind, B.H. Marcus, An Introduction to Symbolic Dynamics and Coding, Cambridge, 1995. [5] D. Perrin, G. Rindone, Syntactic groups, Bull. Belgium Math. Soc., to appear. [6] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72 (1990) 55–64. [7] A. Restivo, Codes with constraints, in: Mots, Lang. Raison. Calc., Hermès, Paris, 1990, pp. 358–366. [8] Ch. Reutenauer, Ensembles libres de chemins dans un graphe, Bull. Soc. Math. France 114 (1986) 135–152.
Theoretical Computer Science 340 (2005) 394 – 407 www.elsevier.com/locate/tcs
Small size quantum automata recognizing some regular languages夡 Alberto Bertoni, Carlo Mereghetti∗ , Beatrice Palano Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
Abstract Given a class {p | ∈ I } of stochastic events induced by M-state 1-way quantum finite automata (1qfa) on alphabet , we investigate the size (number of states) of 1qfa’s that -approximate a convex linear combination of {p | ∈ I }, and we apply the results to the synthesis of small size 1qfa’s. We obtain: • An O((Md/3 ) log2 (d/2 )) general size bound, where d is the Vapnik dimension of {p (w) | w ∈ ∗ }. • For commutative n-periodic events p on with || = H , we prove an O((H log n/2 )) size bound for inducing a -approximation of 21 + 21 p whenever F(p) ˆ 1 nH , where F(p) ˆ is the discrete Fourier transform of (the vector pˆ associated with) p. • If the characteristic function L of an n-periodic unary language L satisfies F(ˆL )) 1 n, then L is recognized with isolated cut-point by a 1qfa with O(log n) states. Vice versa, if L is recognized with isolated cut-point by a 1qfa with O(log n) state, then F(ˆL )) 1 = O(n log n). © 2005 Elsevier B.V. All rights reserved. Keywords: Stochastic events; Quantum automata
夡 Partially supported by MURST, under the projects “Linguaggi formali: teoria ed applicazioni” and “FIRB: Complessità descrizionale di automi e strutture correlate”.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Bertoni), [email protected] (C. Mereghetti), [email protected] (B. Palano). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.032
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
395
1. Introduction One-way quantum finite automata (1qfa, for short) [2,4,7,8] are particularly interesting computational devices since they represent a theoretical model for a quantum computer with finite memory. 1qfa’s exhibit both advantages and disadvantages with respect to their classical (deterministic or probabilistic) counterpart. Basically, quantum superposition offers some computational advantages on probabilistic superposition. On the other hand, quantum dynamics are reversible: because of limitation of memory, it is sometimes impossible to simulate deterministic automata by quantum automata. In this paper, we develop techniques for constructing small size 1qfa’s, possibly more succinct than equivalent deterministic or probabilistic automata [13,16,18]. Given a 1qfa A on input alphabet , its behavior is the stochastic event pA : ∗ → [0, 1], where pA (w) is the probability that A accepts w. The language accepted by A with cutpoint is the set LA, = {w ∈ ∗ | pA (w) > }; the cut-point is isolated by > 0 if |pA (w) − | , for every w ∈ ∗ . First of all, we study the problem of approximating stochastic events by using (measureonce [3,6,10]) 1qfa. More precisely, we investigate the following problem: given a family {p | ∈ I } of stochastic events induced by M-state 1qfa’s A on input alphabet , find a “succinct” 1qfa A inducing a -approximation of a convex linear combination q of p ’s, i.e., satisfying |pA (w) − q(w)| , for every w ∈ ∗ . After giving preliminary notions in Section 2, we formulate our problem as a problem of uniform convergence of empirical averages to their expectations in Section 3. By using general results (see, e.g., [1]), we prove an O((Md/3 ) log2 (d/2 )) bound on the number of states for 1qfa’s -approximating q, where d is the Vapnik dimension of the class {p (w) | w ∈ ∗ }. As we will briefly observe at the end of the section, our technique can be directly used to solve the same problem for probabilistic automata. In Section 4, we specialize the previous result on a particular subclass of stochastic events: the n-periodic commutative events. An event : ∗ → [0, 1] is called n-periodic commutative if, for every w ∈ ∗ , (w) depends only on the number modulo n of occurrences in w of each symbol in . In this case, we prove a bound O((M||/2 ) log n) for 1qfa’s inducing -approximations of convex linear combinations of n-periodic commutative events on the alphabet . In Section 5, we relate the 1 -norm of the discrete Fourier transform of any given event to its approximability by 1qfa’s with O(log n) states. As an application, we consider the languages Ln,H ⊆ ∗ , with || = H , consisting of those words for which the number of occurrences of each symbol in is a multiple of n. We prove that Ln,H is recognizable with isolated cut-point by an O(H log n)-state 1qfa, while every nondeterministic automaton recognizing Ln,H requires at least nH states. In Section 6, the unary case (i.e., || = 1) is studied. We show that if the 1 -norm of the discrete Fourier transform of the characteristic function of an n-periodic unary language L does not exceed n, then L is recognized with isolated cut-point by a 1qfa with O(log n) states. Vice versa, if an n-periodic unary language L is recognized with isolated cut-point by a 1qfa with O(log n) state, then the 1 -norm of the discrete Fourier transform of the characteristic function of L does not exceed O(n log n). As an application, we consider the languages Ln,1 , and we compare Q(n) with S(n), where Q(n) (S(n)) is the
396
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
minimum number of states of 1qfa’s (probabilistic automata) accepting Ln,1 . We prove that S(n)/Q(n) = ((log n/ log log n)). Moreover, if n factorizes in a constant number of prime factors, then S(n) is “exponentially greater” than Q(n). 2. Preliminaries 2.1. Linear algebra We quickly recall some notations of linear algebra. For more details, we refer the reader to, e.g., [11,12]. We denote by C the field of complex numbers and by Cn×m the set of n × m matrices with entries in C. Given a complex number z ∈ C, its conjugate is denoted by z, and its √ modulus is |z| = zz. The adjoint of a matrix M ∈ Cn×m is the matrix M † ∈ Cm×n , where Mij† = Mj i . For matrices A ∈ Cn×n and B ∈ Cm×m and for vectors ∈ C1×n and ∈ C1×m , their direct sum is, respectively, A 0 A⊕B = , ⊕ = (1 , . . . , n , 1 , . . . , m ). 0 B A Hilbert space of dimension n is the linear space C1×n equipped with sum and product by elements in C, in which the inner product ( , ) = † is defined. If ( , )√= 0 we say that is orthogonal to . The norm of vector ∈ C1×n is defined as = ( , ). Two subspaces X, Y are orthogonal if any vector in X is orthogonal to any vector in Y; in this case, the linear space generated by X ∪ Y is denoted by XY . A matrix M ∈ Cn×n is said to be unitary whenever MM † = I = M † M, where I is the identity matrix; moreover, a matrix is unitary if and only if it preserves the norm, i.e., M = for each vector ∈ C1×n . The eigenvalues of unitary matrices are complex numbers of modulus 1, i.e., they are in the form eiϑ , for some real ϑ. M is said to be Hermitian whenever M = M † . Given a Hermitian matrix O ∈ Cn×n , let c1 , . . . , cs be its eigenvalues and E1 , . . . , Es the corresponding eigenspaces. It is well-known that each eigenvalue ck is real, that Ei is orthogonal to Ej , for any i = j , and that E1 · · · Es = C1×n . Each vector ∈ C1×n can be uniquely decomposed as = 1 + · · · + s , where j ∈ Ej . The linear transformation → j is the projector Pj on the subspace Ej . It is easy to see that s j =1 Pj = I . The Hermitian matrix O is biunivocally determined by its eigenvalues and its eigenspaces or, equivalently, by its projectors: in fact, we have that O = c1 P1 + · · · + cs Ps . We denote by N the set of non-negative integers, Z the set of integers and, for the sake of readability, we let xn = x mod n, for any x ∈ Z. We let Zn = {xn | x ∈ Z} equipped with operations modulo n. 2.2. Axiomatic for quantum mechanics in short Here, we use the previous formalism to describe quantum systems. Given a set Q = {q1 , . . . , qm }, every qi can be represented by its characteristic vector ei ∈ {0, 1}1×m having 1m at the ith position and 0 elsewhere. A quantum state on Q is a superposition = k=1 k ek , where the coefficients k are complex amplitudes and
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
397
= 1. Every ek is called pure state. Given an alphabet = { 1 , . . . , H }, with every symbol i we associate a unitary transformation U ( i ) : C1×m → C1×m . An observable is described by an m × m Hermitian matrix O = c1 P1 + · · · + cs Ps . Suppose that, at a given time, a quantum system is described by the quantum state . Then, we can operate: (1) Evolution U ( i ). The new quantum state = U ( i ) is reached; this dynamics is reversible, since = U † ( i ). (2) Measurement of O. Every result in {c1 , . . . , cs } can be obtained; cj is obtained with probability Pj 2 and the state after such a measurement is Pj / Pj . The state transformation induced by a measurement is typically irreversible. 2.3. One-way quantum finite automata, stochastic events and languages Several models of quantum automata have been proposed in the literature. Basically, they differ in the measurement policy [2,4,7,8]. In this paper, we consider only the measure-once model. Measure-once 1qfa’s [3,6,10] are the simplest model of quantum automata. In this model, the transformation on a symbol of the input alphabet is realized by a unitary operator, and a unique measurement is performed at the end of computation. In what follows, we will simply write 1qfa, understanding the designation “measure-once”. Let ∗ be the free monoid of words generated by the finite alphabet . For any w ∈ ∗ , we denote by #
(w) the number of occurrences of the symbol ∈ within w. Clearly, the length of w is ∈ # (w). A stochastic event on ∗ is a function p : ∗ → [0, 1]. A 1qfa with q control states on the input alphabet is a system A = (, {U ( )} ∈ , P ), where ∈ C1×q , for each ∈ , U ( ) ∈ Cq×q is a unitary matrix, and P ∈ Cq×q is a projector that biunivocally determined the observable O = 1 · P + 0 · (I − P ). For the sake of simplicity, we will denote the family {U ( )} ∈ by simply writing U ( ). The stochastic event induced by A is the function pA : ∗ → [0, 1] defined, for any
1 · · · k ∈ ∗ , by k 2 pA ( 1 · · · k ) = U ( i ) P . i=1
(1)
Sometimes, it will be more convenient to specify the 1qfa A in the equivalent form A = (, U ( ), F ), where F ⊆ {1, . . . , q} indexes the (final) states spanning the subspace onto which P projects. In this case, the event induced by A writes as 2 k pA ( 1 · · · k ) = U ( i ) . j ∈F i=1 j
(2)
The reader may easily verify that Eq. (2) coincides with Eq. (1). Given an event p : ∗ → [0, 1] and a real ∈ [0, 1], the language L ⊆ ∗ defined by p with cut-point is the set L = {w ∈ ∗ | p(w) > }.
398
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
The cut-point is said to be isolated if there exists a positive real such that |p(w) − | , for any w ∈ ∗ . Moreover, if p is induced by the 1qfa A, then L is said to be recognized by A with cut-point (isolated by ). 2.4. Uniform convergence of empirical averages of random variables to their expectations Bernoulli’s theorem (see, e.g., [15]) states that the relative frequencies of an event A in a sequence of independent trials converges, in probability, to the probability of A. More precisely, given a space I on which a probability measure P is defined, let be A ⊂ I and A : I → {0, 1} its characteristic function. Observe that the expectation E[A ] is the (S) of independent trials x , . . . , x , the empirical probability P A of A and, for a sequence C 1 S S average 1/S t=1 A (xt ) is the relative frequency A (C (S) ) of the elements of A in C (S) . Bernoulli’s theorem states that, for every probability distribution P on I, we have lim Prob |A (C (S) ) − PA | ε = 0 for every ε > 0. S→∞
In [19,20], the more general problem of uniform convergence of relative frequencies to their probabilities is studied. For a class D ⊂ 2I , we say that the uniform convergence of relative frequencies to their probability holds for D if and only if, for every probability distribution P on I, we have
lim Prob sup {|A (C (S) ) − PA |} ε = 0 for every ε > 0. S→∞
A∈D
To characterize the classes D for which the uniform convergence of relative frequencies to their probability holds, the relevant combinatorial measure called Vapnik–Chervonenkis dimension is introduced in [20]: A set of points {x1 , x2 , . . . , xt } is shattered by D if {(A (x1 ), A (x2 ), . . . , A (xt )) | A ∈ D} = {0, 1}t . The maximal cardinality of sets shattered by D is called Vapnik–Chervonenkis dimension of D (VC-dim(D), for short). The main result in [20] states that the uniform convergence of relative frequencies to their probability holds for D if and only if VC-dim(D) < ∞. Several attempts have been made to extend the VC-dim to arbitrary random variables. Here, we are interested in random variables of the form f : I → [0, 1]. In this framework, a useful measure is the Vapnik dimension: Definition 1. Given a class B of functions f : I → [0, 1] and ∈ (0, 1), a subset A ⊂ I is said to be shattered by B if, for every X ⊂ A, there exists g ∈ B for which x ∈ X implies g(x) , and x ∈ A − X implies g(x) < . Then the Vapnik dimension V-dim(B) is the maximal cardinality of shattered subsets of I. If B is finite, a simple bound for V-dim(B) is easily seen to be V-dim(B) log |B|.
(3)
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
399
The following theorem gives a quantitative measure of uniform convergence of empirical averages of random variables f : I → [0, 1] to their expectation. It is an immediate consequence of Theorem 3.6 and Lemmas 2.3 and 2.4 in [1]: Theorem 1 (Alon et al. [1]). Let B be the class of functions {fw : I → [0, 1] | w ∈ ∗ }, and P a probability distribution over I. Let (w) be the expectation of fw according to P, and S (w) = 1/S St=1 fw (t ) an empirical average, where 1 , . . . , S are drawn independently at random according to P. Then, for every probability distribution P and every , > 0, we get Prob
sup | S (w) − (w)| <
w∈∗
for S=O
d
3
log2
d
2
+
1
2
log
1
and d = V-dim(B).
3. Approximating the convex closure of classes of stochastic events: the general case The problem we shall be dealing with concerns the analysis of 1qfa’s whose induced events approximate given stochastic events in the following sense: Definition 2. A -approximation in L∞ of a given stochastic event p : ∗ → [0, 1] is any stochastic event q : ∗ → [0, 1] satisfying sup {|p(w) − q(w)|} .
w∈∗
Given a family = { : ∗ → [0, 1] | ∈ I } of stochastic events induced by M˜ be the convex closure of , i.e., the class of stochastic state 1qfa’s ( , U ( ), P ), let events obtained as convex linear combination (w) = ∈I b (w), with b 0 and ∈I b = 1. We are interested in estimating the number of states of 1qfa’s inducing stochastic events ˜. that -approximate ∈ Since b 0 and ∈I b = 1, we can interpret b ’s as a probability distribution on I. Then, for any w ∈ ∗ , (w) becomes a random variable with expectation E[ (w)] =
∈I
b (w) = (w).
400
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
We can approximate such an expectation by an empirical average of the events in . To this purpose, we design the following algorithm: ALGORITHM 1 for t := 1 to S do [t] := independently chosen in I with probability b ; output the 1qfa Adefined as 1 A= [t] , U[t] ( ), P[t] . S [t] [t] [t] It is easy to verify that the 1qfa A output by the previous algorithm has S · M states, and induces the stochastic event S : ∗ → [0, 1] defined, for any w ∈ ∗ , as
S (w) =
S 1 (w). S t=1 [t]
Moreover, notice that S is an empirical average of the events in . Now, if
Prob sup | S (w) − (w)| < 1 w∈∗
(4)
holds true, then the existence of a 1qfa—with (S · M) states—inducing a -approximation of the given stochastic event is guaranteed. Estimating
S 1 Prob sup (w) − E[ (w)] S t=1 [t] w∈∗
is a classical problem of uniform convergence of empirical averages to their expectations, a problem addressed in Section 2. A general solution in terms of the Vapnik dimension of the class of random variables { (w) | w ∈ ∗ } directly follows from Theorem 1: Theorem 2. Let { (w) | ∈ I } be a class of stochastic events induced by M-state 1qfa’s, with d = V-dim({ (w) | w ∈ ∗ }). Then every convex linear combination (w) = ∈I b (w) can be -approximated by a 1qfa with O((Md/3 ) log2 (d/2 )) states. To apply this result to the synthesis of small size 1qfa’s, we must require that: (1) The Vapnik dimension of the family must be finite. (2) The class of the events given by convex linear combinations of events in the family must not be trivial. In the next section, we consider a class of events satisfying both these conditions. We end this section with a quick comment on the applicability of the technique here presented in the realm of probabilistic automata. A probabilistic automaton is similar to a 1qfa: the main difference is that its transition matrices and superpositions are stochastic instead of unitary (we refer to, e.g., [16,18] for details). As the reader may easily verify, our technique can be directly used to evaluate the size of probabilistic automata -approximating
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
401
convex linear combinations of stochastic events, thus obtaining the analogue of Theorem 2 for probabilistic automata. 4. The commutative periodic case We recall that a language is recognized with isolated cut-point by a 1qfa if and only if it is a group language [3,6], i.e., it can be recognized by a deterministic automaton where, for any input symbol, the corresponding transition function is a permutation [17]. In this section, we consider the case where all such permutations commute. This naturally leads to the following. Definition 3. Given an alphabet = { 1 , 2 , . . . , H }, a stochastic event : ∗ → [0, 1] ˆ : Zn H → [0, 1] such that, is said to be n-periodic commutative if there exists a function ∗ for any w ∈ , we have
(w) = ˆ (# 1 (w)n , # 2 (w)n , . . . , # H (w)n ). ˆ can be viewed as a real vector whose components are indexed by Zn H . Hence, From now on, we will always denote by pˆ the vector associated with the periodic commutative event p, according to Definition 3. Now let = { (w) | ∈ I } be a class of n-periodic commutative events induced by M-state 1qfa’s, and set B = { (w) | w ∈ ∗ }. Since ˆ (k1 , k2 , . . . , kH ) | 0 k1 , k2 , . . . , kH < n}, { (w) | w ∈ ∗ } = { we have that |B| nH . By directly using the simple bound of inequality (3), we get V-dim(B) H log n. Hence, from Theorem 2, we get that we can -approximate any convex linear combination of events in by 1qfa’s with O((M · H log n/3 )(log log n + log (H /2 ))2 ) states, i.e., almost logarithmic in n. We can improve such a bound with a simple direct approach. We use Höffdings’inequality [9]: If Xi ’s are i.i.d. random variables with values in [0, 1] and expectation , then for any S 1 S
1 2 Prob Xi − 2e−2 S . (5) S i=1 This tool enables us to prove Theorem 3. Given a family of n-periodic commutative events induced by M-state 1qfa’s on an alphabet with H symbols, any event in the convex closure of can be -approximated by the event induced by a 1qfa with O((M · H /2 ) log n) states. ∗ Proof. Let = { 1 , . . . , H }, and let = { : → [0, 1] | ∈ I } be the class of nperiodic commutative events. Let (w) = ∈I b (w) be a convex linear combination
402
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
of events in . By using the construction in Algorithm 1, we are able to realize the event
S (w) such that
Prob sup | S (w) − (w)| w∈∗
= Prob
n
H
max
0 k1 ,...,kH
max
0 k1 ,...,kH
ˆ (k1 , . . . , kH ) − ˆ (k1 , . . . , kH )|} {| S
ˆ (k1 , . . . , kH ) − ˆ (k1 , . . . , kH )| }} {Prob{| S
(by union bound)
nH 2e−2
2
S
By requiring nH 2e−2
2
(by Höffdings’ inequality (5)). S
< 1, we get the result.
5. Approximating a family of periodic commutative events In this section, we study a class of n-periodic commutative events that are approximable by events induced by O(log n)-state 1qfa’s. In particular, we investigate the relation between such an approximability and the 1 -norm of the discrete Fourier transform of these events. We first need to briefly recall the notion of multidimensional discrete Fourier transform. Given an alphabet = { 1 , . . . , H }, let p : ∗ → [0, 1] be an n-periodic commutative event, and pˆ its associated vector. The discrete Fourier transform of pˆ is the complex vector P = F(p), ˆ where P : Zn H → C and P (j1 , . . . , jH ) = p(k ˆ 1 , . . . , kH )ei(2/n)(k1 j1 +···+kH jH ) . 0 k1 ,...,kH
By the well-known inversion formula, we have 1 P (j1 , . . . , jH ) e−i(2/n)(k1 j1 +···+kH jH ) . (6) nH 0 k1 ,...,kH
Theorem 4. Let p : ∗ → [0, 1] be an n-periodic commutative event on an alphabet with H symbols. Then, the event 21 + 21 (nH / F(p) ˆ 1 )p is -approximable by the event 2 induced by a 1qfa with O(H log n/ ) states. Proof. If = { 1 , . . . , H }, let pˆ : Zn H → [0, 1] be the vector associated with the n-periodic commutative event p, and P = F(p). ˆ Set P (j1 , . . . , jH ) = (j1 , . . . , jH ) eiϑ(j1 ,...,jH ) , where (j1 , . . . , jH ) and ϑ(j1 , . . . , jH ) are the modulus and the phase of
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
403
P (j1 , . . . , jH ), respectively. By recalling Eq. (6), and observing that pˆ has values in [0, 1], we get nH p(k ˆ 1 , . . . , kH ) F(p) ˆ 1 (j1 , . . . , jH ) −i((2/n)(k1 j1 +···+kH jH )−ϑ(j1 ,...,jH )) = e F(p) ˆ 1 0 j1 ,...,jH
1 2
+ 21 p is -approximable by the event
In the following example, we use this result to show that, from a descriptional point of view, quantum automata are more powerful than nondeterministic automata on accepting certain languages.
404
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
Example 1. Given an alphabet = { 1 , . . . , H }, define the language Ln,H = {w ∈ ∗ | # 1 (w)n = · · · = # H (w)n = 0}. If Ln,H is recognized by a 1-way nondeterministic finite automaton A, then A has at least nH states. In fact, suppose that any non-final state of A has an outgoing path leading to a final state. If the number of states of A was less than nH , then, by a simple counting argument, there would exist two distinct words x = k11 · · · kHH and y = s11 · · · sHH with 0 k1 , . . . , kH , s1 , . . . , sH < n which, given as input, would take A to the same state q. Let j j now z = 11 · · · HH be a word taking A from q to a final state. We get that both xz and yz belong to Ln,H , that is, ki + ji n = si + ji n = 0, for 1 i H . This implies that ki = si for 1 i H , against the hypothesis x = y. On the contrary, there exists a 1qfa accepting Ln,H with isolated cut-point, which is exponentially more succinct both in the period n and in the cardinality H of the input alphabet. In fact, the language Ln,H can be defined by the n-periodic commutative event p whose associated function is 1 if k1 = · · · = kH = 0, p(k ˆ 1 , . . . , kH ) = 0 otherwise. Now let P = F(p). ˆ For every 0 j1 , . . . , jH < n, we have P (j1 , . . . , jH ) = 1 and hence P 1 = nH . By applying Corollary 1, we have that the event 21 + 21 p is 18 -approximable by a 1qfa with O(H log n) states, thus accepting Ln,H with isolated cut-point.
6. The unary case In this section, we focus on the particular case of unary alphabets, e.g., = {a}. Languages defined by (periodic) unary events are called (periodic) unary languages; periodic unary languages are exactly the group unary languages. In this section we point out a relation between the minimum size of a 1qfa recognizing a unary periodic language L and the 1 -norm of the discrete Fourier transform of the characteristic function L . The first result is a direct consequence of Corollary 1: Theorem 5. Let p : {a}∗ → [0, 1] be an n-periodic event with F(p) ˆ 1 n, and let L be a unary language defined by p with cut-point isolated by 4. Then L can be recognized by a 1qfa with cut-point 21 + 21 isolated by and O((1/2 ) log n) states. As an application, we exhibit a class of unary languages recognizable by 1qfa’s with less states than the equivalent probabilistic automata [16,18]. Example 2. Consider the language Ln = {a kn | k ∈ N}; let Q(n) (S(n)) be the minimum number of states for 1qfa’s (probabilistic automata) accepting Ln with isolated cut-point. By Example 1, we have that Ln = Ln,1 is recognized with isolated cut-point by a 1qfa with O(log n) states, yielding Q(n) = O(log n). If n is prime, the same upper bound is obtained in [2] by different techniques.
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
405
By recalling a result in [14], we have that if n = kj =1 pj j is the prime factorization of k n, then S(n) = ( j =1 pj j ). By a direct computation, it can be shown that the global min imum of the function f (x1 , . . . , xk ) = kj =1 exj , with constraints kj =1 xj = log n and xj 0, is ke(1/k) log n . This implies S(n) = (ke(1/k) log n ). Observe that n = kj =1 pj j implies n k!, whence k log n/log log n(1 + o(1)). Since f (k) = ke(1/k) log n is monotone decreasing in the interval [1, log n], we get S(n) = (log2 n/log log n). In conclusion, having Q(n) = O(log n), we obtain S(n)/Q(n) = (log n/log log n). Furthermore, if n factorizes in a constant number of prime factors, then S(n) is “exponentially greater” than Q(n). We have previously stated that if the 1 -norm of the discrete Fourier transform of an n-periodic event p is bounded by n, then 21 + 21 p is approximable by small size 1qfa’s. Now, we study the converse problem. We bound the 1 -norm of the discrete Fourier transform of periodic events induced by 1qfa’s in terms of the number of states. Theorem 6. Let p : {a}∗ → [0, 1] be an n-periodic event induced by an s-state 1qfa. Then F(p) ˆ 1 ns. Proof. Let A = (, U (a), F ) the 1qfa inducing the event p, i.e., p(a k ) = j ∈F |(U k 2 † (a) )j | . Since U (a) is unitary, then it can be decomposed as U (a) = U U , where U is a elements are the eigenvalues eiϑl of U (a), unitary matrix and is a diagonal matrix whose k for 1 l s. Thus, we can write p(a ) = j ∈F |(U diag(eik ϑ1 , . . . , eik ϑs )U † )j |2 . From [5, Lemma 3], we know that ei(ϑl −ϑr ) = e−i(2/n)zlr , where zlr ∈ Zn . By setting ˜ = U , with a direct computation we get p(a k ) =
1 l,r s
(˜ l ˜ ∗r
j ∈F
Ulj Urj∗ )e−i(2/n)kzlr .
(7)
Calling (P (0), . . . , P (n − 1)) = F(p), ˆ we have (see Eq. (6)) p(a k ) =
n−1 t=0
P (t) −i(2/n)kt . e n
(8)
By comparing Eqs. (7) and (8), we get P (t) ˜ l ˜ ∗r = n {zlr | zlr =t}
j ∈F
Ulj Urj∗
.
Hence n−1 F(p) ˆ 1 ∗ ∗ = ˜ l ˜ r Ulj Urj |˜ l ||˜ r | |Ulj ||Urj |. l,r n t=0 {zlr | zlr =t} j ∈F j ∈F
406
Since
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
j ∈F
|Ulj ||Urj |
F(p) ˆ 1 n
s l=1
By Schwartz inequality
s
j =1 |Ulj ||Urj |
j
|Ulj |2
j
|Urj |2 = 1 we have
2 |˜ l |
.
√ s √ | ˜ | · 1 s ˜ l |2 = s. Thus F(p) ˆ 1 ns. l=1 l l=1 |
s
Finally, we relate the size of 1qfa’s accepting periodic languages with the 1 -norm of the discrete Fourier transform of the corresponding characteristic functions. Theorem 7. Let L be an n-periodic language and L its characteristic function. ˆ ) 1 n, then L is recognizable by an O(log n)(1) If L is 18 -approximable by with F( 1 state 1qfa with cut-point isolated by 8 . (2) If L is recognizable by an O(log n)-state 1qfa with cut-point isolated by 18 , then L is 3 ˆ 7 -approximable by with F( ) 1 = O(n log n). Proof. ˆ ) 1 n, by Corollary 1 there exists an O(log n)-state 1qfa inducing a 1 (1) Since F( 16 approximation of (1+ )/2. Moreover, for any k 0, | (a k )−L (a k )| 18 . Therefore, 1 + L (a k ) 1 + (a k ) (a k ) − (a k ) 1 k k . − (a ) − (a ) + 8 2 2 2 So, L is accepted by the 1qfa inducing with cut-point 43 isolated by 18 . (2) If L is recognized by an O(log n)-state 1qfa with cut-point isolated by 18 , then it is 1 easy to find an O(log n)-state 1qfa A recognizing L with cut-point 21 isolated by 14 . 3 The event induced by A is a 7 -approximation of L . By applying Theorem 6, we get ˆ ) 1 = O(n log n). F( Acknowledgements The authors wish to thank an anonymous referee for helpful comments. References [1] N. Alon, S. Ben-David, N. Cesa-Bianchi, D. Haussler, Scale-sensitive dimensions, Uniform Convergence, and Learnability, J. ACM 44 (1997) 615–631. [2] A. Ambainis, R. Freivalds, 1-way quantum finite automata: strengths, weaknesses and generalizations, in: Proc. 39th Symp. on Foundations of Computer Science, 1998, pp. 332–342. [3] A. Bertoni, M. Carpentieri, Regular languages accepted by quantum automata, Inform. Comput. 165 (2001) 174–182. [4] A. Bertoni, C. Mereghetti, B. Palano, Quantum computing: 1-way quantum automata, in: Proc. Seventh Conf. on Developments in Language Theory, Lecture Notes in Computer Science, Vol. 2710, Springer, Berlin, 2003, pp. 1–20.
A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407
407
[5] A. Bertoni, C. Mereghetti, B. Palano, Golomb rulers and difference sets for succinct quantum automata, Internat. J. Foundations Comput. Sci. 14 (2003) 871–888. [6] A. Brodsky, N. Pippenger, Characterizations of 1-way quantum finite automata, SIAM J. Comput. 31 (2001) 1456–1478. [7] J. Gruska, Quantum Computing, McGraw-Hill, New York, 1999. [8] J. Gruska, Descriptional complexity issues in quantum computing, J. Automata, Languages Combinatorics 5 (2000) 191–218. [9] W. Höffdings, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963) 13–30. [10] C. Moore, J. Crutchfield, Quantum automata and quantum grammars, Theoretical Computer Science 237 (2000) 275–306. A preliminary version of this work appears as Technical Report in 1997. [11] M. Marcus, H. Minc, Introduction to Linear Algebra, Macmillan, NewYork, 1965 (reprinted by Dover, 1988). [12] M. Marcus, H. Minc, A Survey of Matrix Theory and Matrix Inequalities, Prindle, Weber & Schmidt, Boston, MA, 1964 (reprinted by Dover, 1992). [13] C. Mereghetti, B. Palano, On the size of one-way quantum finite automata with periodic behaviors, Theoret. Inform. Appl. 36 (2002) 277–291. [14] C. Mereghetti, B. Palano, G. Pighizzini, Note on the succinctness of deterministic, nondeterministic, probabilistic and quantum finite automata, Theoret. Inform. Appl. 35 (2001) 477–490. [15] A.M. Mood, F.A. Graybill, D.C. Boes, Introduction to the Theory of Statistics, McGraw-Hill, New York, 1983. [16] A. Paz, Introduction to Probabilistic Automata, Academic Press, New York, 1971. [17] J.E. Pin, On languages accepted by finite reversible automata, in: Proc. 14th Internat. Colloq. on Automata, Languages and Programming, Lecture Notes in Computer Science, Vol. 267, Springer, Berlin, 1987, pp. 237–249. [18] M. Rabin, Probabilistic automata, Inform. Control 6 (1963) 230–245. [19] V.N. Vapnik, Inductive principles of the search for empirical dependencies, in: Proc. Second Workshop on Computer in Learning Theory, Morgan Kaufmann, Los Altos, CA, 1989, pp. 1–21. [20] V.N. Vapnik, A.Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl. 16 (1971) 264–280.
Theoretical Computer Science 340 (2005) 408 – 431 www.elsevier.com/locate/tcs
New operations and regular expressions for two-dimensional languages over one-letter alphabet夡 Marcella Anselmoa,∗ , Dora Giammarresib , Maria Madoniac a Dipartimento Informatica ed Applicazioni, Università di Salerno, 84081 Baronissi (Sa), Italy b Dipartimento Matematica, Università di Roma “Tor Vergata”, via Ricerca Scientifica, 00133 Roma, Italy c Dipartimento Matematica e Informatica, Università di Catania, Viale Andrea Doria 6/a, 95125 Catania, Italy
Abstract We consider the problem of defining regular expressions to characterize the class of recognizable picture languages in the case of a one-letter alphabet. We define a diagonal concatenation and its star and consider two different families, L(D) and L(CRD), of languages denoted by regular expressions involving such operations plus classical operations. L(D) is characterized both in terms of rational relations and in terms of two-dimensional automata moving only right and down. L(CRD) is included in REC and contains languages defined by three-way automata while languages in L(CRD) necessarily satisfy some regularity conditions. Finally, we introduce new definitions of advanced stars expressing the necessity of conceptually different definitions for iteration. © 2005 Elsevier B.V. All rights reserved. Keywords: Two-dimensional language; Regular expression
1. Introduction The theory of one-dimensional string languages is well founded and investigated since 1950. From several years, the increasing interest for pattern recognition and image 夡 Work
partially supported by Miur Cofin Linguaggi Formali e Automi: Metodi, Modelli e Applicazioni.
∗ Corresponding author.
E-mail addresses: [email protected] (M. Anselmo), [email protected] (D. Giammarresi), [email protected] (M. Madonia). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.031
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
409
processing has also motivated the research on two-dimensional string languages, and nowadays this is a field of big investigation. The scope is the generalization, or possibly the extension, of the richness of the theory of one-dimensional languages to two dimensions. A first attempt has been devoted to the study of two-dimensional languages defined by finite state devices, with the aim of finding a counterpart of what regular languages are in one dimension. Many approaches have been presented in the literature considering all ways to define regular languages: finite automata, grammars, logic and regular expressions. In 1991, an unifying point of view was presented by A. Restivo and D. Giammarresi who defined the family REC of recognizable picture languages (see [7]). This class seems to be the candidate as “the” generalization of the class of regular one-dimensional languages. Indeed REC is well characterized from very different points of view and thus inherits several properties from the class of regular string (one-dimensional) languages. It is characterized in terms of projections of local languages (tiling systems), of some finite-state automata, of logic formulas and of regular expressions with alphabetic mapping. The approach by regular expressions is indeed not completely satisfactory: the concatenation and star operations involved there are partial functions and moreover an external operation of alphabetic mapping is needed. Then, in [7], the problem of stating a Kleene-like theorem for the theory of recognizable picture languages remains open. Several papers were recently devoted to find a better formulation for regular expressions for two-dimensional languages. In [15], O. Matz affords the problem of finding some more powerful expressions to represent recognizable picture languages and suggests some regular expressions where the iteration is over combinations of operators, rather than over languages. The author shows that the power of these expressions does not exceed the family REC, but it remains open whether or not it exhausts it. In [18] some tiling operation is introduced as extension of the Kleene star to pictures and a characterization of REC that involves some morphism and the intersection is given. The paper [19] compares star-free picture expressions with first-order logic. The aim of this paper is to look for a homogenous notion of regular expression that could extend more naturally the concept of regular expression of one-dimensional languages. In this framework, we propose some new operations on pictures and picture languages and study the families of languages that can be generated using classical and new operations. The paper focuses on one-letter alphabets. This is a particular case of the more general case of several letter alphabets. However this is not only a simpler case to handle, but it is a necessary and meaningful case to start. Indeed studying two-dimensional languages on one-letter alphabets means to study the “shapes” of pictures: if a picture language is in REC then necessarily the language of its shapes is in REC. Such approach allows us to separate the twofold nature of a picture: its shape and its content. Classical concatenation operations on pictures and picture languages are the row and column concatenations and their closure. Regular expressions that use only Boolean operations and this kind of concatenations and closure however cannot define a large number of two-dimensional languages in REC. As an example, take the simple language of “squares” (that is pictures with number of rows equal to the number of columns). The major problem with this kind of regular expressions is that they cannot describe any relationship existing between the two dimensions of the pictures. Such operations are useful
410
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
to express some regularity either on the number of rows or on the number of columns but not between them. This is the reason we introduce, in the one-letter case, a new concatenation operation between pictures: the diagonal concatenation. The diagonal concatenation introduces the possibility of constructing new pictures forcing some dependence between their dimensions. Moreover an important aspect of the diagonal concatenation is that it is a total function between pictures. This allows to find a quite clean double characterization of D-regular languages, the picture languages denoted by regular expressions containing union, diagonal concatenation and its closure: they are exactly those picture languages in which the dimensions are related by a rational relation and also exactly those picture languages recognizable by particular two-dimensional automata moving only right and down. Unfortunately, an analogous situation does not hold anymore, when we also introduce row and column concatenations in regular expressions, essentially because they are partial functions. The class of CRD-regular languages, the languages denoted by regular expressions with union, column, row and diagonal concatenations and their closures, strictly lies between the class of languages recognized by three-way deterministic automata and REC. The main result for CRD-regular languages is a necessary condition regarding a sort of “regularity” in the possible “extensions” of a picture in a given language to another bigger picture in the language. In a CRD-regular language: if a picture is sufficiently “long” then we can concatenate to it some picture infinitely often by columns; if a picture is sufficiently “high” then we can concatenate to it some picture infinitely often by rows; if a picture is sufficiently “big” then we can concatenate to it some picture infinitely often in diagonal. This result generalizes in some sense what regularity implies in one-dimensional languages over a one-letter alphabet. We also provide a collection of examples classically considered in the literature, specifying for each of them whether they belong or not to the classes of two-dimensional languages considered throughout the paper. Examining some examples of languages not captured by CRD formalism, we find out that the “extensions’’ of a picture cannot be obtained by iterating the concatenation of the same picture, and this independently from the picture to what we concatenate. On the contrary, for some languages, a kind of iteration that generate pictures in a “non-uniform” way is needed, indeed depending from the picture just obtained. This is a new situation with respect to the one-dimensional case. Such considerations show the necessity of a more complex definition for regular expressions in order to denote a wider class of two-dimensional languages in REC. We introduce the definitions of some advanced stars. They allow to capture a wider class of languages, that still remains inside the class REC. All definitions are given in such a way to synchronize the steps of iteration on a picture with the picture just constructed. Observe that in this case we exploit the partial nature of column and row concatenation operations. We conclude by discussing some ideas for extending all those definitions to the general alphabet case. The paper is organized as follows. In Section 2 we recall some preliminary definitions and results later used in the paper. Section 3 contains the main results: it presents our proposals for possible classes of regular expressions. Moreover, it contains a table summarizing a wide collection of examples. In Section 4 we define new star operations that allow to describe many more languages (over one-letter alphabet) in REC, while in Section 5 we draw some
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
411
conclusions and proposals to extend results of this paper to two-dimensional languages over general alphabets. A preliminary version of this paper appeared in [1].
2. Preliminaries In this section we recall terminology for two-dimensional languages. Then, we briefly describe some machine-based model for recognizing two-dimensional languages and summarize all major results concerning the class REC of recognizable two-dimensional languages, that is the one that seems to generalize better the family of regular string languages to two dimensions. We assume the reader is familiar with the basic terminology and properties of the theory of one-dimensional languages as can be found for example in [8]. We will first introduce some definitions about two-dimensional languages by borrowing and extending notation from the theory of one-dimensional languages. Next, we will give formal definitions of the classical concatenation operations between two-dimensional strings (pictures) and two-dimensional languages. The notations used can be mainly found in [7]. Let be a finite alphabet. A two-dimensional string (or a picture) over is a twodimensional rectangular array of elements of . The set of all two-dimensional strings over is denoted by ∗∗ . A two-dimensional language over is a subset of ∗∗ . Given a picture p ∈ ∗∗ , let 1 (p) denote the number of rows of p and 2 (p) denote the number of columns of p. The pair (1 (p), 2 (p)) is called the size of the picture p. Unlike the one-dimensional case, we can define an infinite number of empty pictures namely all the pictures of size (n, 0) and of size (0, m), for all m, n 0, that we call empty columns and empty rows, and denote by 0,m and n,0 respectively. The empty picture is the only picture of size (0, 0) and it will be denoted by 0,0 . We indicate by col and row the language of all empty columns and of all empty rows, respectively. We give first some simple examples of two-dimensional languages. Example 1. Let = {a} be a one-letter alphabet. The set of pictures of a’s with three columns is a two-dimensional language over . It can be formally described as L = {p | 2 (p) = 3} ⊆ ∗∗ . As another example let L be the subset of ∗∗ that contains all the pictures with a shape of “squares”. More formally, L = {p | 1 (p) = 2 (p)} ⊆ ∗∗ . We now recall the classical concatenation operations between pictures and picture languages. Let p and q be two pictures over an alphabet , of size (n, m) and (m , n ), m, n, m , n 0, respectively. Definition 2. The column concatenation of p and q (denoted by p ❡q) and the row concatenation of p and q (denoted by p ❡q) are partial operations, defined only if n = n and if m = m , respectively and are given by p ❡q = p
q ,
p p ❡q = . q
412
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
Moreover we set p ❡n,0 = p and p ❡0,m = p that is, the empty columns and the empty rows are the neutral elements for the column and the row concatenation operations, respectively. As in the string language theory, these definitions of pictures concatenation can be extended to define concatenations between set of pictures. Let L1 , L2 ⊆ ∗∗ , the column concatenation and the row concatenation of L1 and L2 are defined respectively by L1 ❡L2 = {p ❡q| p ∈ L1 , q ∈ L2 }
and L1 ❡L2 = {p ❡q| p ∈ L1 , q ∈ L2 }.
By iterating the concatenation operations, we can define the columns and rows transitive closures, which are somehow “two-dimensional Kleene star”. Let L be a picture language. Definition 3. The column closure (column star) and the row closure (row star) of L are defined as ❡ i ❡ ❡ i ❡ L∗ = L , L∗ = L , i 0
i 0
❡ ❡ ❡ ❡ ❡ ❡ ❡ where L0 = col , L1 = L, Ln = L ❡L(n−1) and L0 = row , L1 = L, Ln = ❡ L ❡L(n−1) . 2.1. Automata for two-dimensional languages In this section we briefly review different kinds of automata that read two-dimensional tapes. All models reduce to conventional automata when restricted to operate on one-row pictures. One of the first attempts at formalizing the concept of “recognizable picture language” was made by M. Blum and C. Hewitt who in 1967 introduced a model of finite automaton that reads a two-dimensional tape (cf. [3]). A deterministic (non-deterministic) four-way automaton, denoted by 4DFA (4NFA), is defined as an extension of the two-way automaton that recognizes strings (cf. [8]) by allowing it to move in four directions: Left, Right, Up, Down. For example, a 4DFA can recognize squares by starting its computation from top-left corner of a given picture and going alternatively one step right one step down (i.e. following the diagonal) till it reaches the bottom-right corner. The families of picture languages recognized by some 4DFA and 4NFA are denoted by L(4DFA) and L(4NFA), respectively. An important result (cf. [3]) states that, unlike in the one-dimensional case, the family L(4DFA) is strictly included in the family L(4NFA). Both families L(4DFA) and L(4NFA) are closed under Boolean union and intersection operations. The family L(4DFA) is also closed under complement, while for L(4NFA) this is not known. From several points of view, four-way automata could appear as a reasonable model of computation for two-dimensional tapes and they were widely studied, but they have a major bug. In fact, it can be proved that both L(4DFA) and L(4NFA) are not closed under row and column concatenation and closure operations [12]. In [14], a weaker model called three-way automaton is also considered in the two versions non-deterministic and deterministic (referred to as 3NFA and 3DFA, respectively) that is
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
413
allowed to move right, left and down only. The family L(3NFA) is strictly included in L(4NFA). Another interesting model of two-dimensional automaton is the two-dimensional on-line tessellation acceptor (denoted by 2-OTA) introduced in [9]. In a sense the 2-OTA is an infinite array of identical finite-state automata in a two-dimensional space. The computation goes by diagonals starting from top-left towards bottom-right corner of the picture. Depending on the corresponding kinds of automata we can have a deterministic or a nondeterministic version of 2-OTA. Despite the fact that this model is quite different in principle from four-way automaton, also in this case the family of languages corresponding to a determinist 2-OTA is strictly included in the one corresponding to the non-deterministic model. In [9] it is proved that the family of two-dimensional languages recognized by a 2-OTA, L(2-OTA) is closed under union and intersection and also under row/column concatenation and row/column star while it is not closed under complement. Moreover L(2-OTA) properly includes family L(4NFA). The only trouble with this 2-OTA model is that it is quite difficult to manage. 2.2. Tiling systems and the class REC A different way to define (recognize) picture languages was introduced by A. Restivo and D. Giammarresi in [6]. It generalizes the characterization of regular languages by means of local strings language and alphabetic mapping to two dimensions (the local set together with the mapping is an alternative description of the state graph of an automaton). We recall that a local language of strings is defined by means of a finite set of strings of length 2. As natural generalization, a local picture language L over an alphabet is defined by means of a finite set of pictures of size (2, 2) (called tiles) that represent all allowed sub-pictures for the pictures in L. To be more precise, such set is defined over ∪ {#} where # is a border symbol that we assume always to surround all the pictures. A tiling system for a language L over is a pair of a local language over an alphabet and an alphabetic mapping : → . The mapping can be extended in the obvious way from the alphabet to pictures over and to picture languages over . Then, we say that a language L ⊆ ∗∗ is recognizable by tiling systems if there exists a local language L over and a mapping : → such that L = (L ). The family of two-dimensional languages recognizable by tiling systems is denoted by REC. As an example, consider again the language L of squares over a one-letter alphabet = {a}. Then L is in REC since it can be obtained as (L ), where L is the language of squares over = {0, 1} that have 1 in the diagonal positions and 0 elsewhere and (0) = (1) = a. The family REC is closed under Boolean union and intersection but not under complement. It is also closed under all row and column concatenations and stars. Moreover, by definition, it is closed under alphabetic mappings. This notion of recognizability by tiling systems turns out to be very robust: in [11], it is proved that REC = L(2-OTA). Moreover finite tiling systems have also a natural logic meaning: in [7] it is shown that the family REC and the family of languages defined by existential monadic second order formulas coincide. And this is actually the generalization of Büchi’s theorem for strings to two-dimensional languages. The class REC can also be characterized in terms of regular expressions, as specified in Section 2.3.
414
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
2.3. Regular expressions and the class REC The characterizations of the family REC show that the family REC captures in some sense the idea of unification of the concept of recognizability from the two different points of view of descriptive and computational models, that is one of the main properties of the class of recognizable string languages. It seems thus natural to ask whether one can prove also a sort of two-dimensional Kleene’s Theorem. Using row and column concatenations and closure operations, it is possible to express two-dimensional languages by means of simpler languages. Nevertheless it can be observed that such classical operations are useful to express some regularity either on the number of rows or on the number of columns, but they cannot describe any relationship existing between the two dimensions of the pictures. As an example, already in the case of a one-letter alphabet, we have that languages such as the language of “squares’’ (see Example 1) cannot be described using only classical operations. More precisely, O. Matz [15] has characterized the class of languages that can be obtained starting from finite languages and applying Boolean operations, column and row concatenations and stars, as the class of languages that are a finite union of Cartesian products of ultimately periodic string languages. Furthermore, it can be shown (cf. [7]) that to describe the whole class REC we need to allow also the alphabet mapping between the regular operations. This characterization of REC in terms of regular expressions seems not completely satisfactory, because it is not purely constructive and involves some external operations. Therefore the problem of proving a sort of two-dimensional Kleene’s Theorem, is still under investigation. Furthermore such considerations are a clear sign that, going from one to two dimensions we find a very rich family of languages that need a non-straightforward generalization of the one-dimensional definitions and techniques. In the next section we are going to define a new operation on picture languages and consider the class of languages that can be thereby denoted.
3. The diagonal concatenation and related regular expressions In this section we introduce a new operation on picture languages over a one-letter alphabet. We propose some different types of regular expressions involving the new operation, comparing the resulting classes of languages obtained with known families of picture languages. Through all the section, we assume to be in the case of languages over one-letter alphabet = {a}. Remark 4. When a one-letter alphabet is considered, any picture p ∈ ∗∗ is characterized only by its size. Therefore it can be equivalently represented either by a pair of words in ∗ , where the first one is equal to the first column of p and the second one to the first row of p, i.e. (a 1 (p) , a 2 (p) ), or simpler by its size, i.e. (1 (p), 2 (p)). Remark 5. The one-letter alphabet case means to consider the “shapes” of pictures. Indeed if L ⊆ ∗∗ , with || 2, is in REC then the language obtained by mapping into a one-letter alphabet {a}, is still in REC, since REC is closed under alphabetic mappings.
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
415
Therefore for a language in REC, it is a necessary condition that the language of its shapes is in REC. Let us denote by CR = {∪, ❡, ❡, ∗ ❡, ∗ ❡} the set of classical operations on picture languages (C for “columns” and R for “rows”), and by L(CR) the class of languages (over a one-letter alphabet) that can be denoted by a regular expression involving only operations in CR and starting from finite languages. O. Matz [15] has characterized L(CR) as the class of languages that are a finite union of Cartesian products of ultimately periodic string languages and he has shown that L(CR) is closed under intersection. 3.1. D-regular expressions We introduce a new simple definition of concatenation of two pictures in the particular case of one-letter alphabet. The definition is motivated by the necessity of an operation between pictures that could express some relationship existing between the dimensions of the pictures. We use this new concatenation to construct some regular expressions and to define a class of languages. This class is characterized in terms of the relations between the dimensions of the pictures and in terms of the four-way automata recognizing them. Let p and q be two pictures of size (n, m) and (n , m ) respectively over a one-letter alphabet. Definition 6. The diagonal concatenation of p and q (denoted by p \❡q) is a picture over of size (n + n , m + m ). It can be represented by p p \❡q =
q
.
Observe that, unlike the classical row and column concatenation, the diagonal concatenation is a total operation. As usual, it can be extended to define the diagonal concatenation between languages. Moreover the Kleene closure of \❡ can be defined as follows. Let L be a picture language over a one-letter alphabet. \❡ Definition 7. The diagonal closure or diagonal star of L (denoted by L∗ ) is defined as i \❡ \❡ L , L∗ =
i 0
where L
0 \❡
= {0,0 }, L1
❡
\
= L, Ln
❡
\
\❡ = L \❡L(n−1) .
Example 8. Let Ln,n be the language of squares (see Example 1) that is Ln,n = {p | 1 (p) = \❡ \❡ 2 (p) 0}. It can be easily shown that Ln,n = {(1, 1)}∗ = {0,1 \❡1,0 }∗ , observing that 0,1 \❡1,0 is the picture (1, 1). Example 9. Let L2n,2m be the language of rectangular pictures with even dimensions, that is ❡ ❡ L2n,2m = {p | l1 (p) = 2n, l2 (p) = 2m, n, m 0}. We have that L2n,2m = {{(2, 2)}∗ }∗ , ❡ ❡ \ \ and also L2n,2m = {0,2 }∗ \❡{2,0 }∗ , using the diagonal concatenation.
416
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
Proposition 10. The family REC is closed under diagonal concatenation and diagonal star. Proof. The proof uses similar techniques to the one for the closure of REC under row (or column) concatenation and star (see [6] for more details). A tiling system for L = L1 \❡L2 can be defined as follows. Let the local languages for L1 and L2 be given by a set of tiles 1 over an alphabet 1 and a set of tiles 2 over an alphabet 2 , respectively. Moreover we can always assume that 1 and 2 are disjoint. Then, the local language for L is defined over the alphabet 1 ∪ 2 ∪ {x}, where x is a new symbol not in 1 ∪ 2 . The set of tiles p s , where p and q belongs s q to the local languages of L1 and L2 , respectively, and s, s are pictures containing all x. For example, the non-border tiles of consist of all non-border tiles in 1 and 2 plus the tile containing all x plus tiles obtained by replacing by x all border symbols in all right-border and bottom-border tiles in 1 and all left-border and top-border tiles in 2 , plus tiles like
is defined, using 1 and 2 in a way to represent pictures
a x , where a and b are symbols in bottom-right corner tiles of and top-left corner tiles 1 x b of 2 , respectively. Observe that the last mentioned tiles are those that “glue’’ bottom-right corners of pictures in L1 to top-left corners of pictures in L2 . Finally, the projection from to maps all symbols to the unique symbol in . \❡ Regarding the closure under diagonal star, the tiling system for L ∗ can be defined as above using two different local languages (i.e. over disjoint local alphabets) for L. The diagonal concatenation can be used to generate families of picture languages, starting from atomic languages. Formally, let us denote D = {∪, \❡, ∗ \❡}; the elements of D are called diagonal-regular operations, briefly D-regular operations. Definition 11. A diagonal-regular expression (D-RE) is defined recursively as follows: (1) ∅, (0,0 ), (0,1 ), (1,0 ) are D-RE. \❡ (2) if , are D-RE then () ∪ (), () \❡(), ()∗ are D-RE. Every D-RE denotes a language using the standard notation. Languages denoted by D-RE are called diagonal-regular languages, briefly D-regular languages. The class of D-regular languages is denoted by L(D). Observe that languages containing a single picture (n, m) \❡ \❡ can be denoted by the D-RE En,m = (n1,0 ) \❡(m 0,1 ). We will now characterize D-regular languages in terms of rational relations and in terms of some 4FA. For this, let us recall that (see [2]) a rational relation over alphabets and is a rational subset of the monoid (∗ × ∗ , ., (, )), where the operation . is the componentwise product defined by (u1 , v1 ) (u2 , v2 ) = (u1 u2 , v1 v2 ) for any (u1 , v1 ), (u2 , v2 ) ∈ ∗ × ∗ . When the alphabet is = = {a}, there is a natural correspondence between pictures over and relations over × . For any relation T ⊆ ∗ × ∗ we define the picture language: L(T ) = {p| 1 (p) = |r1 | and 2 (p) = |r2 | for some (r1 , r2 ) ∈ T }.
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
417
Vice versa for any picture language L ⊆ ∗∗ we define the relation: R(L) = {(r1 , r2 ) ∈ ∗ × ∗ | |r1 | = 1 (p) and |r2 | = 2 (p) for some p ∈ L}. Remark 12. We recall that a 4NFA M over a one-letter alphabet is equivalent to a two-way two-tape automaton M1 (cf. [10]). In fact, let H1 and H2 be the first and the second heads of M1 respectively, then M1 simulates M as follows. If the input head H of M moves down (up) one square, M1 moves H1 right (left) one square without moving H2 , and if H moves right (left) one square, M1 moves H2 right (left) without moving H1 . Proposition 13. Let be a one-letter alphabet and let L ⊆ ∗∗ . Then L is a D-regular language if and only if L = L(T ) for some rational relation T ⊆ ∗ × ∗ if and only if L = L(A) for some 4NFA A that moves only right and down. Proof. In light of Remark 4, the componentwise concatenation in M = ∗ × ∗ exactly corresponds to the diagonal concatenation in ∗∗ . It is well-known that a rational subset of any monoid M is either empty or can be expressed, starting with singleton, by a finite number of the (rational) operations ∪, . (product) and .-closure (star). Thus L is a D-regular language if and only if L = L(T ) for some rational relation T ⊆ ∗ × ∗ . On the other hand, it is well-known that T ⊆ ∗ × ∗ is a rational relation iff it is accepted by a (finite) transducer, that is a (finite) automaton over ∗ × ∗ . Further such an automaton can be viewed as a (finite) one-way automaton with two tapes (cf. [17]). Then, in analogy to Remark 12, one-way two-tape automata are equivalent to 4NFA that move only right and down. Example 14. Let Ln,n be the language of squares, as in Example 8. We have Ln,n ∈ L(D). Indeed it can be easily shown that Ln,n is denoted by the following D-RE: En,n = \❡ (0,1 \❡1,0 )∗ . We have Ln,n =L(T ), where T is the rational relation T ={(a n , a n ) | n 0}. Further L = L(A) where A is the 4NFA that, starting in the top-left corner, moves along the main diagonal until it eventually reaches the bottom-right corner and accepts. More generally, the languages Ln,n+i = {p | l1 (p) = n, l2 (p) = n + i, n 1}, for some i 0, ❡ \❡ are denoted by the D-RE: En,n+i = En,n ❡((E1,i )∗ ), where E1,i = (i0,1 \❡1,0 ) denotes the language {(1, i)}. Example 15. Let L2n,2m be the language of even sides pictures, as in Example 9, that is L2n,2m = {p ∈ ∗∗ | l1 (p) = 2n, l2 (p) = 2m, n, m 0}. We have that L2n,2m = L(T ), where T is the rational relation T = {(a 2n , a 2m ) | n, m 0}. Further L = L(A), where A is the 4NFA that, starting in top-left corner, moves down checking the parity of the number of rows and then to the right checking the parity of the number of columns, eventually accepting in the bottom-right corner. Therefore L ∈ L(D). Indeed L ∈ L(CR) because of the characterization of L(CR) given in [15]. Corollary 16. L(CR) ⊂ L(D). Proof. Following [15], L(CR) is the class of languages that are a finite union of Cartesian products of ultimately periodic string languages. Let L ∈ L(CR) and suppose
418
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
L = i=1,...,k Ai × Bi . Then L can be recognized by a 4NFA (that moves only right and down) that non-deterministically checks whether a picture belongs to some Ai × Bi checking first the belonging of the first row to Ai and then the belonging of the last column to Bi . Hence L ∈ L(D) by Proposition 13. Moreover the inclusion is strict since for example the language of squares is in L(D) (see Example 14) and it is not in L(CR) since it is not a finite union of Cartesian products of ultimately periodic string languages. In the same way that L(D) corresponds to the class of rational relations, L(CR) corresponds to its subclass of recognizable relations. Corollary 17. L(D) is closed under intersection and complement. Proof. The result follows from the characterization of L(D) in terms of rational relations in Proposition 13, and from the closure under intersection and complement of rational relations over a one-letter alphabet [2]. The following example shows that, also in the case of a one-letter alphabet, four-way automata that move only right and down are strictly less powerful than 3DFA. Example 18. Let L be the following picture language over a one-letter alphabet: L = {(kn, n)| k, n 0}. Language L can be easily recognized by a 3DFA that, starting in the top-left corner moves along the main diagonal until it reaches the right boundary and then moves along the secondary diagonal until it reaches the left boundary and so on until it eventually reaches the bottom-right corner and accepts. By Proposition 13, language L cannot be recognized by a four-way automaton that moves only right and down, since {(a kn , a n )| k, n 0} is not a rational relation (see [4]). 3.2. CRD-regular expressions In this section we consider regular expressions that involve columns, rows and diagonal concatenations and stars defined in previous sections. We refer to them as CRD-regular expression. We show that, in the case of one-letter alphabet, the class L(CRD) of corresponding languages is strictly included in the family REC, and strictly contains L(3DFA). Further we show that there are languages accepted by a four-way automaton that do not belong to L(CRD). The main result is a necessary condition for languages in L(CRD) that expresses a sort of “regularity” on the possible “extensions” of a picture (pictures containing the given one as a subpicture) inside the language. Let us denote CRD = {∪, ❡, ❡, \❡, ∗ ❡, ∗ ❡, ∗ \❡}, where C, R, D stand for “column”, “row” and “diagonal”. The elements of CRD are called CRD-regular operations. Definition 19. A CRD-regular expression (CRD-RE), is defined recursively as follows: (1) ∅, (0,0 ), (0,1 ), (1,0 ) are CRD-RE. ❡ ❡ \❡ (2) if , are CRD-RE then () ∪ (), () ❡(), ()∗ , () ❡(), ()∗ , () \❡(), ()∗ are CRD-RE.
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
419
Every CRD-RE denotes a language using the standard notation. Languages denoted by CRD-RE are called CRD-regular languages. The family of CRD-regular two-dimensional languages (over one-letter alphabets) will be denoted by L(CRD). Observe that L(CRD) is contained in REC, since REC is closed under operations in CRD. A more precise positioning of L(CRD) inside REC is established in Proposition 30 below. Example 20. Let L = {(n, k1 (n + 1) + k2 (n + 2) + k3 (n + 3)) | n, k1 , k2 , k3 0}. Consider ❡ the languages Ln,n+i denoted by the D-RE: En,n+i = En,n ❡((E1,i )∗ ), as in Example 14. Language L ∈ L(CRD) since it can be denoted by the following CRD-RE: E = ∗ ❡ ❡E ∗ ❡ ❡E ∗ ❡ . En,n+1 n,n+2 n,n+3 Example 21. Let L = {(hn, hkn + n) | n, h, k 0}. Language L belongs to L(CRD). Indeed L = L1 ❡L2 , where L1 = {(n, kn) | n, k 0} and L2 = {(hm, m) | m, h 0}. If En,n is a D-RE for the languages of squares (see Example 14), a CRD-RE for L is ∗ ❡) ❡(E ∗ ❡). E = (En,n n,n We now present some “regularity’’ conditions necessarily satisfied by CRD-regular languages. They generalize in some sense what regularity means for one-dimensional languages in what concerns the possible extensions of a picture inside a regular language. Indeed it is well-known that a string language over a one-letter alphabet = {a} is regular if and only if it is ultimately periodic. In particular if L ⊆ {a}∗ is a regular language and a n ∈ L is a sufficient long string then there exists a string a m such that a n (a m )∗ ⊆ L. We show that a generalization of this necessary condition holds for two-dimensional languages in L(CRD): if a picture is sufficient “long” then we can concatenate to it some picture infinitely often by columns; if a picture is sufficient “high’’ then we can concatenate to it some picture infinitely often by rows; if a picture is sufficient “big” then we can concatenate to it some picture infinitely often in diagonal. Let = {a} and L ⊆ ∗∗ . Let us define for any n, m 0, the following string languages: Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L}. Proposition 22. Let L ⊆ {a}∗∗ and L ∈ REC. Then for any n, m 0, Cn , Rm are regular languages. Proof. For any alphabet , and fixed n, the fixed-height-n word language of L ⊆ ∗∗ is the language L(n) over the alphabet n,1 , of all the strings of columns of height n that compose pictures in L. In [16], it is shown that if L is in REC, then L(n) is regular, for any alphabet , and any integer n. In the special case of an alphabet of a single letter, we can identify any column in {a}n,1 with a and we have that L(n) is regular iff Cn is regular. An analogous reasoning implies the regularity of Rm . The proof of the following proposition is only sketched here; a more complete proof is given in Appendix A. Proposition 23. Let L be a CRD-regular language. Then there exist
, , , : N → N, , : N × N → N increasing functions and n, m ∈ N such
420
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
that for any p = (n, m) ∈ L we have ❡ (1) if m > (n) then p ❡q ∗ ⊆ L for q = (n, (n)) with (n) = 0, ❡ (2) if n > (m) then p ❡q ∗ ⊆ L for q = ((m), m) with (n) = 0, \❡ (3) if n n, m m then p \❡q ∗ ⊆ L for some q = (nq , mq ) with nq , mq = 0, nq (n, m), mq (n, m). Proof (Sketch). First we show how to choose the functions , , , . From Proposition 22, we know that the sets Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L} are regular and therefore ultimately periodic. So we can define (, resp.) in relation to the steam of the minimal automaton of Cn (Rm , resp.) and (, resp.) related to the period of Cn (Rm , resp.), in such a way to ensure that such functions be increasing functions. Now we sketch how to choose n, m, and for a CRD-regular language L, by induction on the number of operators in a CRD-regular expression r that denotes L. For the basis, if L = ∅ then the proposition is vacuously true. If L = {0,0 }, or L = {0,1 }, or L = {1,0 }, then we can choose n, m, and in such a way the proposition is always vacuously true, having care to define (n) = 1, (m) = 1, (n, m) = (n, m) = 1 so that q = 0,0 . Suppose now r > 0. There are seven different cases depending on the form of r: r = ❡ ❡ \❡ r1 ∪ r2 , r = r1 ❡r2 , r = r1 ❡r2 , r = r1 \❡r2 , r = r1∗ , r = r1∗ , or r = r1∗ . In any of the seven cases, r1 and r2 denote some language L1 and L2 , respectively, that satisfies the conditions. Let i , i , i , i , i , i , ni , mi be the functions and the values for Li , with i = 1, 2. The values n, and m for L are chosen in such a way that a “big” picture (i.e. p = (n, m) with n n and m m ) in L always decomposes in some pictures in L1 and in L2 that are either “big” or “long” (i.e. m > i (n), for i = 1, 2) or “high” (i.e. n > i (m), i = 1, 2). Therefore n may depend on n1 , n2 , but also on the other functions of L1 and L2 . As an example, when L = L1 \❡L2 then any “big” p = p1 \❡p2 , where p1 ∈ L1 , p2 ∈ L2 can be such that p1 and p2 are either both “big”, or one of fixed size and the other one “big” or one “high” and the other one “long”. The functions , ensure a limitation on the size of picture q that can be diagonal concatenated infinitely many times to a big p. Such picture q is constructed from some corresponding pictures q1 for p1 and q2 for p2 . The major problem is due to the partiality of column and row concatenations that requires that q1 and q2 must have same number of rows or columns. This problem is solved by concatenating q1 and q2 with itself as many ❡ ❡ times as necessary. For example q1 k1 and q2 k2 have same number of rows if we choose k1 = 1 (q2 ) and k2 = 1 (q1 ) (a more refined version could consider a lowest common multiple). A special care is due to handle also the case where p is an empty column or an empty row. The regularity conditions in Proposition 23 are stated in such a way a finite number of pictures that could “disturb” this regularity are put away, by properly defining the limitation on the size (namely n, m, , ). Observe that such “small” pictures may indeed have an infinite number of extensions in some direction (horizontal, vertical, diagonal). This situation is illustrated in the following Example 24.
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
421
Example 24. For a language L1 = {(n0 , m0 )} consisting of a single picture, the functions
1 , 1 , 1 , 1 , 1 , 1 and integers n1 , m1 as in Proposition 23 can be chosen as 1 (n) = m0 , 1 (m) = n0 , 1 (n) = 1 (m) = 1 (n, m) = 1 (n, m) = 1, n1 = n0 + 1, and m1 = m0 + 1. Indeed the conditions (1), (2) and (3) in the proposition will be vacuously true. Consider now the language of squares L2 = Ln,n = {p | 1 (p) = 2 (p) 0}. The functions 2 , 2 , 2 , 2 , 2 , 2 and integers n2 , m2 as in Proposition 23 for L2 can be chosen as follows: 2 (n) = 2 (n) = n, 2 (n) = 2 (n) = 1, n2 = m2 = 0, and 2 (n, m) = 2 (n, m) = 1. Finally, consider L = L1 ∪ L2 = {(n0 , m0 )} ∪ Ln,n and suppose (n0 , m0 ) ∈ L2 . According to Proposition 23 (case 1), the functions , , , , , and integers n, m are the following: (n) = max{n, m0 }; (n) = 1; (m) = max{n0 , m}; (n) = 1; n = max{n1 , n2 } = n0 + 1; m = max{m1 , m2 } = m0 + 1; (n, m) = max{ 1 (n, m), 2 (n, m)} = 1 and (n, m) = max{ 1 (n, m), 2 (n, m)} = 1. Observe that, even if the picture (n0 , m0 ) have an infinite number of extensions, we cannot find some picture q that can be concatenate infinitely often in diagonal. The choice of n and m is made in such a way (n0 , m0 ) does not satisfy the conditions n0 n and m0 m, and it does not “disturb” the regularity of L. Remark 25. In Proposition 23 we state that for any p there exists a picture q that can be concatenate to p as many times as we want. This picture may indeed depend on p as shown in the following Example 26. ❡ Example 26. Let L = {(kn, n) | k, n 0} = (Ln,n )∗ , where Ln,n is the language of squares. The functions , , , , , and integers n, m as in Proposition 23 can be chosen as follows: (n) = n, (m) = 0, (n) = 1, (m) = max{m, 1}, n = m = 0, (n, m) = max{m, 1} and (n, m) = 1. Remark that the size of picture q in case p = (n, m) with n > (m) or n n, m m, depends on the size of p. This situation is indeed unavoidable. ❡ For example, when p = (k n , n ), any q = (nq , n ) such that p ❡q ∗ ⊆ L is such that q = (k n , n ), thus depending on the number of columns of p, as pointed out in Remark 25. ❡ Example 27. Let L = {(hn, hn + n) | n, h 0}. We have L = Ln,n ❡(Ln,n )∗ , where L1 = Ln,n is the language of squares. The functions 1 , 1 , 1 , 1 , 1 , 1 and integers n1 , m1 as in Proposition 23 for L1 can be chosen as follows: 1 (n) = 1 (n) = n,
1 (n) = 1 (n) = 1, n1 = m1 = 0, and 1 (n, m) = 1 (n, m) = 1. The functions ❡
2 , 2 , 2 , 2 , 2 , 2 and integers n2 , m2 as in Proposition 23 for L2 = (Ln,n )∗ can be chosen as follows: 2 (n) = n, 2 (m) = 0, 2 (n) = 1, 2 (m) = max{m, 1}, n2 = m2 = 0, 2 (n, m) = max{m, 1} and 2 (n, m) = 1. According to Proposition 23 (case 2), the functions , , , , , and integers n, m are the following ones: (n) = 2n;
(n) = 1; (m) = m − 1; (n) = 1; n = 1; m = 0; (n, m) = max{m, 1} and finally (n, m) = max{m, 1} + 1. Observe that Proposition 23 ensures that, for any picture p = (n , m ) = (hn, hn+n) with n 1,m 0, there exists q = (nq , mq ) with nq , mq = 0, \❡ nq max{m , 1}, mq max{m , 1} + 1 such that p \❡q ∗ ⊆ L. Indeed such a picture can be chosen as q = (n, n), that really satisfies n m = hn + n and n m + 1 = hn + n + 1. \❡ For example, in Fig. 1, given p1 = (6, 6) ∈ L1 we choose q1 = (1, 1) and p1 \❡q1∗ ⊆ L1 .
422
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
p
q q
1
p
2
1
q2 1
Fig. 1. Extensions of p1 , p2 and p as in Example 27.
❡ Given p2 = (6, 2) ∈ L2 we choose q2 = (2, 2) and we have p2 ❡q2∗ ⊆ L2 .Then, if we consider p = p1 ❡p2 we can choose q = (2, 2) according to Proposition 23 (case 2) and \❡ we have p \❡q ∗ ⊆ L. Proposition 23 can be used to prove that some picture languages are not CRD-regular languages, as shown in the following examples. Example 28. Let L = {(n, n2 ) | n 0}. We show that L ∈ L(CRD), proving that it does not satisfy the condition (3) in Proposition 23. Indeed suppose on the contrary that there exist n, m ∈ N, , : N → N as in the proposition. Observe that in L, for any n 0, there is only one picture with n rows and one picture with n2 columns. Hence the pictures of L with number of rows less than or equal to n or number of columns less than or equal to m are in a finite number. Since L is infinite, then there exists a picture p = (n, n2 ) ∈ L such that n > n and n2 > m. Therefore there exists q = (nq , mq ) \❡ with nq , mq = 0 such that p \❡q ∗ ⊆ L. Consider p1 = p \❡q = (n + nq , n2 + mq ). We must have that n2 + mq = (n + nq )2 and thus mq = (n + nq )2 − n2 . Consider now p2 = p \❡q \❡q = (n + 2nq , n2 + 2mq ); we have that n2 + 2mq = n2 + 2(n + nq )2 − 2n2 = n2 + 4nnq + 2n2q = (n + 2nq )2 (since nq = 0) against p2 ∈ L. Example 29. Let L = {(2n , 2n ) | n 0}. We show that L ∈ L(CRD), proving that it does not satisfy the condition (3) in Proposition 23. Indeed, suppose on the contrary that there exist n, m ∈ N, , : N → N as in the proposition. Observe that in L, for any n 0, there is only one picture with 2n rows and one picture with 2n columns. Hence the pictures of L with number of rows less than or equal to n or number of columns less than or equal to
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
423
m are in finite number. Since L is infinite, then there exists a picture p = (2n , 2n ) ∈ L such that 2n > n and 2n > m. Therefore there exists q = (nq , mq ) with nq , mq = 0 \❡ such that p \❡q ∗ ⊆ L. Consider p1 = p \❡q = (2n + nq , 2n + mq ). Since p1 ∈ L, we have that 2n + nq = 2n + mq = 2n+k for some k = 0 (since nq = 0) and thus nq = mq = 2n+k − 2n . Consider now p2 = p \❡q \❡q = (2n + 2nq , 2n + 2mq ); we have that 2n + 2mq = 2n + 2(2n+k − 2n ) = 2n (1 + 2k+1 − 2) = 2n (2k+1 − 1). Therefore 2n + 2mq is the product of a power of 2 times an odd number different from 1 and it cannot be a power of 2, against p2 ∈ L. We now show that the family of CRD-regular languages lies between the class L(3DFA) and REC. On the other hand, there are languages that belong to L(4NFA) and that are not CRD-regular. Proposition 30. L(3DFA) ⊂ L(CRD) ⊂ REC. Proof. Let L ∈ L(3DFA). Following [14] we have that L is a finite union of languages R whose elements are (f, g), where f = a0 + a1 n and g = h(b0 + b1 n) + b2 n + b3 k + b4 with a0 , a1 , b0 , b1 , b2 , b3 , b4 positive integers and n, h, k positive integers variables. We show that any language R of this form is in L(CRD). Let En,n a CRD-RE for the language of squares (see Example 14). The language La0 ,a1 ,b0 ,b1 = {(a0 + a1 n, b0 + b1 n) | a0 , a1 , b0 , b1 , n ∈ N} can be denoted by the CRD-RE Ea0 ,a1 ,b0 ,b1 = ((a0 , b0 ) \❡ ❡ ❡ ❡ ❡ ❡ ❡ ((En,n )a1 )b1 ). Then E=(Ea0 ,a1 ,b0 ,b1 )∗ ❡((a0 , 1)∗ ❡E0,a1 ,0,b2 ) ❡(((1, b3 )∗ )∗ ) ❡((1, b4 )∗ ❡) is a CRD-RE for language R. Moreover the inclusion L(3DFA)⊂ L(CRD) is strict: in fact the language L = {n, k1 (n + 1) + k2 (n + 2) + k3 (n + 3)} in Example 20 is in L(CRD), but it cannot be recognized by a 3DFA (cf. [14]). CRD-regular languages are contained in REC because REC is closed under union, column and row concatenations and stars (cf. [6]) and under diagonal concatenation and star (cf. Proposition 10). Moreover there can be found examples of languages in REC that are not CRD-regular languages as language L = {(2n , 2n ) | n 0} (see Example 29) or L = {(n, n2 ) | n 0}. Regarding the comparison with the class of languages recognized by four-way automata, consider language L = {(2n , 2n ) | n 0 }. As shown in Example 29, L is not a CRD-regular language, but J. Kari and C. Moore [13] showed that L is recognized by a 4DFA. On the other hand, the class L(4DFA) seems not closed under concatenation and star operations (despite the case of one-letter alphabet is still open, it seems that for example the column closure of language in Example 21 cannot be recognized by a 4NFA). 3.3. A collection of examples In this section, we give a collection of examples of two-dimensional languages and classify them with respect to their machine-type and regular expression-type. Languages are given by their representative element, where n, m, h, k 1 are integer variables and c 1 is an integer constant. Moreover f1 (n) = a1 + · · · + an , where a1 , . . . , an are all
424
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
chosen in a finite subset of N, and f2 (n) = k1 (n + 1) + k2 (n + 2) + k3 (n + 3), where k1 , k2 , k3 1 are integer variables. Element (n, n) (2, 2n) (2n, 2n) (2n, 2m) (n, cn) (n, f1 (n)) (kn, n) (n, f2 (n)) (n, kn) (2n , 2n ) (hn, hkn + n) (n, n2 ) (n2 , n) (n2 , n2 ) (n, 2n ) (n, n!)
2DFA
2NFA
3DFA
3NFA
4DFA
4NFA
D -RE
CRD-RE
REC
Y Y Y Y Y N N N N N N N N N N N
Y Y Y Y Y Y N N N N N N N N N N
Y Y Y Y Y Y Y N N N N N N N N N
Y Y Y Y Y Y Y Y N N N N N N N N
Y Y Y Y Y Y Y Y Y Y N N N N N N
Y Y Y Y Y Y Y Y Y Y Y N N N N N
Y Y Y Y Y Y N N N N N N N N N N
Y Y Y Y Y Y Y Y Y N Y N N N N N
Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N
4. Advanced star operations Using the three types of concatenation operations (row, column and diagonal) and the three corresponding stars we get regular expressions describing a quite large family of twodimensional languages over one-letter alphabet. Unfortunately, all those operations together are not enough to describe the whole family REC because in REC there are very “complex’’ languages even in the case of one-letter alphabet. For example, REC contains languages of the form L = {(n, f (n)) | n > 0}, as well as L = {(f (n), g(n)) | n > 0}, where f (n), g(n) are polynomial or exponential functions in n (see [5] for details). Observe that the peculiarities of the “classical’’star operations (along which such column, row or diagonal stars are defined) are mainly the following: (a) they are a simple iteration of one kind (row- or column- or diagonal-) of concatenation between pictures; (b) they correspond to an iterative process that at each step adds (concatenates) always the same set. We can say that they correspond to the idea of the iteration for some recursive H defined as H (1) = S and H (n + 1) = H (n)S, where S is a given set. In this section we define new types of iteration operations, to which we will refer as advanced stars, that result much more powerful than the “classical’’ ones. We will use subscripts “r”and “d” with the meaning of “right” and “down’’, respectively. Definition 31. Let L, Lr , Ld be two-dimensional languages. The star of L with respect to (Lr , Ld ) is defined as (Lr ,Ld )i L , L(Lr ,Ld )∗ = i 0
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
425
where L(Lr ,Ld )0 = {0,0 }, L(Lr ,Ld )1 = L and p pr (Lr ,Ld )i ∗∗ . = p = |p∈L , pr ∈ Lr , pd ∈ Ld , q ∈ pd q
(Lr ,Ld )i+1
L
Remark that the operation we defined cannot be simulated by a sequence of ❡ and ❡ operations because to get p we first concatenate p ❡pr and p ❡pd , then we overlay them and finally we fill the “hole’’ with a picture q ∈ ∗∗ . For this reason this definition is conceptually different from the one given by O. Matz in [15]. Moreover, observe that such advanced star is based on a reverse principle with respect to the diagonal star: we “decide’’ what to concatenate to the right and down to the given picture and then fill the hole in the bottom-right corner. This implies that, at (i + 1)th step of the iteration, we are forced to select pictures pr ∈ Lr and pd ∈ Ld that have the same number of rows and the same number of columns, respectively, of pictures generated at the ith step. Therefore, we actually exploit the fact that column and row concatenations are partial operations to somehow synchronize each step of the iteration with the choice of pictures in Lr and Ld . We now state the following proposition. Proposition 32. If L, Lr , Ld are languages in REC, then L(Lr ,Ld )∗ is in REC. Proof. We give only few hints for the proof because it can be carried over using the techniques shown in the proof of Proposition 10. The idea is to assume that the tiling systems for L, Lr , Ld are over disjoints local alphabets , r , d and define a local language M over an alphabet equal to the union of the three ones together with a new different symbol {x}. Language M contains pictures like p pr , where p , pr and pd belong to the local pd s languages for L, Lr and Ld , respectively and s is any picture filled with symbol x. Then the set of tiles for L = L(Lr ,Ld )∗ can be defined by taking two “different copies’’ (i.e., over disjoint local alphabets) of languages M and different local languages for Lr and Ld and define tiles according to the definition of pictures in L .
As immediate application, consider the language L = {(n, n2 ) | n 0} of Example 28. Then L can be defined as advanced star of M = {(1, 1)} with respect to Mr = {(n, 2n + 1) n 0} and Md = {(1, n) | n 0} (at (i + 1)th step of the iteration we “add’’ (2i + 1) columns to the current i 2 ones and 1 row to the current i ones). Using the same principle, namely exchanging languages Mr and Md , it is easy to define also the rotation of this language, i.e. language L = {(n2 , n) | n 0}. Then also the language L = {(n2 , n2 ) | n 0} can be defined as advanced star of M = {(1, 1)} with respect to Nr = {(n2 , 2n + 1) n 0} and Nd = {(2n + 1, n2 ) | n 0}, where Nr (Nd ) can be obtained by column-concatenation (row-concatenation) of two copies of L (L) and 1-row (1-column) pictures. Remark that, even using the above defined advanced star, it seems still not possible to define the language of Example 29 of pictures of size (2n , 2n ) or the language of pictures of size (n, 2n ) and similar ones. In fact, for this kind of languages (recall that they are all
426
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
in REC), it would be needed a definition that allows to use as Lr and/or Ld the language itself. We give the following definition. Definition 33. Let L, Ld be two-dimensional languages. The bi-iteration along the columns of L with respect to Ld is defined as (∗,Ld )i L , L(∗,Ld )∗ = i 0
where L(∗,Ld )0 = {0,0 }, L(∗,Ld )1 = L and p p 1 2 | p1 , p2 ∈ L(∗,Ld )i , pd ∈ Ld , q ∈ ∗∗ . L(∗,Ld )i+1 = p = pd q Similarly we define the bi-iteration along the rows of L with respect to a language Lr , denoted by L(Lr ,∗)∗ , where the (i + 1)th step of the iteration is given by p1 pr (Lr ,∗)i+1 (Lr ,∗)i ∗∗ . = p = | p1 , p2 ∈ L , pr ∈ Lr , q ∈ L p2 q These notations naturally bring us to define also the bi-iteration along rows and columns, denoted by L(∗,∗)∗ , where the (i + 1)th step of the iteration is given by p1 p3 (∗,∗)i+1 (∗,∗)i ∗∗ . = p = | p1 , p2 , p3 ∈ L ,q∈ L p2 q Using same techniques as in the proof of Proposition 32, one can prove that the family REC over one-letter alphabet is closed under all such bi-iteration operations. It is immediate to verify that the language L of pictures of size (n, 2n ) can be obtained from language M = {(1, 1)} and Md = {(1, n) | n > 0} as L = M (∗,Md )∗ . We conclude by observing that the language of Example 29 of pictures of size (2n , 2n ) can be obtained as a bi-iteration both along rows and columns of the same language M = {(1, 1)}. 5. Towards the general alphabet case In this paper, we have defined new operations between pictures so that a quite wide class of two-dimensional languages over one-letter alphabet could be described in terms of regular expressions. All these languages belong to REC that is the class of recognizable languages that generalizes better to two dimensions the class of regular string languages. Next step is surely to complete the definitions of some other kind of “advanced’’ star operations in the aim of proving a two-dimensional Kleene’s Theorem in this simpler case of one-letter alphabet. We also emphasize that an important goal of further work is to extend all these results to the general case of two-dimensional languages over any alphabet (i.e. the case with more
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
427
than one-letter). Observe that the definitions of diagonal concatenation and star are hard to extend to such general case, even using their characterizations in terms of rational relations or in terms of automata with only two moving directions. The main problem is that, if p, q are two pictures over , to define q \❡q we need to specify two pictures r, s such that p r . p \❡q = s q On the other hand, the formalism of the advanced stars appears to be a more reasonable approach to the general case. Recall that, in this case, we need always to specify four pictures (or four languages). We will use subscripts r, d and c with the meaning of “right”, “down” and “corner”, respectively. Then, we can give the following definition that directly extends Definition 31. Definition 34. Let L, Lr , Ld , Lc be two-dimensional languages over . The star of L with respect to (Lr , Ld , Lc ) is defined as (Lr ,Ld ,Lc )i L , L(Lr ,Ld ,Lc )∗ = i 0
where L(Lr ,Ld ,Lc )0 = {0,0 }, L(Lr ,Ld ,Lc )1 = L and p pr (Lr ,Ld ,Lc )i+1 (Lr ,Ld ,Lc )i = p = |p∈L , pr ∈ Lr , pd ∈ Ld , pc ∈ Lc . L pd p c Remark that this kind of star operation is not the iteration of a “classical’’ concatenation operation. These operations seem to be able to describe several languages in REC, despite the “regular expressions’’ for the two-dimensional languages in the general case will result very complex. Appendix A. Proposition 23. Let L be a CRD-regular language. Then there exist
, , , : N → N, , : N × N → N increasing functions and n, m ∈ N such that for any p = (n, m) ∈ L we have ❡ (1) if m > (n) then p ❡q ∗ ⊆ L for q = (n, (n)) with (n) = 0, ❡ ∗ ❡ (2) if n > (m) then p q ⊆ L for q = ((m), m) with (n) = 0, \❡ ❡ (3) if n n, m m then p \ q ∗ ⊆ L for some q = (nq , mq ) with nq , mq = 0, nq (n, m), mq (n, m). Proof. First let us see how to choose , , , in all these cases. From Proposition 22, we know that the sets Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L} are regular and therefore ultimately periodic. So there exist hC , kC , hR , kR ∈ N such that a j ∈ Cn ⇔ a j +kC ∈ Cn , for every j hC , and a j ∈ Rm ⇔ a j +kR ∈ Rm , for every j hR . If we do not take care to the fact that , , , have to be increasing and that , have to be = 0, then it would be sufficient to set (n) = hC , (m) = hR , (n) = kC and
428
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
(m) = kR . But, to be sure that (n), (m) = 0 and to assure the increase of the functions, we set (n) = hC + s1 kC , (n) = hR + s2 kR , (n) = kC + s3 kC and (m) = kR + s4 kR , where kC = max{1, kC }, kR = max{1, kR } and s1 , s2 , s3 , s4 0 are the minimal integer such that (n) (n − 1), (n) (n − 1), (n) (n − 1) and (n) (n − 1). Let us now show how to choose n, m, and for a CRD-regular language L. Let r be a CRD-regular expression denoting L. The proof is by induction on the number of operators in r. For the basis, if L = ∅ then the proposition is vacuously true. If L = {0,0 }, then we can set n = 1, m = 1, (n) = 0, (m) = 0. If L = {1,0 } (resp. L = {0,1 }), then we can set n = 2 (resp. n = 1), m = 1 (resp. m = 2), (n) = 0 (resp. (n) = 1), (m) = 1 (resp. (m) = 0). In all these cases we can set (n) = 1, (m) = 1, (n, m) = (n, m) = 1. Assume now that the proposition is true for languages denoted by CRD-regular expression with less than i operators, i 1, and let r have i operators. There are seven cases depending ❡ on the form of r: (1) r = r1 ∪r2 , (2) r = r1 ❡r2 , (3) r = r1 ❡r2 , (4) r = r1 \❡r2 , (5) r = r1∗ , ❡ \❡ (6) r = r1∗ , or (7) r = r1∗ . In any of the seven cases, r1 and r2 denote some language L1 and L2 , respectively, that satisfies the condition. Let 1 , 1 , 1 , 1 , 1 , 1 , n1 , m1 be the functions and the values for L1 and let 2 , 2 , 2 , 2 , 2 , 2 , n2 , m2 be the functions and the values for L2 . Case 1: We have L = L1 ∪ L2 . We set (n, m) = max{ 1 (n, m), 2 (n, m)}, (n, m) = max{ 1 (n, m), 2 (n, m)}, n = max{n1 , n2 }, m = max{m1 , m2 }. Case 2: We have L = L1 ❡L2 . We set: (n, m) = max{ 1 (n, m) 2 (n, m), 1 (m) 2 (n, m), 2 (m) 1 (n, m)},
(n,m) = max{ 1(n,m) 2 (n,m) + 2 (n,m) 1 (n,m), 1 (m) 2 (n,m), 2 (m) 1 (n, m)}, n = max{n1 , n2 , 1 (m1 ), 2 (m2 )}, m = m1 + m2 . Now, let p = (n, m) ∈ L, with n n, m m. Clearly, p = p1 ❡p2 for some p1 = (np1 , mp1 ) = (n, mp1 ) ∈ L1 and p2 = (np2 , mp2 ) = (n, mp2 ) ∈ L2 . We have to consider three different cases: (2a) mp1 m1 and mp2 m2 , (2b) mp1 < m1 , (2c) mp2 < m2 . (2a) Since np1 n1 , mp1 m1 , np2 n2 and mp2 m2 , from the hypothesis on L1 \❡ and L2 , we have that p1 \❡q1∗ ⊆ L1 for some q1 = (nq1 , mq1 ) with nq1 , mq1 = 0, \❡ nq1 1 (n, mp1 ), mq1 1 (n, mp1 ) and that p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, mp2 ), mq2 2 (n, mp2 ). \❡ Now let us set q = (nq1 nq2 , nq1 mq2 + nq2 mq1 ) = (nq , mq ). Then p \❡q ∗ ⊆ L with nq , mq = 0, nq = nq1 nq2 1 (n, mp1 ) 2 (n, mp2 ) 1 (n, m) 2 (n, m) and mq = nq1 mq2 + nq2 mq1 1 (n, m) 2 (n, m) + 2 (n, m) 1 (n, m). (2b) Since mp1 < m1 , then mp2 m2 (recall that mp1 + mp2 = m m = m1 + m2 ) and \❡ therefore p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, mp2 ), ❡ mq2 2 (n, mp2 ). Moreover nq1 = n n 1 (m1 ) > 1 (mp1 ): therefore p1 ❡q1∗ ⊆ L1 for q1 = (nq1 , mq1 ) = (1 (mp1 ), mp1 ). Note that we have nq1 = 0. Let us set q = \❡ (nq1 nq2 , nq1 mq2 ) = (nq , mq ). Then we have p \❡q ∗ ⊆ L with nq , mq = 0, nq = nq1 nq2 1 (mp1 ) 2 (n, mp2 ) 1 (m) 2 (n, m) and mq = nq1 mq2 1 (mp1 ) 2 (n, mp2 ) 1 (m) 2 (n, m).
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
429
(2c) It is analogous to the previous case. Case 3: We have L = L1 ❡L2 and the proof is similar to that one of the previous case. Case 4: We have L = L1 \❡L2 . We set: (n, m) = max{ 1 (n, m), 2 (n, m), 1 (m), 2 (m)},
(n, m) = max{ 1 (n, m), 2 (n, m), 2 (n), 1 (n)}, n = max{n1 + n2 , 1 (m1 ) + n2 , 2 (m2 ) + n1 }, m = max{m1 + m2 , 2 (n2 ) + m1 , 1 (n1 ) + m2 }. Now, let p = (n, m) ∈ L = L1 \❡L2 , with n n, m m. Clearly, p = p1 \❡p2 for some p1 = (np1 , mp1 ) ∈ L1 and p2 = (np2 , mp2 ) ∈ L2 . We have to consider two different cases 4(a) and (b): with nq , mq = 0 (4a) At least one of the following conditions (1) and (2) is verified
np1 n1 , (1) mp1 m1 .
(2)
np2 n2 , mp2 m2 .
\❡ If condition (1) is verified, then p1 \❡q1∗ ⊆ L1 for some q1 = (nq1 , mq1 ) with nq1 , mq1 = 0, nq1 1 (n, m), mq1 1 (n, m) and it suffices to set q = q1 . If, instead, condition (2) is \❡ verified, then p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, m), mq2 2 (n, m) and it suffices to set q = q2 . (4b) If neither condition (1) nor condition (2) is verified, then, again, we have to consider two different subcases either np1 n1 , mp1 < m1 , np2 < n2 , mp2 m2 or np1 < n1 , mp1 m1 , np2 n2 , mp2 < m2 . We give the details only for the first subcase, since the other one can be handled in a similar way. So, in the first subcase, we have np1 = n−np2 n−np2 > n−n2 1 (m1 )+n2 −n2 = 1 (m1 ) > 1 (mp1 ) i.e., np1 > 1 (mp1 ) and mp2 = m − mp1 m − mp1 > m − m1 2 (n2 ) + m1 − m1 = 2 (n2 ) > 2 (np2 ) i.e., ❡ ❡ mp2 > 2 (np2 ). Therefore, p1 ❡q1∗ ⊆ L1 for q1 = (1 (mp1 ), mp1 ) and p2 ❡q2∗ ⊆ L2 for q2 = (np2 , 2 (np2 )). We set q = (nq , mq ) = (nq1 , mq2 ) = (1 (mp1 ), 2 (np2 )) \❡ and we will have p \❡q ∗ ⊆ L with nq , mq = 0, nq = 1 (mp1 ) 1 (m), mq = 2 (np2 ) 2 (n). m ❡ Case 5: We have L = L∗1 . We set (n, m) = max{ 1 (n, m), m 1 (n, m)1 (m)}, (n, m) m = max{m 1 (n, m) m−1 (n, m)1 (m), 1 (n, m)}, n = max{n1 , 1 (m1 )} and m = m1 . 1 Now, let p = (n, m) ∈ L, with n n, m m. If m = 0, then p ∈ L1 and we can apply the inductive hypothesis. If instead m = 0, then we have p = p1 ❡ · · · ❡pk with pi = (npi , mpi ) = (n, mpi ) ∈ L1 . Let us consider two different subcases 5(a) and (b). (5a) There exists some ™ ∈ {1, . . . , k} such that mpi m1 for every i = 1, . . . , ™ and mpi < m1 for every i = ™ + 1, . . . , k. Therefore, for every i = 1, . . . , ™, there exists qi = \❡ (nqi , mqi ) with nqi , mqi = 0, nqi 1 (npi , mpi ), mqi 1 (npi , mpi ), such that pi \❡qi∗ ⊆ L1 . Note that for i = 1, . . . , ™, we have nqi 1 (npi , mpi ) = 1 (n, mpi ) 1 (n, m), mqi 1 (npi , mpi ) = 1 (n, mpi ) 1 (n, m). Moreover, since for every i = ™ + 1, . . . , k, we have mpi < m1 , it follows that 1 (mpi ) < 1 (m1 ) n n = npi . So for every i = ❡ ™ + 1, . . . , k, there exists qi = (nqi , mqi ) = (1 (mqi ), mqi ) such that pi ❡qi∗ ⊆ L1 . We \❡ set q = (nq , mq ) = ( ki=1 nqi , ™i=1 (mqi kj =1,j =i nqj )). Then p \❡q ∗ ⊆ L, where
m
nq , mq = 0, with nq ™1 (n, m) 1 (mq™+1 ) . . . 1 (mqk ) m 1 (n, m)1 (m) and mq =
430
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431 m
m
m
mq1 m−1 (n, m) 1 (m) + · · · + mq™ m−1 (n, m)1 (m) 1 (n, m)™ m−1 (n, m)1 (m) 1 1 1 m (n, m) (m) . m 1 (n, m) m−1 1 1 (5b) In this subcase, for every i=1, . . . , k, mpi <m1 . Therefore, as in case (5a), for ❡ every i=1, . . . , k, there exists qi =(nqi , mqi ) = (1 (mqi ), mqi ) such that pi ❡qi∗ ⊆L1 . k m m We set q=(nq , mq )=( i=1 nqi , m). Then nq , mq =0, nq 1 (m) m 1 (n, m)1 (m), m mq = m m 1 (n, m) m−1 (n, m)1 (m). 1 Case 6: This case is analogous to the previous one. \❡ Case 7: We have L = L∗1 . If there exists q = (nq , mq ) ∈ L with nq , mq = 0, we can \❡ set (n, m) = nq , (n, m) = mq , n = m = 0. Then for every p = (n, m) ∈ L = L∗1 we \❡ have p \❡q ∗ ⊆ L. If instead L ⊆ col (resp. L ⊆ row ), then we can set n = 0, m = 1 (resp. n = 1, m = 0) and condition (3) will be vacuously true. Note that in all the cases the choice of the functions , , , , and preserves their increase.
References [1] M. Anselmo, D. Giammarresi, M. Madonia, Regular expressions for two-dimensional languages over oneletter alphabet, in: C.S. Calude, E. Calude, M.J. Dinneen (Eds.), Proc. Development in Language Theory (DLT 04), Lecture Notes in Computer Science, Vol. 3340, Springer, Berlin, 2004, pp. 63–75. [2] J. Berstel, Transductions and Context-free Languages, Teubner, 1979. [3] M. Blum, C. Hewitt, Automata on a two-dimensional tape, IEEE Symp. on Switching and Automata Theory, 1967, pp. 155–160. [4] L. De Prophetis, S. Varricchio, Recognizability of rectangular pictures by Wang systems, J. Automat. Languages Combin. 2 (1997) 269–288. [5] D. Giammarresi, Two-dimensional languages and recognizable functions, in: G. Rozenberg, A. Salomaa (Eds.), Proc. Developments in Language Theory, Finland, 1993, World Scientific Publishing Co., Singapore, 1994. [6] D. Giammarresi, A. Restivo, Two-dimensional finite state recognizability, Fund. Inform. 25 (3,4) (1996) 399–422. [7] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, et al. (Eds.), Handbook of Formal Languages, Vol. III, Springer, Berlin, 1997, pp. 215–268. [8] J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley, Reading, MA, 1979. [9] K. Inoue, A. Nakamura, Some properties of two-dimensional on-line tessellation acceptors, Inform. Sci. 13 (1977) 95–121. [10] K. Inoue, A. Nakamura, Two-dimensional finite automata and unacceptable functions, Internat. J. Comput. Math. Sec. A 7 (1979) 207–213. [11] K. Inoue, I. Takanami, A characterization of recognizable picture languages, in: Proc. Second Internat. Colloq. on Parallel Image Processing, Lecture Notes in Computer Science, Vol. 654, Springer, Berlin, 1993. [12] K. Inoue, I. Takanami, A. Nakamura, A note on two-dimensional finite automata, Inform. Process. Lett. 7 (1) (1978) 49–52. [13] J. Kari, C. Moore, Rectangles and squares recognized by two-dimensional automata, In: J. Karhumaki, H. Maurer, G. Paun, G. Rozenburg (Eds.), Theory is Forever, Essays dedicated to Arto Salomaa on the occasion of his 70th birthday. LNCS 3113 (2004) 134–144. [14] E.B. Kinber, Three-way automata on rectangular tapes over a one-letter alphabet, Inform. Sci. 35 (1985) 61–77. [15] O. Matz, Regular expressions and context-free grammars for picture languages, Proc. STACS’97, Lecture Notes in Computer Science, Vol. 1200, Springer, Berlin, 1997, pp. 283–294.
M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431
431
[16] O. Matz, On piecewise testable, starfree, and recognizable picture languages, in: M. Nivat (Ed.), Foundations of Software Science and Computation Structures, Vol. 1378, Springer, Berlin, 1998. [17] M. Rabin, T. Scott, Finite automata and their decision problems, IBM J. Res. Dev. 3 (1959) 114–125. [18] D. Simplot, A characterization of recognizable picture languages by tilings by finite sets, Theoret. Comput. Sci. 218 (2) (1999) 297–323. [19] T. Wilke, Star-free picture expressions are strictly weaker than first-order logic, in: Proc. ICALP’97, Lecture Notes in Computer Science, Vol. 1256, Springer, Berlin, 1997, pp. 347–357.
Theoretical Computer Science 340 (2005) 432 – 442 www.elsevier.com/locate/tcs
Parsing with a finite dictionary Julien Clémenta , Jean-Pierre Duvalb,∗ , Giovanna Guaianab , Dominique Perrina , Giuseppina Rindonea a Institut Gaspard-Monge, Université de Marne-la-Vallée, France b LIFAR, Université de Rouen, France
Abstract We address the following issue: given a word w ∈ A∗ and a set of n nonempty words X, how does one determine efficiently whether w ∈ X ∗ or not? We discuss several methods including an O(r × |w| + |X|) algorithm for this problem where r n is the length of a longest suffix chain of X and |X| is the sum of the lengths of words in X. We also consider the more general problem of providing all the decompositions of w in words of X. © 2005 Elsevier B.V. All rights reserved. Keywords: Finite automata; String matching
1. Introduction The complexity of algorithms related to finite automata and regular expressions is wellknown in general. In this article, we focus on a particular problem, namely the complexity of parsing a regular language of the form Y = X ∗ where X is a finite set of nonempty words. This type of language occurs often in the applications, when X is a dictionary and Y is the set of texts obtained by arbitrary concatenations of strings from this dictionary. The time and space complexity can be an important issue in such applications since the dictionaries used for natural languages can contain up to several million words. ∗ Corresponding author. Tel.: +33 235 146610; fax: +33 235 146763.
E-mail addresses: [email protected] (J. Clément), [email protected] (J.-P. Duval), [email protected] (G. Guaiana), [email protected] (D. Perrin), [email protected] (G. Rindone). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.030
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
433
As a consequence of general constructions from automata theory, any regular language can be parsed in time proportional to the product of the size of the regular expression by the length of the input word. This just amounts to simulating a nondeterministic automaton built in a standard way from the regular expression. Using a deterministic automaton produces a linear-time algorithm after completing a determinization algorithm which may itself be exponential. Our main result here is an algorithm allowing one to parse a regular language of the form X ∗ , with X finite, in time O(r × |w| + |X|) with r the length of a longest suffix chain in X, w the input word and |X| the sum of the lengths of words in X (Sections 4 and 5). The quantity r depending only on the set X is upper bounded by Card(X). This algorithm allows one to get all the decompositions of the input word in words of X. We also discuss some further problems on the automata related to regular expressions of this type (Section 3). The motivation of our study is in the work of Schützenberger on this type of languages. He has shown that although the size of the automaton depends on the length of the words in X, several syntactic parameters depend only on the cardinality of X (see [14]). One of them is linked with the number of interpretations of a word in words of X, and is related with the problem considered here. A similar yet unsolved problem is the complexity of the problem of unambiguity of the expression X ∗ , i.e. of the problem of testing whether X is a code. The standard algorithm [13] gives a quadratic complexity O(|X|2 ) where |X| is the total length of the words of X. It was lowered later to O(Card(X) × |X|) by various authors [2,11,6,8]. However, it is not known whether there exists a linear algorithm (see [5]).
2. Preliminaries and notations For a more complete description of automata and fundamentals of formal languages, the reader is referred to [7,9,12] and to [16] in particular for a recent overview on recognizable languages in free monoids. Let A be a finite alphabet. We denote by ε the empty word and by A∗ (resp. A+ ) the set of finite words (resp. nonempty finite words) on A. For a word w ∈ A∗ , we denote by |w| the length of w, by w[j ] for 0 j < |w| the letter of index j in w, and by w[j..k] = w[j ]w[j + 1] · · · w[k]. For any decomposition w = uv with u, v ∈ A∗ we say that u and v are, respectively, a prefix and a suffix of w. The suffix v is said to be proper if v = w. For a finite set X of words on A, we denote by Pref (X) and Suff (X) the set of prefixes and suffixes of the words of X, respectively, by Card(X) the cardinality of X and by |X| the sum of the lengths of words of X, that is |X| =
x∈X
|x|.
We denote a nondeterministic finite automaton over the alphabet A by A = (Q, , i, T ), where Q is the set of states, i ∈ Q is the initial state, T ⊆ Q is the set of terminal states and is the transition function. We use A to denote the number of states of A. We abbreviate by NFA a nondeterministic finite automaton and by DFA a deterministic finite automaton.
434
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
3. Using deterministic automata Before presenting our algorithm, we examine what would be the classical approach to check if a word w is in X ∗ . A natural idea is to build an automaton for X ∗ . We consider the following process: (i) build a finite automaton for X, (ii) modify this automaton to accept X∗ (doing so, we usually get an NFA), and (iii) finally get a DFA after a classical determinization procedure. An optional fourth step could be to minimize the resulting automaton. Automata for a finite set of words X. First we consider three simple ways of building an automaton for a finite set of words X. (1) The “solar” automaton SX is obtained as follows: we build one automaton per word x ∈ X with |x| + 1 states and merge all the initial states (see Fig. 1). Note that this NFA is a tree with root i and that SX = |X| + 1. (2) The tree automaton TX (see Fig. 1): this is a tree which collects words sharing a common prefix. In terms of automata, the set of states corresponds to the set of prefixes and we have TX = Q = Pref (X), , i = , T = X with (p, a) = pa if p, pa ∈ Pref (X) and a ∈ A. This DFA has TX = Card(Pref (X)) states. (3) The minimal automaton MX (see Fig. 1). Given the set X a more elaborate method is to build the minimal DFA MX recognizing X. For instance one can apply a minimization algorithm to the tree automata TX in linear time with respect to |X| [10]. Of course MX is not necessarily a tree. Automata for the language X ∗ . A straightforward way to build an NFA recognizing the language X ∗ from an automaton A = (Q, , i, T ) recognizing X is to add -transitions from each final state of T to the initial state i. We denote this automaton by star(A) (see Fig. 2). To save a little more space, we also merge all the terminal states without outgoing
b
a a
b a
b a
b
b
b a
a
a
b
a
b
b a
b a
a
b
a
b
b
b Fig. 1. The solar NFA SX (left), the tree DFA TX (middle) and the minimal DFA MX for X = {aa, ab, bb, aba, abb}.
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
435
b
a
a b
a b
b
b
b a
a
a b
a
a a
b
b
b
Fig. 2. The flower automaton merge(SX ) (left) and the NFA star(TX ) (right) for X = {aa, ab, bb, aba, abb}.
a b
a b
b
a b
b
a
a
b a Fig. 3. An NFA for X ∗ with the set X = Ak a + b (k = 5) of Example 1.
transitions with the initial state: this yields an automaton merge(A). Doing so with the solar automaton SX , we obtain merge(SX ) the classical flower automaton of X ∗ (see Fig. 2). Applying the classical powerset construction to one of the previous automata accepting X∗ , we obtain a DFA for X ∗ . Note that the determinization procedure gives the same result either starting from star(SX ) or from star(TX ), due to their tree-like structures. The same is true with merge instead of star. In general, for an NFA A, the determinization procedure builds a DFA whose number of states is trivially bounded by 2A . However, when we consider the particular case of X ∗ with X finite, could an exponential blow-up really happen? The following example shows that the answer is positive. Example 1. Let us consider X = Ak a + b with k > 0. It is easy to give an NFA for X ∗ with k + 1 states (see Fig. 3). The determinization leads to a DFA with (2k ) states. Another question is to find if we can relate the number of states of a DFA for X ∗ to |X|. Until recently, it was thought that it could not exceed (|X|2 ), a bound which was shown to be reachable in [17], as stated in the following example.
436
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
Example 2. For an integer h > 1, take X = {a h−1 , a h }. The tree DFA TX and the minimal DFA MX are the same and have h + 1 = (|X|) states. The minimal DFA for X ∗ has (|X|2 ) states (see [17]). Shallit showed in [15] with the following example that an exponential blow-up is also possible. Example 3. Let h 3 and let X = {b} ∪ {a i ba h−i−1 |1 i h − 2} ∪ {ba h−2 b}. The minimal DFA accepting X∗ has exactly 3(h − 1)2h−3 + 2h−2 states [15]. Note that the size is exponential of order (h2h ) whereas Card(X) = (h) and |X| = (h2 ). The problem of finding a tight upper bound for the number of states of the minimal DFA for X∗ in terms of the total length |X| is called by this author [15] the noncommutative Frobenius problem. The number of states of the minimal automata obtained for the family of sets used in Example 3 is (h2h ) where h = (|X|1/2 ). A priori, the upper bound for a DFA obtained by determinization of an NFA for X∗ with (|X|) states is (2|X| ). Experiments performed on the family of Example 3 show that the DFA obtained by determinization 2 (before minimization) has also (h2h ) states, and not (2h ). We do not know in general whether (i) it is possible that the minimal DFA for X∗ has (2|X| ) states; (ii) it is possible that the DFA obtained by determinization has (2|X| ) states. Simulating the determinization process. A way to avoid the determinization step would be to simulate the determinized automaton while parsing the word w. Given an NFA A accepting the language X∗ with X a finite set of words, this gives an algorithm of time complexity O(A × |w|) and the space required to simulate the determinization process is A. Since the number of states of the NFA can be of order O(|X|), this approach gives a time complexity O(|X| × |w|) in the worst case. As an example of such a situation, we have the set X = {a k b, a} with k > 0.
4. Using string matching machines The methods discussed in the previous section do not lead to an optimal algorithm in O(|w|). Indeed, either we use a DFA and we face a computation which can be exponential in |X| or we simulate the DFA and we obtain an algorithm in O(|X| × |w|). We now consider a different approach which leads to a lower complexity. Another advantage of the proposed approach is to possibly solve a more general problem. Indeed, we may be interested in obtaining the set of all decompositions of the input word over X. This cannot be achieved using a DFA accepting X ∗ given for instance by the methods described in the previous section.
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
437
Let X = {x0 , . . . , xn−1 } be a set of n words of A+ . We present in this section an algorithm, using classic pattern matching techniques, which gives all the X-decompositions of w (the decompositions of w as concatenations of words of X). Then we derive a membership test for X∗ in O(Card(X) × |w|) time complexity. In the next section, we shall study a further improvement of this algorithm.
4.1. Decompositions The following remark is the basis of our algorithm. An X-decomposition of w is always the extension of an X-decomposition of a prefix of w. We consider the prefix w[0..i] of length i + 1 of w. The word w[0..i] admits an Xdecomposition ending with a word x if and only if w[0..i] = f x for a word f in X ∗ . In other terms, w[0..i] admits an X-decomposition ending with x if and only if x is a suffix of w[0..i] and w[0..i − |x |] ∈ X ∗ . We obtain all the X-decompositions of w[0..i] by examining all the words of X which are suffixes of w[0..i] and which extend a previous X-decomposition. Of course, when w[0..i] = w, we get all the X-decompositions of w. So the idea of the algorithm is the following: build, for each word x ∈ X, a deterministic automaton A accepting the language A∗ x and use an auxiliary array D of size |w| such that D[i] = { ∈ [0..n − 1] | w[0..i] ∈ X∗ x }. Then testing if w[0..i] ends by the word x is equivalent to checking that the automaton A is in a terminal state after reading w[0..i]. Also testing if w[0..i − |x |] ∈ X ∗ is equivalent to checking that D[i − |x |] = ∅. In the following algorithm, the input word w is read simultaneously by all the n automata, letter by letter, from left to right. We use, for technical convenience, an additional element D[−1] initialized to an arbitrary nonempty set (for instance {∞}) meaning that the prefix ε of w is always in X ∗ . At the end of the scanning of w, provided D[|w| − 1] = ∅, we can process the array D from the end to the beginning and recover all the Xdecompositions for instance with a recursive procedure like PRINTALLDECOMPOSITIONS() (see below). For each word x ∈ X, the automaton A considered here is the minimal automaton which recognizes the language A∗ x . This automaton is defined by A = (Q = Pref (x ), , i = ε, t = x ) where the transition function is defined, for p ∈ Pref (x ) and a ∈ A, by
(p, a) = the longest suffix of pa which belongs to Pref (x ). We use these principles in the following algorithm.
438
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
ISDECOMPOSEDALL(w, X = {x0 , . . . , xn−1 }) 1 Preprocessing step 2 for ← 0 to n − 1 do 3 A ← AUTOMATONFROMWORD(x ) 4 Main loop 5 for ← 0 to n − 1 do 6 p is the current state of the automaton A . 7 p ← i 8 D[−1] ← {∞} 9 for i ← 0 to |w| − 1 do 10 D[i] ← ∅ 11 for ← 0 to n − 1 do 12 p ← (p , w[i]) 13 if p = t and D[i − |x |] = ∅ then 14 D[i] ← D[i] ∪ {} 15 return D The algorithm returns an array of size O(Card(X) × |w|). The preprocessing step which builds automata requires a time O(|X|) and a space O(|X|×Card(A)) (or O(|X|) if automata are represented with the help of a failure function as usually made in stringology [4,3]). Note that we do not need to build all the automata A in the preprocessing step. We can also choose to construct in a lazy way the accessible part of the automata (corresponding for each automaton A to the prefixes of x occurring in w) along the processing of the input word w. For the sake of clarity, we have chosen to distinguish the preprocessing step from the rest. In view of this remark, we could omit the complexity O(|X|) of the preprocessing step in the following proposition. Proposition 4. The time and space complexity of the algorithm ISDECOMPOSEDALL() is O(Card(X) × |w| + |X|). Given the array D computed by the procedure ISDECOMPOSEDALL() for a word w, it is quite straightforward to print all the decompositions of w thanks to the following two procedures: PRINTALLDECOMPOSITIONS(w, X = {x0 , . . . , xn−1 }) 1 D ← IsDECOMPOSEDALL(w, X) 2 L ← emptyList 3 RECPRINTALLDECOMPOSITIONS(D, |w| − 1, L) RECPRINTALLDECOMPOSITIONS(D, h, L) 1 if h = −1 then 2 PRINT(L) 3 else for j ∈ D[h] do 4 RECPRINTALLDECOMPOSITIONS(D, h − |xj |, xj · L)
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
439
For a word w belonging to X ∗ the procedure PRINTALLDECOMPOSITIONS() prints every X-decomposition of w in the form xi0 · xi1 · · · xip . If we want only one X-decomposition of w, it suffices to store in D[i] only one word x of X corresponding to an X-decomposition of w[0..i] ending with this x. The space required for the array then becomes O(|w|). 4.2. Membership test When we are interested only in testing the membership of w in X ∗ , we can simply use a Boolean array D setting D[i] = true if and only if there exists x ∈ X such that w[0..i] ∈ X∗ x. Moreover, it suffices to use a circular Boolean array D[0..k] with k = maxx∈X |x| (instead of |w| + 1), and compute indexes in this array modulo k + 1 (which means that for m ∈ Z, one has D[m] = D[r] with 0 r < k + 1 and m = r mod (k + 1)). This yields the following algorithm. MEMBERSHIP(w, X = {x0 , . . . , xn−1 }) 1 Preprocessing step 2 for ← 0 to n − 1 do 3 A ← AUTOMATONFROMWORD(x ) 4 Main loop 5 for ← 0 to n − 1 do 6 p is the current state of the automaton A . 7 p ← i 8 D[−1] ← true 9 for i ← 0 to |w| − 1 do 10 D[i] ← false 11 for ← 0 to n − 1 do 12 p ← (p , w[i]) 13 ←0 14 do if (p = t and D[i − |x |] = true) then 15 D[i] ← true 16 ←+1 17 while ( < n and D[i] = false) 18 return D[|w| − 1] We can easily modify the algorithm while preserving the same complexity by exiting whenever all the elements of the array D from 0 to k are false. In this case, w ∈ / X∗ . The following proposition gives the complexity of the above algorithm. Proposition 5. The time complexity of the algorithm MEMBERSHIP() is O(Card(X) × |w| + |X|). The analysis of the space complexity shows that, except for the preprocessing step, the algorithm needs only O(maxx∈X |x|) additional space. In particular, the space complexity is independent of the length of the input word.
440
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
5. String matching automaton In the preceding section, we used for each word x ∈ X a distinct automaton A corresponding to A∗ x . To get a more efficient algorithm, we resort in this section to the well-known Aho–Corasick algorithm [1] which is built from a finite set of words X a deterministic complete automaton (not necessarily minimal) AX recognizing the language A∗ X. This automaton is the basis of many efficient algorithms on string matching problems and is often called the string matching automaton. It is a generalization of the automaton A associated to a single word. Let us briefly recall its construction. We let AX = (Pref (X), , ε, Pref (X) ∩ A∗ X) be the automaton where the set of states is Pref (X), the initial state is ε, the set of final states is Pref (X) ∩ A∗ X and the transition function is defined by
(p, a) = the longest suffix of pa which belongs to Pref (X). We associate to each word u ∈ A∗ , u = ε, the word Border X (u), or simply Border(u) when there is no ambiguity, defined by Border(u) = the longest proper suffix of u which belongs to Pref (X). The automaton AX can be easily built from the tree TX (cf. Section 3) of X by a breadth-first exploration and using the Border function. Indeed, one has if pa ∈ Pref (X), pa (p, a) = (Border(p), a) if p = ε and pa ∈ Pref (X), ε otherwise. A state p is terminal for AX if p is a word of X (i.e. p is terminal in the tree TX of X) or if a proper suffix of p is a word of X. The automaton AX can be built in time and space complexity O(|X|) if we use the function Border as a failure function (see [4,3] for implementation details). We will say, for simplicity, that a state of the automaton is marked if it corresponds to a word of X and not marked otherwise. A major difference induced by the Aho–Corasick automaton is that a terminal state p, marked or not, corresponds to an ordered set Suff (p)∩X of suffixes of p. The order considered is given by the suffix relation suff where u suff v means that v is a proper suffix of u. We denote by SuffixChain(p) the sequence of words in Suff (p) ∩ X ordered by this relation. To find easily the words of SuffixChain(p), we associate to each terminal state p of AX the state SuffixLink(p) = the longest proper suffix of p which belongs to X. Thus we have
if Border(p) ∈ X, Border(p) SuffixLink(p) = SuffixLink(Border(p)) if Border(p) ∈ X and Border(p) = ε, undefined otherwise.
Since SuffixLink(p) is computed in time O(|p|), the preprocessing can be done in time and space complexity O(|X|), i.e. the complexity of the Aho–Corasick algorithm.
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
a
a
a
b
a
b
a
a
b a
b
a a
b a
b
a
b
b
a b
b b
b
441
a
a
a
Fig. 4. For the set X = {a 2 , a 4 b, a 3 ba, a 2 b, ab} of Example 7: Tree TX (left), Aho–Corasick automaton with links Border (middle) and the new links SuffixLink (right) to add to the Aho–Corasick automaton.
To decide whether an input word w belongs to X ∗ or not (and get eventually its Xdecompositions), we use the same technique as in the previous section, considering this time the automaton AX (instead of the n automata A ). The immediate advantage is that each letter of the word w is read only once (meaning that only one transition is made in the automaton) whereas each letter was read n times before (one per automaton A ). Let us suppose that for the current prefix w[0..i] of w, the automaton AX ends in a terminal state p. This means that w[0..i] = fp with f ∈ A∗ and p the longest suffix of w[0..i] in Pref (X) ∩ A∗ X. Consequently, w[0..i] ∈ X ∗ if and only if w[0..i − |x|] ∈ X ∗ for at least one word x of SuffixChain(p). This is easily checked using the marking of terminal states (whether they correspond exactly to a word of X or not), the function SuffixLink(p) and the array D (which plays exactly the same role as in the previous section). This yields our main result, stated in the following proposition. Proposition 6. Let X be a finite set of words on A. The membership test of a word w in X ∗ can be done in time O(r × |w| + |X|) where r is the maximal length of the suffix chains in X. The space complexity includes O(|X|) for the preprocessing step (building the Aho– Corasick automaton) and O(maxx∈X |x|) for the rest of the algorithm. If X is a suffix code, the complexity, except for the preprocessing step, becomes O(|w|) which is optimal, whereas the worst case happens when all words are suffixes of one another giving the same complexity O(Card(X) × |w|) as in the previous section. Note also that in the particular case where X is a prefix code, it is easy to solve the membership problem for X ∗ in an optimal time O(|w|) after an O(|X|) preprocessing step. Example 7. Let X = {a 2 , a 4 b, a 3 ba, a 2 b, ab}. For the word w = a 5 b, it is necessary to follow the suffix chain SuffixChain(a 4 b) = (a 4 b, a 2 b, ab) since after parsing w the automaton is in the state corresponding to a 4 b and the unique X-decomposition is a 5 b = a 2 · a 2 · ab. Fig. 4 shows the tree TX (left), the automaton AX with the links representing the
442
J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442
failure function Border (middle) and the SuffixLink representing the suffix chains (right) to add to the Aho–Corasick automaton. Acknowledgements We thank the referee for pointing us the reference to Shallit [15] used in Example 3. The style for algorithms is algochl.sty from [3] and automata are drawn thanks to gastex. References [1] A.V. Aho, M.J. Corasick, Efficient string matching: an aid to bibliographic search, Commun. ACM 18 (6) (1975) 333–340. [2] A. Apostolico, R. Giancarlo, Pattern matching implementation of a fast test for unique decipherability, Inform. Process. Lett. 18 (1984) 155–158. [3] M. Crochemore, C. Hancart, T. Lecroq, Algorithmique du texte, Vuibert, 2001, 347pp. [4] M. Crochemore, W. Rytter, Jewels of Stringology, World Scientific, Hong-Kong, 2002, 310pp. [5] Z. Galil, Open problems in stringology, in: A. Apostolico, Z. Galil (Eds.), Combinatorial Algorithms on Words, Springer, Berlin, 1985, pp. 1–8. [6] C.M. Hoffmann, A note on unique decipherability, in: MFCS, Lecture Notes in Computer Science, Vol. 176, Springer, Berlin, New York, 1984, pp. 50–63. [7] J. Hopcroft, R. Motwani, J. Ullman, Introduction to Automata Theory, Languages and Computation, AddisonWesley, Reading MA, 2001. [8] R. McCloskey, An o(n2 ) time algorithm for deciding whether a regular language is a code, J. Comput. Inform. 2 (1) (1996) 79–89 Special issue: Proc. Eighth Internat. Conf. on Computing and Information, ICCI’96. [9] D. Perrin, Finite automata, in: J. Leeuwen (Ed.), Handbook of Theoretical Computer Science, Formal Models and Semantics, Vol. B, Elsevier, Amsterdam, 1990, pp. 1–57. [10] D. Revuz, Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci. 92 (1) (1992) 181–189. [11] M. Rodeh, A fast test for unique decipherability based on suffix trees, IEEE Trans. Inform. Theory 28 (1982) 648–651. [12] J. Sakarovitch, Eléments de théorie des automates, Vuibert, 2003. [13] A. Sardinas, G. Patterson, A necessary and sufficient condition for the unique decomposition of coded messages, in: IRE Convention Record, part 8, 1953, pp. 104–108. [14] M.-P. Schützenberger, A property of finitely generated submonoids of free monoids, in: G. Pollak (Ed.), Algebraic Theory of Semigroups, Proc. Sixth Algebraic Conf., Szeged, 1976, North-Holland, Amsterdam, 1979, pp. 545–576. [15] J. Shallit, Regular expressions, enumeration and state complexity, invited talk at Ninth Internat. Conf. on Implementation and Application of Automata (CIAA 2004) Queen’s University, Kingston, Ontario, Canada, July 22–24, 2004. [16] S. Yu, Regular languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer, Berlin, New York, 1997, pp. 41–110. [17] S. Yu, State complexity of regular languages, in: Proc. Descriptional Complexity of Automata, Grammars and Related Structures, 1999, pp. 77–88.
Theoretical Computer Science 340 (2005) 443 – 456 www.elsevier.com/locate/tcs
A topological approach to transductions Jean-Éric Pina,∗ , Pedro V. Silvab a LIAFA, Université Paris VII and CNRS, Case 7014, 2 Place Jussieu, 75251 Paris Cedex 05, France b Centro de Matemática, Faculdade de Ciências, Universidade do Porto, R. Campo Alegre 687,
4169-007 Porto, Portugal
Abstract This paper is a contribution to the mathematical foundations of the theory of automata. We give a topological characterization of the transductions from a monoid M into a monoid N, such that if R is a recognizable subset of N, −1 (R) is a recognizable subset of M. We impose two conditions on the monoids, which are fullfilled in all cases of practical interest: the monoids must be residually finite and, for every positive integer n, must have only finitely many congruences of index n. Our solution proceeds in two steps. First we show that such a monoid, equipped with the so-called Hall distance, is a metric space whose completion is compact. Next we prove that can be lifted to a map ˆ from M into the set of compact subsets of the completion of N. This latter set, equipped with the Hausdorff metric, is again a compact monoid. Finally, our main result states that −1 preserves recognizable sets if and only if ˆ is continuous. © 2005 Elsevier B.V. All rights reserved.
1. Introduction This paper is a contribution to the mathematical foundations of automata theory. We are mostly interested in the study of transductions from a monoid M into another monoid N such that, for every recognizable subset R of N, −1 (R) is a recognizable subset of M. We propose to call such transductions continuous, a term introduced in [7] in the case where M is a finitely generated free monoid.
∗ Corresponding author.
E-mail addresses: [email protected] (J.-É. Pin), [email protected] (P.V. Silva). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.029
444
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
In mathematics, the word “continuous” generally refers to a topology. The aim in this paper is to find appropriate topologies for which our use of the term continuous coincides with its usual topological meaning. This problem was already solved when is a mapping from A∗ into B ∗ . In this case, a result which goes back at least to the 1980s (see [14]) states that is continuous in our sense if and only if it is continuous for the profinite topology on A∗ and B ∗ . We shall not attempt to define here the profinite topology and the reader is referred to [3,4,21] for more details. This result actually extends to mappings from A∗ into a residually finite monoid N, thanks to a result of Berstel et al. [7] recalled below (Proposition 2.3). However, a transduction : M → N is not a map from M into N, but a map from M into the set of subsets of N, which calls for a more sophisticated solution, since it does not suffice to find an appropriate topology on N. Our solution proceeds in two steps. We first show, under fairly general assumptions on M and N, which are fulfilled in all cases of practical interest, that M and N can be equipped with a metric, the Hall metric, for which they become metric monoids whose completion (as metric spaces) is compact. Next we prove that can ) of compact subsets of N , the completion be lifted to a map from M into the monoid K(N ), equipped with the Hausdorff metric, is again a compact monoid. of N. The monoid K(N Finally, our main result states that is continuous in our sense if and only if is continuous in the topological sense. Our paper is organised as follows. Basic results on recognizable sets and transductions are recalled in Section 2. Section 3 is devoted to topology and is divided into several subsections: Section 3.1 is a reminder of basic notions in topology, metric monoids and the Hall metric are introduced in Sections 3.2 and 3.3, respectively. The connections between clopen and recognizable sets are discussed in Section 3.5 and Section 3.6 deals with the monoid of compact subsets of a compact monoid. Our main result on transductions is presented in Section 4. Examples like the transductions (x, n) → x n and x → x ∗ are studied in Section 5. The paper ends with a short conclusion.
2. Recognizable languages and transductions Recall that a subset P of a monoid M is recognizable if there exists a finite monoid F and a monoid morphism : M → F and a subset Q of F such that P = −1 (Q). The set of recognizable subsets of M is denoted by Rec(M). Recognizable subsets are closed under boolean operations, quotients and inverse morphisms. By Kleene’s theorem, a subset of a finitely generated free monoid is recognizable if and only if it is rational. The description of the recognizable subsets of a product of monoids was given by Mezei (see [5, p. 54] for a proof). Theorem 2.1 (Mezei). Let M1 , . . . , Mn be monoids. A subset of M1 × · · · × Mn is recognizable if and only if it is a finite union of subsets of the form R1 × · · · × Rn , where Ri ∈ Rec(Mi ). The following result is perhaps less known (see [5, p. 61]).
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
445
Proposition 2.2. Let A1 , …, An be finite alphabets. Then Rec(A∗1 × A∗2 × · · · × A∗n ) is closed under concatenation product. Given two monoids M and N, recall that a transduction from M into N is a relation on M and N, that we shall also consider as a map from M into the monoid of subsets of N. If X is a subset of M, we set (X) = (x). x∈X
Observe that “transductions commute with union”: if (Xi )i∈I is a family of subsets of M, then Xi = (Xi ). i∈I
i∈I
If : M → N is a transduction, then the inverse relation −1 : N → M is also a transduction, and if P is a subset of N, the following formula holds:
−1 (P ) = {x ∈ M | (x) ∩ P = ∅}. A transduction : M → N preserves recognizable sets if, for every set R ∈ Rec(M), (R) ∈ Rec(N ). It is said to be continuous if −1 preserves recognizable sets, that is, if for every set R ∈ Rec(N ), −1 (R) ∈ Rec(M). Continuous transductions were characterized in [7] when M is a finitely generated free monoid. Recall that a transduction : M → N is rational if it is a rational subset of M × N . According to Berstel et al. [7], a transduction : A∗ → N is residually rational if, for any morphism : N → F , where F is a finite monoid, the transduction ◦ : A∗ → F is rational. We can now state: Proposition 2.3 (Berstel et al. [7] ). A transduction : A∗ → N is continuous if and only if it is residually rational. 3. Topology The aim of this section is to give a topological characterization of the transductions from a monoid into another monoid such that −1 preserves recognizable sets. Even if topology is undoubtedly part of the background of the average mathematician, it is probably not a daily concern of the specialists in automata theory to which this paper is addressed. For those readers whose memories in topology might be somewhat blurry, we start with a brief overview of some key concepts in topology used in this paper. 3.1. Basic notions in topology A metric d on a set E is a map from E into the set of nonnegative real numbers satisfying the three following conditions, for all (x, y, z) ∈ E 3 : (1) d(x, y) = 0 if and only if x = y,
446
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
(2) d(y, x) = d(x, y), (3) d(x, z) d(x, y) + d(y, z). A metric is an ultrametric if (3) is replaced by the stronger condition (3 ) d(x, z) max{d(x, y), d(y, z)}. A metric space is a set E together with a metric d on E. Given a positive real number and an element x in E, the open ball of center x and radius is the set B(x, ) = {y ∈ E | d(x, y) < }. A function from a metric space (E, d) into another metric space (E , d ) is uniformly continuous if, for every > 0, there exists > 0 such that, for all (x, x ) ∈ E 2 , d(x, x ) < implies d((x), (x )) < . It is an isometry if, for all (x, x ) ∈ E 2 , d((x), (x )) = d(x, x ). A sequence (xn )n 0 of elements of E is converging to a limit x ∈ E if, for every > 0, there exists N such that for all integers n > N , d(xn , x) < . It is a Cauchy sequence if, for every positive real number > 0, there is an integer N such that for all integers p, q N , d(xp , xq ) < . A metric space E is said to be complete if every Cauchy sequence of elements of E converges to a limit. containing E as For any metric space E, one can construct a complete metric space E, 1 a dense subspace and satisfying the following universal property: if F is any complete metric space and is any uniformly continuous function from E to F, then there exists → F which extends . The space E is : E a unique uniformly continuous function determined up to isometry by this property, and is called the completion of E. Metric spaces are a special instance of the more general notion of topological space. A topology on a set E is a set T of subsets of E, called the open sets of the topology, satisfying the following conditions: (1) ∅ and E are in T , (2) T is closed under arbitrary union, (3) T is closed under finite intersection. The complement of an open set is called a closed set. The closure of a subset X of E, denoted by X, is the intersection of the closed sets containing X. A subset of E is dense if its closure is equal to E. A topological space is a set E together with a topology on E. A map from a topological space into another one is continuous if the inverse image of each open set is an open set. A basis for a topology on E is a collection B of open subsets of E such that every open set is the union of elements of B. The open sets of the topology generated by B are by definition the arbitrary unions of elements of B. In the case of a metric space, the open balls form a basis of the topology. A topological space (E, T ) is Hausdorff if for each u, v ∈ E with u = v, there exist disjoint open sets U and V such that u ∈ U and v ∈ V . A family of open sets (Ui )i∈I is said to cover a topological space (E, T ) if E = i∈I Ui . A topological space (E, T ) is said to be compact if it is Hausdorff and if, for each family of open sets covering E, there exists a finite subfamily that still covers E. To conclude this section, we remind the reader of a classical result on compact sets. 1 See definition below.
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
447
Proposition 3.1. Let T and T be two topologies on a set E. Suppose that (E, T ) is compact and that (E, T ) is Hausdorff. If T ⊆ T , then T = T . Proof. Consider the identity map from (E, T ) into (E, T ). It is a continuous map, since T ⊆ T . Therefore, if F is closed in (E, T ), it is compact, and its continuous image (F ) in the Hausdorff space (E, T ) is also compact, and hence closed. Thus −1 is also continuous, whence T = T . 3.2. Metric monoids Let M be a monoid. A monoid morphism : M → N separates two elements u and v of M if (u) = (v). By extension, we say that a monoid N separates two elements of M if there exists a morphism : M → N which separates them. A monoid is residually finite if any pair of distinct elements of M can be separated by a finite monoid. Residually finite monoids include finite monoids, free monoids, free groups and many others. They are closed under direct products and thus monoids of the form A∗1 ×A∗2 ×· · ·×A∗n are also residually finite. A metric monoid is a monoid equipped with a metric for which its multiplication is uniformly continuous. Finite monoids, equipped with the discrete metric, are examples of metric monoids. More precisely, if M is a finite monoid, the discrete metric d is defined by 0 if s = t, d(s, t) = 1 otherwise. In the sequel, we shall systematically consider finite monoids as metric monoids. Morphisms between metric monoids are required to be uniformly continuous. 3.3. Hall metric Any residually finite monoid M can be equipped with the Hall metric d, defined as follows. We first set, for all (u, v) ∈ M 2 : r(u, v) = min{Card(N ) N separates u and v}. Then we set d(u, v) = 2−r(u,v) , with the usual conventions min ∅ = +∞ and 2−∞ = 0. Let us first establish some general properties of d. Proposition 3.2. In a residually finite monoid M, d is an ultrametric. Furthermore, the relations d(uw, vw) d(u, v) and d(wu, wv) d(u, v) hold for every (u, v, w) ∈ M 3 . Proof. It is clear that d(u, v) = d(v, u). Suppose that d(u, v) = 0. Then u cannot be separated from v by any finite monoid, and since M is residually finite, this shows that u = v. Finally, let (u, v, w) ∈ M 3 . First assume that u = w. Since M is residually finite, u and w can be separated by some finite monoid F. Therefore F separates either u and v, or v and w. It follows that min{(r(u, v), r(v, w)} r(u, w) and hence d(u, w) max{d(u, v), d(v, w)}. This relation clearly also holds if u = w.
448
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
The second assertion is trivial. A finite monoid separating uw and vw certainly separates u and v. Therefore d(uw, vw) d(u, v) and dually, d(wu, wv) d(u, v). The next two propositions state two fundamental properties of the Hall metric. Proposition 3.3. Multiplication on M is uniformly continuous for the Hall metric. Thus (M, d) is a metric monoid. Proof. It is a consequence of the following relation d(uv, u v ) max{d(uv, uv ), d(uv , u v )} max{d(v, v ), d(u, u )} which follows from Proposition 3.2. Proposition 3.4. Let M be a residually finite monoid. Then any morphism from (M, d) onto a finite discrete monoid is uniformly continuous. Proof. Let be a morphism from M onto a finite monoid F. Then by definition of d, d(u, v) < 2−|F | implies (u) = (v). Thus is uniformly continuous. d), is called the Hall comThe completion of the metric space (M, d), denoted by (M, pletion of M. Since multiplication on M is uniformly continuous, it extends, in a unique which is again uniformly continuous. In particular, M way, into a multiplication onto M, any morphism from is a metric, complete monoid. Similarly, Proposition 3.4 extends to M: d) onto a finite discrete monoid is uniformly continuous. (M, is compact. We now characterize the residually finite monoids M such that M is compact if and only if, for Proposition 3.5. Let M be a residually finite monoid. Then M every positive integer n, there are only finitely many congruences of index n on M. Proof. Recall that the completion of a metric space is compact if and only if it is precompact, that is, for every > 0, it can be covered by a finite number of open balls of radius . Denote by Cn the set of all congruences on M of index n and let n be the intersection of all congruences of Cn . is compact and let n > 0. Since M is precompact, there exist a Assume first that M finite subset F of M such that the balls B(x, 2−n ), with x ∈ F , cover M. Let x ∈ F and y ∈ B(x, 2−n ). Then r(x, y) > n and thus the monoids of size n cannot separate x from y. It follows that x y for each ∈ Cn and thus x n y. Therefore n is a congruence of finite index, whose index is at most |F |. Now each congruence of Cn is coarser than n , and since there are only finitely many congruences coarser than n , Cn is finite. Conversely, assume that, for every positive integer n, there are only finitely many congruences of index n on M. Given > 0, let n be an integer such that 2−n <. Since Cn is finite, n is a congruence of finite index on M. Let F be a finite set of representatives of the classes of n . If x ∈ F and x n y, then (x) = (y) for each morphism from M onto a monoid of size n. Thus r(x, y) > n and so d(x, y) < 2−n < . It follows that M is is compact. covered by a finite number of open balls of radius . Therefore M An important sufficient condition is given in the following corollary.
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
449
is Corollary 3.6. Let M be a residually finite monoid. If M is finitely generated, then M compact. Proof. Let n>0. There are only finitely many monoids of size n. Since M is finitely generated, there are only finitely many morphisms from M onto a monoid of size n. Now, since any congruence of index n is the kernel of such a morphism, there are only finitely many is compact. congruences on M of index n. It follows by Proposition 3.5 that M 3.4. Hall-compact monoids Proposition 3.5 justifies the following terminology. We will say that a monoid M is Hallcompact if it is residually finite and if, for every positive integer n, there are only finitely many congruences of index n on M. Proposition 3.5 can now be rephrased as follows: is compact.” “A residually finite monoid M is Hall-compact if and only if M and Corollary 3.6 states that “Every residually finite and finitely generated monoid is Hall-compact.” The class of Hall-compact monoids includes most of the examples used in practice: finitely generated free monoids (resp. groups), finitely generated free commutative monoids (resp. groups), finite monoids, trace monoids, finite products of such monoids, etc. The next proposition shows that the converse to Corollary 3.6 does not hold. Proposition 3.7. There exists a residually finite, nonfinitely generated monoid M such that is compact. M Proof. Let P be the set of all prime numbers and let M = p∈P Z/p Z, where Z/p Z denotes the additive cyclic group of order p. It is clear that M is residually finite. Furthermore, in a finitely generated commutative group, the subgroup consisting of all elements of finite period is finite [12]. It follows that M is not finitely generated. Let n > 0 and let : M → N be a morphism from M onto a finite monoid of size n. Since M is a commutative group, N is also a commutative group. For every prime p > n, the order of the image of a generator of Z/p Z must divide p and be n, hence the image of this generator must be 0. Consequently, any such morphism is determined by the images of the generators of Z/p Z for p n, and so there are only finitely many of them. Therefore is compact by there are only finitely many congruences on M of index n and so M Proposition 3.5. 3.5. Clopen sets versus recognizable sets Recall that a clopen subset of a topological space is a subset which is both open and closed. A topological space is zero-dimensional if its clopen subsets form a basis for its topology. d) are zeroProposition 3.8. Let M be a residually finite monoid. Then (M, d) and (M, dimensional.
450
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
Proof. The open balls of the form B(x, 2−n ) = {y ∈ M | d(x, y) < 2−n }, | d(x, y) < 2−n }, 2−n ) = {y ∈ M B(x, and n is a positive integer, form a basis of the Hall topology where x belongs to M (resp. M) But these balls are clopen since of M (resp. M). {y | d(x, y) < 2−n } = {y | d(x, y) 2−(n+1) }. d) are zero-dimensional. It follows that (M, d) and (M,
is profinite (see Proposition 3.8 implies that if M is a Hall-compact monoid then M [1,3,4,21] for the definition of profinite monoids and several equivalent properties), but we will not use this result in this paper. We now give three results relating clopen sets and recognizable sets. The first one is due to Hunter [9, Lemma 4], the second one summarizes results due to Numakura [13] (see also [17,2]). The third result is stated in [3] for free profinite monoids. For the convenience of the reader, we present a self-contained proof of the second and the third results. Recall that the syntactic congruence of a subset P of a monoid M is defined, for all u, v ∈ M, by s∼t
if and only if, for all (x, y) ∈ M 2 ,
xuy ∈ P ⇔ xvy ∈ P .
It is the coarsest congruence of M which saturates P. Lemma 3.9 (Hunter’s Lemma [9]). In a compact monoid, the syntactic congruence of a clopen set is clopen. Proposition 3.10. In a compact monoid, every clopen subset is recognizable. If M is a is clopen. residually finite monoid, then every recognizable subset of M Proof. Let M be a compact monoid, let P be a clopen subset of M and let ∼P be its syntactic congruence. By Hunter’s Lemma, ∼P is clopen. Thus for each x ∈ M, there exists an open neighborhood G of x such that G × G ⊆ ∼P . Therefore G is contained in the ∼P -class of x. This proves that the ∼P -classes form an open partition of M. By compactness, this partition is finite, and hence P is recognizable. Suppose now that M is a residually finite monoid and let P be a recognizable subset Let : M → F be the syntactic morphism of P. Since P is recognizable, F is of M. finite and by Proposition 3.4, is uniformly continuous. Now P = −1 (Q) for some subset Q of F. Since F is discrete and finite, Q is a clopen subset of F and hence P is also clopen. The last result of this subsection is a clone of a standard result on free profinite monoids (see [3] for instance).
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
451
Proposition 3.11. Let M be a Hall-compact monoid, let P be a subset of M and let P be The following conditions are equivalent: its closure in M. (1) P is recognizable, (2) P = K ∩ M for some clopen subset K of M, and P = P ∩ M, (3) P is clopen in M and P = P ∩ M. (4) P is recognizable in M Proof. (1) implies (2). Let : M → F be the syntactic monoid of P and let Q = (P ). Since F is finite, is uniformly continuous by Proposition 3.4 and extends to a uniformly → F . Thus K = continuous morphism :M −1 (Q) is clopen and satisfies K ∩ M = P . Then the (2) implies (3). Suppose that P = K ∩ M for some clopen subset K of M. equality P = P ∩ M follows from the following sequence of inclusions P ⊆ P ∩ M = (K ∩ M) ∩ M ⊆ K ∩ M = K ∩ M = P . K ∩ M is dense in K. Thus P = Furthermore, since K is open and M is dense in M, K ∩ M = K = K. Thus P is clopen in M. The equivalence of (3) and (4) follows from Proposition 3.10, which shows that in M, the notions of clopen set and of recognizable set are equivalent. → F be the syntactic monoid of P and let Q = (4) implies (1). Let :M (P ). Let be the restriction of to M. Then we have P = P ∩ M = −1 (Q) ∩ M = −1 (Q). Thus P is recognizable. 3.6. The monoid of compact subsets of a compact monoid Let M be a compact monoid, and let K(M) be the monoid of compact subsets of M. The Hausdorff metric on K(M) is defined as follows. For K, K ∈ K(M), let
(K, K ) = sup inf d(x, x ) x∈K x ∈K max((K, K ), (K , K)) if K and K are nonempty, if K and K are empty, h(K, K ) = 0 1 otherwise. The last case occurs when one and only one of K or K is empty. By a standard result of topology, K(M), equipped with this metric, is compact. The next result states a property of clopen sets which will be crucial in the proof of our main result. and let Proposition 3.12. Let M be a Hall-compact monoid, let C be a clopen subset of M : K(M) → K(M) be the map defined by (K) = K ∩ C. Then is uniformly continuous for the Hausdorff metric. Proof. Since C is open, every element x ∈ C belongs to some open ball B(x, ) contained is compact, C is also compact and can be covered by a finite number of these in C. Since M open balls, say (B(xi , i ))1 i n . Let > 0 and let = min{1, , 1 , . . . , n }. Suppose that h(K, K ) < with K = K . Then K, K = ∅, d(x, K ) < for every x ∈ K and d(x , K) < for every x ∈ K .
452
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
Suppose that x ∈ K ∩ C. Since d(x, K ) < , we have d(x, x ) < for some x ∈ K . Furthermore, x ∈ B(xi , i ) for some i ∈ {1, . . . , n}. Since d is an ultrametric, the relations d(x, xi ) < i and d(x, x ) < i imply that d(x , xi ) < i and thus x ∈ B(xi , i ). Now since B(xi , i ) is contained in C, x ∈ K ∩ C and hence d(x, K ∩ C) < < . By symmetry, d(x , K ∩ C) < for every x ∈ K ∩ C. Hence h(K ∩ C, K ∩ C ) < and is continuous. 4. Transductions ), Let M and N be Hall-compact monoids and let : M → N be a transduction. Then K(N ) equipped with the Hausdorff metric, is also a compact monoid. Define a map : M → K(N (x) = (x). by setting, for each x ∈ M, Theorem 4.1. The transduction −1 preserves the recognizable sets if and only if is uniformly continuous. is compact, it Proof. Suppose that −1 preserves the recognizable sets. Let > 0. Since N can be covered by a finite number of open balls of radius /2, say = B(xi , /2). N 1i k
is zero-dimensional by Proposition 3.8, its clopen subsets constitute a basis for Since N is a union of its topology. Thus every open ball B(xi , /2) is a union of clopen sets and N clopen sets each of which is contained in a ball of radius /2. By compactness, we may assume that this union is finite. Thus = Cj , N 1j n
where each Cj is a clopen set contained in, say, B(xij , /2). It follows now from Proposition 3.11 that Cj ∩N is a recognizable subset of N. Since −1 preserves the recognizable sets, the sets Lj = −1 (Cj ∩ N ) are also recognizable. By Proposition 3.4, the syntactic morphism of Lj is uniformly continuous and thus, there exists j such that d(u, v) < j implies u ∼Lj v. Taking = min{j | 1 j n}, we have for all (u, v) ∈ M 2 , d(u, v) < ⇒ for all j ∈ {1, . . . , n}, u ∼Lj v. We claim that, whenever d(u, v) < , we have h((u), (v)) < . By definition, Lj = {x ∈ M | (x) ∩ Cj ∩ N = ∅}. / 1 j n Lj . Since u ∼Lj v for Suppose first that (u) = ∅. Then u ∈ every j, it follows that v ∈ / 1 j n Lj , so (v) ∩ Cj ∩ N = ∅ for 1 j n. Since N = 1 j n (Cj ∩ N ), it follows that (v) = ∅. by symmetry, we conclude that (u) = ∅ if and only if (v) = ∅. Thus we may assume that both (u) and (v) are nonempty. Let y ∈ (u). Then y ∈ Cj ∩ N for some j ∈ {1, . . . , n} and so u ∈ Lj . Since u ∼Lj v, it follows that v ∈ Lj and
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
453
hence there exists some z ∈ (v) such that z ∈ Cj ∩ N . Since Cj ⊆ B(xij , /2), we obtain d(xij , y) < /2 and d(xij , z) < /2, whence d(y, z) < /2 since d is an ultrametric. Thus d(y, (v)) < /2. Since (u) is dense in (u), it follows that d(x, (v)) /2 for every x ∈ (u) and so
((u), (v)) /2 < . By symmetry, ((v), (u)) < and hence h((u), (v)) < as required. Next we show that if is uniformly continuous, then −1 preserves the recognizable can be extended to a uniformly continuous mapping sets. First, → K(N ). ˇ : M Let L be a recognizable subset of N. By Proposition 3.11, L = C ∩ N for some clopen . Let subset C of N ) | K ∩ C = ∅}. R = {K ∈ K(N ). Let : K(N ) → K(N ) be the map defined by We show that R is a clopen subset of K(N (K) = K∩C. By Proposition 3.12, is uniformly continuous and since R = −1 ({∅}c ) = ). Since B(∅, 1) = {∅}, {∅} is [−1 ({∅})]c , it suffices that {∅} is a clopen subset of K(N c / B(K, 1), we have B(K, 1) ⊆ {∅}c and so {∅}c is also open. open. Let K ∈ {∅} . Since ∅ ∈ Therefore {∅} is clopen and so is R. Since ˇ is continuous, ˇ −1 (R) is a clopen subset of M −1 and so M ∩ ˇ (R) is recognizable by Proposition 3.11. Now M ∩ ˇ −1 (R) = {u ∈ M | ˇ (u) ∈ R} = {u ∈ M | (u) ∈ R} = {u ∈ M | (u) ∩ C = ∅}. Since C is open, we have (u) ∩ C = ∅ if and only if (u) ∩ C = ∅, hence M ∩ ˇ −1 (R) = {u ∈ M | (u) ∩ C = ∅} = {u ∈ M | (u) ∩ L = ∅} = −1 (L) and so −1 (L) is a recognizable subset of M. Thus −1 preserves the recognizable sets.
5. Examples of continuous transductions A large number of examples of continuous transductions can be found in the literature [6–8,10,11,15,16,18,20]. We state without proof two elementary results: continuous transductions are closed under composition and include constant transductions. Proposition 5.1. Let L ⊆ N and let L : M → N be the transduction defined by L (x) = L. Then L is continuous.
454
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
Theorem 5.2. The composition of two continuous transductions is a continuous transduction. Continuous transductions are also closed under product, in the following sense: Proposition 5.3. Let 1 : M → N1 and 2 : M → N2 be continuous transductions. Then the transduction : M → N1 × N2 defined by (x) = 1 (x) × 2 (x) is continuous. Proof. Let R ∈ Rec(N1 × N2 ). By Mezei’s Theorem, we have R = ni=1 Ki × Li for some Ki ∈ Rec N1 and Li ∈ Rec N2 . Hence
−1 (R) = {x ∈ M | (x) ∩ R = ∅}
n = x ∈ M | (1 (x) × 2 (x)) ∩ Ki × L i = ∅ = =
n
i=1
{x ∈ M | 1 (x) ∩ Ki = ∅ and 2 (x) ∩ Li = ∅}
i=1 n i=1
−1 −1 (K ) ∩ (L ) . i i 1 2
−1 Since 1 and 2 are continuous, each of the sets −1 1 (Ki ) and 2 (Li ) is recognizable and thus −1 (R) is recognizable. It follows that is continuous.
Further examples will be presented in a forthcoming paper. We just mention here a simple but nontrivial example. An automata-theoretic proof of this result was given in [19] and we provide here a purely algebraic proof. Proposition 5.4. The function : M × N → M defined by (x, n) = x n is continuous. Proof. Let R ∈ Rec M. Then
−1 (R) = {(x, n) ∈ M × N | x n ∈ R}. Let : M → F be the syntactic morphism of R in M and, for each s ∈ F , let Ps = {n ∈ N | s n ∈ (R)}. Then we have
−1 (R) = {(x, n) ∈ M × N | x n ∈ R} = {(x, n) ∈ M × N | (x) = s for some s ∈ F such that s n ∈ (R)} = {(x, n) ∈ M × N | x ∈ −1 (s) for some s ∈ F such that n ∈ Ps } −1 = (s) × Ps . s∈F
Each set −1 (s) is recognizable by construction, and thus it suffices to show that P
s ∈ Rec N for each s ∈ F . Given a finite cyclic monoid generated by a and some element b of this monoid, the set {n ∈ N | a n = b} is either empty or an arithmetic progression. Applying this fact to the finite cyclic submonoid generated by s in F, we conclude that Ps ∈ Rec N as required. Thus −1 (R) ∈ Rec(M × N) and hence is continuous.
Corollary 5.5. The transduction : M → M defined by (x) = x ∗ is continuous.
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
455
Proof. Let N : M → N be defined by N (x) = N. By Proposition 5.1, N is continuous. Since the identity map is trivially continuous, it follows from Proposition 5.3 that : M → M × N defined by (x) = {x} × N is continuous. Let : M × N → M be defined by (x, n) = x n . By Proposition 5.4, is continuous. Since = ◦ , it follows from Theorem 5.2 that is continuous. 6. Conclusion We gave some topological arguments to call continuous transductions whose inverse preserve recognizable sets. It remains to see whether this approach can be pushed forward to use purely topological arguments, like fixpoint theorems, to obtain new results on transductions and recognizable sets. Acknowledgements The second author acknowledges support from FCT through CMUP and the project POCTI/MAT/37670/2001, with funds from the programs POCTI and POSI, supported by national sources and the European Community fund FEDER. References [1] J. Almeida, Residually finite congruences and quasi-regular subsets in uniform algebras, Portugal. Math. 46 (3) (1989) 313–328. [2] J. Almeida, Finite semigroups: an introduction to a unified theory of pseudovarieties, in: G.M.S. Gomes, J.-E. Pin, P. Silva (Eds.), Semigroups, Algorithms, Automata and Languages, World Scientific, Singapore, 2002, pp. 3–64. [3] J. Almeida, Profinite semigroups and applications. in: Proc. SMS-NATO ASI Structural Theory of Automata, Semigroups and Universal Algebra, University of Montréal, July 2003, Preprint, in press. [4] J. Almeida, P. Weil, Relatively free profinite monoids: an introduction and examples, in: J. Fountain (Ed.), NATO Advanced Study Institute Semigroups, Formal Languages and Groups, Vol. 466, Kluwer Academic Publishers, Dordrecht, 1995, pp. 73–117. [5] J. Berstel, Transductions and Context-free Languages, Teubner, Stuttgart, 1979. [6] J. Berstel, L. Boasson, O. Carton, B. Petazzoni, J.-E. Pin, Operations preserving recognizable languages, in: A. Lingas, B.J. Nilsson (Eds.), Proc. FCT’2003, Lecture Notes in Computer Science, Vol. 2751, Springer, Berlin, 2003, pp. 343–354. [7] J. Berstel, L. Boasson, O. Carton, B. Petazzoni, J.-E. Pin, Operations preserving recognizable languages, Theoret. Comput. Sci. 2005, in press. [8] J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London, 1971. [9] R. Hunter, Certain finitely generated compact zero-dimensional semigroups, J. Austral. Math. Soc. (Ser. A) 44 (1988) 265–270. [10] S.R. Kosaraju, Correction to “Regularity preserving functions”, SIGACT News 6 (3) (1974) 22. [11] S.R. Kosaraju, Regularity preserving functions, SIGACT News 6 (2) (1974) 16–17. [12] S. Lang, Algebra, Graduate Texts in Mathematics, Vol. 211, Springer, New York, 2002. [13] K. Numakura, Theorems on compact totally disconnected semigroups and lattices, Proc. Amer. Math. Soc. 8 (1957) 623–626. [14] M. Petkovsek, A metric-space view of infinite words, Unpublished, personal communication. [15] J.-E. Pin, J. Sakarovitch, Operations and transductions that preserve rationality, in: Proc. Sixth GI Conf. Lecture Notes in Computer Science, Vol. 145, Springer, Berlin, 1983, pp. 617–628.
456
J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456
[16] J.-E. Pin, J. Sakarovitch, Une application de la représentation matricielle des transductions, Theoret. Comput. Sci. 35 (1985) 271–293. [17] J.-E. Pin, P. Weil, Uniformities on free semigroups, Internat. J. Algebra Comput. 9 (1999) 431–453. [18] J.I. Seiferas, R. McNaughton, Regularity-preserving relations, Theoret. Comput. Sci. 2 (1976) 147–154. [19] P.V. Silva, An application of first order logic to the study of recognizable languages, Internat. J. Algebra Comput. 14 (5/6) (2004) 785–799. [20] R.E. Stearns, J. Hartmanis, Regularity preserving modifications of regular expressions, Inform. Control 6 (1963) 55–69. [21] P. Weil, Profinite methods in semigroup theory, Internat. J. Algebra Comput. 12 (2002) 137–178.