Theoretical Computer Science, Volume 340, Issue 2, Pages 179-456 (27 June 2005), The Art of Theory

Theoretical Computer Science 340 (2005) 179 – 185 www.elsevier.com/locate/tcs Preface A good friend, a good companion, ...

Author: A. de Luca | F. Mignosi | D. Perrin | G. Rozenberg (eds.)

12 downloads 342 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD

Theoretical Computer Science 340 (2005) 179 – 185 www.elsevier.com/locate/tcs

Preface A good friend, a good companion, a brilliant research partner, an enemy of any redundance and useless diversions, restless in tackling scientiﬁc problems, an enthusiastic follower of research by his colleagues, a person that people around him can rely on. These are only some qualities of Antonio Restivo that we (the editors, with very different backgrounds) have come up with. These qualities supported him throughout his research career that began when he was a young theoretical physicist at the “Istituto di Cibernetica” (IC), Arco Felice, Naples, where he provided for his own Copernican revolution by moving to Theoretical Computer Science. Fundamental for this decision was meeting the research team of the Istituto of Cibernetica, directed by E.R. Caianiello, and especially the meeting with Professor M.P. Schützenberger—Antonio was very impressed by the way that they approached research problems and by the respect that they had for competence and hard work. The following lines from Divina Commedia by Dante, an author often quoted by Professor Schützenberger, have become a sort of common inheritance of the research team of IC, since they can be seen as a link between scientiﬁc discoveries and the truth. Vie più che ’ndarno da riva si parte, perché non torna tal qual è si move, chi pesca per lo vero e non ha l’arte. 1 (Dante Alighieri, Divina Commedia, Paradiso, Canto XIII, Lines 121–123) The art of research is the ability of going straight into the heart of problems, whenever you believe that this is the right direction, even if going this way implies a high cost and even if it means going against the general opinion. But, it is also a “complementary” ability of detecting and hence avoiding false avenues of research. These abilities as well as the ability of asking the right questions in order to generate new ones, have led him to many relationships of genuine scientiﬁc cooperation with researchers all over the world, in a sort of successful globalization of scientiﬁc research. This special issue is a testimony to the positive inﬂuence that Antonio Restivo had and still has on his fellow researchers. Many papers in this issue contain clear traces of his inﬂuence.

1 It is even worse than vain to go off shore for one who wants to ﬁnd the truth but hasn’t got the art, since he’ll come back not as he was before.

0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.019

180

Preface / Theoretical Computer Science 340 (2005) 179 – 185

His inﬂuence extends beyond individual researchers into whole research institutes. Some of them are particularly dear to Antonio’s heart—these are places where he worked or currently works, and where some of his dearest friends and students are still working: Palermo, Napoli, Paris—just to mention some of them. These are the places where he spent years of his own formation, and with which he still cooperates sharing his vast knowledge and the enthusiasm for research. We conclude by quoting Leonardo da Vinci’s words—this is a statement that Antonio wanted to be included on the website for students in a lecture in computer science at Palermo University; he thinks that this quotation encapsulates the fundamental meaning of research. Quelli che s’innamoran di pratica senza scienza, son come ’l nocchiere, ch’entra in naviglio senza timone o bussola, che mai ha certezza dove si vada. 2 (Leonardo da Vinci) To Antonio Restivo, a master of Theoretical Informatics, on the occasion of his 60th birthday. A. de Luca, F. Mignosi, D. Perrin, G. Rozenberg Leiden University, Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, Leiden 2333 CA, Netherlands E-mail address: [email protected]

Publications by Antonio Restivo [1] R. Ascoli, G. Epifanio, A. Restivo, On the mathematical description of quantized ﬁelds, Comm. Math. Phys. 18 (1970) 291–300. [2] A. Restivo, Codes and aperiodic languages, in: K.-H. Böhling, K. Indermark (Eds.), Automatentheorie und Formale Sprachen, Lecture Notes in Computer Science, Vol. 2, Springer, Berlin, 1973, pp. 175–181. [3] R. Ascoli, G. Epifanio, A. Restivo, *-Algebrae of unbounded operators in scalarproduct spaces, Riv. Mat. Univ. Parma, 3 (1974) 1–12. [4] A. Restivo, S. Termini, An algorithm for deciding whether a strictly locally testable submonoid is free, Cahiers Math. Université de Montpellier, Vol. 3, 1974, pp. 299–303. [5] A. Restivo, On a question of McNaughton and Papert, Inform. Control 25(1) (1974) 93–101. [6] A. Restivo, A combinatorial property of codes having ﬁnite synchronization delay, Theoret. Comput. Sci. 1(2) (1975) 95–101.

2 Those who fall in love with Practice without Theory are like the seaman on a boat without a steering wheel or a compass, who is never sure where he’ll land.

Preface / Theoretical Computer Science 340 (2005) 179 – 185

181

[7] A. Restivo, A characterization of bounded regular sets, in: H. Barkhage (Ed.), Automata Theory and Formal Languages, Lecture Notes in Computer Science, Vol. 33, Springer, Berlin, 1975, pp. 239–244. [8] A. Restivo, S. Termini, On a family of rational languages, in: E. Caianiello (Ed.), New Concepts and Technologies in Parallel Information Processing, Nato Advanced Study Institutes Series, Series E, Noordhoff, Leyden, 1975, pp. 349–357. [9] A. Restivo, On a family of codes related to factorization of cyclotomic polynomials, in: S. Michaelson, R. Milner (Eds.), ICALP, Edinburgh University Press, 1976, pp. 38–44. [10] L. Boasson, A. Restivo, Une caractérisation des langages algébriques bornés, ITA 11(3) (1977) 203–205. [11] A. Restivo, Mots sans répétitions et langages rationnels bornés, ITA 11(3) (1977) 197–202. [12] A. Restivo, On codes having no ﬁnite completions, Discrete Math. 17(3) (1977) 309–316. [13] A. Restivo, Some decision results for recognizable sets in arbitrary monoids, in: G. Ausiello, C. Böhm (Eds.), ICALP, Lecture Notes in Computer Science, Vol. 62, Springer, Berlin, 1978, pp. 363–371. [14] A. de Luca, A. Restivo, Synchronization and maximality for very pure subsemigroups of a free semigroup, in: J. Becvár (Ed.), MFCS, Lecture Notes in Computer Science, Vol. 74, Springer, Berlin, 1979, pp. 363–371. [15] J. Berstel, D. Perrin, J.F. Perrot, A. Restivo, Sur le théorème du defaut, J. Algebra 60 (1979) 169–180. [16] A. de Luca, D. Perrin, A. Restivo, S. Termini, Synchronization and simpliﬁcation, Discrete Math. 27 (1979) 287–308. [17] J.-M. Boë, A. de Luca, A. Restivo, Minimal complete sets of words, Theoret. Comput. Sci. 12 (1980) 325–332. [18] A. de Luca, A. Restivo, On some properties of very pure codes, Theoret. Comput. Sci. 10 (1980) 157–170. [19] A. de Luca, A. Restivo, A characterization of strictly locally testable languages and its applications to subsemigroups of a free semigroup, Inform. Control 44(3) (1980) 300–319. [20] A. de Luca, A. Restivo, On some properties of local testability, in: J.W. de Bakker, J. van Leeuwen (Eds.), ICALP, Lecture Notes in Computer Science, Vol. 85, Springer, Berlin, 1980, pp. 385–393. [21] S. Mauceri, A. Restivo, A family of codes commutatively equivalent to preﬁx codes, Inform. Process. Lett. 12(1) (1981) 1–4. [22] A. Restivo, Some remarks on complete subsets of a free monoid, 1981, pp. 19–25. [23] A. de Luca, A. Restivo, A synchronization property of pure subsemigroups of a free semigroup, 1981, pp. 233–240. [24] A. de Luca, A. Restivo, S. Salemi, On the centers of a language, Theoret. Comput. Sci. 24 (1983) 21–34. [25] A. Restivo, C. Reutenauer, Some applications of a theorem of Shirshov to language theory, Inform. Control 57(2/3) (1983) 205–213. [26] A. Restivo, S. Salemi, On weakly square free words, Bull. EATCS 21 (1983) 49–57.

182

Preface / Theoretical Computer Science 340 (2005) 179 – 185

[27] A. Restivo, C. Reutenauer, On cancellation properties of languages which are supports of ration power series, J. Comput. System Sci. 29(2) (1984) 153–159. [28] A. de Luca, A. Restivo, A ﬁniteness condition for ﬁnitely generated semigroups, Semigroup Forum 28(1–3) (1984) 123–134. [29] A. Restivo, C. Reutenauer, On the Burnside problem for semigroups, J. Algebra 89(1) (1984) 102–104. [30] A. de Luca, A. Restivo, Representations of integers and language theory, in: M. Chytil, V. Koubek (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 176, Springer, Berlin, 1984, pp. 407–415. [31] A. Restivo, C. Reutenauer, Cancellation, pumping and permutation in formal languages, in: J. Paredaens (Ed.), ICALP, Lecture Notes in Computer Science, Vol. 172, Springer, Berlin, 1984, pp. 414–422. [32] A. Restivo, S. Salemi, Overlap-free words on two symbols, in: M. Nivat, D. Perrin (Eds.), Automata on Inﬁnite Words, Lecture Notes in Computer Science, Vol. 192, Springer, Berlin, 1984, pp. 198–206. [33] A. Restivo, Rational languages and the Burnside problem, Theoret. Comput. Sci. 40 (1985) 13–30. [34] C. de Felice, A. Restivo, Some results on ﬁnite maximal codes, ITA 19(4) (1985) 383–403. [35] A. Restivo, S. Salemi, Some decision results on nonrepetitive words, Series. F, Vol. 12, NATO Adv. Sci. Inst., Springer, Berlin, 1985, pp. 289–295. [36] A. de Luca, A. Restivo, Star-free sets of integers, Theoret. Comput. Sci. 43 (1986) 265–275. [37] A. de Luca, A. Restivo, On a generalization of a conjecture of Ehrenfeucht, Bull. EATCS 30 (1986) 84–90. [38] A. Restivo, Codes and automata, in: J.-E. Pin (Ed.), Formal Properties of Finite Automata and Applications, Lecture Notes in Computer Science, Vol. 386, Springer, Berlin, 1988, pp. 186–198. [39] A. Restivo, Permutation properties and the Fibonacci semigroup, Semigroup Forum 38(3) (1989) 337–345. [40] A. Restivo, Finitely generated soﬁc systems, Theoret. Comput. Sci. 65(2) (1989) 265–270. [41] A. Restivo, S. Salemi, T. Sportelli, Completing codes, ITA 23(2) (1989) 135–147. [42] A. Restivo, A note on multiset decipherable codes, IEEE Transactions on Information Theory 35(3) (1989) 662. [43] A. Restivo, Coding sequences with constraints, in: R. Capocelli (Ed.), Sequences, Springer, New York, 1990, pp. 530–540. [44] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72(1) (1990) 55–64. [45] A. Restivo, Codes with constraint, in: M.P. Schutzenberger, M. Lothaire (Eds.), Mots, Langue, raisonnement, calcul, Hermes, 1990, pp. 358–366. [46] G. Guaiana, A. Restivo, S. Salemi, Complete subgraphs of bipartite graphs and applications to trace languages, ITA 24 (1990) 409–418. [47] G. Guaiana, A. Restivo, S. Salemi, On aperiodic trace languages, in: C. Choffrut, M. Jantzen (Eds.), STACS, Lecture Notes in Computer Science, Vol. 480, Springer, Berlin, 1991, pp. 76–88.

Preface / Theoretical Computer Science 340 (2005) 179 – 185

183

[48] G. Guaiana, A. Restivo, S. Salemi, Star-free trace languages, Theoret. Comput. Sci. 97(2) (1992) 301–311. [49] A. Restivo, A note on renewal systems, Theoret. Comput. Sci. 94(2) (1992) 367–371. [50] D. Giammarresi, A. Restivo, Recognizable picture languages, IJPRAI 6(2,3) (1992) 241–256. [51] R. Montalbano, A. Restivo, The star height one problem for irreducible automata, in: R. Capocelli et al. (Eds.), Sequences II, Springer, New York, 1993, pp. 457–469. [52] D. Giammarresi, A. Restivo, S. Seibert, W. Thomas, Monadic second-order logic over pictures and recognizability by tiling systems, in: P. Enjalbert, E.W. Mayr, K.W. Wagner (Eds.), STACS, Lecture Notes in Computer Science, Vol. 775, Springer, Berlin, 1994, pp. 365–375. [53] R. Montalbano, A. Restivo, Antonio on the star height of rational languages, Internat. J. Algebra Comput. 4(3) (1994) 427–441. [54] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, A periodicity theorem for trees, in: B. Pehrson, I. Simon (Eds.), Technology and Foundations—Information Processing ’94, Volume 1, Proceedings of the IFIP 13th World Computer Congress Hamburg, Germany, 28 August–2 September, 1994. North-Holland, Amsterdam, 1994, pp. 473–478. [55] M. Anselmo, A. Restivo, Factorizing languages, in: B. Pehrson, I. Simon (Eds.), Technology and Foundations—Information Processing ’94, Volume 1, Proceedings of the IFIP 13th World Computer Congress Hamburg, Germany, 28 August–2 September, 1994. North-Holland, Amsterdam, 1994, pp. 445–450. [56] F. Mignosi, A. Restivo, S. Salemi, A periodicity theorem on words and applications, in: J. Wiedermann, P. Hájek (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 969, Springer, Berlin, 1995, pp. 337–348. [57] D. Giammarresi, A. Restivo, S. Seibert, W. Thomas, Monadic second-order logic over rectangular pictures and recognizability by tiling systems, Inform. Comput. 125(1) (1996) 32–45. [58] D. Giammarresi, A. Restivo, Two-dimensional ﬁnite state recognizability, Fund. Inform. 25(3) (1996) 399–422. [59] M. Anselmo, A. Restivo, On languages factorizing the free monoid, Internat. J. Algebra Comput. 6(4) (1996) 413–427. [60] M.-P. Béal, F. Mignosi, A. Restivo, Minimal forbidden words and symbolic dynamics, in: C. Puech, R. Reischuk (Eds.), STACS, Lecture Notes in Computer Science, Vol. 1046, Springer, Berlin, 1996, pp. 555–566. [61] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, Congruences, automata and periodicities, in: J. Almeida, P.V. Silva, G.M.S. Gomes (Eds.), Semigroups, Automata and Languages, World Scientiﬁc, River Edge, NJ, 1996, pp. 125–135. [62] S. Mantaci, A. Restivo, Equations on trees, in: W. Penczek, A. Szalas (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 1113, Springer, Berlin, 1996, pp. 443–456. [63] M. Anselmo, C. De Felice, A. Restivo, On some factorization problems, Bull. Belg. Math. Soc. Simon Stevin 4(1) (1997) 25–43. [64] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 3, Springer, Berlin, 1997, pp. 215–267.

184

Preface / Theoretical Computer Science 340 (2005) 179 – 185

[65] D. Giammarresi, S. Mantaci, F. Mignosi, A. Restivo, Periodicities on trees, Theoret. Comput. Sci. 205(1–2) (1998) 145–181. [66] F. Mignosi, A. Restivo, S. Salemi, Periodicity and the golden ratio, Theoret. Comput. Sci. 204(1–2) (1998) 153–167. [67] M. Crochemore, F. Mignosi, A. Restivo, Automata and forbidden words, Inform. Process. Lett. 67(3) (1998) 111–117. [68] M. Crochemore, F. Mignosi, A. Restivo, Minimal forbidden words and factor automata, in: L. Brim, J. Gruska, J. Zlatuska (Eds.), MFCS, Lecture Notes in Computer Science, Vol. 1450, Springer, Berlin, 1998, pp. 665–673. [69] M.G. Castelli, F. Mignosi, A. Restivo, Fine and Wilf’s theorem for three periods and a generalization of Sturmian words, Theoret. Comput. Sci. 218(1) (1999) 83–94. [70] F. Mignosi, A. Restivo, On negative informations in language theory, Aust. Comput. Sci. Commun. 21(3) (1999) 60–72. [71] S. Mantaci, A. Restivo, On the defect theorem for trees, Publ. Math. Debrecen 54 (1999) 923–932. [72] D. Giammarresi, A. Restivo, Extending formal language hierarchies to higher dimensions, ACM Comput. Surv. 31(3es) (1999) 12. [73] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors in ﬁnite and inﬁnite words, in: J. Karhumäki, H.A. Maurer, G. Paun, G. Rozenberg (Eds.), Jewels are Forever, Springer, Berlin, 1999, pp. 339–350. [74] M. Crochemore, F. Mignosi, A. Restivo, S. Salemi, Text compression using antidictionaries, in: J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), ICALP Lecture Notes in Computer Science, Vol. 1644, Springer, Berlin, pp. 261–270. [75] M.-P. Béal, F. Mignosi, A. Restivo, M. Sciortino, Forbidden words in symbolic dynamics, Adv. Appl. Math. 25(2) (2000) 163–193. [76] J.-P. Duval, F. Mignosi, A. Restivo, Recurrence and periodicity in inﬁnite words from local periods, Theoret. Comput. Sci. 262(1) (2001) 269–284. [77] S. Mantaci, A. Restivo, Codes and equations on trees, Theoret. Comput. Sci. 255(1–2) (2001) 483–509. [78] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors and fragment assembly, ITA, 35(6) (2001) 565–577. [79] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors and fragment assembly, in: W. Kuich, G. Rozenberg, A. Salomaa (Eds.), Developments in Language Theory 2001, Lecture Notes in Computer Science, Vol. 2295, Springer, Berlin, 2002, pp. 349–358. [80] A. Restivo, S. Salemi, Words and patterns, in: Developments in Language Theory 2001, Lecture Notes in Computer Science, Vol. 2295, Springer, Berlin, 2002, pp. 117–129. [81] A. Restivo, S.R. Della Rocca, L. Roversi (Eds.), Theoret. Comput. Sci. Proceedings of the Seventh Italian Conference, ICTCS 2001, Torino, Italy, 4–6 October, 2001, Lecture Notes in Computer Science, Vol. 2202, Springer, Berlin, 2001. [82] A. Restivo, P.V. Silva, On the lattice of preﬁx codes, Theoret. Comput. Sci. 289(1) (2002) 755–782. [83] F. Mignosi, A. Restivo, M. Sciortino, Words and forbidden factors, Theoret. Comput. Sci. 273(1–2) (2002) 99–117. [84] F. Mignosi, A. Restivo, Vol. 90, Chapter Periodicity, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002.

Preface / Theoretical Computer Science 340 (2005) 179 – 185

185

[85] A. Restivo, S. Salemi, Binary patterns in inﬁnite binary words, in: W. Brauer, H. Ehrig, J. Karhumäki, A. Salomaa (Eds.), Formal and Natural Computing, Lecture Notes in Computer Science, Vol. 2300, Springer, Berlin, 2002, pp. 107–118. [86] F. Mignosi, A. Restivo, P.V. Silva, On Fine and Wilf’s theorem for bidimensional words, Theoret. Comput. Sci. 292(1) (2003) 245–262. [87] M.-P. Béal, M. Crochemore, F. Mignosi,A. Restivo, M. Sciortino, Computing forbidden words of regular languages, Fund. Inform. 56(1–2) (2003) 121–135. [88] S. Mantaci, A. Restivo, M. Sciortino, Burrows-Wheeler transform and Sturmian words, Inform. Process. Lett. 86(5) (2003) 241–246. [89] A. Restivo, P.V. Silva, Periodicity vectors for labelled trees, Discrete Appl. Math. 126(2–3) (2003) 241–260. [90] A. Gabriele, F. Mignosi, A. Restivo, M. Sciortino, Indexing structures for approximate string matching. in: R. Petreschi, G. Persiano, R. Silvestri (Eds.), CIAC, Lecture Notes in Computer Science, Vol. 2653, Springer, Berlin, 2003, pp. 140–151. [91] G. Castiglione, A. Restivo, Reconstruction of I-convex polyominoes, Electron. Notes in Discrete Math. 12 (2003). [92] G. Guaiana, A. Restivo, S. Salemi, On the trace product and some families of languages closed under partial commutations, J. Automat. Lang. Comb. 9(1) (2004) 61–79. [93] G. Castiglione, A. Restivo, S. Salemi, Patterns in words and languages, Discrete Appl. Math. 144(3) (2004) 237–246. [94] G. Castiglione, A. Restivo, Ordering and convex polyominoes, in: M. Margenstern (Ed.), MCU 2004, Lecture Notes in Computer Science, Vol. 3354, Springer, Berlin, 2004. [95] F. Burderi, A. Restivo, Varieties of codes and Kraft inequality, in: V. Diekert, B. Durand (Eds.), STACS 2005, Lecture Notes in Computer Science, Vol. 3404, Springer, Berlin, 2005.

Theoretical Computer Science 340 (2005) 186 – 187 www.elsevier.com/locate/tcs

Editorial

From Antonio’s former students It was December 2004: Antonio’s 60th birthday was approaching and we, as his former students, were wishing very much to dedicate him one page in this volume. Hence, during the Christmas’ holidays, we planned a top secret reunion at the Department of Mathematics in Palermo. We all agreed on one thing: we wished to contribute with a special preface, that would focus on some of Antonio’s aspects that only we, his students, are probably fortunate enough to know. We wished to tell about his kind attitude that cannot be separated from his outstanding scientiﬁc proﬁle. Perhaps this is what makes Antonio a “milestone” for us (this is incredible... Everyone used the same term to “deﬁne” him!): Antonio looks at everybody in front of him with extreme attention to the person and to all his aspects and qualities. This is the way Antonio himself uses to tell us about his “maestro” and friend M.P. Schützenberger or about his best times at the Istituto di Cibernetica in Arco Felice, telling us many funny stories with his peculiar sense of humor and acting capabilities. We all feel that being his students is a great privilege. But, how can we tell people about all the times that Antonio surprises us and provides us with renewed enthusiasm for our research? We all remember the many times when some of us got stuck on a research problem and Antonio would say: “There should be something on this subject in some LITP technical report back in 1987”. Then he looked at several stacks of papers on his desk or his shelves and said: “It should be in this stack”. Then, he started going through this stack of papers and magically took one paper out of it: needless to say, it contained exactly the solution to the problem! Or when, in the biology laboratories, looking at shelves displaying jars with strange insects, seeds or leaves preserved with formalin, we felt a bit uncomfortable with our theorems and counterexamples. Then Antonio used to say: “We should put our examples in a jar, with some labels like Example of language with property X or Thue-Morse word, disproving conjecture Y, so as to ﬁnd them when needed!”. Finally, when the top secret meeting in that December afternoon in Palermo was over, we realized that we had spent most of the time just telling each other our respective nice memories and stories about working with Antonio. We all really had a great time, but we were still left with the problem of writing all those memories in a way that would be really suited to Antonio... Indeed, he does not like “commemorations” at all.

0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.023

Editorial / Theoretical Computer Science 340 (2005) 186 – 187

187

After a while, we all agreed that the best thing would be to write in the simplest way what we all felt: Many thanks Antonio for everything and our hearty wishes for your birthday!!! Marcella Anselmo Marina Madonia Giuseppina Rindone

Dora Giammarresi Sabrina Mantaci Marinella Sciortino

Giovanna Guaiana Filippo Mignosi

Theoretical Computer Science 340 (2005) 188 – 203 www.elsevier.com/locate/tcs

Connections between subwords and certain matrix mappings夡 Arto Salomaa Turku Centre for Computer Science, Academy of Finland, Lemminkäisenkatu 14, 20520 Turku, Finland

Abstract Parikh matrices recently introduced have turned out to be a powerful tool in the arithmetizing of the theory of words. In particular, many inequalities between (scattered) subword occurrences have been obtained as consequences of the properties of the matrices. This paper continues the investigation of Parikh matrices and subword occurrences. In particular, we study certain inequalities, as well as information about subword occurrences sufﬁcient to determine the whole word uniquely. Some algebraic considerations, facts about forbidden subwords, as well as some open problems are also included. © 2005 Elsevier B.V. All rights reserved. Keywords: Parikh matrix; Subword; Scattered subword; Number of subwords; Inference from subsequences; Forbidden subword

1. Introduction The purpose of this paper is to investigate the number of occurrences of a word u as a subword in a word w, in symbols, |w|u . For us the term subword means that w, as a sequence of letters, contains u as a subsequence. More formally, we begin with the following fundamental 夡 The paper is dedicated to Antonio Restivo on the occasion of his 60th birthday. I have been fortunate to meet Antonio every now and then through many decades. I have always found him as a young colleague and friend with very bright ideas. Antonio has also been involved in successful cooperation with the Turku research group. I wish him all the best in the years to come both in science and life. E-mail address: asalomaa@utu.ﬁ.

0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.024

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

189

Deﬁnition 1. A word u is a subword of a word w if there exist words x1 , . . . , xn and y0 , . . . , yn , some of them possibly empty, such that u = x1 . . . xn

and

w = y0 x1 y1 . . . xn yn .

The word u is a factor of w if there are words x and y such that w = xuy. If the word x (resp. y) is empty, then u is also called a preﬁx (resp. sufﬁx) of w. Throughout this paper, we understand subwords and factors in this way. In classical language theory, [13], our subwords are usually called “scattered subwords”, whereas our factors are called “subwords”. The notation used throughout the article is |w|u , the number of occurrences of the word u as a subword of the word w. Two occurrences are considered different if they differ by at least one position of some letter. (Formally an occurrence can be viewed as a vector of length |u| whose components indicate the positions of the different letters of u in w.) Clearly, |w|u = 0 if |w| < |u|. We also make the convention that, for any w and the empty word , |w| = 1. In [14] the number |w|u is denoted as a “binomial coefﬁcient” w |w|u = . u If w and u are words over a one-letter alphabet, w = ai ,

u = aj ,

then |w|u equals the ordinary binomial coefﬁcient: |w|u = ji . Our convention concerning the empty word reduces to the fact that 0i = 1. (The convention is made also in [3,14].) Assume that is an alphabet containing the letters a and b. A little reﬂection shows that, for any word w, (|w|a ) × (|w|b ) = |w|ab + |w|ba . This simple equation can be viewed as a general fact about occurrences of subwords. It is also an instance about the linearization of subword histories investigated in [10]. A slight variation of the equation immediately leads to difﬁculties. No explicit characterization is known for the relation between (|w|u , |w|v ) and (|w|uv , |w|vu ), where u, v, w are arbitrary words. (In general, we use small letters from the beginning of the English alphabet to denote letters of the formal alphabet.) A general problem along these lines is the following. What numbers |w|u sufﬁce to determine the word w uniquely? For instance, a word w ∈ {a, b}∗ is uniquely determined by the values |w|a = |w|b = 4, |w|ab = 15.

190

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

Indeed, w = a 3 bab3 . On the other hand, a word w ∈ {a, b}∗ of length 4 is not uniquely determined by the values |w|u , |u| 2. Either one of the words abba and baab can be chosen as w, and still the equations |w|a = |w|b = |w|ab = |w|ba = 2, |w|aa = |w|bb = 1 are satisﬁed. A powerful tool for such problems is the notion of a Parikh matrix. The rest of this paper deals with this notion. The Parikh matrix associated to a word w tells the numbers |w|u for certain speciﬁc words u. The original notion of a Parikh matrix was introduced in [9]. When dealing with the extended notion, [17], one has more leeway in the choice of the words u.

2. Parikh matrices Parikh mappings (vectors) introduced in [12] express properties of words as numerical properties of vectors yielding some fundamental language-theoretic consequences, [13,5]. Much information is lost in the transition from a word to a vector. A sharpening of the Parikh mapping, where more information is preserved than in the original Parikh mapping, was introduced in [9]. The new mapping uses upper triangular square matrices, with nonnegative integer entries, 1’s on the main diagonal and 0’s below it. Two words with the same Parikh matrix always have the same Parikh vector, but two words with the same Parikh vector have in general different Parikh matrices. Thus, the Parikh matrix gives more information about a word than a Parikh vector. The set of all triangular matrices described above is denoted by M, and the subset of all matrices of dimension k 1 is denoted by Mk . We are now ready to introduce the original notion of a Parikh matrix mapping. Deﬁnition 2. Let = {a1 , . . . , ak } be an alphabet. The Parikh matrix mapping, denoted k , is the morphism:

k : ∗ → Mk+1 , deﬁned by the condition: if k (aq ) = (mi,j )1 i,j (k+1) , then for each 1 i (k + 1), mi,i = 1, mq,q+1 = 1, all other elements of the matrix k (aq ) being 0. Observe that when deﬁning the Parikh matrix mapping we have, similarly as when deﬁning the Parikh vector, in mind a speciﬁc ordering of the alphabet. Knowledge of the Parikh matrices for different orderings of the alphabet will increase our knowledge of the word in question. If we consider letters without numerical indices, we assume the alphabetic ordering in the deﬁnition of Parikh matrices. The Parikh matrix mapping is not injective even for the alphabet {a, b}. For instance, consider the matrices     1 4 6 1 5 8  0 1 3  and  0 1 3  . 0 0 1 0 0 1

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

191

Then the ﬁve words baabaab, baaabba, abbaaab, abababa, aabbbaa are exactly the ones having the ﬁrst matrix as the Parikh matrix. Similarly, the six words aababbaa, aabbaaba, abaababa, baaaabba, ababaaab, baaabaab are exactly the ones having the second matrix as the Parikh matrix. This example becomes clearer in view of the following theorem, [9], where the entries of the Parikh matrix are characterized. For the alphabet = {a1 , . . . , ak }, we denote by ai,j the word ai ai+1 . . . aj , where 1 i j k. Theorem 1. Consider = {a1 , . . . , ak } and w ∈ ∗ . The matrix k (w) = (mi,j )1 i,j (k+1) , has the following properties: • mi,j = 0, for all 1 j < i (k + 1), • mi,i = 1, for all 1 i (k + 1), • mi,j +1 = |w|ai,j , for all 1 i j k. By the second diagonal (and similarly the third diagonal, etc.) of a matrix in Mk+1 , we mean the diagonal of length k immediately above the main diagonal. (The diagonals from the third on are shorter.) Theorem 1 tells that the second diagonal of the Parikh matrix of w gives the Parikh vector of w. The next diagonals give information about the order of letters in w by indicating the numbers |w|u for certain speciﬁc words u. Properties of Parikh matrices, notably the unambiguity of Parikh matrix mappings, have been investigated in [4,7–10,15,16]. For any word w over the alphabet {a, b, c, d}, Theorem 1 implies that   1 |w|a |w|ab |w|abc |w|abcd 0 1 |w|b |w|bc |w|bcd    0 0 1 |w|c |w|cd  4 (w) =   . 0 0 0 1 |w|d  0 0 0 0 1 The problem of deciding whether or not a given matrix is a Parikh matrix is discussed in [8]. No nice general criterion is known. However, the following theorem, [8], characterizes exhaustively the entries in the second and third diagonals of a Parikh matrix. Theorem 2. Arbitrary nonnegative integers may appear on the second diagonal of a Parikh matrix. Arbitrary integers mi,i+2 , 1 i k − 1, satisfying the condition 0 mi,i+2 mi,i+1 mi+1,i+2 (but no others) may appear on the third diagonal of a (k + 1)-dimensional Parikh matrix. Theorem 2 gives a complete characterization of Parikh matrices over binary alphabets, since in this case no further diagonals are present. In the general case, starting with arbitrary second and third diagonals satisfying the conditions of Theorem 2, the matrix can be completed to a Parikh matrix in at least one way.

192

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

We will now introduce the generalized notion of a Parikh matrix due to [17]. We ﬁrst recall the deﬁnition of the “Kronecker delta”. For letters a and b,

a,b =

1 if a = b, 0 if a = b.

Deﬁnition 3. Let u = b1 . . . bk be a word, where each bi , 1 i k, is a letter of the alphabet . The Parikh matrix mapping with respect to u, denoted u , is the morphism:

u : ∗ → Mk+1 , deﬁned, for a ∈ , by the condition: if u (a) = Mu (a) = (mi,j )1 i,j (k+1) , then for each 1 i (k + 1), mi,i = 1, and for each 1 i k, mi,i+1 = a,bi , all other elements of the matrix Mu (a) being 0. Matrices of the form u (w), w ∈ ∗ , are referred to as generalized Parikh matrices. Thus, the Parikh matrix Mu (w) associated to a word w is obtained by multiplying the matrices Mu (a) associated to the letters a of w, in the order in which the letters appear in w. The above deﬁnition implies that if a letter a does not occur in u, then the matrix Mu (a) is the identity matrix. For instance, if u = baac, then 

1 0  Mu (a) =  0 0 0

0 1 0 0 0

0 1 1 0 0

0 0 1 1 0

 0 0  0 . 0 1

1 1 0 0 0

0 0 1 0 0

0 0 0 1 0

 0 0  0 , 0 1

Similarly, 

1 0  Mu (b) =  0 0 0



1 0  Mu (c) =  0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

 0 0  0 . 1 1

In the original deﬁnition of a Parikh matrix, [9], the word u was chosen to be u = a1 . . . ak , for the alphabet = {a1 , . . . , ak }. In the general setup, the essential contents of Theorem 1 can be formulated as follows. For 1 i j k, denote Ui,j = bi . . . bj . Denote the entries of the matrix Mu (w) by mi,j . Theorem 3. For all i and j , 1 i j k, we have mi,1+j = |w|Ui,j .

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

193

Going back to our example u = baac, we infer from Theorem 3 that, for any word w,   1 |w|b |w|ba |w|baa |w|baac 0 1 |w|a |w|aa |w|aac    0 0 1 |w|a |w|ac  Mu (w) =   . 0 0 0 1 |w|c  0 0 0 0 1 For w = a 3 c3 bac2 ac we get  1 1 2 0 1 5  Mu (w) =  0 0 1 0 0 0 0 0 0

 1 1 10 31   5 22  . 1 6  0 1

3. Matrix-deducible inequalities We begin with the following theorem. It concerns the occurrences of subwords of a certain general type. We consider decompositions xyz, and the occurrences of xyz, y, xy, and yz as subwords in an arbitrary w. Theorem 4. The inequality |w|xyz |w|y |w|xy |w|yz holds for arbitrary words w, x, y, z. Theorem 4 is due to [10]. A direct combinatorial proof is given also in [15]. The result can be obtained also using the following lemma, [10,17]. As in the preceding section, we denote by Mu (w) an arbitrary generalized Parikh matrix. Lemma 1. The value of any minor of the matrix Mu (w) is a nonnegative integer. The inequality presented in Theorem 4 is referred to as the Cauchy inequality for words. It can be claimed to be a fundamental property of words, because of its generality and because it reduces to equality in a great variety of cases. The choice for the name of the inequality is motivated by the resemblance to the well-known algebraic Cauchy inequality for real numbers and also by the methods used in the proof. The reader is referred to [10] for further details. No general theory exists concerning the cases when the Cauchy inequality actually reduces to an equality. We now present some considerations in this direction. We begin with a simple example. Consider the words w = a i1 bj1 ck1 ,

x = a i2 ,

y = bj2 ,

z = ck2 . (As usual, a, b, c stand for letters.) Clearly, |w|y = jj21 . Straightforward calculations show that 2 j1 k1 i1 = |w|xy |w|yz . |w|y |w|xyz = i2 j2 k2

194

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

For instance, the setup w = a 4 b4 c4 ,

x = a,

y = b,

z = c2

yields the value 384 for both sides of the equation. In general, if w = x1 y1 z1 ,

|w|x = |x1 |x = m,

|w|y = |y1 |y = n,

|w|z = |z1 |z = p,

then both sides of the Cauchy inequality equal mn2p and, thus, the inequality is not proper. Consider words over a one-letter alphabet. If the words w, x, y, z are of lengths n, i, j, k, respectively, then the inequality assumes the form n n n n , i+j j +k j i+j +k which is easily veriﬁed to be true. Here we have an equality exactly in case i = 0 or k = 0. Assume that y = a i bj a k ,

x = a i1 ,

z = a k1

and w = a i+i1 +i bj +j a k+k1 +k . Then again it is easy to verify that the inequality is not proper. More general results can be obtained using the linearization of subword histories presented in [10]. Consider the equation (|w|a ) × (|w|b ) = |w|ab + |w|ba mentioned in Section 1, valid for any word w and letters a and b. According to the terminology introduced in [10], we speak of the subword history a × b − ab − ba in the word w, deﬁned by the equation SH(w, a × b − ab − ba) = (|w|a ) × (|w|b ) − |w|ab − |w|ba . Thus, our simple equation tells us that, for any word w, SH(w, a × b − ab − ba) = 0. Secondly, our equation can be written in the form SH(w, a × b) = SH(w, ab + ba). In other words, independently of w, the subword history a × b assumes the same value as the subword history ab + ba in w. In such a case we say that the two subword histories are equivalent. Our equation shows also how a particular subword history involving the operation × possesses an equivalent linear subword history, that is, an equivalent subword history not involving the operation ×. It was established in [10] that this holds true in general: the operation × can be eliminated from all subword histories. The proof uses the

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

shufﬂe u

v of two words u and v. By deﬁnition, u

u0 v0 u1 v1 . . . uk vk ,

195

v consists of all words

where k 0, ui , vi ∈ ∗ for 0 i k, and

u = u0 . . . uk , v = v0 . . . vk . It is fairly straightforward to prove

that if u and v are words over disjoint alphabets, then the subword histories u × v and x∈u v x are equivalent. This result forms the basis of the general linearization technique: for arbitrary u and v, one ﬁrst provides the letters of v with primes, forcing the two words to be over disjoint alphabets. One then forms the shufﬂe, arguing at the same time with “reduced” words and multiplicities. For instance, 2abba + abab + baba + 2baab + aba + bab is a linear subword history equivalent to ab × ba. The following example is more sophisticated. Consider the special case A = (ab) × (aabb) (aab) × (abb) = B. of the Cauchy inequality. The equivalent linear subword histories are in this case: A=aba2 b2 +4a 2 bab2 +9a 3 b3 +a 2 b2 ab+abab2 +a 2 bab+6a 2 b3 +6a 3 b2 +4a 2 b2 , the linear subword history equivalent to B being obtained by adding a 2 bab2 +a 2 b2 ab+aba2 b2 +ababab+ab2 a 2 b+2abab2 +a 2 bab+ab2 ab+abab to A. This gives the following conclusion. (The result can be inferred without reference to Theorem 4.) For any word w, we have |w|ab |w|aabb |w|aab |w|abb . The equality holds exactly in case w does not contain the subword abab (and the right side is nonzero). The same argument is applicable for more general words. Consider the inequality |w|xyz |w|y |w|xy |w|yz , where x = a m , y = ab, z = bn , m, n 1. By analyzing the linear subword histories arising from the two sides of the inequality, we see that every term on the left side gives rise to a unique term on the right side and, moreover, the eventual additional terms on the right side all possess the subword abab. Thus, we obtain the following result. Theorem 5. The inequality |w|ab |w|a m bn |w|a m b |w|abn ,

m, n 2,

holds for all words w and is strict exactly in case w contains the subword abab (and the right side is nonzero). Numerous inequalities can be deduced from Lemma 1, by Theorem 1 or Theorem 3. The following general result is along these lines.

196

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

Theorem 6. Let k 1 and let w, arbitrary minor of the matrix  1 |w|x1 |w|x1 x2 0 1 |w|x2   .. . M=   ..  . 0 ...

x1 , . . . , xk be arbitrary words. Further, let Mdet be an ... |w|x1 ...xk ... |w|x2 ...xk .. . .. . |w|xk ... 0 1

    .   

Then Mdet 0. For instance, the subsequent inequalities are obtained by Theorem 6. The letters u, w, x, y, z, y1 , . . . , yn stand for arbitrary words. A suitable combination of the inequalities gives (partial) results about the cases when the inequality is strict. However, a general theory is missing. |w|xy |w|y |w|xyz |w|y1 . . . |w|yn |w|xy1 ...yn z |w|x |w|yz + |w|xy |w|z

|w|x |w|y , |w|xy |w|yz , (Cauchy inequality), |w|xy1 |w|y1 y2 . . . |w|yn z , |w|xyz + |w|x |w|y |w|z ,

|w|yz |w|xyzu + |w|xy |w|z |w|yzu + |w|y |w|xyz |w|zu |w|xy |w|yz |w|zu + |w|y |w|z |w|xyzu + |w|xyz |w|yzu , |w|x |w|y |w|zu + |w|x |w|yz |w|u + |w|xy |w|z |w|u + |w|xyzu |w|x |w|yzu + |w|xy |w|zu + |w|xyz |w|u + |w|x |w|y |w|z |w|u .

4. Sufﬁcient conditions for complete inference A very central problem concerning words, also important in numerous applications, is to ﬁnd some elements (factors, subwords, etc.) of words that characterize the word so that, instead of the word itself, it sufﬁces to investigate the elements. For instance, one might be able to characterize a word in terms of some speciﬁc factors or subwords. Here the characterization can be total or partial: the elements considered may determine the word uniquely or only to a certain extent. A characterization in terms of factors, optimal in a speciﬁc sense, was given in [2]. Here we consider characterizations in terms of subwords. A general problem is the following. What numbers |w|u sufﬁce to determine the word w uniquely? In addressing the general problem, one should specify a class of subwords u such that the values |w|u , where u ranges over this class, determine w uniquely. Such a class could consist of all words of at most a given length. Indeed, a notion often mentioned but not much investigated in the literature, [1,6,13,15], is that of a t-spectrum. For a ﬁxed t 1, the t-spectrum of a word w tells all the values |w|u , where |u| t. Following the notation of

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

197

formal power series, [5], the t-spectrum of a word w in ∗ can be viewed as a polynomial in N0 < ∗ > of degree t. For instance, the polynomial aa + bb + 2ab + 2ba + 2a + 2b is the 2-spectrum of the word abba, as well as of the word baab. In general, one can deﬁne the function (t) as the maximal length such that any word of length (t) is uniquely determined by its t-spectrum. See [15] for other details. The function (t) is discussed in detail in [3], where the original formulation of the problems is credited to L.O. Kalashnik. For instance, the two different words abbaaab, baaabba (resp. ab2 a 3 ba2 b2 a, ba3 bab2 a 3 b) have the same 3-spectrum (resp. 4-spectrum), and are both of length 7 (resp. 12), [6]. This shows that

(3) 6, (4) 11. Perhaps one should not always consider subwords of the same length and take all of them. Sometimes very few words (of different lengths) determine the word uniquely. Consider words w over the alphabet {a, b}. We will now show how w can be uniquely inferred from certain values |w|u . A good choice for the words u is the sequence abi , i = 0, 1, 2, . . . . Indeed, as shown in the following Lemma, the word w can be uniquely inferred from its Parikh vector (r, s) and the numbers |w|abi , 1 i min(r, s). Lemma 2. Assume that w and w are words over the alphabet {a, b} with the same Parikh vector (r, s) and that |w|abi = |w |abi ,

1 i min(r, s).

Then w = w . Proof. Recall that the Parikh vector of a word w is the vector (|w|a , |w|b ). Notice that under our hypotheses one has |w|abi = |w |abi , 1 i r. Indeed, this is trivial if r s while if s < r, then |w|abi = |w |abi = 0 for s + 1 i r. Thus, in order to prove the statement, it is sufﬁcient to show that the numbers r, s and |w|abi ,

1 i r,

determine the word w uniquely. Consider the r occurrences of the letter a in w, and denote by xi , 1 i r, the number of occurrences of b to the right of a particular occurrence of a, when the occurrences of a are counted from left to right. Thus, s x1 x2 . . . xr 0.

(1)

198

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

Denote |w|abi = i , 1 i r. We obtain the following system of equations: r

xi = j , j = 1, . . . , r. j i=1 (This follows because, for instance, each subword occurrence of ab2 in w is obtained by taking the ith occurrence of a, for some i where 1 i r, and an arbitrary pair of the xi occurrences of b to the right of this a.) When the binomial coefﬁcients are written out as polynomials, the system of equations takes the form r

i=1

j

xi = Pj (1 , . . . , j ),

j = 1, . . . , r,

where each Pj is a linear polynomial with positive integer coefﬁcients. (The latter can be given explicitly but this is irrelevant for our purposes.) For instance, we obtain for r = 4 : x1 + x2 + x3 + x4 = 1 , x12 + x22 + x32 + x42 = 22 + 1 , x13 + x23 + x33 + x43 = 63 + 62 + 1 , x14 + x24 + x34 + x44 = 244 + 363 + 142 + 1 . It is well known that this system has a unique unordered solution (on the complex ﬁeld) which is given by the roots of a suitable polynomial of degree r. This is, indeed, a straightforward consequence of Newton–Girard formulas relating the coefﬁcients of a polynomial and the sums of the powers of its roots. We derive that there is at most one ordered solution (x1 , . . . , xr ) where xi , 1 i r, are integers satisfying (1). Finally, the word w is uniquely inferred from the numbers xi and s. For instance, the values |w|a = 4, |w|b = 11, |w|ab4 = 128

|w|ab = 18,

|w|ab2 = 48,

|w|ab3 = 92,

yield the (unique) word w = b2 ab5 a 2 b3 ab. This concludes the proof.

Lemma 3. The statement of Lemma 2 holds true if the sequence abi , 1 i min(r, s), is replaced by any of the three sequences a i b,

bai ,

bi a,

1 i min(r, s).

Proof. The claim concerning the sequence ba i follows from Lemma 2, by interchanging the letters a and b. Consider the sequence a i b. Clearly, |w|a i b = |mi(w)|ba i , where mi(w) is the mirror image of w. Thus, mi(w) and, therefore, also w is uniquely determined by the given numerical values. Finally, the claim concerning bi a follows again by interchanging a and b. Theorem 7. For any integer l, a word w of length l over the alphabet {a, b} can be uniquely inferred from at most [l/2] + 2 speciﬁc values |w|u .

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

199

Proof. The result follows directly from Lemma 2 (or from Lemma 3), because min(r, s) [l/2]. For instance, the values |w|a ,

|w|b , |w|ab , |w|ab2

determine uniquely a word w of length 5. The result is optimal in the sense that no three among the values sufﬁce for the same purpose. The 5002 values |w|a ,

|w|b ,

|w|abi ,

1 i 5000,

determine uniquely a word w of length 104 . The 12-spectrum of a word consists of somewhat more values but, according to [3], only words of length less that 600 are uniquely determined by it. To infer uniquely words of length 104 , the 18-spectrum is needed, [3]. In the consideration of spectra, attention may be restricted to binary alphabets, [3]. The situation is different if one just wants to have a “good” set of values |w|u for the inference of w. If the alphabet is bigger than binary, one may consider the letters pairwise or try some direct approach. In any case, one has to extend the results such as Lemmas 2 and 3. Some results about the injectivity of Parikh matrix mappings have been presented in [4,8,16]. The above considerations can be used to establish an injectivity result for generalized Parikh matrix mappings. We base our discussion on Lemma 2; Lemma 3 yields analogous results. Consider the generalized Parikh matrix mapping (over the alphabet {a, b})

= u ,

where u = abt , t 1.

Thus, the matrices (w) are (t + 2)-dimensional. In the matrix (a) the only nonzero entry above the main diagonal is the entry (1, 2), whereas in the matrix (b) all entries (j, j + 1), 2 j t + 1, equal 1. By Theorem 3, we have for an arbitrary word w:   |w|abt 1 |w|a |w|ab . . . 0 1 |w|b ... |w|bt      . . .. .. . (w) =      .. ..  . . |w|b  0 ... ... 0 1 Observe also that, for any word w, the value |w|b determines uniquely all values |w|bi , i 1. Hence, the following result is a consequence of Lemma 2 and Theorem 3. Theorem 8. If the equation (w) = (w ) holds for different words w and w , then |w| = |w | > 2t. Theorem 8 gives a numerical characterization of binary words in terms of matrices. It can be extended to arbitrary words by considering the letters pairwise. However, this method is not very efﬁcient. It is likely that there are better direct ways for the characterization.

200

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

5. Forbidden subwords Forbidden factors of words and inﬁnite words have been widely investigated. A forbidden factor of a word w is simply a word that does not occur as a factor of w. Forbidden factors are sometimes of fundamental importance in determining the structure of the word itself. A word u is a minimal forbidden factor of a word w if u is a forbidden factor of w but all proper factors of u are factors of w. This notion has relevant connections with automata theory, text compression and symbolic dynamics. The reader is referred to [11] and the references given therein. Analogous notions can be deﬁned for subwords as well. Deﬁnition 4. A word u is a forbidden subword of w if |w|u = 0. A forbidden subword u of w is minimal if all proper factors v of u satisfy |w|v > 0. The purpose of this section is only to point out a direct connection between (minimal) forbidden subwords and generalized Parikh matrices. We hope to return to forbidden subwords in another contribution. Theorem 9. A word u is a forbidden subword of a word w if and only if the entry in the upper right corner of the generalized Parikh matrix Mu (w) equals 0. A forbidden subword u of w is minimal exactly in case all other entries above the main diagonal in Mu (w) are positive. Theorem 9 follows by the deﬁnitions and Theorem 3. For instance, consider w = baababb, Then



1 0  Mu (w) =  0 0 0

u = abba.

3 1 0 0 0

8 4 1 0 0

7 6 4 1 0

 0 1  4 , 3 1

showing that u is a minimal forbidden subword of w. Observe that u is a minimal forbidden subword of w also under the following modiﬁed deﬁnition: A forbidden subword u of w is minimal if all proper subwords of u are also subwords of w. Minimality under this modiﬁed sense cannot be directly characterized by generalized Parikh matrices. A transposition in w, resulting in w = baabbab, gives the matrix   1 3 7 6 2 0 1 4 6 3    Mu (w) =  0 0 1 4 5 0 0 0 1 3 0 0 0 0 1 and, thus, u is not forbidden.

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

201

The deﬁning condition for a forbidden subword, |w|u = 0, concerns estimations of the number |w|u . The inequalities discussed in Section 3 yield readily such estimations. For instance, the following result is a consequence of Theorem 4. Lemma 4. For any words w, u, x, y, z, where u = xyz, we have |w|u

|w|xy |w|yz . |w|y

6. Conclusion. Open problems Various algebraic considerations concerning Parikh matrices have been presented in the literature, see for instance [7]. Parikh matrices are not closed under the ordinary addition of matrices. A special operation was introduced for matrices in Mk in [7]. The entries above entries in the main diagonal in the matrix M1 M2 are obtained from the corresponding M1 and M2 by addition. (Thus, the main diagonal of the matrix M1 M2 consists of 1’s.) If we are dealing with binary alphabets, then Theorem 2 implies that the “sum” M1 M2 of two Parikh matricesM1 and M2 is again a Parikh matrix. The same conclusion hold for the “product” M1 M2 of two Parikh matrices M1 and M2 , deﬁned by entry-wise multiplication. Indeed, if in both M1 and M2 the only element of the third diagonal is within the bounds allowed in Theorem 2, the same holds true with respect to the corresponding element in M1 M2 . Thus, we obtain the following result. Theorem 10. Parikh matrices over the alphabet {a, b} constitute a commutative semiring with identity, with respect to the operators and . If the alphabet consists of three or more letters, then M1 M2 is not necessarily a Parikh matrix for Parikh matrices M1 and M2 . As pointed out in [7], the matrix M1 M2 is not a Parikh matrix if M1 and M2 are the Parikh matrices resulting from the words abc and b, respectively. As regards the operation , it is not easy to ﬁnd similar examples. Indeed, it is an open problem whether or not the set of Parikh matrices is closed under . The matrices satisfying the property of Lemma 1 are closed under . (In other words, if every minor of the matrices M1 and M2 is a nonnegative integer, the same holds true for the matrix M1 M2 .) However, not every matrix (in M) having this property is a Parikh matrix, [10,8]. Problems concerning the operation belong to the more general problem area concerning suitable algebraic operations for Parikh matrices. For instance, would the Kronecker product of matrices suit for some characterizations? Properly chosen algebraic operations might contribute signiﬁcantly to the general characterization and injectivity problems of Parikh matrices, [4,10,8,16]. We conclude by mentioning some other open problems. Finding numerical values, such as in Lemmas 2 and 3, from which a word can be uniquely inferred is a problem area of considerable practical signiﬁcance, [3,6,15]. What is a minimal or otherwise optimal set of such values? Our considerations above deal with binary alphabets. In the general case one can of course consider the letters pairwise, but a more direct approach is called for.

202

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

A number of open problems relate Parikh matrices with languages. Given a language L ⊆ ∗ , we denote by M(L) the set of Parikh matrices associated to the words in L. Is the equation M(L1 ) = M(L2 ) decidable when L1 and L2 come from a speciﬁc language family? This problem is open even for regular languages. Related problems are mentioned in [8]. One can also specify an alphabet and some values |w|u , and study the set of words w, where each of these values is met. For instance, the regular language b∗ (a 3 b+ab3 +abab)a ∗ results from the value |w|ab = 3, whereas the language (ba 3 b + abab)a ∗ results from the combination of the values |w|ab = 3 and |w|b = 2. The combination of the values |w|aba = 1 and |w|babab6 = 5 yields the unique word b5 abab6 . The conditions |w|a = |w|b = |w|ab = i,

for some i 1,

lead to a rather involved noncontext-free language. From ﬁnitely many conditions always a regular language results. Thus, if we ﬁx values for arbitrary entries in a (generalized) Parikh matrix, then the set of all those words whose Parikh matrix has the ﬁxed values in the corresponding entries is regular. Inﬁnite languages are obtained by leaving open some entries in the second diagonal. Subword histories were considered above in Section 3. The equality problem, that is, the problem of deciding whether two subword histories assume the same value for all words, was settled in [10]. The corresponding inequality problem is open: given two subword histories SH 1 and SH 2 , is the value of SH 1 for an arbitrary word w less than or equal to that of SH 2 ? For instance, baab bab + baaab because |w|baab |w|bab + |w|baaab holds for all w. In the general case it is not even known whether the problem is decidable. The case of one-letter alphabets is easy to settle. By [10], the attention may be restricted to linear subword histories. One can also show that the inequality u v holds between two “monomial” subword histories u and v only in case u = v. Acknowledgements The author is grateful to the referee for useful suggestions.

References [1] [2] [3] [4]

J. Berstel, J. Karhumäki, Combinatorics on words—a tutorial, EATCS Bull. 79 (2003) 178–228. A. Carpi, A. De Luca, Words and special factors, Theoret. Comput. Sci. 259 (2001) 145–182. M. Dudik, L.J. Schulman, Reconstruction from subsequences, J. Combin. Theory A 103 (2002) 337–348. S. Fossé, G. Richomme, Some characterizations of Parikh matrix equivalent binary words, Inform. Proc. Lett. 92 (2004) 77–82. [5] W. Kuich, A. Salomaa, Semirings Automata Languages, Springer, Berlin, Heidelberg, New York, 1986.

A. Salomaa / Theoretical Computer Science 340 (2005) 188 – 203

203

[6] J. Ma˘nuch, Characterization of a word by its subwords, in: G. Rozenberg, W. Thomas (Eds.), Developments in Language Theory, World Scientiﬁc Publ. Co., Singapore, 2000, pp. 210–219. [7] A. Mateescu, Algebraic aspects of Parikh matrices, in: J. Karhumäki, H. Maurer, G. P˘aun, G. Rozenberg (Eds.), Theory is Forever, Springer, Berlin, 2004, pp. 170–180. [8] A. Mateescu, A. Salomaa, Matrix indicators for subword occurrences and ambiguity, Int. J. Found. Comput. Sci. 15 (2004) 277–292. [9] A. Mateescu, A. Salomaa, K. Salomaa, S. Yu, A sharpening of the Parikh mapping, Theoret. Inform. Appl. 35 (2001) 551–564. [10] A. Mateescu, A. Salomaa, S. Yu, Subword histories and Parikh matrices, J. Comput. Systems Sci. 68 (2004) 1–21. [11] F. Mignosi, A. Restivo, M. Sciortino, Forbidden factors in ﬁnite and inﬁnite words, in: J. Karhumäki, H. Maurer, G. P˘aun, G. Rozenberg (Eds.), Jewels are Forever, Springer, Berlin, 1999, pp. 339–350. [12] R.J. Parikh, On context-free languages, J. Assoc. Comput. Mach. 13 (1966) 570–581. [13] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages 1–3, Springer, Berlin, Heidelberg, New York, 1997. [14] J. Sakarovitch, I. Simon, Subwords, in: M. Lothaire (Ed.), Combinatorics on Words, Addison-Wesley, Reading, MA, 1983, pp. 105–142. [15] A. Salomaa, Counting (scattered) subwords, EATCS Bull. 81 (2003) 165–179. [16] A. Salomaa, On the injectivity of Parikh matrix mappings, Fund. Inform. 64 (2005) 391–404. [17] T.-F. Serb˘ ¸ anu¸ta˘ , Extending Parikh matrices, Theoret. Comput. Sci. 310 (2004) 233–246.

Theoretical Computer Science 340 (2005) 204 – 219 www.elsevier.com/locate/tcs

Words derivated from Sturmian words Isabel M. Araújoa,1 , Véronique Bruyèreb,∗ a Departamento de Matemática, Universidade de Évora Rua Romão Ramalho, 59, 7000-671 Évora, Portugal b Institut d’Informatique, Université de Mons-Hainaut, Le Pentagone av. du Champ de Mars,

6, 7000 Mons, Belgium

Abstract A return word of a factor of a Sturmian word starts at an occurrence of that factor and ends exactly before its next occurrence. Derivated words encode the unique decomposition of a word in terms of return words. Vuillon has proved that each factor of a Sturmian word has exactly two return words. We determine these two return words, as well as their ﬁrst occurrence, for the preﬁxes of characteristic Sturmian words. We then characterize words derivated from a characteristic Sturmian word and give their precise form. Finally, we apply our results to obtain a new proof of the characterization of characteristic Sturmian words which are ﬁxed points of morphisms. © 2005 Elsevier B.V. All rights reserved. Keywords: Combinatorics on words; Sturmian words; Return words

1. Introduction The concepts of return words and derivated words were introduced by Durand in [9]. Given a Sturmian word x, a return word of a factor w of x, is a word that starts at an occurrence of w in x and ends exactly before the next occurrence of w. Derivated words encode the unique decomposition of a word in terms of its return words. In [14], Vuillon characterized Sturmian words in terms of their return words by showing that an inﬁnite word is Sturmian if and only if each non-empty factor w of x has exactly ∗ Corresponding author.

E-mail addresses: [email protected] (I.M. Araújo), [email protected] (V. Bruyère). 1 Also at Centro de Álgebra da Universidade de Lisboa, Avenida Professor Gama Pinto, 2, 1649-003 Lisboa,

Portugal. 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.020

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

205

two distinct return words. In [2], the authors considered the shortest of those return words and its ﬁrst occurrence. That permitted to answer negatively a question posed by Michaux and Villemaire in [13]. In this paper, we are interested in both return words as well as in the associated derivated word. Thus, in Section 2, we introduce the classes of Sturmian words and characteristic Sturmian words. In Section 3, we give the exact form of the return words of preﬁxes of characteristic Sturmian words together with their ﬁrst occurrence. That allows us to fully characterize derivated words of characteristic Sturmian words, which we do in Section 4. Finally, in Section 5, we apply our results to obtain an alternative proof for the characterization of characteristic Sturmian words which are ﬁxed points of morphisms given in [8]. 2. Sturmian words An inﬁnite word x is Sturmian if the number of distinct factors of length n is exactly n+1. The function px : N → N such that px (n) is the number of distinct factors of x of length n is called the complexity of the inﬁnite word x. It is well known that any non-ultimately periodic word satisﬁes px (n) n + 1, for all n ∈ N; in this sense Sturmian words are words of minimal complexity among inﬁnite non-ultimately periodic words. It is clear from the deﬁnition of Sturmian word that any Sturmian word is necessarily binary. Moreover, all words considered in this paper are binary. There is a vast amount of literature on Sturmian words and their study is an active area of research. Both Chapter 2 in [12] and the survey [4] are comprehensive introductions to Sturmian words and contain many references to recent works. Allouche and Shallit’s recent book [1] also contains two chapters on the subject. We now deﬁne the subclass of characteristic Sturmian words. For an irrational ∈]0, 1[ we deﬁne a sequence (tn )n of ﬁnite words by t0 = 0,

t1 = 0a1 1,

an tn = tn−1 tn−2

(n 2),

where [0, a1 + 1, a2 , . . .] is the continued fraction expansion of (a1 0 and ai 1 for i 2). It is also usual to consider t−1 = 1, which permits to write t1 = t0a1 t−1 . We then deﬁne the inﬁnite word f = lim tn n→∞

which is called the characteristic Sturmian word of slope . The sequence (tn )n is called the associated standard sequence of f . To each characteristic Sturmian word we may associate the sequence (qn )n of the lengths of the words tn of the above given sequence. Clearly (qn )n is given by q0 = 1,

q1 = a1 + 1,

qn = an qn−1 + qn−2

(n 2).

Any characteristic Sturmian word is indeed a Sturmian word. This fact is a consequence of the study of Sturmian words as mechanical words (see [12, Chapter 2]). It can also be proved within this context (see [12, Proposition 2.1.18]), that every Sturmian word has the same set of factors as a well chosen characteristic Sturmian word. Notice that any tn is a

206

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

preﬁx of both tm , for m n 1, and of f . On the other hand, if a1 = 0, then t0 = 0 is not a preﬁx of neither tn , for n 1, nor f . A pair of ﬁnite words (u, v) is standard if there is a ﬁnite sequence of pairs of words (0, 1) = (u0 , v0 ), (u1 , v1 ), . . . , (uk , vk ) = (u, v) such that for each i ∈ {1, . . . , k}, either ui = vi−1 ui−1 and vi = vi−1 , or ui = ui−1 and vi = ui−1 vi−1 . An unordered standard pair is a set {u, v} such that either (u, v) or (v, u) is a standard pair. A word of a standard pair is called a standard word. Any standard word is primitive (see [12, Proposition 2.2.3]), any word in a standard sequence (tn )n is a standard word and every standard word occurs in some standard sequence (see [12, Section 2.2.2]). A factor u of a word x is left (respectively right) special if 0u and 1u (respectively u0 and u1) are factors of x; it is bispecial if it is both left and right special. It is easy to see that a word x is Sturmian if and only if it has only one left (respectively right) special factor of each length. For a characteristic Sturmian word f , the set of left special factors is its set of preﬁxes, and its set of right special factors is the set of reversal of preﬁxes (see [12, Section 2.1.3]). Moreover, the bispecial factors of a characteristic Sturmian word f are the preﬁxes of f which are palindromes. The next lemma lists some useful facts about characteristic Sturmian words. For a ﬁnite word w of length greater than or equal to 2, we denote by c(w) the word obtained from w by swapping its last two letters. For a non-empty word w, we denote by w the word obtained from w by deleting its last letter. We say that a factor u of w is a strict factor of w if u is neither a preﬁx, nor a sufﬁx, of w. Lemma 1. With the above notation, for any n ∈ N, (a) tn tn−1 = c(tn−1 tn ) and tn tn−1 = tn−1 tn , (b) tn tn−1 is not a strict factor, nor a sufﬁx, of tn tn−1 tn , (c) the preﬁxes of f which are palindromes are the preﬁxes of length aqn + qn−1 − 2, with 1 a an+1 . Proof. A proof of (a) appears in [1], statement (b) is an easy consequence of [2, Lemma 3.8(iv)] and (c) can be found in [6]. 3. Return words Given a non-empty factor w of a Sturmian word x = x0 x1 . . . (where each xi is a letter of x), an integer i is said to be an occurrence of w in x if xi xi+1 . . . xi+|w|−1 = w. For adjacent occurrences i, j , i < j , of w in x the word xi xi+1 . . . xj −1 is said to be a return word of w in x (or, more precisely, the return word of the occurrence i of w in x). That is, a return word of w in x is a word that starts at an occurrence of w in x

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

207

and ends exactly before the next occurrence of w. Note that a return word of w always has w as a preﬁx or is a preﬁx of w. The latter happens when two occurrences of w overlap. Return words were ﬁrst deﬁned by Durand in [9]. Example 2. Consider the characteristic Sturmian word f , where has continued fraction expansion [0, 3, 2]. According to the deﬁnition of the sequence (tn )n , in this case we have t0 = 0, t1 = 001 and t2 = 0010010. Moreover, the word 00100100010010001001000100100100010010 is a preﬁx of f . Let us look for the return words of the factor w = 001. Occurrences of that factor are underlined. The words u = 001 and v = 0010 are the two return words of 001 that we ﬁnd in that preﬁx of f . 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0. u

v

u

v

u

v

u

u

v

(1)

u

In [14], Vuillon shows that an inﬁnite binary word x is Sturmian if and only if each nonempty factor of x has exactly two return words. In [2] we studied the shortest of those return words and its ﬁrst occurrence. That study has permitted, in particular, to answer negatively a question posed by Michaux and Villemaire in [13]. In the next proposition we recall this result from [2] (see Fig. 1). Proposition 3 (Araújo and Bruyère [2, Proposition 3.2]). Let n 2. With the above notation, the shortest return word of a preﬁx w of f of length in the interval In =]qn + qn−1 − 2, qn+1 + qn − 2] is tn , and its ﬁrst occurrence as a return word of w is 0 if |w| qn+1 − 2, an+2 qn+1 otherwise. We are now also interested in the other return word of a preﬁx of a characteristic Sturmian word. The next proposition gives its form and its ﬁrst occurrence (see Fig. 2). Proposition 4. Let n 2 and i ∈ {1, . . . , an+1 }. With the above notation, the longest return word of a preﬁx w of f of length in the interval In,i =]iqn + qn−1 − 2, (i + 1)qn + qn−1 − 2] is tni tn−1 , and its ﬁrst occurrence as a return word of w is (an+1 − i)qn . Remark 5. The interval In considered in Proposition 3 is the union of the intervals In,i , with i ∈ {1, . . . , an+1 }, considered in Proposition 4. In particular, it is clear that qn+1 −2 = an+1 qn + qn−1 − 2. Also, notice that in Proposition 4, when i = an+1 , the longest return word of w is tn+1 and its ﬁrst occurrence is 0. In order to prove Proposition 4 we start by proving two lemmas. The ﬁrst one gives us a special decomposition of a preﬁx of f . The second lemma points out a strategy to prove Proposition 4. Lemma 6. For n 0 and i ∈ {1, . . . , an+1 }, tn+1 tni tn−1 is a preﬁx of f .

208

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

Fig. 1. The shortest return word of a preﬁx of f and its ﬁrst occurrence.

Fig. 2. The longest return word of a preﬁx of f and its ﬁrst occurrence.

Proof. Let n 0 and i ∈ {1, . . . , an+1 }. We have that a

a

a

n+3 n+2 n+3 tn+3 = tn+2 tn+1 = tn+1 tn tn+2

−1

tn+1

is a preﬁx of f . If an+2 > 1, it follows that a

tn+1 tn+1 = tn+1 tn n+1 tn−1 is a preﬁx of f . Thus tn+1 tni tn−1 is a preﬁx of f . If, on the other hand, an+2 = 1, we see a that tn+1 tn tn+1 = tn+1 tn tn n+1 tn−1 is a preﬁx of f , and therefore so is tn+1 tni tn−1 . Lemma 7. Let n 0 and i ∈ {1, . . . , an+1 }. The occurrences of the preﬁxes of f , with lengths in the interval In,i =]iqn + qn−1 − 2, (i + 1)qn + qn−1 − 2], coincide. Moreover, if w and w are two such preﬁxes, and j is an occurrence of w and w in f , then a word u is a return word for the occurrence j of w if and only if it is a return word for the occurrence j of w . Proof. For the ﬁrst part of the proof, it is enough to show that given w = x0 . . . xk−1 and w = x0 . . . xk , with k, k + 1 ∈ In,i , an occurrence of w is an occurrence of w in f . Notice that k < (i + 1)qn + qn−1 − 2. Hence, by Lemma 1(c), w is a preﬁx of f which is not a palindrome. Therefore w is not bispecial and, in particular, it is not right special (recall that w, being a preﬁx of f , is a left special factor). Thus, the only factor of f of length k + 1, which begins by w, is w . Therefore, any occurrence of w in f is an occurrence of w in f .

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

209

Fig. 3. Illustration of occurrences of tni tn in ](an+1 − i)qn , qn+1 [.

The second statement follows immediately from the ﬁrst one and the deﬁnition of return word. Proof of Proposition 4. Let n 2 and i ∈ {1, . . . , an+1 }. By Lemma 7, it is enough to prove the result for the preﬁx of length iqn + qn−1 of f , tni tn−1 . Notice that this length a −i belongs to the interval In,i , since n 2. By Lemma 6, tn+1 tni tn−1 = tn n+1 tni tn−1 tni tn−1 is i a preﬁx of f . Thus both (an+1 − i)qn and qn+1 are occurrences of tn tn−1 in f . Moreover, there is no occurrence of tni tn−1 between (an+1 − i)qn and qn+1 . Indeed, if we suppose otherwise, there are two cases to be considered: (a) tni tn−1 occurs at a position in the interval ](an+1 − i)qn , an+1 qn ], (b) tni tn−1 occurs at a position in the interval ]an+1 qn , qn+1 [. The two cases are illustrated in Fig. 3 in which the ﬁrst line represents the preﬁx tn+1 tn = a −i tn n+1 tni tn−1 tn of f , and the other lines represent the beginning of occurrences of tni tn−1 as described in cases (a) and (b) (keeping in mind that tn−1 is a preﬁx of tn ). Case (a) implies that tn tn−1 is a strict factor, or a sufﬁx, of tn tn−1 tn , contradicting Lemma 1(b). In case (b) we obtain tn as a strict factor, or a sufﬁx, of tn tn−1 which contradicts the primitivity of tn . We therefore conclude that tni tn−1 is a return word of tni tn−1 in f . We shall now determine the ﬁrst occurrence of tni tn−1 as a return word of tni tn−1 in f . By the ﬁrst part of the proof we already know that this ﬁrst occurrence is bounded by (an+1 −i)qn . Now, if i = an+1 , then (an+1 −i)qn = 0 and therefore, in this case, 0 is the ﬁrst occurrence of tni tn−1 as a return word of tni tn−1 in f . Suppose now that i < an+1 . Observing a the preﬁx tn+1 = tn n+1 tn−1 of f , we see that 0, qn , . . . , (an+1 − i)qn are occurrences of tni tn−1 . Therefore, the only return word that appears before position (an+1 − i)qn is tn , the shortest return word. We conclude that the ﬁrst occurrence of tni tn−1 as a return word of tni tn−1 in f is (an+1 − i)qn . Propositions 3 and 4 are actually valid for n 0, though we have proved them only for n 2. The proofs for smaller values of n have to be made separately and are rather technical; they appear in an appendix at the end of this paper. Example 8. Let f and w = 001 be as in Example 2. Thus |w| = 3 and hence |w| ∈ ]q1 + q0 − 2, 2q1 + q0 − 2] = [3, 5]. Therefore we are in the case n = 1, i = 1 and the return words of w in f are indeed the words t1 = 001, t1 t0 = 0010, found in Example 2. Moreover, applying Propositions 3 and 4, we have that the ﬁrst occurrence of t1 as a return word is at position 0, while the ﬁrst occurrence of t1 t0 as a return word is at position (a2 − i)q1 = 3, as observed in Example 2.

210

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219 a

n+1 Now, for n 0, consider the interval In = ∪i=1 In,i as in Propositions 3 and 4. We may deﬁne In+1,0 = In,an+1 , and

Jn =

a

−1

n+1 In,i if n > 0, ∪ i=0 an+1 −1 ∪ i=1 In,i if n = 0.

Notice that Jn corresponds to shifting In to the left. From Propositions 3 and 4, we have that a the set of return words of a preﬁx w with length in In,an+1 = In+1,0 is {tn , tn n+1 tn−1 }, which 0 can also be written as {tn+1 , tn+1 tn }. Thus, combining Propositions 3 and 4, we obtain Proposition 9. Let n 1 and i ∈ {0, . . . , an+1 − 1} or n = 0 and i ∈ {1, . . . , an+1 − 1}. Let w be a preﬁx of f of length in In,i . Then the return words of w in f are tn and tni tn−1 . Moreover, the ﬁrst occurrence of tn as a return word of w is 0, and the ﬁrst occurrence of tni tn−1 as a return word of w is (an+1 − i)qn . The change of indexes from Propositions 3 and 4 to Proposition 9 will be very useful in the proofs of the results in the remaining of the paper. Therefore, we will refer to Proposition 9 whenever we make use of the return words of a preﬁx of a characteristic Sturmian word. Remark 10. Notice that working with characteristic Sturmian words is not a restriction since every Sturmian word has the same set of factors of a well-chosen characteristic Sturmian word. In Proposition 9, we study the return words of the preﬁxes of f . Since the preﬁxes of a characteristic Sturmian word coincide with the left special factors of any Sturmian word with the same set of factors, Proposition 9 actually gives us the form of the return words of the left special factors of a Sturmian word. Remark 11. In [14], Vuillon uses factor graphs of a Sturmian word x to study the return words of x. Factor graphs are efﬁcient tools to study the factors of Sturmian words (for deﬁnition and applications see, for instance, [3,6,10]); they are formed by two cycles, intersecting each other in either one vertex or on a simple path. Vuillon, while proving that an inﬁnite word is Sturmian if and only if each factor has exactly two return words, shows that the form of the return words of x depends on the labels of the above-mentioned cycles.

4. Derivated words Let us now introduce the concept of derivated word proposed by Durand in [9]. Let x be a Sturmian word, let w be a preﬁx of x and let u, v be the two return words of w. Then x can be written in a unique way as a concatenation of the words u and v. Suppose, without loss of generality, that u appears before v in that concatenation. Denote by (x) the ﬁrst letter of x. Thus we deﬁne a bijection : {u, v} → {0, 1} by putting (u) = (x) and (v) = 1 − (x). In this way, if x = z1 z2 . . ., with zi ∈ {u, v}, we deﬁne Dw (x) = (z1 )(z2 ) · · · .

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

211

The word Dw (x) is called the derivated word of x with respect to w. The derivated word Dw (x) is a renaming by 0 and 1 of the occurrences of u and v in the decomposition of x in terms of its return words. This deﬁnition is better understood with an example. Example 12. Once again let f be the characteristic Sturmian word of slope = [0, 3, 2], and consider the return words of the preﬁx w = 001. The two return words of w in f are u = 001 and v = 0010. Thus we set (u) = (f ) = 0 and (v) = 1 − (f ) = 1. Thus, from (1), we see that the derivated word of f with respect to 001 starts with 0 1 0 1 0 1 0 0 1 0. Remark 13. Note that the images (u) and (v) were chosen so that any derivated word of x starts with the same letter as x. Remark 14. If two preﬁxes w, w of x have the same return words u and v then their derivated word is the same. Thus, we may call Dw (x) = Dw (x) the derivated word of x with respect to the return words u and v. Proposition 9 from above describes some preﬁx of the derivated word of f with respect to its preﬁx w. With the notation of the proposition, Dw (f ) has a preﬁx (u)an+1 −i (v). In the next proposition we determine the precise form of the whole derivated word Dw (f ). Its proof uses Proposition 9. Proposition 15. Let f be a characteristic Sturmian word of slope , where is given by its continued fraction expansion [0, a1 + 1, a2 , . . .]. For a preﬁx w of f whose return words are tn , tni tn−1 (n 1, i ∈ {0, . . . , an+1 − 1} or n = 0, i ∈ {1, . . . , an+1 − 1}), the derivated word Dw (f ) of f is the characteristic Sturmian word of slope • [0, an+1 + 1 − i, an+2 , an+3 , . . .] if a1 > 0; and • [0, 1, an+1 − i, an+2 , an+3 , . . .] if a1 = 0. In order to prove Proposition 15 we need two lemmas. The ﬁrst lemma, which can be found in [9], is based on the unicity of the decomposition of a Sturmian word with respect to the return words of some preﬁx of it. Lemma 16 (Durand [9]). Let x be an inﬁnite Sturmian word, w a preﬁx of x and let u, v be the two return words of w, such that u appears before v in the decomposition of x. Let be the morphism obtained by extending the mapping (x) → u and 1 − (x) → v. Then (a) (Dw (x)) = x and (b) if d is a word such that (d) = x then d = Dw (x). We denote by E the morphism 0 → 1, 1 → 0. Notice that E 2 is the identity mapping. The next lemma relates a characteristic Sturmian word f and its image E(f ), with respect to their associated standard sequences and their derivated words. Lemma 17. Let f be the characteristic Sturmian word of slope , where has continued fraction expansion [0, a1 + 1, a2 , . . .].

212

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

(a) E(f ) = f1− and 1 − has continued fraction expansion • [0, 1, a1 , a2 , . . .] if a1 > 0 and • [0, a2 + 1, a3 , . . .] if a1 = 0. (b) If (tn )n and (sn )n are the standard sequences associated to f and f1− , respectively, then • if a1 > 0 then E(tn ) = sn+1 for all n 0 and • if a1 = 0 then E(tn ) = sn−1 for all n 1. (c) Let n 1 and i ∈ {0, . . . , an+1 − 1} or n = 0 and i ∈ {1, . . . , an+1 − 1}. Then d is derivated from f with respect to the return words tn , tni tn−1 if and only if E(d) is derivated from f1− with respect to the return words E(tn ), E(tni tn−1 ). If a1 > 0, this is also true for n = 0. Proof. (a) The fact that E(f ) = f1− is proved in [12, Corollary 2.2.20]. The form of the continued fraction of 1 − comes from the deﬁnition of continued fractions. (b) Suppose ﬁrst that a1 > 0. Then the continued fraction of 1 − is [0, 1, a1 , a2 , . . .]. Moreover, (sn )n , the standard sequence associated to f1− is given by s0 = 0,

s1 = 1,

a

n−1 sn = sn−1 sn−2

(n 2).

We prove that E(tn ) = sn+1 , for all n 0, by induction on n. For 0 and 1 we have E(t0 ) = 1 = s1 ,

E(t1 ) = E(0a1 1) = 1a1 0 = s1a1 s0 = s2 .

Now, let n 2 and suppose that the claim is true for n − 1 and n − 2. Then an E(tn ) = E(tn−1 tn−2 ) = snan sn−1 = sn+1

which completes the induction. Suppose now that a1 = 0. Since E(f1− ) = f , and in the continued fraction of 1 − the ﬁrst non-zero entry is strictly greater than 1, we can use the ﬁrst case to conclude that, for all n 0, E(sn ) = tn+1 . Thus E(tn ) = sn−1 , for all n 1, as desired. (c) Clearly, by Proposition 9 and by (b), if tn and tni tn−1 are the return words of some preﬁx w of f , then E(tn ) and E(tni tn−1 ) are the return words of the preﬁx E(w) of f1− . The deﬁnition of E permits us to conclude that d is derivated of f if and only if E(d) is derivated from f1− . Remark 18. Lemma 17(c) tells us, in particular, that for a preﬁx w of f , E(Dw (f )) = DE(w) (E(f )) = DE(w) (f1− ). Proof of Proposition 15. Suppose ﬁrst that a1 > 0 and let d = Dw (f ). Notice that f begins by 0, and thus d also begins by 0. Thus, by Proposition 9, 0an+1 −i 1 is a preﬁx of d. Let be the morphism 0 → tn ,

1 → tni tn−1 .

We deﬁne a sequence of ﬁnite words (rm )m by setting r0 = 0,

r1 = 0an+1 −i 1,

a

m+n rm = rm−1 rm−2

(m 2).

(2)

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

213

Let us see that (rm ) = tm+n , for all m 0. We use induction on m. For m = 0 and m = 1 we have:

(r0 ) = tn , a

−i

a

(r1 ) = tn n+1 tni tn−1 = tn n+1 tn−1 = tn+1 . Suppose now that m 2 and that the claim is true for m − 1 and m − 2. Then a

a

m+n m+n (rm ) = (rm−1 rm−2 ) = tm+n−1 tm+n−2 = tm+n .

Now, if we let d = lim rm , we obtain (d ) = f and hence, by Lemma 16, d = d . Thus, by (2), d is the characteristic Sturmian word whose slope has continued fraction expansion [0, an+1 + 1 − i, an+2 , an+3 , . . .]. Now let a1 = 0. By Lemma 17, the derivated word of f with respect to the return words tn , tni tn−1 is the image by E of the derivated word d of E(f ) = f1− with respect to the return words E(tn ) and E(tni tn−1 ). The continued fraction expansion of 1 − is [0, a2 + 1, a3 , . . .] i sn−2 . Thus, by the ﬁrst part of the proof, the slope of and E(tn ) = sn−1 , E(tni tn−1 ) = sn−1 d has the continued fraction expansion

[0, an+1 + 1 − i, an+2 , an+3 , . . .]. Therefore E(d) is the characteristic Sturmian word whose slope has the continued fraction expansion [0, 1, an+1 − i, an+2 , an+3 , . . .].

Example 19. Let f be as in Example 12. It is easy to see that f has exactly ﬁve derivated words: they are the characteristic Sturmian words whose slope is [0, 2, 2, 3],

[0, 3, 3, 2],

[0, 2, 3],

[0, 4, 2, 3]

and

[0, 3, 2].

The next result relates Proposition 15 and [9, Theorem 2.5]. We start by introducing some deﬁnitions with respect to morphisms. A morphism is non-trivial if it is neither the identity nor E, and it is non-erasing if the image of each letter is non-empty. Given a morphism , we say that a word x is a ﬁxed point of , if (x) = x. Moreover, an inﬁnite word x is morphic if there exists a non-erasing morphism , such that (a) = as with a ∈ {0, 1}, s = ε, and x = (a) (in particular, x is a ﬁxed point of ). The inﬁnite word x is called substitutive if it is the image by a literal morphism (i.e. the image of a letter is a letter) of a morphic word. Theorem 20. For a characteristic Sturmian word f , the following conditions are equivalent: (a) the continued fraction of is ultimately periodic,

214

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

(b) the set of all derivated words (with respect to preﬁxes of f ) of f is ﬁnite, (c) f is substitutive. Proof. The equivalence of (b) and (c) is [9, Theorem 2.5]. Let us now prove that (a) and (b) are equivalent. We consider ﬁrst the case a1 > 0. Suppose that the set of all derivated words of f is ﬁnite. Thus, applying Proposition 15, there exist m, n, i and j, with m < n, such that the derivated word of f with respect to j i t tm , tm m−1 and the derivated word of f with respect to tn , tn tn−1 coincide, that is, the continued fraction expansions [0, am+1 + 1 − i, am+2 , am+3 , . . .]

and

[0, an+1 + 1 − j, an+2 , an+3 , . . .]

are equal. Therefore am+k = an+k , for all k 2, and [0, a1 + 1, a2 , a3 , . . .] = [0, a1 + 1, . . . , am+1 , am+2 , . . . , an+1 ] is ultimately periodic. Conversely, suppose that [0, a1 + 1, a2 , a3 , . . .] is ultimately periodic. It is clear from Proposition 15 that there are only ﬁnitely many derivated words from f . The equivalence of (a) and (b) for a1 = 0 is proved similarly.

5. An application As an application of the previous results, we obtain a new proof for Theorem 21 in terms of return words and derivated words. This theorem was ﬁrst proved by Crisp et al. in [8]. Both Berstel and Séébold in [5] and Komatsu and van der Poorten in [11] have presented alternative proofs. This theorem states three equivalences like in Theorem 20, in the case where is a Sturm number, that is, its continued fraction expansion is of one of the following types: (i) [0, a1 + 1, a2 , . . . , an ], with an a1 1, (ii) [0, 1, a1 , a2 , . . . , an ], with an a1 . It is easy to see that is a Sturm number of type (i) if and only if 1 − is a Sturm number of type (ii). The main ingredients of our proof of Theorem 21 are Proposition 15 and the fact that if a characteristic Sturmian word is a ﬁxed point of a morphism , then {(0), (1)} is a unordered standard pair (see [12, Proposition 2.3.11, Theorem 2.3.12] for a proof of this result). Theorem 21. For a characteristic Sturmian word f , the following conditions are equivalent: (a) is a Sturm number, (b) there exists a non-empty preﬁx w of f such that Dw (f ) = f , (c) f is a ﬁxed point of a (non-erasing, non-trivial) morphism. Proof. [(a)⇒(b)] Suppose ﬁrst that is a Sturm number of the form [0, a1 + 1, a2 , . . . , an ], i t where an a1 1. Consider the pair of words tn−1 , tn−1 n−2 , where i = an − a1 . We i t have 0 i an − 1, and thus tn−1 , tn−1 are the return words of some preﬁx of f . n−2

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

215

By Proposition 15, the derivated word of f with respect to those return words is the characteristic Sturmian word whose slope has the continued fraction expansion [0, an−1+1 + 1 − (an − a1 ), an−1+2 , an−1+3 , . . . , an−1+n ] = [0, a1 + 1, an+1 , an+2 , . . . , a2n−1 ] = [0, a1 + 1, a2 , . . . , an ] which is exactly the continued fraction expansion of . Therefore f is derivated from itself. Suppose now that is a Sturm number of the form [0, 1, a1 , a2 , . . . , an ], with an a1 . Then, by Lemma 17, the continued fraction expansion of 1 − is a Sturm number of the other form. Now, applying the ﬁrst part of the proof we have that f1− is a derivated word of itself. Thus, E(f1− ) = f is also a derivated word of itself (see Remark 18). [(b)⇒(a)] Suppose that f starts by 0 and that the continued fraction expansion of is [0, a1 + 1, a2 . . .]

(3)

(in particular a1 1). Since f is a derivated word from itself, by Proposition 15, there exist m, i, with m > 0, such that the continued fraction expansion of is [0, am+1 + 1 − i, am+2 , am+3 , . . .].

(4)

Thus, a1 + 1 = am+1 + 1 − i, a2 = am+2 , a3 = am+3 , etc. That is, the continued fraction expansion of is [0, a1 + 1, a2 , . . . , am+1 ] and am+1 = a1 + i. Thus am+1 a1 . Therefore is a Sturm number. Suppose now that f starts with 1. Since f is a derivated word from f , E(f ) = f1− is also a derivated word of itself. Thus 1 − is a Sturm number and hence is also a Sturm number. [(b)⇒(c)] There exists a non-empty preﬁx w of f such that f = Dw (f ). Let u and v be the return words of w, and let (f ) denote the ﬁrst letter of f . Hence, by Lemma 16, the morphism , deﬁned by

((f )) = u, (1 − (f )) = v, veriﬁes (f ) = f . [(c)⇒(b)] Let be a morphism such that (f ) = f . We want to show that (0), (1) are the return words of a non-empty preﬁx w of f . It will follow that Dw (f ) = f . Since has a ﬁxed point which is a characteristic word, by [12, Proposition 2.3.11 and Theorem 2.3.12], {((0), (1)} is an unordered standard pair. In particular (0) and (1) are primitive words. Moreover, this pair is different from {0, 1} since is non-trivial. Claim. Any unordered standard pair, different from {0, 1}, is either {0, 0k 1}, {1, 1k 0}, or {u, uk u }, for some word u, some non-empty preﬁx u of u and k 1. Proof of Claim. The proof is by induction on the way standard pairs (u, v) are constructed. For the base case, the standard pairs different from (0, 1) are (10, 1) and (0, 01) which verify the claim. It is easy to check that if (u, v) veriﬁes the claim, then the next pairs (vu, v) and (u, uv) also verify the claim.

216

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

Fig. 4. Occurrences i and i + |u| of w in f .

We start by considering the case |(0)| < |(1)|. Suppose ﬁrst that (0) = u and (1) = uk u , with u a non-empty preﬁx of u and k 1. The word uk u is a preﬁx of f since 0a1 1 is a preﬁx of f , and (0a1 1) = ua1 +k u is a preﬁx of (f ) = f . Let us show that (0), (1) are the return words of w = uk u in f . The word 01 is clearly a factor of f (otherwise f would be ultimately periodic). Hence (01) = uk+1 u is also a factor of f . Therefore there is an occurrence i of w in f , with i 0, such that i + |u| is also an occurrence of w. The situation is represented in Fig. 4. There is no occurrence of w between i and i + |u| for otherwise u would be a strict factor of uu, contradicting its primitivity. Therefore (0) = u is a return word of w in f . As for the other return word, observe that there exists l 0 such that 10l 1 is a factor of f (otherwise f would be ultimately periodic). Thus (10l 1), and in particular uk u uk u = ww, are factors of f . Thus there are two occurrences j and j + |w| of w in f , for some j 0. There is no intermediate occurrence of w since w = (1) is primitive. It follows that (1) = uk u is the other return word of w in f . Suppose now that

(0) = 0 and (1) = 0k 1 and consider the preﬁx w = 0k of f . The proof is similar to the previous one. Thanks to the factor (01) of f , we verify that 0 = (0) is a return word of w in f . Thanks to the factor (10l 1) of f , we verify that 0k 1 = (1) is the other return word of w in f . The case (0) = 1 and (1) = 1k 0 is similar. Finally, if |(0)| > |(1)| the proof is analogous. Remark 22. In Theorem 21, statement (c) may be substituted by “f is a morphic word”. Indeed, a characteristic Sturmian word is morphic if and only if it is the ﬁxed point of a (non-erasing, non-trivial) morphism. In order to prove this claim, let be a non-erasing, non-trivial morphism, and let f be a characteristic Sturmian word such that (f ) = f . Suppose, without loss of generality, that the ﬁrst letter of f is 0. Then (0) = 0w, for some word w. Notice that w cannot be the empty word. Indeed, on one hand, it follows from the proof of Theorem 21 that both (0) and (1) should start by the same letter (in this case, 0). On the other hand if k is the ﬁrst occurrence of 1 in f , that is 0k 1 is a preﬁx of f , then

(0k 1) = 0k (1). Since f is a ﬁxed point of it follows that the ﬁrst letter of (1) is 1, which is a contradiction. Thus w is non-empty and by [12, Theorem 1.2.8] (0) is the only ﬁxed point of that starts with 0. Hence f = (0), and f is a morphic word.

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

217

Example 23. By Theorem 21, the word f from Example 19 is a morphic word since it is a derivated word of itself.

Acknowledgments The ﬁrst author acknowledges the support of Fundação para a Ciência e a Tecnologia (Grant no. SFRH/BPD/11489/2002), and of the Centro de Álgebra da Universidade de Lisboa. Her participation in this work is partly part of the project POCTI Fundamental and Applied Algebra of Fundação para a Ciência e a Tecnologia and FEDER. She would also like to thank the hospitality of the Institut d’Informatique of the Université de Mons-Hainaut.

Appendix In this appendix we present the proofs of Propositions 3 and 4 for n ∈ {0, 1}. We start by the case n = 0, given by the following: Proposition 24. Let n = 0 and i ∈ {1, . . . , a1 }. Let w = 0i be the preﬁx of f of length in ]iq0 + q−1 − 2, (i + 1)q0 + q−1 − 2] = {i}. The shortest return word of w is t0 = 0, and its ﬁrst occurrence is 0 if i < a1 , a2 q1 if i = a1 . The longest return word of w is t0i t−1 = 0i 1, and its ﬁrst occurrence is (a1 − i)q0 . Proof. If a1 = 0 then the interval {1, . . . , a1 } is empty; thus we may assume that a1 > 0. Notice that (0a1 1)a2 0a1 +1 is a preﬁx of f . Studying this preﬁx it is clear that the two return words of w = 0i , are t0 = 0 and t0i t−1 = 0i 1. Moreover, the ﬁrst occurrence of 0 as a return word of w is 0 if i = a1 and a2 q1 otherwise, while the ﬁrst occurrence of 0i 1 is a1 − i = (a1 − i)q0 . The next proposition is Proposition 9 in the case n = 1. Proposition 25. Let n = 1 and i ∈ {1, . . . , a2 }. Let w be a preﬁx of f of length in the interval ]iq1 + q0 − 2, (i + 1)q1 + q0 − 2] = [iq1 , (i + 1)q1 − 1]. The shortest return word of w is t1 , and its ﬁrst occurrence is 0 if i < a2 , a3 q2 if i = a2 . The longest return word of w is t1i t0 , and its ﬁrst occurrence is (a2 − i)q1 .

218

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

Fig. 5. Illustration of an occurrence of t1i in ]a3 q2 , q3 [.

Proof. By Lemma 7, for each interval [iq1 , (i + 1)q1 − 1], it is enough to determine the return words for the preﬁx of w = t1i of f (notice that |w| ∈ [iq1 , (i + 1)q1 − 1]). It is easy to see that t2a3 t1i+1 is a preﬁx of f and t2a3 t1i+1 = t2a3 t1i t1 = t2a3 t1 t1i . Thus, a3 q2 and q3 are occurrences of t1i in f . Moreover, there is no occurrence of t1i in ]a3 q2 , q3 [. Indeed, if t1i occurred in that interval, we would obtain a situation as shown in Fig. 5 (the top line represents the preﬁx t2a3 t1 t1 of f , and the bottom line represents the beginning of an occurrence of t1i in ]a3 q2 , q3 [). We would hence have t1 as a strict factor of t1 t1 , contradicting the primitivity of t1 . Hence t1 is a return word of w in f . Now, by Lemma 6, t2 t1i t0 = t1a2 −i t1i t0 t1i t0 is also a preﬁx of f . Thus, both (a2 − i)q1 and q2 are occurrences of t1i in f . Moreover, there is no occurrence of t1i between (a2 − i)q1 and q2 . Indeed, remember that t1 = 0a1 1 and t0 = 0. Therefore t1i t0 is the longest return word of w in f . We now locate the ﬁrst occurrence of the two return words of w. Let i < a2 . Since t2 = t1a2 t0 is a preﬁx of f , we see that 0, q1 , . . . , (a2 − i)q1 are occurrences of w = t1i in f . Therefore the shortest return word t1 occurs at position 0, and the ﬁrst occurrence of the longest return word t1i t0 is greater than or equal to (a2 − i)q1 . Since we have already seen that (a2 − i)q1 is indeed an occurrence of the return word t1i t0 , we conclude that it is its ﬁrst occurrence. Let now i = a2 . From the above we have that the ﬁrst occurrence of the shortest return word t1 is bounded by a3 q2 . Let us see that t1 cannot appear before as a return word of w = t1a2 . It will also follow that the ﬁrst occurrence of the longest return word t1a2 t0 = t2 is 0. Any occurrence of t1 as a return word of w corresponds to an occurrence of t1 w = t1a2 +1 . Now, if a1 = 0, then t1 = 1 and t2 = 1a2 0. Hence, considering the preﬁx t2a3 t1a2 +1 of f , it is clear that the ﬁrst occurrence of t1 w in f is a3 q2 . On the other hand, if a1 > 0, then t2 is a preﬁx of t1 w. Thus, any occurrence of t1 w smaller than a3 q2 is of the form kq2 , with k ∈ {0, . . . , a3 − 1}, since t2 is primitive. Keeping in mind that t1 is a preﬁx of t2 , it follows that t1 = t0 t1 (see Fig. 6), which is not possible since t0 = 0 and t1 = 0a1 1.

I.M. Araújo, V. Bruyère / Theoretical Computer Science 340 (2005) 204 – 219

219

a

Fig. 6. Illustration of occurrences of t1 as a return word of t1 2 before a3 q2 .

References [1] J.P.Allouche, J. Shallit,Automatic Sequences—Theory,Applications, Generalizations, Cambridge University Press, Cambridge, 2003. [2] I.M. Araújo, V. Bruyère, Sturmian words and a criterium by Michaux–Villemaire. Theoret. Comput. Sci., in press, doi:10.1016/j.tcs.2005.01.010 (Appeared in Proc. Fourth Internat. Conf. on Words, Turku, Finland, 2003, pp. 83–94.) [3] P. Arnoux, G. Rauzy, Représentation géométrique de suites de complexité 2n + 1, Bull. Soc. Math. France 119 (2) (1991) 199–215. [4] J. Berstel, Recent results in Sturmian words, in: Developments in Language Theory, Vol. II, Magdeburg, 1995, World Science Publishing, River Edge, NJ, 1996, pp. 13–24. [5] J. Berstel, P. Séébold, Morphismes de Sturm, Bull. Belg. Math. Soc. Simon Stevin 1 (2) (1994) 175–189 (Journées Montoises, Mons, 1992.). [6] V. Berthé, Fréquences des facteurs des suites sturmiennes, Theoret. Comput. Sci. 165 (1996) 295–309. [8] D. Crisp, W. Moran, A. Pollington, P. Shiue, Substitution invariant cutting sequences, J. Théor. Nombres Bordeaux 5 (1) (1993) 123–137. [9] F. Durand, A characterization of substitutive sequences using return words, Discrete Math. 179 (1998) 89–101. [10] I. Fagnot, L. Vuillon, Generalized balances in Sturmian words, Discrete Appl. Math. 121 (1–3) (2002) 83–101. [11] T. Komatsu, A.J. van der Poorten, Substitution invariant Beatty sequences, Japan J. Math. (N.S.) 22 (2) (1996) 349–354. [12] M. Lothaire, Algebraic Combinatoric on Words, Encyclopedia of Mathematics and its Applications, Cambridge University Press, Cambridge, 2002. [13] C. Michaux, R. Villemaire, Presburger arithmetic and recognizability of sets of natural numbers by automata: new proofs of Cobham’s and Semenov’s theorems, Ann. Pure Appl. Logic 77 (1996) 251–277. [14] L. Vuillon, A characterization of Sturmian words by return words, European J. Combin. 22 (2) (2001) 263–275.

Theoretical Computer Science 340 (2005) 220 – 239 www.elsevier.com/locate/tcs

Codes of central Sturmian words夡 Arturo Carpia, c,∗ , Aldo de Lucab, c a Dipartimento di Matematica e Informatica dell’Università di Perugia, via Vanvitelli 1, 06123 Perugia, Italy b Dipartimento di Matematica e Applicazioni, Università di Napoli “Federico II”, via Cintia, Monte S. Angelo,

80126 Napoli, Italy c Istituto di Cibernetica del C. N. R. “E. Caianiello”, Pozzuoli (NA), Italy

Abstract A central Sturmian word, or simply central word, is a word having two coprime periods p and q and length equal to p + q − 2. We consider sets of central words which are codes. Some general properties of central codes are shown. In particular, we prove that a non-trivial maximal central code is inﬁnite. Moreover, it is not maximal as a code. A central code is called preﬁx central code if it is a preﬁx code. We prove that a central code is a preﬁx (resp., maximal preﬁx) central code if and only if the set of its ‘generating words’ is a preﬁx (resp., maximal preﬁx) code. A suitable arithmetization of the theory is obtained by considering the bijection , called ratio of periods, from the set of all central words to the set of all positive irreducible fractions deﬁned as: () = 1/1 and (w) = p/q (resp., (w) = q/p) if w begins with the letter a (resp., letter b), p is the minimal period of w, and q = |w| − p + 2. We prove that a central code X is preﬁx (resp., maximal preﬁx) if and only if (X) is an independent (resp., independent and full) set of fractions. Finally, two interesting classes of preﬁx central codes are considered. One is the class of Farey codes which are naturally associated with the Farey series; we prove that Farey codes are maximal preﬁx central codes. The other is given by uniform central codes. A noteworthy property related to the number of occurrences of the letter a in the words of a maximal uniform central code is proved. © 2005 Elsevier B.V. All rights reserved. Keywords: Sturmian word; Code; Central word

夡 The work for this paper has been supported by the Italian Ministry of Education under Project COFIN 2003 Linguaggi Formali e Automi: metodi, modelli e applicazioni.

∗ Corresponding author. Dipartimento di Matematica e Informatica dell’Università di Perugia, via Vanvitelli 1,

06123 Perugia, Italy. E-mail addresses: [email protected] (A. Carpi), [email protected] (A. de Luca). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.021

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

221

1. Introduction Sturmian words are inﬁnite sequences of symbols taken from a ﬁnite alphabet which are not eventually periodic and have the minimal possible value for the subword complexity, i.e., for any integer n 0 the number of the subwords of length n of any Sturmian word is equal to n + 1. Sturmian words are of great interest both from the theoretical and applicative point of view, so that there exists a large literature on the subject. We refer to the recent overviews on Sturmian words by Berstel and Séébold [4, Chapter 2] and by Allouche and Shallit [1, Chapters 9–10]. A geometrical deﬁnition of a Sturmian word is the following: consider the sequence of the cuts (cutting sequence) in a squared-lattice made by a ray having a slope which is an irrational number. A horizontal cut is denoted by the letter b, a vertical cut by a and a cut with a corner by ab or ba. Sturmian words represented by a ray starting from the origin are usually called standard or characteristic. The most famous Sturmian word is the Fibonacci word f = abaababaabaababaababaabaababaabaab · · · , which is the limit of the sequence of words (fn )n 0 , inductively deﬁned as f0 = b, f1 = a, and fn+1 = fn fn−1 for n 1 . Standard Sturmian words can be equivalently deﬁned in the following way which is a natural generalization of the deﬁnition of the Fibonacci word. Let c0 , c1 , . . . , cn , . . . be any sequence of integers such that c0 0 and ci > 0 for i > 0. We deﬁne, inductively, the sequence of words (sn )n 0 , where c

s0 = b, s1 = a, and sn+1 = snn−1 sn−1 for n 1 . The sequence (sn )n 0 converges to a limit s which is an inﬁnite standard Sturmian word. Any standard Sturmian word is obtained in this way. We shall denote by Stand the set of all the words sn , n 0 of any standard sequence (sn )n 0 . Any word of Stand is called ﬁnite standard Sturmian word, or generalized Fibonacci word. In the study of combinatorial properties of Sturmian words a very important role is played by the set PER of all palindromic preﬁxes of all standard Sturmian words. The words of PER have been called central Sturmian words, or simply central words, in [4]. It has been proved in [6] that a word is central if and only if it has two coprime periods p and q and length equal to p + q − 2. In this paper, we consider sets of central words which are codes, i.e., bases of free submonoids of {a, b}∗ . There are several motivations for this research. From the theoretical point of view central codes have interesting combinatorial properties. In particular, a suitable arithmetization of the theory can be given. Moreover, the words of a central code are palindromes which satisfy some strong constraints which can be useful for the applications (coding with constraints [10], error correcting codes). Finally, we believe that these codes can be of some interest in discrete geometry (for instance to represent polygonals in a discrete plane). In Section 4 some general properties of central codes are shown. In particular, we prove that a non-trivial maximal central code X is PER-complete, i.e., any central word is a factor

222

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

of a word of X ∗ . As a consequence of this proposition and of some technical lemmas, we prove that a non-trivial maximal central code is inﬁnite. Moreover, it is not maximal as a code. In Section 5, we consider preﬁx central codes, i.e., central codes such that no word of the code is a preﬁx of another word of the code. We prove that a central code is a preﬁx (resp., maximal preﬁx) central code if and only if the set of its ‘generating words’ is a preﬁx (resp., maximal preﬁx) code. A suitable arithmetization of the theory is obtained by considering the bijection , called ratio of periods, from the set of all central words to the set I of all positive irreducible fractions deﬁned as: () = 11 and (w) = p/q (resp., (w) = q/p) if w begins with the letter a (resp., letter b), p is the minimal period of w, and q = |w| − p + 2. A suitable derivation relation on the set I is introduced. A subset H of I is called independent if no fraction of the set can be derived from another one. A subset H of I is called full if for any element p/q of I either from p/q one can derive an element of H or there exists an element of H from which one can derive p/q. We prove that a central code X is preﬁx (resp., maximal preﬁx) if and only if (X) is an independent (resp., independent and full) set of fractions. In Section 6, we consider for any positive integer n the set n of all central words w having minimal period p, q = |w| − p + 2 n + 1, and |w| n. One can prove that for each n, n is a maximal preﬁx central code called the Farey code of order n since it is naturally associated with the Farey series of order n. Finally, in Section 7, we consider the class of uniform central codes. A central code is uniform of order n if all the words of the code have length equal to n. For any n the maximal uniform central code of order n is given by Un = PER ∩ {a, b}n . The following noteworthy property, related to the number of occurrences |w|a of the letter a in a word w of a maximal uniform central code Un , is proved: for any k, 0 k n there exists a (unique) word w ∈ Un such that |w|a = k if and only if gcd(n + 2, k + 1) = 1. 2. Preliminaries Let A be a ﬁnite non-empty set, or alphabet, and A∗ the free monoid generated by A. The elements of A are usually called letters and those of A∗ words. The identity element of A∗ is called empty word and denoted by . We set A+ = A∗ \ {}. A word w ∈ A+ can be written uniquely as a sequence of letters as w = w1 w2 · · · wn , with wi ∈ A, 1 i n, n > 0. The integer n is called the length of w and denoted |w|. The length of is 0. For any w ∈ A∗ and a ∈ A, |w|a denotes the number of occurrences of the letter a in w. Let w ∈ A∗ . The word u is a factor (or subword) of w if there exist words p, q such that w = puq. A factor u of w is called proper if u = w. If w = uq, for some word q (resp., w = pu, for some word p), then u is called a preﬁx (resp., a sufﬁx) of w. For any w ∈ A∗ , we denote by Fact w, the sets of its factors. For any X ⊆ A∗ , we set Fact X = Fact u. u∈X

An element of Fact X will be also called a factor of X.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

223

A set X is called dense if any word of A∗ is a factor of X. A set which is not dense is called thin. If X is a ﬁnite set we denote by (X) the maximal length of the words of X. Any word of A∗ of length greater than (X) is not a factor of X, so that X is thin. Let Y ⊆ A∗ . A set X is called Y-complete if Y ⊆ Fact X ∗ . A set X which is A∗ -complete, i.e., such that X∗ is dense, is called simply complete. Let p be a positive integer. A word w = w1 · · · wn , wi ∈ A, 1 i n, has period p if the following condition is satisﬁed: for all 1 i, j n, if i ≡ j (mod p), then wi = wj . From the deﬁnition one has that any integer q |w| is a period of w. As is well known, a word w has a period p |w| if and only if there exist words u, v, s such that w = us = sv ,

|u| = |v| = p .

We shall denote by w the minimal period of w. We can uniquely represent w as w = rkr , where |r| = w , k 1, and r is a proper preﬁx of r. We shall call r the fractional root or, simply, root of w. Let w = w1 · · · wn , wi ∈ A, 1 i n. The reversal of w is the word w ∼ = wn · · · w1 . One deﬁnes also ∼ = . A word is called palindrome if it is equal to its reversal. A code X over a given alphabet A is the base of a free submonoid of A∗ , i.e., any nonempty word of X∗ can be uniquely factorized by words of X (cf. [3]). A code X over A is preﬁx (resp., sufﬁx) if no word of X is a preﬁx (resp., sufﬁx) of another word of X. A code is bipreﬁx if it is both preﬁx and sufﬁx. A code X over the alphabet A is maximal if it is not properly included in another code on the same alphabet. As is well known any maximal code is complete. Conversely, a thin and complete code is maximal. A preﬁx code is a maximal preﬁx code if it is not properly included in another preﬁx code on the same alphabet. The following two lemmas will be useful in the sequel. Lemma 1. Let X be a code over the alphabet A and w ∈ A∗ be a word having root . If ∈ Fact X∗ , then X ∪ {w} is a code. Proof. Suppose that Y = X ∪ {w} is not a code. There would exist h, k > 0 and words y1 , . . . , yh , y1 , . . . , yk ∈ Y such that y1 = y1 and y1 · · · yh = y1 · · · yk . Since X is a code and w does not belong to Fact X ∗ , one easily derives that w has to occur in both sides of the previous equation, i.e., there exist minimal positive integers i and j such that w = yi = yj . Setting u = y1 · · · yi−1 and v = y1 · · · yj −1 , one has uw = vw with u, v ∈ X∗ , u = v, and , ∈ Y ∗ .

224

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

With no loss of generality, we can assume |u| > |v|. Then one has u = v , w = w , ∈ A+ . From this latter equation one obtains

w = w with ∈ A+ . This equation shows that || is a period of w and then of w, so that || ||. Thus, is a preﬁx of and, consequently a factor of u = v . Hence ∈ Fact X ∗ , which is a contradiction. Lemma 2. Let X be a preﬁx code over the alphabet A and w ∈ A∗ be a word such that wA∗ ∩ X ∗ = ∅. Then Y = X ∪ {w} is a code. Proof. Suppose that Y is not a code. There would exist h, k > 0 and words y1 , . . . , yh , y1 , . . . , yk ∈ Y such that y1 = y1 and y1 · · · yh = y1 · · · yk . Since X is preﬁx one has y1 = w or y1 = w. Without loss of generality, we may suppose that y1 = w. Since wA∗ ∩ X∗ = ∅ there exists j 2 such that y1 , . . . , yj −1 ∈ X and yj = w. Hence, one has y1 · · · yj −1 = u ∈ X+ and uw = wv,

with v ∈ A∗ .

Let n be a positive integer such that |un | |w|. One has un w = wv n so that un = w for a suitable ∈ A∗ . Thus, wA∗ ∩ X ∗ = ∅, which is a contradiction. 3. Central words In the study of combinatorial properties of Sturmian words a crucial role is played by the set PER of all ﬁnite words w having two periods p and q such that gcd(p, q) = 1 and |w| = p + q − 2. We assume that ∈ PER (this is formally coherent with the deﬁnition if one takes p = q = 1) The set PER was introduced in [6] where its main properties were studied. In particular, it has been proved that PER is equal to the set of the palindromic preﬁxes of all standard Sturmian words. The words of PER have been called central in [4]. As is well known, central words are in a two-letter alphabet {a, b} that, in the sequel, will be denoted by A. The set PER has remarkable structural properties. The set of all ﬁnite factors of all Sturmian words equals the set of factors of PER. Moreover, the set Stand of all ﬁnite standard Sturmian words is given by Stand = A ∪ PER{ab, ba} .

(1)

Thus, any ﬁnite standard Sturmian word which is not a single letter is obtained by appending ab or ba to a central word. The following useful characterization of central words is a slight generalization of a statement proved in [5] (see also [8]). We report the proof for the sake of completeness.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

225

Proposition 3. A word w is central if and only if w is a power of a single letter of A or it satisﬁes the equation: w = w1 abw2 = w2 baw1

(2)

with w1 , w2 ∈ A∗ . Moreover, in this latter case, w1 and w2 are central words, p = |w1 |+2 and q = |w2 | + 2 are coprime periods of w, and min{p, q} is the minimal period of w. Proof. In view of the results of [5, Lemma 4], it is sufﬁcient to prove that any word w satisfying Eq. (2) is a central word. Indeed, in such a case, w has the two periods p = |w1 ab| and q = |w2 ba|, and |w| = p + q − 2. Moreover, gcd(p, q) = 1. In fact, suppose that gcd(p, q) = d 2. By the theorem of Fine and Wilf (see, e.g., [9]) the word w will have the period d. Thus, w1 ab = zp/d and w2 ba = zq/d , where z is the preﬁx of w of length d. We reach a contradiction since from the ﬁrst equation the last letter of z has to be b, while from the second equation it has to be a. Since p and q are coprime, the word w is central. Finally, we observe that since w is palindrome, w1 and w2 are palindromes and preﬁxes of a central word, so that they are central words. The following corollary will be useful in the sequel. Corollary 4. If w ∈ PER has the factor x n with x ∈ A and n > 0, then x n−1 is a preﬁx (and sufﬁx) of w. Proof. We can assume, with no loss of generality, that x = a. If w is a power of a letter, the statement is trivially true. If, on the contrary, w is not a power of a letter, then by Proposition 3, w = w1 abw2 = w2 baw1 with w1 , w2 ∈ PER. The word a n is a factor of w2 or of w1 or a preﬁx of aw1 . In the ﬁrst two cases, by making induction on the length of |w|, we can assume that w2 or w1 has the preﬁx a n−1 ; in the third case, w1 has the preﬁx a n−1 . Thus, in all cases, a n−1 is a preﬁx of w. For any word w we denote by w (−) the shortest palindrome having the sufﬁx w. The word w(−) is called the palindromic left-closure of w. For any set of words X, we set X(−) = {w (−) | w ∈ X}. The following lemmas were proved in [5]. Lemma 5. For any w ∈ PER, one has (aw)(−) , (bw)(−) ∈ PER. More precisely, if w = w1 abw2 = w2 baw1 , then (aw)(−) = w2 baw1 abw2 ,

(bw)(−) = w1 abw2 baw1 .

If w = x n with {x, y} = A, then (xw)(−) = x n+1 , (yw)(−) = x n yx n . Lemma 6. Let u, w ∈ PER and x ∈ A. If ux is a preﬁx of w, then also (xu)(−) is a preﬁx of w.

226

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

By Proposition 3 and Lemma 5 one easily derives that if u = (xw)(−) with w ∈ PER and x ∈ A, then |u| = u + |w| .

(3)

The following method to generate central words was introduced in [5]. By the preceding lemma, we can deﬁne the map

: A∗ → PER as follows: () = and for all v ∈ A∗ , x ∈ A,

(vx) = (x (v))(−) . The map : A∗ → PER is a bijection. The word v is called the generating word of (v). One has that for all v, u ∈ A∗

(vu) ∈ A∗ (v) ∩ (v)A∗ .

(4)

Example 7. Let w = abba. One has

(a) = a ,

(ab) = aba ,

(abb) = ababa ,

(abba) = ababaababa . As usually, one can extend to the subsets of A∗ by setting, for all X ⊆ A∗ , (X) = { (x) | x ∈ X}. In particular, one has (aA∗ ) = PERa and (bA∗ ) = PERb , where PERa = PER ∩ aA∗

and PERb = PER ∩ bA∗ .

Let I be the set of all irreducible positive fractions. We consider the map : PER → I, called the ratio of periods, deﬁned as follows: let w ∈ PER \ {ε}, p be the minimal period of w, and q = |w| + 2 − p. We set

(w) = p/q if w ∈ PERa ,

(w) = q/p if w ∈ PERb .

Moreover,

(ε) =

1 1

.

As is well known [5] the map is a bijection. We recall that for all w ∈ PER, the numbers |w|a + 1 and |w|b + 1 are coprime. Moreover the function : PER → I deﬁned, for any w ∈ PER, by

(w) =

|w|b + 1 |w|a + 1

is a bijection [2], called slope. Since and are both bijections, the values of each of them is determined from the other.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

227

We introduce in I the binary relation ⇒ deﬁned as follows: for p/q, r/s ∈ I, one sets r p ⇒ q s

if

p q , r ∈ {p, q} , s = p + q ,

or

p q , s ∈ {p, q} , r = p + q .

One easily veriﬁes that the graph of this relation is a complete binary tree with root 11 . We denote by ⇒∗ the reﬂexive and transitive closure of ⇒. For instance, one has 21 ⇒ 23 ⇒ 2 5 1 ∗ 5 5 ⇒ 7 , so that 2 ⇒ 7 . From Lemma 5 one derives that for any w, w ∈ PER one has

(w) ⇒ (w ) if and only if w = (xw)(−) , with x ∈ A .

(5)

We say that a subset H of I is independent if for any pair of fractions p/q, r/s ∈ H such that p/q ⇒∗ r/s one has p/q = r/s. A subset H of I is full if for any fraction p/q ∈ I there exists a fraction r/s ∈ H such that p/q ⇒∗ r/s or r/s ⇒∗ p/q. One introduce the Farey map Fa = ◦ . Thus for any x ∈ A∗ one has Fa(x) = ( (x)) ∈ I. Since and are bijections, also Fa will be so. Lemma 8. Let x, x ∈ A∗ . One has that Fa(x) ⇒∗ Fa(x ) if and only if x is a preﬁx of x . Proof. It is sufﬁcient to prove that for any pair of words x, x ∈ A∗ , one has Fa(x) ⇒ Fa(x ) if and only if x ∈ xA. We suppose that x ∈ aA∗ (the case where x ∈ bA∗ or x = ε can be dealt with similarly). We set Fa(x) = p/q. Therefore, by Eq. (5), {Fa(xa), Fa(xb)} =

q p , p+q p+q

.

Thus, p/q ⇒ Fa(x ) if and only if Fa(x ) ∈ {Fa(xa), Fa(xb)}. Since Fa is a bijection, this last condition is equivalent to x ∈ xA. Corollary 9. A set X ⊆ A+ is a preﬁx code if and only if Fa(X) is an independent set. Proof. Let x and x be two distinct elements of X. By the previous lemma, x is a proper preﬁx of x if and only if Fa(x) ⇒∗ Fa(x ). This implies that X is a preﬁx code if and only if Fa(X) is an independent set. Corollary 10. A preﬁx code X ⊆ A∗ is maximal if and only if Fa(X) is a full set. Proof. A preﬁx code X is maximal if and only if for any word w ∈ A∗ there exists a word x ∈ X such that either w is a preﬁx of x or x is a preﬁx of w. By Lemma 8 this occurs if and only if Fa(w) ⇒∗ Fa(x) or Fa(x) ⇒∗ Fa(w). This implies that X is a maximal preﬁx code if and only if Fa(X) is a full set.

228

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

4. Central codes In this section we shall consider sets of central words which are codes. These codes, which are in a two-letter alphabet, will be called Sturmian central codes or, simply, central codes. For instance the sets X1 = {a, b}, X2 = {b, aa, aba}, X3 = {aa, aabaa, babbab}, and X4 = {b2 } ∪ (ab)∗ a are central codes. Proposition 11. A central code is thin. Proof. This is a consequence of the fact that the set PER is thin. Indeed, for instance, as is well known, the word aabb is not a factor of any Sturmian word. A central code is maximal if it is not properly included in another central code. By using a classical argument based on the Zorn property, which is satisﬁed by the family of central codes, one easily derives that any central code is included in a maximal central code. Proposition 12. A maximal central code is PER-complete. Proof. Let X be a maximal central code. By contradiction, suppose that there exists a word f ∈ PER such that f ∈ Fact X ∗ . Let p be the minimal period of f and q = |f | − p + 2. If v is the generating word of f, by Eqs. (5) and (4) one derives that there exist letters x, y ∈ A such that g = (vxy) ∈ PER has minimal period p + q and preﬁx f. Thus, f is a preﬁx of the root of g, so that ∈ Fact X∗ . By Lemma 1, X ∪ {g} should be a code which is central, contradicting the maximality of X as central code. Now, we shall prove (cf. Corollary 18) that the unique ﬁnite maximal central code is A. We need some preliminary technical lemmas. Lemma 13. Let X be a central code and u ∈ A∗ . The following statements hold: (1) If baau ∈ X∗ , then b ∈ X and aau ∈ X ∗ . (2) If X = A and aba 3 u ∈ X∗ , then aba ∈ X and aau ∈ X ∗ . Proof. If baau ∈ X∗ , there exist v ∈ X∗ and x ∈ X such that baau = xv . By Corollary 4 no central word has the preﬁx baa so that x is necessarily a proper preﬁx of baa. Hence, since X is a central code, x = b and v = aau ∈ X∗ . If aba 3 u ∈ X∗ , there exist v ∈ X∗ and x ∈ X such that aba 3 u = xv . By Corollary 4 no central word has the preﬁx aba 3 so that x is necessarily a proper preﬁx of aba 3 , i.e., x = aba or a. In the ﬁrst case, v = aau ∈ X ∗ . In the second case, v = ba 3 u so that, by Statement 1 one has b ∈ X, i.e., X = A.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

229

Lemma 14. Let X be a ﬁnite PER-complete central code. Then a ∈ X or b ∈ X. Proof. Consider the word w = (aab)n aaa(baa)n with 3n (X). As one easily veriﬁes, w = (a 2 bn a), so that w ∈ PER. Since X is PERcomplete, there exist words , ∈ A∗ such that w ∈ X ∗ . We have to distinguish three cases: (1) (aab)n a, aa(baa)n ∈ X∗ , (2) (aab)n aa, a(baa)n ∈ X∗ , (3) (aab)n = u, (baa)n = v with x = uaaav ∈ X, , ∈ X∗ , u, v ∈ A∗ . Let us consider Case 1. If a ∈ X, then the statement is true. Thus suppose a ∈ X. Since (X) 3n one derives that the ﬁrst factor in the X-factorization of aa(baa)n , which has to be a palindrome, has the form aa(baa)i with 0 i < n. This implies that (baa)n−i ∈ X∗ . By Lemma 13 one derives b ∈ X. Case 2 can be dealt with symmetrically. Now let us consider Case 3. As x ∈ PER has the factor aaa, by Corollary 4 it must have the sufﬁx aa. Since (baa)n = v and |v| < (X) 3n, one derives v = (baa)i with 0 i < n. This implies that = (baa)n−i ∈ X ∗ . By Lemma 13 one obtains again b ∈ X. Lemma 15. Let X be a ﬁnite PER-complete central code. Then one has b ∈ X or aba ∈ X. Symmetrically, one has a ∈ X or bab ∈ X. Proof. Consider the word w = (aaab)n aaa with 4(n − 1) (X). As one easily veriﬁes, w = (a 3 bn ), so that w ∈ PER. Since X is PER-complete, there exist words , ∈ A∗ such that w ∈ X ∗ . Since (X) 4(n − 1), one has

(aaab)i a p , a q (baaa)j ∈ X∗ with i, j 1, i + j = n, p, q 0, and p + q = 3. We distinguish three cases, according to the values of q. Case q = 0. As (baaa)j ∈ X∗ , by Lemma 13 it follows that b ∈ X. Case q = 1. If X = A, then trivially, b ∈ X. If, on the contrary, X = A, since a(baaa)j ∈ X∗ , by Lemma 13 one derives aba ∈ X. Case q > 1. Since p = 3 − q 1 and a p (baaa)i ∼ ∈ X ∗ , one reaches the result by a similar argument. Lemma 16. Let X be a ﬁnite PER-complete central code. Then there exist h, k 0 such that (ab)h a, (ba)k b ∈ X.

230

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

Proof. Consider the word w = (ab)n a with n such that |w| = 2n + 1 3(X). As one easily veriﬁes, w = (abn ), so that w ∈ PER. Since X is PER-complete, there exist words , ∈ A∗ such that w ∈ X ∗ . Since |w| 3(X), one derives that w has a factor in X 2 , i.e., w = xy , with x, y ∈ X and , ∈ A∗ . We shall suppose | | even (the opposite case is similarly dealt with). One has

= (ab)i ,

xy = (ab)n−i a,

0i < n .

As x is a palindrome, one obtains x = (ab)h a,

y = (ba)n−i−h ,

0h < n − i

and, similarly, y = (ba)k b,

= a(ba)n−i−h−k−1 ,

which concludes the proof.

0k < n − i − h ,

Proposition 17. Let X be a ﬁnite PER-complete central code. Then X = A. Proof. If a, b ∈ X, then X = A and the statement holds true. Let us then suppose that b ∈ X. By Lemma 14, one has a ∈ X and, by Lemma 15, aba ∈ X. Moreover, by Lemma 16, there exists k > 0 such that (ba)k b ∈ X. This yields a contradiction as the word (ab)k+2 a has two distinct X-factorizations, namely, (a) ((ba)k b) (aba) = (aba) ((ba)k b) (a) .

By the previous proposition and Proposition 12 it follows: Corollary 18. Let X be a ﬁnite maximal central code. Then X = A. The following proposition gives an example of an inﬁnite maximal central code. The proof, which is rather technical, is reported in the appendix. Proposition 19. The set X = PER \ D, where D= ((ab)i a)∗ ∪ ((ba)i b)∗ , i 0

is a maximal central code. Proposition 20. There exists a PER-complete central code which is not a maximal central code.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

231

Proof. Let X = PER \ D be the maximal central code considered in Proposition 19 and set Y = X \ {aabaa} . Since the word aabaa is a factor of aabaabaa ∈ Y , one has Fact X = Fact Y . Let us prove that for any z ∈ D one has z ∈ Fact X. We can assume with no loss of generality that z = ((ab)i a)j , with i, j 0. Moreover, we can suppose j 2 since (ab)i a is a factor of ((ab)i a)2 . As one easily veriﬁes, (bz)(−) = ((ab)i a)j ba((ab)i a)j −1 ∈ PER \ D = X , so that z ∈ Fact X. Thus D ⊆ Fact X. Since PER = X ∪ D, it follows PER ⊆ Fact X = Fact Y . Therefore, in view of the previous proposition, Y is a PER-complete code which is not a maximal central code. Lemma 21. The pairs (b2 , a 2 ) and (a 2 , b2 ) are synchronizing pairs of any central code X, i.e., for all u, v ∈ A∗ , ub2 a 2 v ∈ X∗ ⇒ ub2 , a 2 v ∈ X∗ ,

ua 2 b2 v ∈ X∗ ⇒ ua 2 , b2 v ∈ X∗ .

Proof. Since b2 a 2 is not a factor of PER, if ub2 a 2 v ∈ X∗ , then one of the following three cases occurs: ub, ba 2 v ∈ X∗ , ub2 , a 2 v ∈ X∗ , ub2 a, av ∈ X ∗ .

(6) (7) (8)

If Eq. (6) holds, then by Lemma 13 one has b, a 2 v ∈ X ∗ so that Eq. (7) is satisﬁed. If Eq. (8) holds, one obtains ab2 u∼ ∈ X ∗ so that by Lemma 13 one obtains a ∈ X and b2 u∼ ∈ X∗ . Hence, ub2 ∈ X∗ so that Eq. (7) is satisﬁed again. This proves that (b2 , a 2 ) is a synchronizing pair. In a symmetric way one proves that also (a 2 , b2 ) is a synchronizing pair. Proposition 22. A central code X = A is not complete. Proof. Let X be a complete central code. We consider the word a 2 b2 a 3 b3 a 2 b2 . There exist u, v ∈ A∗ such that ua 2 b2 a 3 b3 a 2 b2 v ∈ X∗ . By the preceding lemma, one derives b2 , b3 , a 2 , a 3 ∈ X∗ . Since X is a code, it follows a, b ∈ X∗ , i.e., X = A. As any maximal code is complete, by the previous proposition, one derives that a central code X = A is not maximal as code.

232

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

5. Preﬁx central codes In this section, we shall consider central codes which are preﬁx codes. Since the words of such codes are palindromes, one has that a preﬁx central code is also a sufﬁx code and then a bipreﬁx code. For instance, the set X = {a, bab, bb} is a preﬁx central code. Proposition 23. A central code Y is preﬁx if and only if Y = (X), with X a preﬁx code. Proof. Let Y = (X). If X is a preﬁx code, then, as proved in [5], Y is a preﬁx code. Conversely, suppose that X is not a preﬁx code. Then there exist x1 , x2 ∈ X and ∈ A+ such that x1 = x2 . By Eq. (4), (x1 ) = (x2 ) = (x2 ) for a suitable ∈ A+ . Hence, Y is not a preﬁx code. We call pre-code of a preﬁx central code Y the preﬁx code X such that Y = (X). For instance, the pre-code of {a, bab, bb} is the preﬁx code {a, ba, bb} and the pre-code of the preﬁx central code {aba, bb, babab, babbab} is the preﬁx code {ab, bb, baa, bab}. The pre-code of the preﬁx central code {a n ba n | n 0} is the preﬁx code a ∗ b. For any X ⊆ A∗ and all n > 1 we set n (X) = ( n−1 (X)), where 1 (X) = (X). From Proposition 23 one derives the following: Corollary 24. If X is a preﬁx code, then for all n 1, n (X) is a preﬁx central code. Proposition 23 shows that the property of being a preﬁx code is preserved by and −1 . On the contrary, the property of being a code is not, in general, preserved by or −1 , as shown by the following example. Example 25. The set X = {ab, ba, abbb} is a code whereas the set (X) = {aba, bab, abababa} is not a code. Conversely, the set X = {a, ab, bab} is not a code whereas

(X) = {a, aba, babbab} is a code. Proposition 26. A central code Y is preﬁx if and only if (Y ) is an independent set. Proof. By Proposition 23, Y is preﬁx if and only if Y = (X), with X a preﬁx code. By Corollary 9, this occurs if and only if Fa(X) = (Y ) is an independent set. A preﬁx central code is a maximal preﬁx central code if it is not properly included in another preﬁx central code. Proposition 27. A preﬁx central code X is a maximal preﬁx central code if and only if for all w ∈ PER, wA∗ ∩ XA∗ = ∅. Proof. If there exists w ∈ PER such that wA∗ ∩ XA∗ = ∅, then X ∪ {w} is a preﬁx central code properly containing X, so that X is not a maximal preﬁx central code.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

233

If X is not a maximal preﬁx central code, there exists at least one word w ∈ PER such that w is not a preﬁx of any word of X and no word of X is a preﬁx of w. This implies that wA∗ ∩ XA∗ = ∅. Proposition 28. A preﬁx central code is a maximal preﬁx central code if and only if its pre-code is a maximal preﬁx code. Proof. Let Y be a maximal preﬁx central code and X be its pre-code. By Proposition 23, X is a preﬁx code. Suppose that X is properly included in a preﬁx code X over A. Since is a bijection, Y = (X) ⊂ (X ). By Proposition 23, (X ) is a preﬁx central code which properly contains Y, which contradicts the maximality of Y as preﬁx central code. Conversely, suppose that the pre-code X of the preﬁx central code Y is a maximal preﬁx code. If Y is properly included in another preﬁx central code Y one would have X ⊂

−1 (Y ). By Proposition 23, −1 (Y ) is a preﬁx code, so that we reach a contradiction with the maximality of X. Proposition 29. A central code Y is a maximal preﬁx central code if and only if (Y ) is an independent and full set. Proof. By Propositions 23 and 28, Y is a maximal preﬁx central code if and only if Y =

(X), with X a maximal preﬁx code. By Corollaries 9 and 10, this occurs if and only if Fa(X) = (Y ) is an independent and full set. Remark 30. We observe that a maximal preﬁx central code X = A is not maximal as preﬁx code. Indeed, as is well known, any maximal preﬁx code is right-complete, i.e., for any w ∈ A∗ , wA∗ ∩ X ∗ = ∅, whereas by Proposition 22 a preﬁx central code X = A is not even complete. By Corollary 18, a ﬁnite maximal preﬁx central code X = A cannot be maximal as central code. More generally, we shall see (cf. Corollary 32) that any non-trivial maximal central code cannot be preﬁx. Proposition 31. Let X = A be a preﬁx central code. There exists w ∈ PER such that wA∗ ∩ X ∗ = ∅. Proof. Let x ∈ X. Without loss of generality, we may suppose that the ﬁrst letter of x is a. There exists a word u ∈ A∗ such that y = xbaau ∈ PER. Indeed, by Eq. (1), xba is a ﬁnite standard Sturmian word so that z = xbaxba is a preﬁx of a standard Sturmian word; since xbaa is a preﬁx of z, it is a preﬁx of a word of PER. If yA∗ ∩ X ∗ = ∅, the statement is proved. Let us then suppose that yA∗ ∩ X ∗ = ∅. Thus there exists v ∈ A∗ such that yv = xbaauv ∈ X∗ . Since X is a preﬁx code, one has baauv ∈ X ∗ and by Lemma 13, b ∈ X. Now, let us consider the word bbabb = (bba) ∈ PER. If bbabbA∗ ∩ X ∗ = ∅, the statement is proved. Suppose that bbabbA∗ ∩ X ∗ = ∅. Since b ∈ X and X is a preﬁx code, it follows that abbA∗ ∩ X ∗ = ∅. By Lemma 13 one obtains a ∈ X, i.e., X = A, which is a contradiction. By Lemma 2 and Proposition 31 one derives the following:

234

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

Corollary 32. A preﬁx central code X = A is not a maximal central code. 6. Farey codes For any positive integer n, we consider the set p Fn = ∈ I 1p q n . q As is well known, by ordering the elements of Fn in an increasing way, one obtains the Farey series of order n (cf. [7]). Now, set p Gn = ∈ Fn+1 p + q − 2 n q and

n,a = {s ∈ PERa | (s) ∈ Gn } ,

n,b = {s ∈ PERb | (s)−1 ∈ Gn } .

The set n = n,a ∪ n,b is a preﬁx central code [5] called the Farey code of order n. The words of n,b are obtained from those of n,a by interchanging the letter a with b. The pre-codes of n,a , n,b , and n will be respectively denoted by Fn,a , Fn,b , and Fn . The preﬁx code Fn = Fn,a ∪ Fn,b will be called the Farey pre-code of order n. Example 33. In the following table, we report the elements of G6 with the corresponding words of the preﬁx code 6,a and their lengths. In the last column, are reported the elements of the pre-code F6,a . 1 6 aaaaaa 7 aaaaaa 2 7 3 7 4 7 3 5 5 7 4 5 5 6 6 7

abababa

7 abbb

aabaabaa

8 aabb

aabaaabaa

9 aaba

abaaba

6 aba

ababaababa 10 abba aaabaaa

7 aaab

aaaabaaaa

9 aaaab

aaaaabaaaaa 11 aaaaab

Some interesting properties of Farey codes have been proved in [5]. We limit ourselves to recall that for all n > 0, Card n =

n+1 i=1

(i) ,

where is the totient Euler’s function.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

235

Proposition 34. For all n 1, the Farey code of order n is a maximal preﬁx central code. Proof. We shall prove that the set (n ) is independent and full, so that the result will follow from Proposition 29. One has that (n ) = {p/q | p/q ∈ Gn or q/p ∈ Gn }. First, we prove the independence. Let p/q and r/s be distinct elements of (n ) such that p/q ⇒∗ r/s. We suppose p < q (the case where p > q is similarly dealt with). There exists a sequence of irreducible fractions pi /qi , i = 1, . . . , m, such that pm r p1 p ⇒ ··· ⇒ = . ⇒ q q1 qm s Hence, q1 = p + q n + 2, so that s = qm q1 n + 2 and r < s. This contradicts the assumption that r/s ∈ (n ). Now, we prove the fullness of (n ). Let r/s be an element of I. We suppose r < s (the cases where r > s or r = s = 1 are similarly dealt with). First we consider the case that s n + 2. There exists a sequence of irreducible fractions pi /qi , i = 1, . . . , m, such that pm r p1 1 ⇒ ··· ⇒ = . ⇒ 1 q1 qm s Let k be the minimal integer such that qk n + 2. One has qk−1 n + 1 and pk−1 + qk−1 = qk n + 2, so that pk−1 /qk−1 ∈ Gn and pk−1 /qk−1 ⇒∗ r/s. Now, we consider the case that s < n + 2. Let k be the minimal integer such that kr + s n + 2. One has (k − 1)r + s n + 1 so that r/((k − 1)r + s) ∈ Gn and r/s ⇒∗ r/((k − 1)r + s). As a consequence of Proposition 23 one has: Proposition 35. For all n 1, the Farey pre-code of order n is a maximal preﬁx code. The following proposition gives an equivalent deﬁnition for Farey codes. Proposition 36. For any n > 0 one has

n = {w ∈ PER | n |w| n + w − 1} . Proof. First we suppose w ∈ PERa and set (w) = p/q, so that p = w and q = |w| − w + 2. One has w ∈ n,a if and only if p/q ∈ Fn+1 and p + q − 2 = |w| n. Since p/q ∈ Fn+1 if and only if q = |w| − w + 2 n + 1, one derives that w ∈ n,a if and only if n |w| n + w − 1. If w ∈ PERb , by a similar argument one obtains that w ∈ n,b if and only if n |w| n+ w − 1. From this the assertion follows. From Proposition 36 one derives immediately that for all n > 0,

n+1 \ n = {w ∈ PER | |w| = n + w }

(9)

236

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

and

n \ n+1 = Un ,

(10)

where Un = PER ∩ An . The following proposition shows a relation between Farey codes of consecutive orders. Proposition 37. For any n > 0 one has

n+1 = (n \ Un ) ∪ (AUn )(−) . Proof. From Eqs. (9) and (10) one derives

n+1 = (n \ Un ) ∪ {w ∈ PER | |w| = n + w } . Thus it is sufﬁcient to prove that (AUn )(−) = {w ∈ PER | |w| = n + w } .

(11)

Let us suppose w = (xv)(−) , with x ∈ A and v ∈ Un . Then w ∈ PER and by Eq. (3), |w| = w + n. This proves the inclusion “⊆”. Conversely, suppose that w ∈ PER and |w| = n + w . Let u ∈ PER and x ∈ A be such that w = (xu)(−) . Since by Eq. (3), |w| = |u| + w , one derives |u| = n so that u ∈ Un and w ∈ (AUn )(−) . This proves the inclusion “⊇”. Example 38. Consider the case n = 5. One has

5,a = {a 5 , ababa, aba 2 ba, a 2 ba 2 , a 3 ba 3 , a 4 ba 4 } and U5,a = U5 ∩ aA∗ = {a 5 , ababa, a 2 ba 2 } . Moreover, (AU5,a )(−) = {a 6 , a 5 ba 5 , ababa 2 baba, abababa, a 2 ba 3 ba 2 , a 2 ba 2 ba 2 } . The set 6,a is given in Example 33. As one easily veriﬁes, 6,a = (5,a \ U5,a ) ∪ (AU5,a )(−) . In a similar way, setting U5,b = U5 ∩ bA∗ one obtains 6,b = (5,b \ U5,b ) ∪ (AU5,b )(−) so that 6 = (5 \ U5 ) ∪ (AU5 )(−) . 7. Uniform central codes Let n be a positive integer. A central code X is uniform of order n if X ⊆ An . In this case X ⊆ Un so that Un is the maximal uniform central code of order n. As is well known [6], for any n, Card Un = (n + 2).

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

237

For instance, one has U5 = {aaaaa, aabaa, ababa, babab, bbabb, bbbbb} , U7 = {aaaaaaa, aaabaaa, abababa, bababab, bbbabbb, bbbbbbb} . From Eqs. (9) and (11) one derives the following noteworthy relation between maximal uniform codes and Farey codes:

n+1 \ n = (AUn )(−)

for all n > 0 .

Proposition 39. Let n > 0 and 0 k n. There exists a (unique) word w ∈ Un such that |w|a = k if and only if gcd(n + 2, k + 1) = 1. Proof. We recall that the slope of a central word is a bijection of PER onto I. Thus, if w ∈ Un and |w|a = k, then

(w) =

n−k+1 |w|b + 1 = |w|a + 1 k+1

(12)

with gcd(n − k + 1, k + 1) = gcd(n + 2, k + 1) = 1. Conversely, if gcd(n − k + 1, k + 1) = gcd(n + 2, k + 1) = 1, then, since is a bijection, there exists a word w ∈ PER satisfying Eq. (12), so that |w| = n and |w|a = k. From the previous proposition, one derives the following: Corollary 40. There exists a (unique) word w ∈ Un such that |w|a = k for all k, 0 k n if and only if n + 2 is a prime. Example 41. In the case n = 7, the set of numbers which are coprime with 9 is {1, 2, 4, 5, 7, 8}. Hence, for w ∈ U7 we have |w|a ∈ {0, 1, 3, 4, 6, 7}. In the case n = 5, since n+2 = 7 is prime, {|w|a | w ∈ U5 } = {0, 1, 2, 3, 4, 5}. Appendix

Proof of Proposition 19. One easily veriﬁes that D = (ab∗ a ∗ ∪ ba ∗ b∗ ∪ ) .

(A.1)

From the rational identity A∗ \ (ab∗ a ∗ ∪ ba ∗ b∗ ∪ ) = ab∗ a + bA∗ ∪ ba ∗ b+ aA∗ one derives X = (ab∗ a + bA∗ ∪ ba ∗ b+ aA∗ ) . Let us prove that X is a code. By contradiction, suppose that one has x1 · · · xm = x1 · · · xn ,

|x1 | < |x1 |

(A.2)

with x1 , . . . , xm , x1 , . . . , xn ∈ X, m, n > 0. One has m 2. Moreover we may suppose without loss of generality that x2 ∈ PERa . Thus, x2 has a preﬁx

(abi a j b) = (a(ba)i )j +1 ba(a(ba)i )j ,

i 0, j 1 .

238

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

From Eq. (A.2) one derives x1 (a(ba)i )j +1 baa = x1 · · · xn ,

(A.3)

for a suitable ∈ A∗ . Hence, x1 has the preﬁx x1 a. By Lemma 6, x1 has the preﬁx (ax1 )(−) . By Lemma 5, (ax1 )(−) has the form x1 abs = sbax1 , with s ∈ PER. Now let y be the longest preﬁx of x1 of the form y = x1 abz = zbax1

with z ∈ PER .

(A.4)

By Proposition 3, y ∈ PER. We set x1 = y with ∈ A∗ . By Eq. (A.3) one gets i > 0 and a(ba)i−1 (a(ba)i )j baa = zx2 · · · xn .

(A.5)

Since z is palindrome one has to consider the following cases: Case 1: z = . By Eq. (A.4) one has y = x1 ab = bax1 so that x1 = (ba)p b, p 0. Thus x1 ∈ D which is a contradiction. Case 2: z = a(ba)h , 0 h i − 1. By Eq. (A.4) one has y = x1 a(ba)h+1 = a(ba)h+1 x1 so that x1 = (a(ba)h+1 )p , p 0. Thus x1 ∈ D which is a contradiction. Case 3: z = a(ba)i−1 (a(ba)i )k a(ba)i−1 , 0 k j − 1. By Eq. (A.5) one gets ba(a(ba)i )j −k−1 baa = x2 · · · xn .

(A.6)

If = , then the ﬁrst letter of is b. Thus, x1 = y has the preﬁx yb and consequently the preﬁx (by)(−) . By Lemma 5 it follows that (by)(−) = (bx1 abz)(−) = x1 abzbax1 = x1 aby = ybax1 . This contradicts the maximality of y. If = , by Eq. (A.6) one derives that x2 has the preﬁx baa or babaa (according to whether k < j − 1 or k = j − 1). This is a contradiction as by Corollary 4 no central word has such preﬁxes. Case 4: z has the preﬁx a(ba)i−1 (a(ba)i )j b. Set u = a(ba)i−1 (a(ba)i )j −1 a(ba)i−1 ∈ PER. Since ub is a preﬁx of z, also (bu)(−) should be a preﬁx of z. By Lemma 5 one has (bu)(−) = a(ba)i−1 (a(ba)i )j a(ba)i−1 . Thus, z has the preﬁx a(ba)i−1 (a(ba)i )j a, which is a contradiction. This proves that X is a central code. To prove that X is a maximal central code one has to show that for all y ∈ D, X ∪ {y} is not a central code. In view of Eq. (A.1) it is sufﬁcient to consider the case that y = (abi a j ), with i, j 0 (the case y = (ba i bj ) is similarly dealt with). One easily checks that in this case y (abi ab)y = (abi a j +2 b) which proves the assertion since (abi ab), (abi a j +2 b) ∈ X. References [1] [2] [3] [4]

J.-P. Allouche, J. Shallit, Automatic Sequences, Cambridge University Press, Cambridge, UK, 2003. J. Berstel, A. de Luca, Sturmian words, Lyndon words and trees, Theoret. Comput. Sci. 178 (1997) 171–203. J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. J. Berstel, P. Séébold, Sturmian words, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, UK, 2002, pp. 45–110. [5] A. de Luca, Sturmian words: structure, combinatorics, and their arithmetics, Theoret. Comput. Sci. 183 (1997) 45–82. [6] A. de Luca, F. Mignosi, On some combinatorial properties of Sturmian words, Theoret. Comput. Sci. 136 (1994) 361–385.

A. Carpi, A. de Luca / Theoretical Computer Science 340 (2005) 220 – 239

239

[7] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, Clarendon, Oxford University Press, Oxford, UK, 1968. [8] L. Ilie, W. Plandowski, Two-variable word equations, Theoret. Inform. Appl. 34 (2000) 467–501. [9] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983; M. Lothaire, Combinatorics on Words, second ed., Cambridge University Press, Cambridge, UK, 1997. [10] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72 (1990) 55–64.

Theoretical Computer Science 340 (2005) 240 – 256 www.elsevier.com/locate/tcs

An enhanced property of factorizing codes Clelia De Felice1 Dipartimento di Informatica e Applicazioni, Università di Salerno, 84081 Baronissi (SA), Italy

Abstract The investigation of the factorizing codes C, i.e., codes satisfying Schützenberger’s factorization conjecture, has been carried out from different viewpoints, one of them being the description of structural properties of the words in C. In this framework, we can now improve an already published result. More precisely, given a factorizing code C over a two-letter alphabet A = {a, b}, it was proved by De Felice that the words in the set C1 = C ∩ a ∗ ba ∗ could be arranged over a matrix related to special factorizations of the cyclic groups. We now prove that, in addition, these matrices can be recursively constructed starting with those corresponding to preﬁx/sufﬁx codes. © 2005 Elsevier B.V. All rights reserved. Keywords: Variable length codes; Formal languages; Factorizations of cyclic groups

1. Introduction In this paper, a subset C of a free monoid A∗ is a (variable-length) code if each word in A∗ has at most one factorization into words of C, i.e., C is the base of a free submonoid of A∗ [1]. This algebraic approach was initiated by Schützenberger in [24] and subsequently developed mainly by his school. The theory of codes is rich in signiﬁcant results, which have been obtained by using several different methods (combinatorial, probabilistic, algebraic) and tools from automata, formal power series and semigroup theory.

E-mail address: [email protected]. 1 Partially supported by MIUR Project “Linguaggi Formali e Automi: Metodi, Modelli e Applicazioni” (2003)

and by 60 % Project “Linguaggi formali e codici: modelli e caratterizzazioni strutturali” (University of Salerno, 2004). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.022

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

241

Nevertheless some basic problems are still open. One of the most difﬁcult of these, the factorization conjecture, was proposed by Schützenberger as follows: given a ﬁnite maximal code C, there would be ﬁnite subsets P, S of A∗ such that C − 1 = P (A − 1)S, with X denoting the characteristic polynomial of X [1,2,5]. We refer to Section 2 for all the known results concerning this conjecture. Any code C which satisﬁes the above equality is ﬁnite, maximal, and is called a factorizing code, whereas a ﬁnite maximal code is a maximal object in the class of ﬁnite codes for the order of set inclusion. For example, ﬁnite bipreﬁx maximal codes are factorizing [1]. This note deals with the investigation of the class of the factorizing codes C. This research line has been carried out from different viewpoints, one of them being the description of structural properties of the words in C. Continuing the investigation initiated in [9], here we enhance a property of the sets C1 ⊆ a ∗ ba ∗ such that C1 = C ∩ a ∗ ba ∗ for a factorizing code C over a two-letter alphabet A = {a, b}. Precisely, we already know that C1 satisﬁes the property reported below: Property 1.1. The words in C1 can be arranged over a matrix C1 = (a rp,q ba vp,q )1 p m,1 q such that for each row Rp = {rp,q | q ∈ {1, . . . , }} and each column Tq = {vp,q | p ∈ {1, . . . , m}} in this matrix, (Rp , Tq ) is a Hajós factorization of Zn . We recall that a pair (R, T ) of subsets of N is a factorization of Zn if for each z ∈ {0, . . . , n − 1} there exists a unique pair (r, t), with r ∈ R and t ∈ T , such that r + t = z (mod n). The general structure of the pairs (R, T ) is still unknown but two simple families of these pairs can be recursively constructed: Krasner factorizations and Hajós factorizations (see Section 4 for precise deﬁnitions). The latter factorizations seem to have an important role in the description of the structure of factorizing codes (see [6–9]). In this paper we prove, that for each factorizing code C, an arrangement of C1 = C ∩ a ∗ ba ∗ satisfying Property 1.1 can be recursively constructed by a natural two-dimensional generalization of Hajós method. This improved version of the result given in [9] is interesting in its own right but it has additional appeal since, as conjectured in [12], given a set C1 satisfying this property, there would exist a factorizing code C such that C1 = C ∩ a ∗ ba ∗ . As we have already said, we take into account codes over a two-letter alphabet but, as done in [9], extending the results presented here to alphabets of larger size should not be difﬁcult. This paper is organized as follows. Section 2 contains all the basic deﬁnitions and results concerning codes. Section 3 summarizes the contents of the subsequent sections and outlines the main result. In Section 4 we have gathered basics on the factorizations of cyclic groups and in Sections 5, 6 we have collected intermediate results, subsequently used in Section 7 to show the above-mentioned property of the factorizing codes.

2. Basics Given a ﬁnite alphabet A, let A∗ be the free monoid generated by it. We denote by 1 the empty word and we set A+ = A∗ \ 1.

242

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

A subset C of A∗ is a code if C ∗ is a free submonoid of A∗ of base C. In other words, C is a code if, for any c1 , . . . , ch , c1 , . . . , ck ∈ C, we have: c1 , . . . , ch = c1 , . . . , ck ⇒ h = k,

∀i ∈ {1, . . . , h},

ci = ci .

Examples of codes can be easily constructed by considering, for instance, the class of the preﬁx codes, C being preﬁx if C ∩ CA+ = ∅. A more complex class is that of maximal codes. A code C is maximal over A if C is not a proper subset of another code over A. As one of Schützenberger’s basic theorems shows, a ﬁnite code C is maximal if and only if C is complete, that is C ∗ ∩ A∗ wA∗ = ∅, for any w ∈ A∗ [1]. The class of codes which we consider in this paper is that of the factorizing codes, introduced by Schützenberger. The deﬁnition of such codes is given in terms of polynomials. Here, we denote ZA the ring of the noncommutative polynomials in variables A and coefﬁcients in the ring Z of the integers and NA the semiring of the noncommutative polynomials in variables A and coefﬁcients in the semiring N of the nonnegative integers [2]. P 0 means P ∈ NA. As usual, the value of P ∈ ZA on w ∈ A∗ is denoted by (P , w) and is referred to as the coefﬁcient of w in P. The characteristic polynomial of a ﬁnite language X ⊆ A∗ , denoted X, is the polynomial X = x∈X x. Henceforth, we will at times identify X with its characteristic polynomial even if this is not stated explicitly. A (ﬁnite) code C over A is factorizing if there exist two ﬁnite subsets P, S of A∗ such that: C − 1 = P (A − 1)S.

(1)

For instance, a ﬁnite maximal preﬁx code C is factorizing, by taking S = {1} and P equal to the set of the proper preﬁxes of the words in C [1]. If C is a factorizing code then C is a ﬁnite maximal code [1]. However it is not known whether every ﬁnite maximal code is factorizing. This problem is known as the factorization conjecture [1,2,25]. Conjecture 2.1. (Schützenberger). Any ﬁnite maximal code is factorizing. Some partial results are known and are mentioned below. The ﬁrst examples of families of factorizing codes can be found in [3,4]. Subsequently, Reutenauer obtained the result that was closest to a solution of the conjecture [2,21,22]. He proved that Eq. (1) holds for any ﬁnite maximal code C if we substitute P , S with P , S ∈ ZA. Results concerning problems which are closely connected to the factorization conjecture can be found in [17,18]. Another class of results has been obtained by considering ﬁnite maximal codes over a two-letter alphabet A = {a, b} having a constraint on the number of the occurrences of the letter b in each word. More precisely, consider a ﬁnite maximal code C over A such that each word in C has a maximum of m occurrences of the letter b. C is also named an m-code. If m is less than or equal to three, then C is factorizing [7,13,19]. Moreover, C is also factorizing if bm ∈ C and m is a prime number or m = 4 [26]. For m 3, the structure of the m-codes has also been described and is related to the solutions to some inequalities which are, in turn, related to the factorizations of the cyclic groups [6,7,19]. Furthermore, other results which relate words in a ﬁnite maximal code to the factorizations of the cyclic groups can be found in [16,20].

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

243

3. Outline of the results The aim of this paper is to prove that, for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C having one occurrence of b s satisfy a special property. In Section 1, we introduced factorizations of cyclic groups. A special class of these is the so-called Hajós factorizations. There exist at least two recursive deﬁnitions of this class of factorizations and they are recalled in Section 4. In this note we will introduce a two-dimensional extension of Hajós factorizations such that they still admit a recursive construction. More precisely, we consider a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations and we consider matrices with integer entries in which each row is one of the Rj ’s. Obviously, several matrices exist with this property but one of them exists, named good arrangement of R1 , . . . , Rm , which can be obtained starting with simpler good arrangements and by a natural two-dimensional extension of Hajós method (Section 5). Finally, we introduce the crossed two-dimensional Hajós factorizations. Namely, given a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations, we consider matrices having pairs (r, v) of integers as elements and r ∈ Rj , v ∈ Ti . We focus our attention on arrangements such that a recursive algorithm exists constructing them. Once again these are called good arrangements (Section 7). We prove that, for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C with one occurrence of b s, can be canonically associated with a matrix which is a good arrangement of a crossed two-dimensional Hajós factorization.

4. Hajós factorizations and their recursive constructions In [14], Hajós gave a method, slightly corrected later by Sands in [23], for the construction of a class of factorizations of an abelian group (G, +) which are of special interest in the construction of factorizing codes. As done in [8], we report this method for the cyclic group Zn of order n (Deﬁnition 4.1). The corresponding factorizations will be named Hajós factorizations. The operation ◦ also intervenes: for subsets S = {s1 , . . . , sq }, T of Zn , S ◦ T denotes the family of subsets of Zn having the form {si + ti | i ∈ {1, . . . , q}}, where {t1 , . . . , tq } is any multiset of elements of T having the same cardinality as S. Furthermore, it is convenient to translate the deﬁnitions in a polynomial form. For a polynomial in Na, the notation a H = n∈N (H, n)a n will be used with H ∈ N1, i.e., with H being a ﬁnite multiset of nonnegative integers. Therefore, if H1 , H2 , . . . , Hk ∈ N1, the expression a H1 ba H2 , . . . , a Hk is a notation for the product of the formal power series a H1 , b, a H2 , . . . , a Hk . For instance, a {2,3} ba {1,5} = a 2 ba + a 2 ba 5 + a 3 ba + a 3 ba 5 . Computation rules are also deﬁned: a M+L = a M a L , a M∪L = a M + a L , a M◦L = a M ◦ a L , a ∅ = 0, a 0 = 1. Finally, let X1 , X2 ⊆ N, let n ∈ N. The equation X1 = X2 (mod n) means that for each x1 ∈ X1 a unique x2 ∈ X2 exists with x1 = x2 (mod n) and for each x2 ∈ X2 a unique x1 ∈ X1 exists with x1 = x2 (mod n). Deﬁnition 4.1. Let R, T be subsets of N. (R, T ) is a Hajós factorization of Zn if and only if there exists a chain of divisors of n: k0 = 1 | k1 | k2 | . . . | ks = n,

(2)

244

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

such that: aR ∈ T

a ∈

a − 1 a k1 − 1 ◦ a−1 a−1 a − 1 a k1 − 1 · a−1 a−1

·

a k2 − 1 a k1 − 1

a k2 − 1 ◦ k a 1 −1

an − 1 , a ks−1 − 1

(3)

an − 1 · ... ◦ ... k . a s−1 − 1

(4)

◦ ... · ...

Furthermore we have R, T ⊆ {0, . . . , n − 1}. Observing the deﬁnition of the Hajós factorizations we can obtain a recursive construction of them with ease. This recursive construction, which will be widely used in this paper, was ﬁrst given in [16] as a direct result, then it was proved in [11] for the sake of completeness, and now it is illustrated in Proposition 4.1. Proposition 4.1 (Lam [16]). Let R, T ⊆ {0, . . . , n−1} and suppose that (R, T ) is a Hajós factorization of Zn with respect to the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n. Then either (R, T ) = (R1 , T1 ) or (R, T ) = (T1 , R1 ), where (R1 , T1 ) satisﬁes one of the two following conditions: (1) There exists t ∈ {0, . . . , n−1} such that R1 = {0, . . . , n−1} and T1 = {t}. Furthermore, s = 1. (2) R1 = R (1) + {0, 1, . . . , g − 1}h, T1 = T (1) ◦ {0, 1, . . . , g − 1}h, (R (1) , T (1) ) being a Hajós factorization of Zh , g, h ∈ N, n = gh, R (1) , T (1) ⊆ {0, . . . , h − 1}. The chain of divisors deﬁning (R (1) , T (1) ) is k0 = 1 | k1 | k2 | . . . | ks−1 = h. Theorem 4.1 is one of the results which allow us to link factorizing codes and Hajós factorizations of Zn . In Theorem 4.1 a crucial role is played by particular factorizations deﬁned as follows. Starting with the chain of divisors of n in Eq. (2), let us consider the two polynomials a I and a J deﬁned by: aI =

(a kj − 1) , kj −1 − 1) (a j even,1 j s

aJ =

(a kj − 1) . kj −1 − 1) (a j odd,1 j s

(5)

The two polynomials above have been considered by Krasner and Ranulac in [15] and are the simplest examples of Hajós factorizations of Zn . In the same paper they proved that a pair (I, J ) satisﬁes Eqs. (5) if and only if (I, J ) satisﬁes the following property: for any z ∈ {0, . . . , n − 1} there exists a unique (i, j ), with i ∈ I and j ∈ J , such that i + j = z, i.e., a I a J = (a n − 1)/(a − 1). (I, J ) is called a Krasner factorization. Theorem 4.1 (De Felice [8]). For R, T ⊆ {0, . . . , n − 1} the following conditions are equivalent: (1) (R, T ) is a Hajós factorization of Zn . (2) There exists a Krasner factorization (I, J ) of Zn such that (I, T ), (R, J ) are (Hajós) factorizations of Zn .

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

245

(3) There exist L, M ⊆ N and a Krasner factorization (I, J ) of Zn such that: a R = a I (1 + a M (a − 1)),

a T = a J (1 + a L (a − 1)).

(6)

Furthermore, (2) ⇔ (3) also holds for R, T ⊆ N. As stated in Theorem 4.1, the equivalence between conditions (2) and (3) still holds under the more general hypothesis that R, T are arbitrary subsets of N (not necessarily with max R < n, max T < n). In order to keep this general framework, in the next part of this paper, for R, T ⊆ N, we will say that (R, T ) is a Hajós factorization of Zn if (R(n) , T(n) ) satisﬁes the conditions contained in Deﬁnition 4.1 where, for a subset X of N and n ∈ N, we denote X(n) the subset of {0, . . . , n − 1} such that X(n) = X (mod n). This is equivalent, as Lemma 4.1 below shows, to deﬁning Hajós factorizations of Zn as those pairs satisfying Eqs. (6). The recursive construction of the solutions of Eqs. (6), given in [6] allowed us to obtain another recursive construction of the Hajós factorizations, given in [8]. Lemma 4.1 (De Felice [10]). Let (I, J ) be a Krasner factorization of Zn . Let R, R , M be subsets of N such that a R = a I (1 + a M (a − 1)) and a R = a R(n) . Then, M ⊆ N exists such that a R = a I (1 + a M (a − 1)) and I + max M + 1 ⊆ {0, . . . , n − 1}. Furthermore, if we set R = {r1 , . . . , rq }, R = {r1 + 1 n, . . . , rq + q n}, for 1 , . . . , q 0, and if we set a H = a r1 +{0,n,...,(1 −1)n} + · · · + a rq +{0,n,...,(q −1)n} then we have a disjoint union M = M ∪ M with M ⊆ N, a M = a J a H and a R = a R + a I (a − 1)a M . It is worthy of note that there is a relationship between Krasner factorizations and Hajós factorizations which goes beyond the observation that the former are simple examples of the latter. Firstly, Theorem 4.1 points out that, for each Hajós factorization (R, T ), we can associate a Krasner factorization (I, J ) with (R, T ), called a Krasner companion factorization of (R, T ) in [16]. Secondly, given a Hajós factorization (R, T ) of Zn such that (R(n) , T(n) ) is deﬁned by Eqs. (3), (4), a Krasner companion factorization (I, J ) is naturally associated with (R, T ): in order to get (I, J ) we have to erase from Eq. (3) polynomials Pj = (a kj − 1)/(a kj −1 − 1) with j odd, and from Eq. (4) polynomials Pj with j even [8]. (I, J ) will be called the Krasner companion factorization of (R, T ) with respect to the chain of divisors of n given in Eq. (2). Proposition 4.2 shows how these two notions are related to each other. Proposition 4.2. Each Krasner companion factorization (I, J ) of (R, T ) is a Krasner companion factorization of (R, T ) with respect to a chain of divisors of n which deﬁnes (R, T ). Proof. Let (I, J ) be a Krasner companion factorization of (R, T ), i.e., suppose that (R, T ) satisﬁes Eqs. (6). Since (I, J ) is also a Krasner companion factorization of (R(n) , T(n) ), we can suppose that R, T ⊆ {0, . . . , n − 1}. We prove the statement by induction on the length s of the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n which deﬁnes (I, J ). If s = 1 and (I, J ) = ({0}, {0, . . . , n − 1}) then a R = 1 + a M (a − 1), a T = [(a n − 1)/(a − 1)] + a L (a n − 1), and (R, T ) satisﬁes condition (1) in Proposition 4.1 (see also

246

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

[6]). Thus, (I, J ) is a Krasner companion factorization of (R, T ) with respect to the chain k0 = 1 | k1 = n which deﬁnes (R, T ). Suppose s > 1. By using Eqs. (5), there exist g, h ∈ N such that n = gh and I = I (1) + {0, 1, . . . , g − 1}h, J = J (1) , (I (1) , J (1) ) being a Krasner factorization of Zh deﬁned by k0 = 1 | k1 | k2 | . . . | ks−1 = h. Since R ⊆ {0, . . . , n−1} and a R = a I (1+a M (a −1)) 0, we have max I + max M + 1 < n which implies max I1 + max M + 1 < h. Thus, for each (1) (1) t ∈ N, we have (a I (1 + a M (a − 1)), a t ) = 0 if t h, otherwise (a I (1 + a M (a − 1)), (1) (1) a t ) = (a I (1 + a M (a − 1)), a t ). Consequently, we have a R = a I (1 + a M (a − 1)) 0, (1) R = R + {0, 1, . . . , g − 1}h (see also [10,12]). In addition, by using Lemma 4.1 we (1) also have a T(h) = a J (1 + a L (a − 1)) 0. Thus, by Theorem 4.1, (a R , a T(h) ) is a Hajós factorization of Zh having (I (1) , J ) as a Krasner companion factorization, where (I (1) , J ) is deﬁned by the chain k0 = 1 | k1 | k2 | . . . | ks−1 . By induction hypothesis, (I (1) , J ) is a (1) Krasner companion factorization of (a R , a T(h) ) with respect to this chain which deﬁnes (1) (a R , a T(h) ). Since R = R (1) + {0, 1, . . . , g − 1}h, T = T(h) ◦ {0, 1, . . . , g − 1}h, we conclude that (I, J ) is a Krasner companion factorization of (R, T ) with respect to the chain k0 = 1 | k1 | k2 | . . . | ks = n of divisors of n which deﬁnes (R, T ). Let us consider Hajós factorizations (R1 , T1 ), . . . , (Rm , Tm ) having the same Krasner companion factorization (I, J ). In the next part of this paper, all the elements denoted by the same symbol R with different indices will refer to the same element in the Krasner pair, i.e., the statement “(R1 , T1 ), . . . , (Rm , Tm ) have (I, J ) as Krasner companion factorization” will mean (Ri , J ) and (I, Ti ) are factorizations of Zn , i ∈ {1, . . . , m}. Furthermore, by using Proposition 4.2, we can conclude that (R1 , T1 ), . . . , (Rm , Tm ) can be deﬁned by the same chain of divisors and have the same Krasner companion factorization (I, J ) with respect to this chain of divisors. 5. Two-dimensional Hajós factorizations In the next part of this paper, matrices with entries in A∗ or in N will also be considered and A = (ap,q )1 p m, 1 q will be an alternative notation for the matrix of size m × : a ... a  1,1

 a2,1 A=  .. . am,1

... .. . ...

1,

a2,  ..  . .

am,

Given a matrix A = (ap,q )1 p m, 1 q with entries in N and a positive integer n, n 2, ) we denote A(n) = (ap,q 1 p m, 1 q , where, for each p, q, 1 p m, 1 q , we n−1. We also denote h+A = (b have ap,q = ap,q (mod n), 0 ap,q p,q )1 p m, 1 q , where, for each p, q, 1 p m, 1 q , we have bp,q = h + ap,q and nA ∪ B = (ap,q )1 p m, 1 q 2 , where B = (ap,q )1 p m, +1 q 2 . Finally, i=1 Ai = A ) ∪ A . ( n−1 i n i=1 Given X, with X ⊆ A∗ (resp. X ⊆ N) an arrangement of X will be an arrangement of the elements of X in a matrix with entries in A∗ (resp. N) and size |X|. We now deﬁne

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

247

special arrangements of Hajós factorizations by a natural two-dimensional extension of Hajós method. Deﬁnition 5.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. An arrangement D = (rp,q )1 p m, 1 q l of m p=1 Rp having the Rp ’s as rows is a good arrangement of (R1 , . . . , Rm ) (with respect to the rows) if D can be recursively constructed using the following three rules. (1) D is a good arrangement of m p=1 Rp (with respect to the rows) if D(n) is a good m arrangement of p=1 (Rp )(n) (with respect to the rows). (2) Suppose that (Rp , Tp ) satisﬁes condition (1) in Proposition 4.1, for all p ∈ {1, . . . , m}. If Rp = {rp } with rp ∈ {0, . . . , n − 1}, then D is the matrix with only one column having rp as the pth entry. If Rp = {rp,0 , . . . , rp,n−1 } with rp,i = i, then D = (rp,j )1 p m, 0 j n−1 . (3) Suppose that (Rp , Tp ) satisﬁes condition (2) in Proposition 4.1, for all p ∈ {1, . . . , m}, (1) (1) i.e., either Rp = Rp + {0, h, . . . , (g − 1)h)} or Rp = Rp ◦ {0, h, . . . , (g − 1)h)}. Let g−1 (1) D(1) be a good arrangement of m p=1 Rp . In the ﬁrst case, we set D = k=0 (kh + D(1) ). In the second case, D is obtained by taking D(1) and then substituting in it each (1) (1) (1) rp,q ∈ Rp with the corresponding rp,q + p,q h ∈ Rp . Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. It goes without saying that we can consider arrangements of m p=1 Rp having the R ’s as columns and therefore, we can give a dual notion of a good arrangement p of m to the columns. This arrangement will be the transpose matrix of p=1 Rp with respect a good arrangement of m p=1 Rp with respect to the rows. Example 5.1. It is easy to see that ({0}, {0, 1}), ({1}, {1, 2}) and ({2}, {2, 3}) are Hajós factorizations of Z2 having ({0}, {0, 1}) as a Krasner companion factorization. According to Deﬁnition 5.1, D1 is a good arrangement whereas D2 is not a good arrangement, where we set: D1 =

0 2 2

1 1 , 3

D2 =

0 1 2

1 2 . 3

Indeed, (D1 )(2) satisﬁes condition (2) in Deﬁnition 5.1 whereas (D2 )(2) does not satisfy the same condition (2). As another example, ({0, 2, 4}, {0, 1, 6, 7}), ({0, 2, 4}, {1, 2, 8, 19}) and ({0, 2, 4}, {2, 3, 8, 9}) are Hajós factorizations of Z12 having ({0, 2, 4}, {0, 1, 6, 7}) as a Krasner companion factorization. According to Deﬁnition 5.1, D3 is a good arrangement whereas D4 is not a good arrangement, where we set: D3 =

0 2 2

1 1 3

6 8 8

7 19 , 9

D4 =

0 8 8

1 19 9

6 2 2

7 1 . 3

248

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

Indeed, (D3 )(12) satisﬁes condition (3) in Deﬁnition 5.1 since we have (D3 )(12) = 1k=0 (1) (1) (6k + (D3 )(12) ), where (D3 )(12) = D1 is a good arrangement. On the contrary, D4 is not a good arrangement since, in view of Proposition 5.1, there exists a unique good arrangement of {0, 1, 6, 7} ∪ {1, 2, 8, 19} ∪ {2, 3, 8, 9} and that is D3 . Note that for each column Wq = (r1,q , r2,q , r3,q ) of D3 , 1 q 4, an ordered sequence Iq = (i1,q , i2,q , i3,q ) of elements of {0, 2, 4} exists satisfying: r1,q + i1,q = r2,q + i2,q = r3,q + i3,q = nq (mod 12). Indeed, we have 0 + 2 = 2 + 0 = 2 + 0 = 2 (mod 12), 1 + 2 = 1 + 2 = 3 + 0 = 3 (mod 12), 6 + 2 = 8 + 0 = 8 + 0 = 8 (mod 12), 7 + 2 = 19 + 2 = 9 + 0 = 9 (mod 12). In Proposition 6.1, we will prove that each good arrangement satisﬁes this special property. Proposition 5.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as ma Krasner companion factorization. There exists a (unique) good arrangement D of p=1 Rp with respect to the rows (resp. columns). Proof. As we observed at the end of Section 4, if (R1 , T1 ), . . . , (Rm , Tm ) are Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization then (R1 , T1 ), . . . , (Rm , Tm ) can be deﬁned by a same chain of divisors of n of length s. Thus, (R1 , T1 ), . . . , (Rm , Tm ) satisfy the same condition contained in Proposition 4.1. The proof is by induction on s and we will prove the statement for good arrangements with respect to the rows (an analogous argument can be used for good arrangements with respect to the columns). Suppose Rp , Tp ⊆ {0, . . . , n−1} for p ∈ {0, . . . , n−1}. If s = 1, then (R1 , T1 ), . . . , (Rm , Tm ) satisfy condition (1) in Proposition 4.1 and (a unique) D exists which satisﬁes condition (2) in Deﬁnition 5.1. Thus, let s > 1. Hence, (R1 , T1 ), . . . , (Rm , Tm ) satisfy condition (2) in Proposition 4.1. Therefore, looking at condition (3) in Deﬁnition 5.1, (a unique) D exists since (a unique) D(1) exists by induction hypothesis. If Rp , Tp ⊆ {0, . . . , n − 1} for p ∈ {0, . . . , n − 1}, then ((Rp )(n) , (Tp )(n) ) are Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Thus, for the (R argument above, a (unique) good arrangement D(n) of m p=1 p )(n) exists. Hence, looking at condition (1) in Deﬁnition 5.1, (a unique) D also exists.

6. A property of good arrangements of Hajós factorizations In this section we will prove technical results concerning good arrangements of Hajós factorizations which will be subsequently used in the proof of Proposition 7.3. The argument used in the proof of Proposition 6.1 has also been used in the proof of another result stated in [9]. Nevertheless the complete proof of Proposition 6.1 is reported here for the sake of completeness. Proposition 6.1. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Let D = (rp,q )1 p m, 1 q be the good arrangement of m p=1 Rp with respect to the rows. Then, the two following conditions are

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

249

satisﬁed: (a) For each column Wq = (r1,q , . . . , rm,q ) of D, there exists an ordered sequence Jq = (j1,q , . . . , jm,q ) of elements of J satisfying r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q = nq (mod n).

(7)

(b) Suppose that Rp , Tq ⊆ {0, . . . , n − 1}. Thus, for each column Wq = (r1,q , . . . , rm,q ) of D, there exists an ordered sequence Jq = (j1,q , . . . , jm,q ) of elements of J satisfying r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q = nq .

(8)

The nq ’s are all different. Proof. Let (R1 , T1 ), . . . , (Rm , Tm ) be Hajós factorizations of Zn having (I, J ) as a Krasner companion factorization. Let D be a good arrangement of m p=1 Rp with respect to the rows. Let us demonstrate that the statement is proved if we prove condition (b). Indeed, suppose that m (Rp , Tp ) = ((Rp )(n) , (Tp )(n) ). Using condition (b), the good arrangement D(n) of p=1 (Rp )(n) satisﬁes Eq. (8). On the other hand, when we change in Eq. (8) the elements in a column Wq of D(n) with the elements in the corresponding column in D, according to condition (1) in Deﬁnition 5.1, the sum deﬁnes the same integer mod n and so Eq. (7) holds, i.e., D satisﬁes condition (a). We prove condition (b) by using induction on the length s of the common chain of divisors of n given in Eq. (2) and deﬁning (R1 , T1 ), . . . , (Rm , Tm ) (Deﬁnition 4.1). Let us ﬁrstly suppose that s = 1. Then, (Rp , Tp ) satisﬁes condition (1) in Proposition 4.1. If Rp = {rp,0 . . . , rp,n−1 } = {0, . . . , n − 1} then J = {0} and obviously D (deﬁned by condition (2) in Deﬁnition 5.1) satisﬁes Eq. (8). Otherwise we have Rp = {rp } ⊆ {0, . . . , n − 1}, J = {0, . . . , n − 1}. Set rmax = max{rp | 1 p m}. We obviously have: r1 + (rmax − r1 ) = r2 + (rmax − r2 ) = · · · = rm + (rmax − rm ), where rmax − rp ∈ {0, . . . , n − 1} = J . Thus, D (deﬁned by condition (2) in Deﬁnition 5.1) satisﬁes Eq. (8). Let us suppose that condition (b) holds for good arrangements of Hajós factorizations (Rp , Tp ) deﬁned by starting with a chain of divisors of length less than s > 1 and let k0 = 1 | k1 | k2 | . . . | ks = n be the chain of divisors of n associated with (Rp , Tp ). Thus, (Rp , Tp ) satisﬁes condition (2) in Proposition 4.1. Then we have I = {0}, J = {0}. Furthermore, (1) (1) either Rp = Rp + {0, h, . . . , (g − 1)h}, J = J (1) or Rp ∈ Rp ◦ {0, h, . . . , (g − 1)h}, (1) (1) J = J (1) + {0, h, . . . , (g − 1)h} with (Rp , Tp ) being a Hajós factorization of Zh having the Krasner companion factorization (I (1) , J (1) ), g > 1, n = gh, with respect to the chain k0 = 1 | k1 | k2 | . . . | ks−1 = h of divisors of h = ks−1 of length less than s and (1) (1) (1) (1) deﬁning (Rp , Tp ). Furthermore, Rp , Tp ⊆ {0, . . . , h − 1}. By induction hypothesis, m (1) (1) the good arrangement D of p=1 Rp satisﬁes condition (b). Thus, for each column (1)

(1)

(1)

(1)

Wq = (r1,q , . . . , rm,q ) of D(1) , an ordered sequence Jq of J (1) exists satisfying (1)

(1)

(1)

(1)

(1) (1) r1,q + j1,q = r2,q + j2,q = · · · = rm,q + jm,q .

(1)

(1)

= (j1,q , . . . , jm,q ) of elements (9)

250

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256 (1)

Firstly, we suppose Rp = Rp + {0, h, . . . , (g − 1)h}. Then, for each ∈ {0, . . . , g − 1}, in virtue of Eq. (9), we have: (1)

(1)

(1)

(1)

(1) (1) + h + jm,q . r1,q + h + j1,q = r2,q + h + j2,q = · · · = rm,q Looking at Deﬁnition 5.1, we see that the good arrangement D of m p=1 Rp (deﬁned by condition (3) in Deﬁnition 5.1) satisﬁes condition (b). (1) We now suppose Rp ∈ Rp ◦ {0, h, . . . , (g − 1)h}. Let p,q ∈ {0, . . . , g − 1} such that (1) rp,q + p,q h ∈ Rp . Thanks to Eq. (9), we have (1)

(1)

(r1,q + 1,q h) + (j1,q + (max,q − 1,q )h) (1) (1) = · · · = (rm,q + m,q h) + (jm,q + (max,q − m,q )h), (1)

where max,q = max{p,q | 1 p m}. As max,q − p,q ∈ {0, . . . , g − 1} then jp,q + (max,q − p,q )h ∈ J . Looking at Deﬁnition 5.1, we see that the good arrangement D of m p=1 Rp (deﬁned by condition (3) in Deﬁnition 5.1) satisﬁes condition (b). Finally, the nq ’s are all different since (Rp , J ) is a Hajós factorization of Zn (if nq = nq then we would have rp,q +jp,q = rp,q +jp,q (mod n) with jp,q , jp,q ∈ J , rp,q , rp,q ∈ Rp , rp,q = rp,q , a contradiction). Proposition 6.2. Let A = (zp,q )0 p m−1, 0 q n−1 , be a matrix of size m × n satisfying the following conditions: (1) For each p ∈ {0, . . . , m − 1}, we have Rp = {zp,q | 0 q n − 1} = {0, . . . , n − 1} (mod n). (2) For each q, q ∈ {0, . . . , n − 1}, for each p, p ∈ {0, . . . , m − 1}, we have zp,q = zp ,q (mod n) if and only if q = q . (3) There exists a Krasner factorization (I, J ) of Zn and Hajós factorizations (R0 , T0 ), . . . , (I, J ) as a Krasner companion factorization, such that A (Rm−1 , Tm−1 ) of Zn having is an arrangement of m−1 (R p + J ) with (zp,q )0 q n−1 = Rp + J , 0 p m − 1. p=0 Set = |I |. Then, for each q ∈ {0, . . . , − 1}, an ordered sequence Jq = (j0,q , . . . , jm−1,q ) of elements of J exists and = |I | columns (zp,nq )0 p m−1 in A also exist such that, for each p ∈ {0, . . . , m − 1} and q ∈ {0, . . . , − 1}, we have zp,nq = rp,q + jp,q , with D = (rp,q )0 p m−1, 0 q −1 being a good arrangement of m−1 p=0 Rp with respect to the rows. Proof. Let D = (rp,q )0 p m−1, 0 q −1 be the good arrangement of m−1 p=0 Rp with respect to the rows, where we obviously have = |I |. In virtue of Proposition 6.1, for each q ∈ {0, . . . , − 1}, an ordered sequence Jq = (j0,q , . . . , jm−1,q ) of elements of J exists satisfying Eq. (7), i.e., rp,q + jp,q = nq (mod n). Now, let us consider the integers nq deﬁned by Eq. (7). In view of condition (2) in the statement, for each q ∈ {0, . . . , − 1}, there is a unique column (zp,nq )0 p m−1 in A associated with nq , i.e., such that zp,nq = nq (mod n). Thus, in view of condition (3) in the statement, we have zp,nq = rp,q + jp,q for the unique pair (rp,q , jp,q ) ∈ Rp × J such that rp,q + jp,q = nq (mod n). Clearly, the columns (zp,nq )0 p m−1 , 0 q − 1 satisfy the conditions contained in the statement.

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

251

Finally, we explicitly note that we can state a dual version of Propositions 6.1 and 6.2 for good arrangements with respect to the columns.

7. Crossed two-dimensional Hajós factorizations Given a sequence (R1 , T1 ), . . . , (Rm , Tm ) of Hajós factorizations, we now consider matrices m having pairs m(r, v) of integers as elements and such that the good arrangement of R (resp. p=1 p q=1 Tq ), with respect to the rows (resp. columns), can be obtained by taking the induced arrangement having the ﬁrst (resp. second) elements in the pairs as entries (Deﬁnition 7.2). We prove that for a given factorizing code C, words in C ∩ a ∗ ba ∗ , i.e., words in C with one occurrence of b s, can be canonically associated with one of these special matrices. We recall that for a ﬁnite subset X of A∗ , we set Xk = X ∩ (a ∗ b)k a ∗ . Deﬁnition 7.1. Let C1 = (a rp,q ba vp,q )1 p m, 1 q be an arrangement of C1 ⊆ a ∗ ba ∗ . The matrix R = (rp,q )1 p m, 1 q is the induced arrangement of the rows Rp = {rp,q | q ∈ {1, . . . , }} and the matrix T = (vp,q )1 p m, 1 q is the induced arrangement of the columns Tq = {vp,q | p ∈ {1, . . . , m}}. Furthermore, Rp,w = {a rp,q ba vp,q | 1 q } (resp. Tq,w = {a rp,q ba vp,q | 1 p m}) is a word-row (resp. a word-column) of C1 , for 1 p m (resp. 1 q ). Deﬁnition 7.2. An arrangement C1 = (a rp,q ba vp,q )1 p m, 1 q of C1 ⊆ a ∗ ba ∗ is a good arrangement (with (I, J ) as a Krasner associated pair) if it satisﬁes the following three conditions: (1) For each row Rp and each column Tq , 1 p m, 1 q , (Rp , Tq ) is a Hajós factorization of Zn having (I, J ) as a Krasner companion factorization with respect to a chain of divisors of n = |C1 |. (2) The induced arrangement of the rows is a good arrangement of m p=1 Rp with respect to the rows. (3) The induced arrangement of the columns is a good arrangement of lq=1 Tq with respect to the columns. Example 7.1. C1 = a {0,2,4} b + a {3,5} ba {3} + aba 5 has the following good arrangement (with ({0, 2, 4}, {0, 1}) as a Krasner associated pair): 0 a b a2b a4b C1 = . aba 5 a 3 ba 3 a 5 ba 3 Analogously, for C1 = a {0,2,4,12,14,16} ba {0,6,21} + a {0,4,8,12,16,20} ba {3} we have the following good arrangement (with ({0, 2, 4, 12, 14, 16}, {0, 1, 6, 7}) as a Krasner associated pair):   0 a2b a4b a 12 b a 14 b a 16 b a b 0 3 8 3 4 3 12 3 20 3 16 3 a ba a ba a ba a ba a ba   a ba C1 =  0 6 . a ba a 2 ba 6 a 4 ba 6 a 12 ba 6 a 14 ba 6 a 16 ba 6 0 21 2 21 4 21 12 21 14 21 16 21 a ba a ba a ba a ba a ba a ba

252

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

Let us recall two known equations associated with sets C1 of words with one b s in a factorizing code C. Let P , S be ﬁnite subsets of A∗such that C = P (A−1)S +1. As a direct result, we have C0 = P0 (a − 1)S0 + 1 and Cr = i+j =r Pi (a − 1)Sj + i+j =r−1 Pi bSj , for r > 0 [7]. Consequently, there exists n ∈ N and a Krasner factorization (I, J ) of Zn such that: an − 1 C0 = a n , P 0 = a I , S 0 = a J , a I a J = . (10) a−1 Furthermore, if we set P1 = i∈I a i ba Li , S1 = j ∈J a Mj ba j , I , J , Li , Mj ⊆ N, we have: C1 = C ∩ a ∗ ba ∗ = a I ba J + a i ba Li (a − 1)a J + a Mj (a − 1)a I ba j 0. i∈I

(11)

j ∈J

Proposition 7.1 (De Felice [9]). Let C1 be a subset of a ∗ ba ∗ which satisﬁes Eqs. (10) and (11). Then, there exists a unique arrangement A 1 = (a zp,q ba tp,q )0 p n−1, 0 q n−1 of a J C1 a I which satisﬁes the following properties, for p, q ∈ {0, . . . , n − 1}: (1) Rp = {zp,q | q ∈ {0, . . . , n − 1}} = {0, . . . , n − 1} (mod n), Tq = {tp,q | p ∈ {0, . . . , n − 1}} = {0, . . . , n − 1} (mod n). (2) Two words a zp,q ba tp,q , a zp ,q ba tp ,q have the same exponent zp,q = zp ,q = q (mod n) (resp. tp,q = tp ,q = p (mod n)) if and only if q = q (resp. p = p ), i.e., they belong ). to the same word-column Tq,w (resp. word-row Rp,w it holds: (3) For the word-rows Rp,w and the word-columns Tq,w ∀ i ∈ I, j ∈ J, a r ba v ∈ C1 a r+j ba v+i ∈ Rp,w ⇒ a J a r ba v+i ⊆ Rp,w , a r+j ba v+i ∈ Tq,w ⇒ a r+j ba v a I ⊆ Tq,w .

) in (resp. word-column Tq,w Proposition 7.2 (De Felice [9]). For every word-row Rp,w

A 1 , a subset Rp,w = a rp,1 ba vp,1 + · · · + a rp, ba vp, (resp. Tq,w = a r1,q ba v1,q + · · · + a rm,q ba vm,q ) of words in C1 exists such that: Rp,w = a J (a rp,1 ba vp,1 +ip,1 + · · · + a rp,l ba vp, +ip, ),

= (a r1,q +j1,q ba v1,q + · · · + a rm,q +jm,q ba vm,q )a I , ) (resp. Tq,w

where the order of the elements is not taken into account and ip,1 , . . . , ip, ∈ I (resp. , . . . ,j rp,g ba vp,g ∈ j1,q m,q ∈ J ) are not necessarily different. Furthermore, let Rp = {rp,g | a

| a rg,q ba vg,q ∈ T Rp,w } and Tq = {vg,q q,w }. Thus, for p, q ∈ {0, . . . , n − 1}, (Rp , Tq ) is a Hajós factorization of Zn having (I, J ) as a Krasner companion factorization and it holds:

a Rp = a J a Rp ,

a Tq = a Tq a I .

Let C1 be a subset of a ∗ ba ∗ which satisﬁes Eq. (11) with (I, J ) deﬁned by Eq. (10). Then C1 satisﬁes the conditions contained in Proposition 7.1 and let A 1 be the corresponding arrangement of a J C1 a I . In Proposition 7.3 below, we show that there exists a good

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

253

arrangement B of C1 with (I, J ) as a Krasner associated pair. In the proof of this result, we construct B starting with A 1 and with the induced arrangement A of the rows in A 1 , by the following matrix transformations: 1

A 1 → B1 (deﬁned by|I | columns in A 1 which are selected according to Proposition 6.2), 2

→ B1 (deﬁned by erasing the elements of a J on the left of b in B1 ), 3

→ B (deﬁned by the dual version of 1 , i.e., by |J | selected rows in B1 ), 4

→ B (deﬁned by the dual version of 2 , i.e., by erasing the elements of a I on the right of b in B ). Proposition 7.3. Let C1 be a subset of a ∗ ba ∗ which satisﬁes Eq. (11) with (I, J ) deﬁned by Eq. (10). Then, there exists a good arrangement of C1 with (I, J ) as a Krasner associated pair. Proof. Let C1 be a subset of a ∗ ba ∗ which satisﬁes Eq. (11) with (I, J ) deﬁned by Eq. (10). Then, C1 satisﬁes the conditions contained in Propositions 7.1 and 7.2 and we will use the same notations used in these propositions. Let (Rp , Tq ) be the Hajós factorizations of Zn deﬁned in Proposition 7.2, let A 1 be the arrangement of a J C1 a I satisfying the conditions contained in Proposition 7.1. Consider the induced arrangement A = (zp,q )0 p n−1, 0 j n−1 of the rows in A 1 . By using Proposition 6.2, an ordered sequence Jq = (j0,q , . . . , jn−1,q ) of elements of J exists and = |I | columns A = (zp,nq )0 p n−1, 0 q −1 in A also exist such that, for each p ∈ {0, . . . , n − 1} and q ∈ {0, . . . , − 1}, we have zp,nq = rp,q + jp,q with D = (rp,q )0 p n−1, 0 q −1 being a good arrangement of n−1 p=0 Rp with respect to the rows. Consider the columns tp,nq r +j p,q p,q ba )0 p n−1, 0 q −1 of A 1 such that the induced arrangement of B1 = (a the rows of B1 is A . We claim that, when we erase in B1 the elements of a J on the left (i.e., when we consider the matrix deﬁned by the word-columns Tn q ,w = (a r0,q ba v0,q + · · · + a rn−1,q ba vn−1,q )a I ) we obtain an arrangement B1 of C1 a I . Intuitively, when we erase the elements of a J on the left , we obtain |J | copies of a subset of C a I : B is obtained by selecting in a word-row Rp,w 1 1 one copy of each element in this subset. In detail, for each word a r ba v ∈ C1 , there exist a of A 1 and i ∈ I such that all the elements in a r+J ba v+i are elements of word-row Rp,w Rp,w . Thus, r ∈ Rp and there exist q, rp,q , jp,q such that r = rp,q and zp,nq = rp,q + jp,q . Since a r a J ba ∗ ∩ Rp,w = a r+J ba v+i we have that a r ba v+i is in Tn q ,w . Furthermore, when we consider in B1 the corresponding arrangement of the exponents of the a s on the left of b, we ﬁnd the good arrangement D. We now ﬁnd the required arrangement of C1 by using the same argument as above with respect to the columns and to the exponents of the a s on the right of b. Indeed, each word-column Tn q ,w in B1 is also a word-column in A 1 . Thus, B1 (and so B1 ) maintains all properties of A 1 contained in Propositions 7.1, 7.2, with respect to the columns. In particular, the induced arrangement T = (tp,nq )0 p n−1, 0 q −1 of the columns in B1 is an arrangement of −1 q=0 (Tq + I ) such that (Tq + I ) is the qth column, 0 q − 1.

254

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

Furthermore, we have Tn q = Tq +I = {tp,nq | p ∈ {0, . . . , n−1}} = {0, . . . , n−1} (mod n) tp ,n

and two words a rp,q ba tp,nq , a rp ,q ba q have the same exponent tp,nq = tp ,nq = . p (mod n) if and only if p = p , i.e., they belong to the same word-row Rp,w Then, by using the dual version of Proposition 6.2, for each p ∈ {0, . . . , m − 1}, with m = |J |, an ordered sequence Ip = (ip,0 , . . . , ip,−1 ) of elements of I exists and m = |J | rows T = (tn p ,nq )0 q −1 in T also exist such that, for each p ∈ {0, . . . , m − 1} and q ∈ {0, . . . , − 1}, we have tn p ,nq = vp,q + ip,q with D = (vp,q )0 p m−1, 0 q −1 being a good arrangement of −1 q=0 Tq with respect to the columns. Consider the rows B = (a rp,q ba vp,q +ip,q )0 p m−1, 0 q −1 of B1 such that the induced arrangement of the columns of B is T . Let us prove that when we erase in B the elements of a I on the right, we obtain a good arrangement B = (a rp,q ba vp,q )0 p m−1, 0 q −1 of C1 . Firstly, B is an arrangement of C1 . Intuitively, when we erase the elements of a I on the right in a word-column Tn q ,w , we obtain |I | copies of a subset of C1 : B is obtained by selecting one copy of each element in this subset. In detail, we have already observed that B1 is an arrangement of C1 a I which maintains all properties of A 1 contained in Propositions 7.1, 7.2, with respect to the columns. Now, for each a r ba v ∈ C1 , a r ba v belongs to a column in B1 (since B1 is an arrangement of C1 a I ) and so all the elements in a r ba v+I are in a word-column Tn q ,w in B1 , in view of condition (3) in Proposition 7.1. Thus, there exist p, vp,q , ip,q such that v = vp,q and tn p ,nq = vp,q + ip,q . Since a ∗ ba v+I ∩ Tn q ,w = a r ba v+I , we have a r ba v ∈ B and B is an arrangement of C1 . Finally, when we consider B, we see that the induced arrangement of the rows is a set of rows in D and the induced arrangement of the columns is D . Thus, B is a good arrangement of C1 .

Suppose that (I, J ) is a Krasner factorization of Zn and suppose that C1 has a good arrangement with (I, J ) as a Krasner associated pair. A natural question which arose is whether the set C1 ∪ a n is a code and partial results towards a positive answer to this question have been given in [12]. We end this section with an example which shows that the hypothesis of the existence of this special arrangement is necessary. Indeed, in Example 7.2, we point out that sets C1 of words exist with C1 ⊆ a ∗ ba ∗ , which are not codes but which have arrangements over a matrix such that for any row Tp and any column Rq , (Tp , Rq ) is a Hajós factorization of Zn . Example 7.2. Consider C1 = {b, aba, a 4 ba, a 5 b, a 4 ba 2 , a 3 ba 3 , ba 3 , a 7 ba 2 }. C1 is not a code since (ba 3 )(aba) = b(a 4 ba). Observe that n is uniquely deﬁned by n = |C1 | and thus n = 8. We have two possible arrangements of C1 over a matrix such that for any row Tp and any column Rq , (Tp , Rq ) is a Hajós factorization of Z8 which correspond to the chain 1|2|4|8 of divisors of 8. They are not good arrangements and are reported below: C1 =

a 0 ba 0 a 4 ba 2

a 1 ba 1 a 3 ba 3

a 4 ba 1 a 0 ba 3

a 5 ba 0 a 7 ba 2

,

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

255

with corresponding Krasner pair I = {0, 1, 4, 5}, J = {0, 2} and 

a 4 ba 1  a 5 ba 0 C1 =  7 2 a ba a 0 ba 3

 a 0 ba 0 a 1 ba 1  , a 3 ba 3 a 4 ba 2

with corresponding Krasner pair I = {0, 4}, J = {0, 1, 2, 3}. We also observe that codes exist which have no (good) arrangement, namely X = {ba, ab, ba 2 , a 3 ba 2 }. We know that X + a 4 has no factorizing completion but we do not know whether X + a 4 has a ﬁnite completion. If this ﬁnite completion existed then it would be a counterexample to the factorization conjecture.

References [1] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [2] J. Berstel, C. Reutenauer, Rational Series and Their Languages, EATCS Monographs on Theoretical Computer Science, Vol. 12, Springer, Berlin, 1988. [3] J.M. Boë, Sur les codes factorisants, in: D. Perrin (Ed.), “ Théorie des codes”, LITP, 1979, pp. 1–8. [4] J.M. Boë, Sur les codes synchronisants coupants, in: A. de Luca (Ed.), “Non Commutative Structures in Algebra and Geometric Combinatorics”, Quaderni della Ric. Sci. del C.N.R., Vol. 109, 1981, pp. 7–10. [5] V. Bruyère, M. Latteux, Variable-Length Maximal Codes, in: Proc. Icalp 96, Lecture Notes in Computer Science, Vol. 1099, 1996, pp. 24–47. [6] C. De Felice, Construction of a family of ﬁnite maximal codes, Theoret. Comput. Sci. 63 (1989) 157–184. [7] C. De Felice, A partial result about the factorization conjecture for ﬁnite variable-length codes, Discrete Math. 122 (1993) 137–152. [8] C. De Felice, An application of Hajós factorizations to variable-length codes, Theoret. Comput. Sci. 164 (1996) 223–252. [9] C. De Felice, On a Property of the Factorizing Codes, Internat. J. Algebra Comput. (Special issue dedicated to M. P. Schützenberger) 9 (1999) 325–345. [10] C. De Felice, On some Schützenberger conjectures, Inform. Comput. 168 (2001) 144–155. [11] C. De Felice, On a complete set of operations for factorizing codes, Theoret. Inform. Appl. (2005) to appear. [12] C. De Felice, Solving inequalities with factorizing codes: part 1, manuscript (2005). [13] C. De Felice, C. Reutenauer, Solution partielle de la conjecture de factorisation des codes, C.R. Acad. Sci. Paris 302 (1986) 169–170. [14] G. Hajós, Sur la factorisation des groupes abéliens, Casopis Pest. Mat. Fys. 74 (1950) 157–162. [15] M. Krasner, B. Ranulac, Sur une propriété des polynômes de la division du cercle, C.R. Acad. Sci. Paris 240 (1937) 397–399. [16] N.H. Lam, Hajós factorizations and completion of codes, Theoret. Comput. Sci. 182 (1997) 245–256. [17] D. Perrin, M.P. Schützenberger, Un problème élémentaire de la théorie de l’information, “Théorie de l’Information”, Colloques Internat. CNRS, Vol. 276, Cachan, 1977, pp. 249–260. [18] D. Perrin, M.P. Schützenberger, A conjecture on sets of differences of integer pairs, J. Combin. Theory B 30 (1981) 91–93. [19] A. Restivo, On codes having no ﬁnite completions, Discrete Math. 17 (1977) 309–316. [20] A. Restivo, S. Salemi, T. Sportelli, Completing codes, RAIRO Inform. Théoret. Appl. 23 (1989) 135–147. [21] C. Reutenauer, Sulla fattorizzazione dei codici, Ricerche Mat. XXXII (1983) 115–130. [22] C. Reutenauer, Non commutative factorization of variable-length codes, J. Pure Appl. Algebra 36 (1985) 167–186.

256

C. De Felice / Theoretical Computer Science 340 (2005) 240 – 256

[23] A.D. Sands, On the factorisation of ﬁnite abelian groups, Acta Math. Acad. Sci. Hungar. 8 (1957) 65–86. [24] M.P. Schützenberger, Une théorie algébrique du codage, Séminaire Dubreil-Pisot 1955–56, exposé no 15, 1955, 24pp. [25] M.P. Schützenberger, Codes à longueur variable, manuscript, 1965, reprinted in: D. Perrin (Ed.), “Théorie des codes”, LITP, 1979, pp. 247–271. [26] L. Zhang, C.K. Gu, Two classes of factorizing codes-(p, p)-codes and (4, 4)-codes, in: M. Ito, H. Jürgensen (Eds.), “Words Languages and Combinatorics II”, World Scientiﬁc, Singapore, 1994, pp. 477–483.

Theoretical Computer Science 340 (2005) 257 – 272 www.elsevier.com/locate/tcs

Tile rewriting grammars and picture languages夡 Stefano Crespi Reghizzi, Matteo Pradella∗ DEI - Politecnico di Milano and CNR IEIIT-MI, Piazza Leonardo da Vinci, 32, I-20133 Milano, Italy

Abstract Tile rewriting grammars (TRG) are a new model for deﬁning picture languages. A rewriting rule changes a homogeneous rectangular subpicture into an isometric one tiled with speciﬁed tiles. Derivation and language generation with TRG rules are similar to context-free grammars. A normal form and some closure properties are presented. We prove this model has greater generative capacity than the tiling systems of Giammarresi and Restivo and the grammars of Matz, another generalization of context-free string grammars to 2D. Examples are shown for pictures made by nested frames and spirals. © 2005 Elsevier B.V. All rights reserved. Keywords: Picture languages; 2D languages; Tiling systems; Context-free grammars; Locally testable languages

1. Introduction In the past, several proposals have been made for applying to pictures (or 2D) languages the generative grammar approach but in our opinion none of them matches the elegance and descriptive adequacy that made context free (CF) grammars so successful for string languages. A picture is a rectangular array of terminal symbols (the pixels). A survey of formal models for picture languages is [3] where different approaches are compared and related: tiling systems, cellular automata, and grammars. The latter had been

夡 A preliminary version is [6]. Work partially supported by MIUR, Progetto Linguaggi formali e automi, teoria e applicazioni.

∗ Corresponding author. Tel.: +39 02 2399 3495; fax: +39 02 2399 3666.

E-mail addresses: [email protected] (S. Crespi Reghizzi), [email protected] (M. Pradella). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.041

258

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

surveyed in more detail by Siromoney [7]. Classical 2D grammars can be grouped into two categories 1 called matrix and array grammars. The array grammars, introduced by Rosenfeld, impose the constraint that the left and right parts of a rewriting rule must be isometric arrays; this condition overcomes the inherent problem of “shearing” which pops up while substituting a subarray in a host array. Siromoney’s matrix grammars are parallel-sequential in nature, in the sense that ﬁrst a horizontal string of nonterminals is derived sequentially, using the horizontal productions; and then the vertical derivations proceed in parallel, applying a set of vertical productions. Several variations have been made, for instance [1]. A particular case is the 2D right-linear grammars in [3]. Matz’s context-free picture grammars [5] rely on the notion of row and column concatenation and their closures. A rule is like a string CF one, but the right part is a 2D regular expression. The shearing problem is avoided because, say, row concatenation is a partial operation which is only deﬁned on pictures of identical width. Exploring a different course, our new model, tile rewriting grammar (TRG), intuitively combines Rosenfeld’s isometric rewriting rules with the tiling system (TS) of Giammarresi and Restivo [2]. The latter deﬁnes the family of recognizable 2D languages (the same accepted by on-line tessellation automata of Inoue and Nakamura [4]). A TRG rule is a schema having a nonterminal symbol to the left and a local 2D language to the right over terminals and nonterminals; that is the right part is speciﬁed by a set of ﬁxed size tiles. As in matrix grammars, the shearing problem is avoided by an isometric constraint, but the size of a TRG rule need not be ﬁxed. The left part denotes any rectangle ﬁlled with the same nonterminal. Whatever size the left part takes, the same size is assigned to the right part. To make this idea effective, we impose a tree partial order on the areas which are rewritten. A progressively reﬁned equivalence relation implements the partial ordering. Derivations can then be visualized in 3D as well nested prisms, the analogue of syntax trees of string grammars. To our knowledge, this approach is novel and is able to generate an interesting gamut of pictures: grids, spirals, and in particular a language of nested frames, which is in some way the analogue of a Dyck language. Section 2 lists the basic deﬁnitions. Section 3 presents the deﬁnition of TRG grammar and derivation, two examples, and proves the basic properties of the model: canonical derivation, uselessness of concave rules, normal forms, closures for some operations. Section 3 compares TRG with other models, proving that its generative capacity exceeds that of TS and of Matz’s CF picture grammars. The appendix contains the grammar of Archimedes spirals.

2. Basic deﬁnitions Many of the following notation and deﬁnitions are from [3].

1 Leaving aside the graph grammar models because they generate graphs, not 2D matrices.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

259

Deﬁnition 1. For a ﬁnite alphabet , the set of pictures is ∗∗ . For h, k 1, (h,k) denotes the set of pictures of size (h, k) (we will use the notation |p| = (h, k), |p|row = h, |p|col = k). # is used when needed as a boundary symbol; pˆ refers to the bordered version of picture p. That is

p ∈ (h,k)

p(1, 1) . . . p(1, k) .. .. .. ≡p= , . . . p(h, 1) . . . p(h, k)

# # # p(1, 1) .. pˆ = ... . # p(h, 1) # #

... # . . . p(1, k) .. .. . . . . . p(h, k) ... #

# # .. . . # #

A pixel is an element p(i, j ). If all pixels are identical to C ∈ the picture is called homogeneous and denoted as C-picture. Row and column concatenations are denoted and ¸, respectively. pq is deﬁned iff p and q have the same number of columns; the resulting picture is the vertical juxtaposition of p over q. pk is the vertical juxtaposition of k copies of p; p ∗ is the corresponding closure. ¸,k ¸ ,∗¸ are the column analogues. The pixel-by-pixel cartesian product (written p ⊗ q) is deﬁned iff |p| = |q| and is such that for all i, j , (p ⊗ q)(i, j ) = p(i, j ), q(i, j ) . Deﬁnition 2. Let p be a picture of size (h, k). A subpicture of p at position (i, j ) is a picture q such that, if (h , k ) is the size of q, then h h, k k, and there exist integers i, j (i h − h +1, j k−k +1) such that q(i , j ) = p(i+i −1, j +j −1) for all 1 i h , 1 j k . We will write also q (i,j ) p, or the shortcut q p ≡ ∃i, j (q (i,j ) p). Moreover, if q (i,j ) p, we deﬁne coor (i,j ) (q, p) as the set of coordinates of p where q is located: coor (i,j ) (q, p) = {(x, y) | i x i + |q|row − 1 ∧ j y j + |q|col − 1}. Conventionally, coor (i,j ) (q, p) = ∅, if q is not a subpicture of p. If q coincides with p we write coor(p) instead of coor (1,1) (p, p).

Deﬁnition 3. Let be an equivalence relation on coor(p), written (x, y) ∼(x , y ). Two

subpictures q (i,j ) p, q (i ,j ) p are -equivalent, written q ∼ q , iff for all pairs (x, y) ∈

coor (i,j ) (q, p) and (x , y ) ∈ coor (i ,j ) (q , p) it holds (x, y) ∼(x , y ). A homogeneous C-subpicture q p is called maximal with respect to relation iff for every -equivalent C-subpicture q we have coor(q, p) ∩ coor(q , p) = ∅ ∨ coor(q , p) ⊆ coor(q, p). In other words q is maximal if any C-subpicture which is equivalent to q is either a subpicture of q or it is not overlapping. 2

2 Maximality as used in [6] is different. It corresponds to the condition coor(q, p)coor(q , p).

260

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

Deﬁnition 4. For a picture p ∈ ∗∗ the set of subpictures (or tiles) with size (h, k) is Bh,k (p) = {q ∈ (h,k) | q p}. We assume B1,k to be only deﬁned on (1,∗) (horizontal strings), and Bh,1 on (∗,1) (vertical strings). For brevity, for tiles of size (1, 2), (2, 1), or (2, 2), we introduce the following notation:  if |p| = (1, k), k > 1,  B1,2 (p) 'p( = B2,1 (p) if |p| = (h, 1), h > 1,  B2,2 (p) if |p| = (h, k), h, k > 1. Deﬁnition 5. Consider a set of tiles ⊆ (i,j ) . The locally testable language in the strict sense deﬁned by (written LOC u () 3 ) is the set of pictures p ∈ ∗∗ such that Bi,j (p) ⊆ . The locally testable language deﬁned by a ﬁnite set of tiles LOC u,eq ({1 , 2 , . . . , n }) 4 is the set of pictures p ∈ ∗∗ such that for some k, Bi,j (p) = k . The bordered locally testable language deﬁned by a ﬁnite set of tiles LOC eq ({1 , 2 , . . . , n }) is the set of pictures p ∈ ∗∗ such that for some k, Bi,j (p) ˆ = k . Deﬁnition 6. Substitution. If p, q, q are pictures, q (i,j ) p, and q, q have the same size, then p[q /q](i,j ) denotes the picture obtained by replacing the occurrence of q at position (i, j ) in p with q . Deﬁnition 7. The (vertical) mirror image and the (clockwise) rotation of a picture p (with |p| = (h, k)), respectively, are deﬁned as follows: p(h, 1) . . . p(h, k) .. .. .. Mirror(p) = , . . . p(1, 1) . . . p(1, k)

p(h, 1) . . . p(1, 1) .. .. .. p = . . . . p(h, k) . . . p(1, k) R

Note that the sizes of Mirror(p) and pR are, respectively, (h, k) and (k, h). 3. Tile rewriting grammars The main deﬁnition follows: Deﬁnition 8. A Tile Rewriting Grammar (in short grammar) is a tuple (, N, S, R), where is the terminal alphabet, N is a set of nonterminal symbols, S ∈ N is the starting symbol, R is a set of rules. R may contain two kinds of rules: Fixed size: A → t, where A ∈ N , t ∈ ( ∪ N )(h,k) , with h, k > 0; Variable size: A → , where A ∈ N , ⊆ ( ∪ N )(h,k) , with 1 h, k 2. 3 To avoid confusion with LOC deﬁned in [3], we mark these with “u” (stands for unbordered, because they do not use boundary symbols). 4 eq stands for equality test.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

261

Intuitively a ﬁxed size rule is intended to match a subpicture of (small) bounded size, identical to the right part t. A variable size rule matches any subpicture of any size which can be tiled using all the elements t of the tile set . However, ﬁxed size rules are not a special case of variable size rules. Deﬁnition 9. Consider a grammar G = (, N, S, R), let p, p ∈ ( ∪ N )(h,k) be pictures of identical size, and let , be equivalence relations over coor(p). We say that (p , ) is derived in one step from (p, ), written (p, ) ⇒G (p , ) iff for some A ∈ N and for some rule : A → . . . ∈ R there exists in p a A-subpicture r (m,n) p, maximal with respect to , such that • p is obtained substituting r with a picture s, i.e. p = p[s/r](m,n) , where s is deﬁned as follows: Fixed size: if = A → t, then s = t; Variable size: if = A → , then s ∈ LOC u,eq (). • Let z be coor (m,n) (r, p). Let be the -equivalence class containing z. Then, is equal to , for all the equivalence classes = ; in is divided in two equivalence classes, z and its complement with respect to (= ∅ if z = ). More formally,

= \ {((x1 , y1 ), (x2 , y2 )) | (x1 , y1 ) ∈ z xor (x2 , y2 ) ∈ z} . The subpicture r is named the application area of rule in the derivation step. n We say that (q, ) is derivable from (p, ) in n steps, written (p, ) ⇒G (q, ), iff p = q

and = , when n = 0, or there are a picture r and an equivalence relation

such that n−1

∗

(p, ) ⇒G (r,

) and (r,

) ⇒G (q, ). We use the abbreviation (p, ) ⇒G (q, ) for a derivation with n 0 steps. Deﬁnition 10. The picture language deﬁned by a grammar G (written L(G)) is the set of p ∈ ∗∗ such that, if |p| = (h, k), then ∗ S (h,k) , coor(p) × coor(p) ⇒G (p, ), (1) ∗

where the relation is arbitrary. For short we write S ⇒G p. Note that the derivation starts with a S-picture isometric with the terminal picture to be generated, and with the universal equivalence relation over the coordinates. The equivalence relations computed by each step of (1) are called geminal relations. When writing examples by hand, it is convenient to visualize the equivalence classes of a geminal relation, by appending the same numerical subscript to the pixels of the application area rewritten by a derivation step. The ﬁnal classes of equivalence represent in some sense a 2D generalization of the parenthesis structure that parenthesized context-free string grammars assigned to a sentence.

262

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

Example 11. Chinese boxes: G = (, N, S, R), where = {, , , , ◦}, N = {S}, and R consists of one ﬁxed size, one variable size rule: S→

; S→

◦

◦ , S

◦

S , ◦

◦ ◦

S , S

S ◦

S , ◦

◦ S

◦ S , S S

S , S

◦ S

◦

,

S S

◦ , ◦

S ◦

◦

.

For brevity and readability, we will often specify a set of tiles by a sample picture exhibiting the tiles as its subpictures. We write | to separate alternative right parts of rules with the same left part (analogously to string grammars). The previous grammar becomes     ◦ ◦   ◦ S S ◦   .  | S→    ◦ S S ◦  ◦ ◦ A picture in L(G) is

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

and is obtained applying the variable size rule twice and then the ﬁxed size rule. We show a complete derivation for a more general version of this language in the following example. Example 12. 2D Dyck analogue: The next language Lbox , a superset of Chinese boxes, can be deﬁned by a sort of blanking rule. But since terminals cannot be deleted without shearing the picture, we replace them with a character b (blank or background). Empty frame: Let k 0. An empty frame is a picture deﬁned by the regular expression: (¸(◦)k ¸ ¸)(◦¸bk ¸ ¸◦)k (¸(◦)k ¸ ¸), i.e. a box bordered by ◦, containing just b’s. Blanking: The blanking of an empty frame p is the picture del(p) obtained by applying the projection del(x) = b, x ∈ ∪ {b}. A picture p is in Lbox iff by repeatedly applying del to subpictures which are empty frames, an empty frame is obtained. To obtain the grammar, we add the following rules to the Chinese boxes grammar:     S S  S S  S S X X   , X → S S . S→ |  X X  S S X X S S X X To illustrate, in Fig. 1 we list the derivation steps of a picture. Nonterminals in the same equivalence class are marked with the same subscript. Although this language can be viewed as a 2D analogue of a Dyck’s string language, variations are possible and we do not claim the same algebraic properties as in 1D.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

263

Fig. 1. Example derivation with marked application areas.

3.1. Basic properties The next two statements, which follow immediately from Deﬁnitions 3 and 9, may be viewed as a 2D formulation of well-known properties of 1D CF derivations. Let p1 ⇒ · · · ⇒ pn+1 be a derivation, and r1 (i1 ,j1 ) p1 , . . . , rn (in ,jn ) pn the corresponding application areas. Disjointness of application areas: For any pf , pg , f < g, one of the following holds: (1) coor (ig ,jg ) (rg , pg ) ⊆ coor (if ,jf ) (rf , pf ); (2) coor (if ,jf ) (rf , pf ) ∩ coor (ig ,jg ) (rg , pg ) = ∅. That is, the application area of a later step is either totally placed within the application area of a previous step, or it does not overlap. As a consequence, a derivation can be represented in 3D as a well-nested forest of rectangular prisms, the analogous of derivation trees of string languages. Canonical derivation: The previous derivation is lexicographic iff f < g implies (if , jf ) lex (ig , jg ) (where lex is the usual lexicographic order). Then, the following result holds: ∗

∗

L(G) ≡ {p | S ⇒G p and ⇒G is a lexicographic derivation}.

264

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272 ∗

Deﬁnition 13. A rule of a grammar G is useful if there exists a derivation S ⇒G p ∈ ∗∗ which makes use of at some step; otherwise is called useless. Deﬁnition 14. Consider a grammar G = (, N, S, R). A variable size rule A → is called concave iff contains an element of the following set: A A x A A x A A , , , , x A A A A A A x where A ∈ N, x ∈ N ∪ , x = A. Theorem 15. A concave rule is useless. Proof. By contradiction, if A → , a concave rule, is used in a derivation, then LOC u,eq in Deﬁnition 9 compels the use of every tile in . But concave tiles generate pictures having a concave area ﬁlled with the same nonterminal, say A, and the geminal relation updated by the derivation step is such that this whole area is in the same equivalence class. But Deﬁnition 3 makes it impossible to ﬁnd at following steps, a A-subpicture which is maximal with respect to the geminal relation; hence the derivation fails to produce a terminal picture. A useful grammar transformation consists of moving terminal symbols to ﬁxed size rules.

Deﬁnition 16. A grammar G is in terminal normal form iff the only rules with terminals have the form A → x, x ∈ , i.e. they are unitary rules. Theorem 17. Every grammar G = (, N, S, R) has an equivalent grammar G = (, N , S, R ) in terminal normal form. Proof. To construct G , we eliminate terminals from variable size rules and nonunitary ﬁxed size rules. N contains N, and for every terminal a, we have in N two nonterminals a, 0 and a, 1 . The idea is to replace every homogeneous a-subpicture with a chequered area of a, 0 and a, 1 , in which every application area has size (1, 1). (m,n) (m,n) Let Ch0 , (Ch1 , respectively) be a chequerboard made of 0 and 1 symbols, starting with a 0 (1, resp.) at the top-leftmost position. Let : N ∪ (N × {0, 1}) → N be the projection deﬁned as (a, k ) = a, k , if a∈ ; (A, k ) = A, if A ∈ N . The mapping Chequer : P ( ∪ N )(m,n) → P (N )(m,n) is deﬁned by |t| |t| Chequer() = (t ⊗ t ) | t ∈ ∧ t ∈ {Ch0 , Ch1 } . Then, for every variable size rule X → in G, the following rules are in G : X → | ⊆ Chequer() ∧ Chequer −1 ( ) = . |t| For every nonunitary ﬁxed size rule X → t, the rule X → t ⊗ Ch0 is in G . Moreover, the unitary ﬁxed size rules a, 0 → a, a, 1 → a are in G . G is by construction in terminal normal form.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

265

By construction, rules in G maintain the same structure and applicability of rules in G, as far as nonterminals in N are concerned. The only difference resides in derived terminal subpictures that are replaced in G by chequered subpictures made of new nonterminals, which maintain information about the terminal symbol originally derivable in G in the same area. The chequered structure of these subpictures contains only unitary application areas. Therefore, starting from these subpictures, and using the unitary terminal rules introduced in R , it is always possible to derive homogeneous terminal subpictures, identical to those derivable from G. Example 18. Terminal normal form of Example 11. It is possible to obtain the equivalent terminal normal form grammar by using the construction presented in Theorem 17. For ease of reading, we write the nonterminals a, k , a ∈ , k ∈ {0, 1} as ak . The resulting grammar (without useless rules) is the following:   0 ◦ 1 ◦ 0 ◦ 1 ◦ 0 1      ◦ 1 S S S S ◦0  0 ◦1 ◦0 1     ◦ 0 S S S S ◦1   ◦ 1 S S ◦0  0 1         |  S→ |  1 0  ◦ 0 S S ◦1    ◦ 1 S S S S ◦0   ◦ 0 S S S S ◦1  1 ◦0 ◦1 0 1 ◦0 ◦1 ◦0 ◦1 0

0 → ; 1 → ; ◦0 → ◦; ◦1 → ◦. 3.2. Closure properties For simplicity, in the following theorem we suppose that L(G1 ), L(G2 ) contain pictures of size at least (2,2). Theorem 19. The family L(TRG) is closed under union, column/row concatenation, column/row closure operations, rotation, and alphabetical mapping (or projection). Proof. Consider two grammars G1 = (, N1 , A, R1 ) and G2 = (, N2 , B, R2 ). Suppose for simplicity that N1 ∩ N2 = ∅, S ∈ / N1 ∪ N2 , and that G1 , G2 generate pictures having size at least (2, 2). Then it is easy to show that the grammar G = (, N1 ∪ N2 ∪ {S}, S, R1 ∪ R2 ∪ R), where Union ∪: A A B B R= S→ , S→ A A B B is such that L(G) = L(G1 ) ∪ L(G2 ). Concatenation ¸/: A A B B R= S→ A A B B is such that L(G) = L(G1 )¸L(G2 ). The row concatenation case is analogous.

266

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

Closures ∗¸ /∗ : G = (, N1 ∪ {S}, S, R1 ∪ R), where

R= S→

A A S S A A S S

A A | A A

is such that L(G) = L(G1 )∗¸ . The row closure case is analogous. Rotation R: Construct the grammar G = (, N, A, R ), where R is such that, if B → t ∈ R1 is a ﬁxed size rule, then B → t R is in R ; if B → ∈ R1 is a variable size rule, then B → is in R , with t ∈ implying t R ∈ . It is easy to verify that L(G) = L(G1 )R . Projection : Without loss of generality, we suppose G1 in terminal normal form (Theorem 17). Consider a projection : 1 → 2 . It is immediate to build a grammar G = (2 , N1 , A, R2 ), such that L(G ) = (L(G1 )): simply apply to unitary rules. That is, if X → x ∈ R1 , then X → (x) ∈ R2 , while the other rules of G1 remain in R2 unchanged. 4. Comparison with other models We ﬁrst compare with CF string grammars, then TS, and ﬁnally with Matz’s 2D CF grammars. 4.1. String grammars If in Deﬁnition 8 we choose h = 1, then a TRG deﬁnes a string language. Such 1D TRGs are easily proved to be equivalent to CF string grammars. 5 In fact, the TRG model for string languages is tantamount to a notational variant [6] of classical CF grammars, where the right parts of rules are local languages. 4.2. Tiling systems and 2D CF grammars The next comparison has to face two technical difﬁculties: TS are deﬁned by local languages with boundary symbols, which are not present in TRG, and the test of which tiles are present uses inclusion in TS, equality in TRG. First we prove that a class of local languages is strictly included in L(TRG). Lemma 20. L(LOC u,eq ) ⊆ L(TRG). Proof. Consider a local 2D language over deﬁned (without boundaries) by the set of sets of allowed tiles {ϑ1 , ϑ2 , . . . , ϑn }, ϑi ⊆ (2,2) . An equivalent grammar is S → ϑ1 | ϑ2 | . . . | ϑn . 5 However the empty string cannot be generated by a 1D TRG.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

267

To simplify the comparison with TS, we reformulate them using the terms of Deﬁnition 5, showing their equivalence. Then we prove strict inclusion with respect to TRG. First we recall the original deﬁnition. Deﬁnition 21 (Giammarresi and Restivo [3, Deﬁnition 7.2]). A tiling system (TS) is a 4ple T = (, , ϑ, ), where and are two ﬁnite alphabets, (1) ϑ is a ﬁnite set of tiles over the alphabet ∪ {#}, and : → is a projection. Deﬁnition 22. The tiling systems T Seq and T Su,eq are the same as a TS, with the following respective changes: • Replace the local language deﬁned by (1) with LOC eq ({ϑ1 , ϑ2 , . . . , ϑn }), where ϑi is a ﬁnite set of tiles over . • Replace the local language deﬁned by (1) with LOC u,eq ({ϑ1 , ϑ2 , . . . , ϑn }), where ϑi is a ﬁnite set of tiles over . In T Su,eq there is no boundary symbol #. Lemma 23. L(T Seq ) ≡ L(T S). Proof. First, L(T S) ⊆ L(T Seq ). This is easy, because if we consider the tile set ϑ of a TS, by taking {ϑ1 , ϑ2 , . . . , ϑn } = P(ϑ) (the powerset) we obtain an equivalent T Seq . Second, we have to prove that L(T Seq ) ⊆ L(T S). In [3], the family of languages L(LOC eq ()), where is a set of sets of tiles, is proved to be a proper subset of L(T S) (Theorem 7.8). But L(T S) is closed with respect to projection, and L(T Seq ) is the closure with respect to projection of L(LOC eq ()). Therefore, L(T Seq ) ⊆ L(T S). Next we prove that boundary symbols can be removed. Lemma 24. L(T Su,eq ) ≡ L(T Seq ). Proof (Sketch). Part L(T Seq ) ⊆ L(T Su,eq ): Let T = (, , {ϑ1 , ϑ2 , . . . , ϑn }, ) be a T Seq . For every tile set ϑi , separate its tiles containing the boundary symbol # (call this subset ϑ i ) from the other tiles (ϑ

i ). That is, ϑi = ϑ i ∪ ϑ

i . Introduce a new alphabet and a bijective mapping br : → . We use symbols in to encode boundary, and new tile sets i to contain them: for every tile t in ϑ

i , if there is a tile in ϑ i which overlaps with t, then encode this boundary in a new tile t and put it in the set i . For example, suppose a b ∈ ϑ

1 c d overlaps with # # ∈ ϑ 1 a b and with d # ∈ ϑ 1 , # #

268

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

then both br(a) br(b) , c d and a br(b) br(c) br(d) are in i . Consider a T Su,eq T = (, ∪ , , ), where extends to as follows:

(br(a)) = (a) = (a), a ∈ , and ubr : ∪ → is deﬁned as ubr(a) = br −1 (a), if a ∈ , otherwise = a, and it is naturally extended to tiles and tile sets. is the set ϑ | ϑ ⊆ ϑ

i ∪ i ∧ ubr(ϑ) = ϑ

i ∧ ϑ ∩ i = ∅ ∧ 1 i n . The proof that L(T ) = L(T ) is straightforward and is omitted. Part L(T Su,eq ) ⊆ L(T Seq ): Let T = (, , {ϑ1 , ϑ2 , . . . , ϑn }, ) be a T Su,eq . To construct an equivalent T Seq , we introduce the boundary tile sets i , deﬁned as follows. For every tile a b ∈ ϑi , c d the following tiles are in i : # # # # # # # a b # # c c d d # , , , , , , , . # a a b b # # c d # # # # # # # Consider a T Seq T = (, , , ), where is the set ϑ ∪ ϑ i | ϑ ⊆ i ∧ ϑ = ∅ ∧ 1 i n . It is easy to show that L(T ) = L(T ).

Example 7.2 of [3], the language of squares over the alphabet {a}, is deﬁned by the following T Su,eq :       1 0 0 0 1 0 0  0 2 0 0      , ϑ2 =   0 2 0  , ϑ3 = 1 0 , ϑ1 =   0 0 2 0   0 3 0 0 3 0 0 0 3

(0) = (1) = (2) = (3) = a. Theorem 25. L(T S) ⊆ L(TRG). Proof. It follows from Theorems 19, 20, 23 and 24, and the fact that L(T Su,eq ) is the closure of L(LOC u,eq ) with respect to projection.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

269

The following strict inclusion is an immediate consequence of the fact that, for 1D languages, L(T S) ⊂ L(CF ), and L(TRG) = L(CF ) \ { }. But we prefer to prove it by exhibiting an interesting picture language, made by the vertical concatenation of two specularly symmetrical rectangles. Theorem 26. L(T S) = L(TRG). Proof. Let = {a, b}. Consider the 2D language of palindromic columns, such as a b b a

b a a b

b b b b

L = {p | p = s Mirror(s) ∧ s ∈ (h,k) , h > 1, k 1}. Consider the grammar G: X S S X S X S→ | | , X S S X S X        a b X    X  a b    |   . X→ | | X    X  a b  a b It is easy to see that L(G) = L. We prove by contradiction that L ∈ / L(T S). Suppose that L ∈ L(T S). Therefore L is a projection of a local language L deﬁned over some alphabet . Let a = || and b = ||, with a b. For an integer n, let Ln = {p | p = s Mirror(s) ∧ |s| = (n, n)}. Clearly, |Ln | = a n . Let L n be the set of pictures in L over whose projections are in Ln . By choice of b and by construction of Ln there are at most bn possibilities for the nth and (n + 1)th rows in the pictures of L n , because this is the number of mirrored stripe pictures of size (2, n) over . 2 For n sufﬁciently large a n bn . Therefore, for such n, there will be two different pictures p = sp Mirror(sp ), q = sq Mirror(sq ) such that the corresponding p = sp sp

, q = sq sq

have the same nth and (n+1)th rows. This implies that, by deﬁnition of local language, pictures v = sp sq

, w = sq sp

belong to L n , too. Therefore, pictures (v ) = sp Mirror(sq ), and (w ) = sq Mirror(sp ) belong to Ln . But this is a contradiction. 2

We terminate by comparing with a different generalization of CF grammars in two dimensions, Matz’s CF picture grammars (CFPG) [5], a model syntactically very similar to string CF grammars. The main difference is that the right parts of their rules use ¸, operators. Nonterminals denote unbound rectangular pictures. Derivation is analogous to

270

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

string grammars, but the resulting regular expression may or may not deﬁne a picture (e.g. a ¸(bb) does not generate any picture). Theorem 27. L(CFPG) ⊆ L(TRG). Proof (Sketch). Consider now a Matz’s CFPG grammar in Chomsky normal form. It may contain three types of rules: A → B ¸ C; A → B C; A → a. Moreover, suppose that B = C (this is always possible, if we permit copy rules like A → B). Then, A → B ¸ C corresponds to the following TRG rules: B B C C B C C B B C A→ | | | B B C C B C C B B C B B C C | B C C | B B C | B C. To obtain A → B, just delete C from the previous rules. The case is analogous to ¸, while A → a is trivial. Theorem 28. L(CFPG) = L(TRG). Proof. It is a consequence of Theorems 25, 26, and 27, and the fact from [5] that L(T S) L(CFPG). An example of a TRG but not CFPG language is the following. We know from [5] that the “cross” language, which consists of two perpendicular b-lines on a background of a, is not in L(CFPG). It is easy to show that the following grammar deﬁnes the language:      a a b a a     b a a  , B B A A B →  a a  , A →   B B A A   b b b b b   , S→  C C D D   C C D D a a b a a C→ , D→ . a a b a a The ﬁne control on line connections provided by TRG rules allows the deﬁnition of complex recursive patterns, exempliﬁed by the spirals presented in the appendix. 5. Conclusions The new TRG model extends the context-free string grammars to two dimensions. Each rule rewrites a homogeneous rectangle as an isometric one, tiled with a speciﬁed tile set. In a derivation the rectangles, rewritten at each step, are partially ordered by the subpicture relation, which can be represented in three dimensions by a forest of well-nested prisms, the analogue of syntax trees for strings. Spirals and nested boxes are typical examples handled by TRG.

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

271

The generative capacity of TRG is greater than that of two previous models: TS and Matz’s context-free picture grammars. Practical applicability to picture processing tasks (such as pattern recognition and image compression) remains to be investigated, which will ultimately depend on the expressive power of the new model and on the availability of good parsing algorithms. The analogy with string grammars raises to the educated formal linguist a variety of questions, such as the formulation of a pumping lemma. For comparison with other models, several questions may be considered, e.g whether TRG and TS families coincide on a unary alphabet, or the generative capacity of nonrecursive TRG versus TS.

Acknowledgements Antonio Restivo called our attention to the problem of “2D Dyck languages”. We thank Alessandra Cherubini, Pierluigi San Pietro, Alessandra Savelli, and Daniele Scarpazza for their comments.

Appendix A Grammar for deﬁning discrete Archimedean spirals with step 3. 6 

A  A   V  S→  V   C C

A A V V C C

H H Q Q K K

H H Q Q K K

H B H B Q W Q W • D • D

B B W W D D

      • • • •    ; Q → S S |• · · •,   S S   • · · •  

• • • • • • • · · · A → • · · ; B → · · •; C → • · · ; D → · • · · · · • • • • ·

· • · •, · •

 

     • • · · · ·  ; K →  · ·  ; V → • · · ; W → · H → • · · · · · • •

6 By Daniele Paolo Scarpazza.

· • · •

.

272

S. Crespi Reghizzi, M. Pradella / Theoretical Computer Science 340 (2005) 257 – 272

An example picture:

References [1] H. Fernau, R. Freund, Bounded parallelism in array grammars used for character recognition, in: P. Perner, P. Wang, A. Rosenfeld (Eds.), Advances in Structural and Syntactical Pattern Recognition (Proc. of the SSPR’96), Vol. 1121, Springer, Berlin, 1996, pp. 40–49. [2] D. Giammarresi, A. Restivo, Recognizable picture languages, Internat. J. Pattern Recogn. Artif. Intell. 6 (2–3) (1992) 241–256 (Special Issue on Parallel Image Processing). [3] D. Giammarresi, A. Restivo, Two-dimensional languages, in: A. Salomaa, G. Rozenberg (Eds.), Handbook of Formal Languages, Vol. 3, Beyond Words, Springer, Berlin, 1997, pp. 215–267. [4] K. Inoue, A. Nakamura, Some properties of two-dimensional on-line tessellation acceptors, Inform. Sci. 13 (1977) 95–121. [5] O. Matz, Regular expressions and context-free grammars for picture languages, in: Proc. of the 14th Annu. Symp. on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science, Vol. 1200, Lübeck, Germany, 27 February–1 March 1997, Springer, Berlin, pp. 283–294. [6] S. Crespi Reghizzi, M. Pradella, Tile rewriting grammars, in: Proc. of the Seventh Internat. Conf. on Developments in Language Theory (DLT 2003), Lecture Notes in Computer Science, Vol. 2710, Szeged, Hungary, July 2003, Springer, Berlin, pp. 206–217. [7] R. Siromoney, Advances in Array Languages, in: H. Ehrig, M. Nagl, G. Rozenberg, A. Rosenfeld (Eds.), Proc. of Third Internat. Workshop on Graph-Grammars and Their Application to Computer Science, Lecture Notes in Computer Science, Vol. 291, Springer, Berlin, 1987, pp. 549–563.

Theoretical Computer Science 340 (2005) 273 – 279 www.elsevier.com/locate/tcs

Counting bordered and primitive words with a ﬁxed weight Tero Harjua,∗ , Dirk Nowotkab a Department of Mathematics, Turku Centre for Computer Science (TUCS), University of Turku,

FIN-20014 Turku, Finland b Institute of Formal Methods in Computer Science, University of Stuttgart, D-70569 Stuttgart, Germany

Abstract A word w is primitive if it is not a proper power of another word, and w is unbordered if it has no preﬁx that is also a sufﬁx of w. We study the number of primitive and unbordered words w with a ﬁxed weight, that is, words for which the Parikh vector of w is a ﬁxed vector. Moreover, we estimate the number of words that have a unique border. © 2005 Elsevier B.V. All rights reserved. Keywords: Combinatorics on words; Borders; Primitive words; Möbius function

1. Introduction Let w denote a ﬁnite word over some alphabet A. We say that w is bordered if there is a non-empty proper preﬁx x of w that is also a sufﬁx of w. If there is no such x then w is called unbordered. We say that w is primitive if w = x k , for some k ∈ N, implies that k = 1 and x = w. We often assume that the alphabet is ordered, A = {a1 , a2 , . . . , aq }. In this case, for a word w ∈ A∗ , let (w) denote by (|w|a1 , |w|a2 , . . . , |w|aq ) the Parikh vector of w, where |w|a denotes the number of occurrences of the letter a in w. We also say that w has weight (w). The number of primitive words and unbordered words of a ﬁxed length and an alphabet of a ﬁxed size is well-known, see for example [1–5,7] and the sequences A027375, A003000, ∗ Corresponding author. Fax: +358 2 3336595.

E-mail addresses: harju@utu.ﬁ (T. Harju), [email protected] (D. Nowotka). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.040

274

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

A019308, and A019309 in Sloane’s database of integer sequences [6]. We will recall these results with short arguments and extend them to the case where the words we consider have a ﬁxed weight. Moreover, we estimate the number of words that have exactly one border. Section 2 contains results on counting the number of primitive words. Section 3 investigates the number of bordered words. Finally, we deal with the number of words with exactly one border in Section 4. In the rest of this section we will ﬁx our notation. For more general deﬁnitions see [2]. Let A be a ﬁnite, non-empty set called alphabet. The elements of A are called letters. Let a ﬁnite sequence of letters be called (ﬁnite) word. Let A∗ denote the monoid of all ﬁnite words over A where ε denotes the empty word. Let |w| denote the length of w, and let |w|a denote the number of occurrences of a in w, where a ∈ A. If w = uv then u is called preﬁx of w, denoted by u p w, and v is called sufﬁx of w, denoted by v s w. A word w is called bordered if there exist non-empty words x, y, and z such that w = xy = zx, and x is called a border of w. Let X be a set, then |X| denotes the cardinality of X. The Möbius function : N → Z is deﬁned as follows:   (−1)t if n = p1 p2 . . . pt for distinct primes pi , (n) = 1 if n = 1,  0 if n is divisible by a square. The Möbius inversion formula for two functions f and g is given by: g(n) = f (d) d|n

if and only if f (n) =

d|n

(d)g(n/d).

2. Primitive words Let Pq (n) denote the number of primitive words of length n over an alphabet of size q. It is well-known, see for example [3,2] and the sequence A027375 in [6], that (d)q n/d . (1) Pq (n) = d|n

Indeed, let A with |A| = q be a ﬁnite alphabet of letters. Every word w has a unique primitive root v for which w = v d for some d|n, where n = |w|. Since there are exactly q n words of length n, Pq (d). qn = d|n

We are in the divisor poset, where the Möbius inversion gives (1). In this paper we investigate the number of primitive words with a ﬁxed weight, that is, each letter has a ﬁxed number of occurrences. Consider an ordered alphabet A = {a1 , a2 , . . . , aq }

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

275

of q 1 letters. For a word w ∈ A∗ , let (w) denote (|w|a1 , |w|a2 , . . . , |w|aq ) which is called the Parikh vector of w. For a given vector k = (k1 , k2 , . . . , kq ), let P(k) = {w | w primitive and (w) = k} q and let P (k) = |P(k)|. Clearly, if w ∈ P(k), then |w| = i=1 ki . Also, denote by gcd(k) the greatest common divisor of the components ki . If d| gcd(k), then denote k/d = (k1 /d, k2 /d, . . . , kq /d). The multinomial coefﬁcients under consideration are n! n n = = , k k1 , k2 , . . . , kq k1 ! k2 ! . . . kq ! where n =

q

i=1

ki .

Theorem 1. Let k = (k1 , k2 , . . . , kq ) be a vector with n = P (k) =

d| gcd(k)

(d)

n/d k/d

q

i=1 ki .

Then

.

Proof. We use the principle of inclusion and exclusion to prove our claim. Let the distinct prime divisors of gcd(k) be p1 , p2 , . . . , pt . For an integer d| gcd(k), deﬁne Qd = {w | w = ud where (u) = k/d}. If w ∈ Qd , then (w) = k. Clearly, |Qd | equals the number of all words u, primitive and imprimitive alike, of length n/d such that u has the Parikh vector k/d. Therefore, |Qd | =

n/d k/d

.

(2)

Notice also that if d|e, then Qe ⊆ Qd , and hence I (k) =

t i=1

Qpi

(3)

is the set of all imprimitive words of length n with Parikh vector k. By the principle of inclusion and exclusion, we have then that t Qp = i i=1

∅ =Y ⊆[1,t]

(−1)|Y |−1 Qpi , i∈Y

(4)

276

where

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

Qpi = Qp(Y ) for p(Y ) = i∈Y pi . Hence, by (2), (−1)|Y |−1 |Qp(Y ) | |I (k)| = i∈Y

∅ =Y ⊆[1,t]

n/p(Y ) k/p(Y ) ∅ =Y ⊆[1,t] n/d (d) , =− k/d d| gcd(k)

=−

(−1)|Y |

d>1

by the deﬁnition of the Möbius function . This proves the claim, because P (k) = |I (k)|.

n k

−

3. Unbordered words Let Uq (n) denote the number of all unbordered words of length n over an alphabet of size q. The following formula for Uq (n) is well-known, see for example [1,4,5,7] and also the sequences A003000, A019308, A019309 in [6]. Surely, we have Uq (1) = q and if n > 1 then Uq (2n + 1) = q Uq (2n), Uq (2n) = q Uq (2n − 1) − Uq (n).

(5) (6)

Indeed, case (5) is clear since a word of odd length is unbordered if and only if it is unbordered after its middle letter (at position n + 1) is deleted. For case (6) consider that a word w of even length is unbordered if and only if it is unbordered after one of its middle letters (say, at position n + 1) is deleted except if w = auau and au is unbordered, where a is an arbitrary letter. Note, that there is an alternative way to obtain Uq (n) by considering the following immediate result. Lemma 2. Let w be a bordered word, and let u be its shortest border. Then (1) 2|u| |w|, (2) u is unbordered, and (3) u is the only unbordered border of w. Let Bq (n) denote the number of all bordered words of length n over an alphabet of size q. Lemma 2 shows that it is enough for every unbordered border u, with |u| n/2, to count the number of words of length n − 2|u| which is q n−2|u| . So, we have Uq (i) q n−2i . Bq (n) = 1 i n/2

This gives the formula in (5) and (6) for Uq (n) where Uq (n) = q n − Bq (n) for every q > 1 and where Uq (1) = q.

(7)

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

277

In this paper we investigate the number of unbordered words with a ﬁxed weight. Let us ﬁx a binary alphabet A = {a, b} for now. Let U (n, k) denote the number of all binary unbordered words of length n that have a ﬁxed weight k in the sense that, for every such word w, we have |w|b = k and |w|a = n − k. It is easy to check that U (1, 0) = U (1, 1) = 1 and U (n, k) = 0, if n k and k > 1, and U (n, 0) = 0, if n > 1. Theorem 3. If 0 < k < n then U (n, k) = U (n − 1, k) + U (n − 1, k − 1) − E(n, k) where

E(n, k) =

(8)

U (n/2, k/2) if n and k are even, 0 otherwise.

Proof. Suppose ﬁrst that w has odd length 2n + 1. Each word w = ucv, with c ∈ A and |u| = |v| = n, contributing to U (2n + 1, k) is obtained by adding a middle letter c to an unbordered word uv of even length. If c = a then uv contributes to U (2n, k), and if c = b then uv contributes to U (2n, k − 1). Assume then that w has even length 2n. If w = cudv, with c, d ∈ A and |u| = |v| = n−1, then it contributes to U (2n, k ) if and only if cuv is unbordered (so it contributed to either U (2n − 1, k ) or U (2n − 1, k − 1)) and cu = dv (that is, borderedness is not obtained by adding a letter to cuv such that w is a square). Consider the case where cuv is unbordered but cudv is not, that is, cu = dv. Then w = cucu and cuu is unbordered. Note, that cuu is unbordered if and only if cu is unbordered. Let |cu|b = k. We have that cuu contributes to U (2n − 1, 2k) (if c = a) or U (2n − 1, 2k − 1) (if c = b) if and only if cu contributes to U (n, k) which is therefore subtracted in case |w|b = 2k. Eq. (8) can be generalized to alphabets of arbitrary size q. For this, consider an ordered alphabet {a1 , a2 , . . . , aq } of size q, and let U (k) denote the number of all unbordered words q w of length n = i=1 ki that have a ﬁxed weight (w) = k = (k1 , k2 , . . . , kq ). Moreover, let k[ki − 1] denote (k1 , . . . , ki−1 , ki − 1, ki+1 , . . . , kq ). If there exists 1 j q such that kj = 1 and ki = 0 for all i = j , then only the letter aj q contributes to U (k). Hence U (k) = 1, if i=1 ki = 1 and ki 0 for all 1 i q. q Theorem 4. If i=1 ki > 0 then U (k) = U (k[ki − 1]) − E(k), 1i q ki >0

where

E(k) =

U (k/2) if ki is even for all 1 i q, 0 otherwise.

Proof. Indeed, the arguments of adding a letter at the point |w|/2 of a word w are similar to those of Theorem 3. For the explanation of E(k) we note that a bordered word (created

278

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

by adding a middle letter) is a square ai uai u, for some 1 i q. Note that the length of w and the number of occurrences of every letter is even in that case. Now, w is only counted if ai u is unbordered, that is, if ai u contributes to U (k/2) which must be therefore subtracted.

4. Words with a unique border In this section we count the number of words that have one and only one border. Let us start with an obvious result which belongs to folklore. Lemma 5. Let w be a bordered word, and let u be its shortest border. If w has a border v with |v| > |u| border, then |v| 2|u|. Proof. Indeed, if, for the shortest border u, we have |v| < 2|u| then u overlaps itself (since u p v and u s v), and hence, u is bordered contradicting Lemma 2(2). In order to estimate the number of words with exactly one border, we make the following two observations. Lemma 6. Let u be a ﬁxed unbordered word of length s. Then the number of words of length r of the form xuyux is the number of bordered words of length r − 2s, that is, Bq (r − 2s). Indeed, every word of the form xyx produces exactly one word of the form xuyux, and the condition xuyux = x uy ux would imply that u is bordered; a contradiction. Lemma 7. Let u be a ﬁxed unbordered word of length s. Then the number of words of length r of the form zuz is the number of words of length (r − s)/2. Indeed, each word z produces exactly one word of the form zuz, and the condition zuz = z uz implies that z = z . Let k n and Bq (n, k) denote the number of all words of length n over an alphabet of size q that have exactly one border of length k. It is clear that Bq (1, k) = Bq (n, 0) = 0, for all 1 n and 0 k, and Bq (n, k) = 0, if n < 2k, see Lemma 2(1). Theorem 8. If 1 2k n then Bq (n, k) = Uq (k) (q n−2k − Wq (n − 2k, k) − Eq (n − 2k, k)), where   Bq (r − 2s) if 2s < r, if 2s = r, Wq (r, s) = 1  0 otherwise.

T. Harju, D. Nowotka / Theoretical Computer Science 340 (2005) 273 – 279

and

279

 (r−s)/2 if s < r < 3s and r − s even, q Eq (r, s) = 1 if s = r,  0 otherwise.

Proof. Indeed, following the argument of Lemma 2(2) we count all unbordered words of length k (that is Uq (k)) which are possible borders of a word of length n. For every such border we have to count the number of different combinations of letters for the rest of the n−2k letters, that is q n−2k . However, we have to exclude those cases where new borders are created. Given an unbordered border u of length k, we have the following cases for words with more than one border: uxuyuxu and uzuzu, where x, y, z ∈ A∗ . These two cases are taken care of by Wq (r, s) and Eq (r, s) where both terms equal 1 if u4 and u3 are counted; see also Lemmas 6 and 7. Note that the latter case is included in the former one if and only if |u| |z| (where the “only if” part comes from the fact that u is unbordered, and hence, it does not overlap itself), therefore r < 3s is required in Eq (r, s). Clearly, the number Bq (n) of words of length n over an alphabet of size q with exactly one border is the following: Bq (n) = Bq (n, i). 1 i n/2

References [1] H. Harborth, Endliche 0 − 1-Folgen mit gleichen Teilblöcken, J. Reine Angew. Math. 271 (1974) 139–154. [2] M. Lothaire, Combinatorics on Words, Encyclopedia of Mathematics and its Applications, Vol. 17, AddisonWesley Publishing Co., Reading, MA, 1983. [3] H. Petersen, On the language of primitive words, Theoret. Comput. Sci. 161 (1–2) (1996) 141–156. [4] M. Régnier, Enumeration of bordered words, The Language of the Laughing Cow, RAIRO Inform. Théor. Appl. 26 (4) (1992) 303–317. [5] I. Simon, String matching algorithms and automata, Results and Trends in Theoretical Computer Science, Graz 1994, Lecture Notes in Computer Science, Vol. 812, Springer, Berlin, 1994, pp. 386–395. [6] N.J.A. Sloane, On-line encyclopedia of integer sequences, http://www.research.att.com/ ∼njas/sequences/. [7] P. Tolstrup Nielsen, A note on biﬁx-free sequences, IEEE Trans. Inform. Theory IT-19 (1973) 704–706.

Theoretical Computer Science 340 (2005) 280 – 290 www.elsevier.com/locate/tcs

Growth of repetition-free words—a review Jean Berstel Institut Gaspard-Monge (IGM), Université de Marne-la-Vallée, 5 Boulevard Descartes, F-77454 Marne-la-Vallée, Cédex 2, France

Abstract This survey reviews recent results on repetitions in words, with emphasis on the estimations for the number of repetition-free words. © 2005 Published by Elsevier B.V. Keywords: Repetitions in words; Square-free words; Overlap-free words; Combinatorics on words

1. Introduction A repetition is any bordered word. Quite recently, several new contributions were made to the ﬁeld of repetition-free words, and to counting repetition-free words. The aim of this survey is to give a brief account of some of the methods and results. The terminology deserves some comments. Let > 1 be a rational number. A nonempty word w is an -power if there exist words x, x with x a preﬁx of x and an integer n, such that w = x n x and = n + |x |/|x| = |w|/|x|. For example, the French word entente is a 73 -power, and the English word outshout is a 85 -power. If = 2 or 3, we speak about a square and a cube, like for murmur or kokoko (the examples are taken from [41]). A word w is an overlap if it is a -power for some > 2. For instance, entente is an overlap. Let > 1 be a real number. A word w is said to avoid -powers or is -power-free if it contains no factor that is an -power for . A word w is + -power-free if it contains no factor that is an -power for > . Thus, a word is overlap-free if and only if it is 2+ -power-free.

E-mail address: [email protected]. 0304-3975/$ - see front matter © 2005 Published by Elsevier B.V. doi:10.1016/j.tcs.2005.03.039

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

281

This review reports results on the growth of the number of -free words of length n over an q-letter alphabet. In some cases, growth is bounded by a polynomial in n, in other cases, it is shown to be exponential in n. We consider overlap-free words in the next section, square-free words in Section 3 and some generalizations in the ﬁnal section. For basics and complements, the reader should consult the book of Allouche and Shallit [3].

2. Counting overlap-free words We ﬁrst review estimations for the number of overlap-free words over a binary alphabet. Let V be the set of binary overlap-free words and let v(n) be the number of overlap-free binary words of length n. This sequence starts with 2, 4, 6, 10, 14, 20 (Sloane’s sequence A007777, see [42]). It is clear that V is factorial (factor-closed). It follows that, as for any factorial set, one has v(n + m) v(n)v(m). Thus the sequence (v(n)) is submultiplicative or the sequence (log v(n)) is subadditive. This in turn implies, by a well-known argument, that the sequence limn→∞ v(n)1/n has a limit, or equivalently, that the limit

(V ) = lim

n→∞

1 log v(n) n

exists. The number (V ) is called the (topological) entropy of the set V. For a general discussion about entropy of square-free words, see [4]. The entropy of the set of square-free words is strictly positive, as we will see later. On the contrary, the entropy of the set of overlap-free words is zero. This is a consequence of the following result of Restivo and Salemi [34,35]. Theorem 1. The number v(n) of binary overlap-free words of length n is bounded from above by a polynomial in n. They proved that v(n) is bounded by n4 . The proof is based on the following structural property of overlap-free words which we state in the more general setting of [22]. Recall ﬁrst that the Thue–Morse morphism is deﬁned by

:

0 → 01 1 → 10

Lemma 2. Let 2 < < 7/3, and let x be a word that avoids -powers. There exist words u, y, v with u, v ∈ {e, 0, 1, 00, 11} and y avoiding -powers such that x = u(y)v. This factorization is unique if |x| 7. First, observe that the lemma does not hold for 7/3 since x = 0110110 is a 7/3power and has no factorization of the required form. Next, consider as an example the word x = 011001100 which is a 9/4-power, and contains no higher repetition. One gets x = (0101)0, and y = 0101 itself avoids repetitions of exponent greater than 9/4.

282

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

It follows from the lemma that an overlap-free word x has a factorization x = u1 (u2 ) · · · h−1 (uh )h (xh )h−1 (vh ) · · · (v2 )v1 , where each ui and vi has length at most 2, and xh has length at most 4. A simple computation shows that log |x| − 3 < h log |x|. Thus, the value of h and each ui and vi and xh may take a ﬁnite number of values, from which the total number of overlap-free words results to be bounded by c · d log n = c · nlog d for some constants c and d. Another consequence of the lemma is that the Thue–Morse word t = (0) is not only overlap-free but avoids 7/3-powers. A clever generalization, by Rampersad [32], of a proof of [39,40] shows that t (and its opposite t¯) is the only inﬁnite binary word avoiding 7/3-powers that is a ﬁxed point of a nontrivial morphism. Restivo and Salemi’s theorem says that v(n) Cns for some real s. The upper bound log 15 for s given by Restivo and Salemi has been improved by Kfoury [24] to 1.7, by Kobayashi [25] to 1.5866 and by Lepistö in his Master thesis [26] to 1.37; Kobayashi [25] gives also a lower bound. So Theorem 3. There are constants C1 and C2 such that C1 nr < v(n) < C2 ns , where r = 1.155 . . . and s = 1.37 . . . . One might ask what the “real” limit is. In fact, a result by Cassaigne [12] shows that there is no limit. More precisely, he proves Theorem 4. Set r = lim inf 1.332 < s.

log v(n) log n

and s = lim sup loglogv(n) n . Then r < 1.276 and

It is quite remarkable that the sequence v(n) is 2-regular. This was shown by Carpi [9] (see [3] for the deﬁnition of regular sequences). As we shall see in the next section, the number of square-free ternary words grows exponentially. In fact, Brandenburg [6] proves also that the number of binary cube-free words grows exponentially. The exact frontier between polynomial and exponential growth has been shown to be the exponent 7/3 by Karhumäki and Shallit [22]. Theorem 5. There are only polynomially many binary words of length n that avoid 7/3-powers, but there are exponentially many binary words that avoid 7/3+ -powers. 3. Counting square-free words We report now estimations for the number of square-free words over a ternary alphabet. Let S be the set of ternary square-free words and let s(n) be the number of square-free ternary words of length n. Since S is factorial (factor-closed), the sequence (s(n)) is submultiplicative and the (topological) entropy (S) exists. We will show that (S) is not zero, and give bounds for (S). The

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

283

sequence s(n) starts with 3, 6, 12, 18, 30, 42, 60 (Sloane’s sequence A006156, see [42]). The sequence s(n) is tabulated for n 90 in [4] and for 91 n 110 in [21]. 3.1. Getting upper bounds There is a simple method to get upper bounds for the number of ternary square-free words, based on using better and better approximations by regular languages. Clearly, any square-free word over A = {0, 1, 2} contains no factor 00, 11 or 22, so S ⊂ A∗ \ A∗ {00, 11, 22}A∗ . Since the latter is a regular set, its generating function is a rational function. It is easily seen to be f (t) = (1 + t)/(1 − 2t). Indeed, once an initial letter is ﬁxed in a word of this set, there are exactly two choices for the next letter (this remembers Pansiot’s encoding [31], see also [28]). So s(n) 2n +2n−1 for n 1. Moreover, since a word of length at most 3 is square-free if and only if is in A∗ \ A∗ {00, 11, 22}A∗ , the equality s(n) = 2n + 2n−1 holds for n 3, and thus s(2) = 6 and s(3) = 12. One can continue in this way: clearly none of the 6 squares of length 4: 0101, 0202, 1010, 1212, 2020, 2121 is a factor of a word in S, and it sufﬁces to compute the generating function of the set A∗ \A∗ XA∗ , where X = {00, 11, 22, 0101, 0202, 1010, 1212, 2020, 2121} to get better upper bound for s(n). Some of these generating functions are given explicitly in [36]. For words without squares of length 2 or 4, the series is (1 + 2t + 2t 2 + 3t 3 )/(1 − t − t 2 ) (see [36]). Again, a direct argument gives the reason: a ternary word without squares of length 2 or 4 either ends with aba for a = b, or with abc where the letters a, b, c are distinct. Denote by un (resp. by vn ) the number of words of the ﬁrst (of the second) type, and by s (2) (n) the total number. Then it is easily seen that, for n 4, un = vn−1 and vn = s (2) (n − 1), and consequently s (2) (n) = s (2) (n − 1) √ + s (2) (n − 2). This shows of n course that s(n) C , for some constant C, with = (1 + 5)/2 the golden ratio. More generally, we consider any ﬁnite alphabet A, a ﬁnite set X and the set K = A∗ \ ∗ A XA∗ . We may assume that X contains no proper factor of one of its elements, so it is a code. Since the set K is a quite particular regular set, we will compute its generating function by using special techniques. There exist at least two (related) ways to compute these generating functions. First, we consider the semaphore code C = A∗ X \ A∗ XA+ . Semaphore codes (see e.g. [5]) were introduced by Schützenberger [38] under the name J codes. The computation below remembers of course also recurrent events in the sense of Feller [18]. The set C is the set of words that have a sufﬁx in X but have no other factor in X. Thus the set K is also the set of proper preﬁxes of elements in C, and since C is a maximal preﬁx code, one has C ∗ K = A∗ . Next, one has (see [5] or [27]) Kx = Cy Ry,x (x ∈ X), y∈X

where Cy = C ∩ A∗ y and Ry,x is the correlation set of y and z, given by Ry,z = {z−1 x | z ∈ S(y) ∩ P (x)}.

(1)

(2)

284

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

Here, S(y) (resp P (x)) is the set of proper sufﬁxes of y (proper preﬁxes of x). Of course, Cy . (3) C= y∈X

Eqs. (1)–(3) are CardX + 2 equations in CardX + 2 unknowns and allow to compute the languages or their generating series. As an example, consider X = {00, 11, 22}. Denote by fZ the generating function of the set Z. Then Eqs. (1)–(3) translate into (1 − 3t)fK = 1 − fC , fKaa = t 2 fK = (1 + t)fCaa ,

fC = 3fCaa

(a ∈ A)

since Raa,aa = {1, a} and Raa,bb = ∅ for a = b. Thus 3t 2 fK = (1+t)fC and (1−3t)fK = 3t 2 1 − fC = 1 − 1+t fK , whence fK =

1 1 − 3t +

3t 2

=

1+t

1+t . 1 − 2t

The second technique is called the “Goulden–Jackson clustering technique” in [29]. The idea is to mark occurrences of words in X in a word, and to weight a marked word with an indicator of the number of its marks. If a word w has r marks, then its weight is (−1)r t |w| . As an example, if X is just the singleton X = {010}, the word w = 01001010 exist in eight marked versions, namely 01001010, 01001010, 01001010, 01001010, 010 01010, 01001010, 01001010, 01001010. Let us write w for a marked version of w, and p(w) for its weight. The sum of the weights of the marked versions of a word w is 0 if w contains a factor in X, and is 1 otherwise. In other terms, the generating series of the set K = A∗ \ A∗ XA∗ is p(w), fK = w∈A∗

where the sum is over all marked versions of all words. Now, it appears that this series is rather easy to compute when one considers clusters: a cluster is a marked word w where every position is marked, and that is not the product of two other clusters. Thus, for X = {010}, the word 01001010 is not a cluster since it is the product of the two clusters 010 and 01010. A marked word is a unique product of unmarked letters and of clusters. Thus, a marked word w is either the empty word, or its last letter is not marked, or it ends with a cluster. Thus fK = 1 + fK (t)kt + fK (t)p(C), where k is the size of the alphabet and p(C) is the generating series of the set C of clusters. It follows that 1 . (4) fK (t) = 1 − kt − p(C) A cluster ends with a word in X. Let Cx = C ∩ A∗ x be the clusters ending in x. Then the generating series p(Cx ) are the solutions of the system (y : x)p(Cy ), (5) p(Cx ) = −t |x| − y∈X

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

285

where y : x is the (strict) correlation polynomial of y and x deﬁned by t |z| . y:x= z∈Ry,x \{e}

Eq. (5) is a system of linear equations, and the number of equations is the size of X. Solving this system gives the desired expression. Consider the example X = {010} over A = {0, 1}. Then the generating series of K = A∗ \ A∗ 010A∗ is fK (t) =

1 1 − 2t − p(C010 )

and p(C010 ) = −t 3 − t 2 p(C010 ), whence p(C010 ) = fK (t) =

1 1 − 2t +

t3

=

1+t 2

−t 3 1+t 2

and

1 + t2 . 1 − 2t + t 2 − t 3

Both methods are just two equivalent formulations of the same computation, as pointed out to me by Dominique Perrin. When X = {x} is a singleton, Eq. (2) indeed becomes Kx = CR with R = Rx,x , and in noncommuting variables, Eq. (1) is just K(1 − A) = 1 − C so K(1 − A) = 1 − KxR −1 whence K(1 − A + xR −1 ) = 1.

(6)

Now, the coefﬁcients of the series −xR −1 are precisely the weights of the cluster of x. So Eq. (6), converted to a generating series, yields precisely Eq. (4)! In the general case = (x)x∈X and C = (Cx )x ∈ X and the X × X matrix one considers the (row) vectors X = CR and the same computation as above gives R = (Rx,y )x,y∈X . Then Eq. (2) is K X −1 )x = 1. (XR K = 1−A+ x∈X

The computation of the generating functions for sets K of the form above, or more generally of the series w∈K (w)t |w| , where is a probability distribution on A∗ , is an important issue both in concrete mathematics [20], in the theory of codes [5] and in computational biology (see e.g. chapters 1, 6 and 7 in [27]). Extensions are in [30,33]. In their paper [29], Nanoon and Zeilberger present a package that allows to compute the generating functions and their asymptotic behaviour for the regular sets of words without squares yy of length |y| = % for % up to 23. Richard and Grimm [36] go one step further, to % = 24. The entropy (S) of the set of square-free ternary words is now known to be at most 1.30194.

286

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

3.2. Getting lower bounds In order to get an exponential lower bound on the number of ternary square-free words, there are two related methods, initiated by Brandenburg [6] and Brinkhuis [7]. The ﬁrst method is used for instance in [22], the second one, which gives now sharper bounds, was recently used in [2]. Both rely on the notion of a ﬁnite square-free substitution from A∗ into B ∗ , for some alphabet B. Let us recall that a substitution in formal language theory is a morphism f from some free monoid A∗ into the monoid of subsets of B ∗ that is a function satisfying f (e) = {e} and f (xy) = f (x)f (y), where the product on the righthand side is the product of the sets f (x) and f (y) in B ∗ . The substitution is ﬁnite if f (a) is a ﬁnite set for each letter a ∈ A (and so for each word w ∈ A∗ ), it is called squarefree if each word in f (w) is square-free whenever w is a square-free word on A. For an overview of recent results about power-free morphisms in connection with open problems, see [36]. ¯ 1, ¯ 2}. ¯ Brandenburg’s method goes as follows. Let A = {0, 1, 2} and let B = {0, 1, 2, 0, Let g : B ∗ → A∗ be the morphism that erases bars. Deﬁne a substitution f by f (a) = g −1 (a). Clearly, f is ﬁnite and square-free. Also each square-free word w of length n over A is mapped onto 2n square-free words of length n over B. The second step consists in ﬁnding a square-free morphism h from B ∗ into A∗ . Assume that h is uniform of length r. Then each square-free word w of length n over B is mapped into a square-free word of length rn over A by the morphism h. It follows that there are 2n square-free words of length rn for each square-free word of length n, that is s(rn) 2n s(n). Since s(n) is submultiplicative, one has s(rn) s(n)r . Reporting in the previous equation yields s(n) 2n/(r−1) and proves that growth is exponential. ¯ 1, ¯ 2}. ¯ It remains to give a square-free morphism h from B ∗ into A∗ , where B = {0, 1, 2, 0, It appears that 0 → 0102012021012102010212 1 → 0102012021201210120212 h:

2 → 0102012102010210120212 0¯ → 0102012102120210120212 1¯ → 0102012101202101210212 2¯ → 0102012101202120121012

is a square-free morphism. Here r = 22, and consequently s(n) 2n/21 . The following is a slight variation of Brandenburg’s result: Theorem 6. The number s(n) of square-free ternary words of length n satisﬁes the inequality s(n) 6 · 1.032n .

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

287

A more direct method was initiated by Brinkhuis [7]. He considers a 25-uniform substitution f from A∗ into itself deﬁned by 0 → {U0 , V0 } f : 1 → {U1 , V1 } 2 → {U2 , V2 } where U0 = x1x, ˜ V0 = y0y˜ and x = 012021020102 and y = 012021201021. The words U1 , . . . , V2 are obtained by applying the circular permutation (0, 1, 2). He proves that f is square-free, and thus every square-free word w of length n is mapped onto 2n square-free words of length 25n. His bound is only 2n/24 . The substitution f can be viewed as the composition of an inverse morphism and a morphism, when U0 , . . . , V2 are considered as letters and then each of these letters is mapped to the corresponding word. However, the second mapping is certainly not square-free since the image of U0 V0 contains the square 00. Thus, the construction of Brinkhuis is stronger. Indeed, Ekhad and Zeilberger [17] found 18-uniform square-free substitution of the same form than Brinkhuis’ and thus reduced the bound from 2n/24 to 2n/17 . A relaxed version of Brinkhuis’ construction is used by Grimm [21] to derive the better bound 65n/40 , and by Sun [43] to improve this bound to 110n/42 . 4. Other bounds We review brieﬂy other bounds on the number of repetition-free words. Concerning cube-free binary words, already Brandenburg [6] gave the following bounds. Theorem 7. The number c(n) of binary cube-free words of length n satisﬁes 2 · 1.080n < 2 · 2n/9 c(n) 2 · 1251(n−1)/17 < 1.315 · 1.522n . The upper bound was improved by Edlin [16] to B · 1.4576n for some constant B by using the “cluster” method. Next, we consider Abelian repetitions. An Abelian square is a nonempty word uu , where u and u are commutatively equivalent, that is u is a permutation of u. For instance, 012102 is an Abelian square. It is easy to see that there is no inﬁnite Abelian square-free word over three letters. The existence of an inﬁnite word over four letters without Abelian squares was demonstrated by Keränen [23]. Also, the question of the existence of exponentially many quaternary inﬁnite words without Abelian squares was settled by Carpi [10] positively. He uses an argument similar to Brinkhuis’ but much more involved. Square-free morphisms from alphabets with more than four letters into alphabets with four letters seem not to exist [8]. He shows Theorem 8. The number d(n) of quaternary words avoiding Abelian squares satisﬁes 3 d(n) C · 219n/(85 −85) for some constant C. This result should be compared to the following, concerning ternary words without Abelian cubes [2].

288

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

Theorem 9. The number r(n) of ternary words avoiding Abelian cubes grows faster than 2n/24 . The number of ternary words avoiding Abelian cubes is 1, 3, 9, 24, 66, 180, . . . . It is the sequence A096168 in [42]. The authors consider the 6-uniform substitution 0 → 001002 h : 1 → 110112 2 → 002212, 122002 This does not preserve Abelian cube-free words since the word 0010|02110|11200|10021|10112 which contains an Abelian cube is in h(0101). However, the set {hn (0) : n 0} is shown to avoid Abelian cubes. There is an interesting intermediate situation between the commutative and the noncommutative case which is the case where, for the deﬁnition of squares, only some of the letters are allowed to commute. To be precise, consider a set of commutation relations of the form ab = ba for a, b letters, and deﬁne the relation u ∼ v mod as the transitive closure of the relation uabv ∼ ubav for all words u, v and ab = ba in . A -square is a word uu such that u ∼ u mod . If is empty, a -square is just a square, and if is the set of all ab = ba for a = b, a -square is an Abelian square. Since there is an inﬁnite quaternary word that avoids Abelian squares, the same holds for -squares. For 3 letters, the situation is on the edge since ther exist inﬁnite square-free words, but no inﬁnite Abelian square-free word. The result proved by Cori and Formisano [13] is: Theorem 10. If the set of commutation relations contains at most one relation, then the set of ternary words avoiding -squares is inﬁnite, otherwise it is ﬁnite. It has been proved by the same authors [14] that the number of words grows only polynomially with the length. This result is different from [11] where square-free words in partially commutative monoids are investigated. Another variation concerns circular words. A circular word avoids -powers if all its conjugates avoid -powers. For instance, 001101 is a circular 2+ -power free word because each word in the set {001101, 011010, 110100, 101001, 010011, 100110} is a 2+ -power free word. On the contrary, the word 0101101 is cube-free but its conjugate 1010101 is not cube-free and not even 3+ -power free; so, viewed as a circular word, 0101101 is not 3+ -power free. It is proved in [1] that there exist inﬁnitely many 5/2+ -power free binary circular words, whereas every circular word of length 5 either contains a cube or a 5/2-power. This improves a previous result [15] showing that there are inﬁnitely many cube-free circular binary words, see also [19]. No informations is available about the growth of the number of these words.

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

289

Acknowledgements Many thanks to the anonymous referee who contributed additional references and corrected several misprints.

References [1] A. Aberkane, J. Currie, There exist binary circular 5/2+ power free words of every length, Electron. J. Combin. 11 (2004) R10. [2] J. Aberkane, J. Currie, N. Rampersad, The number of ternary words avoiding Abelian cubes grows exponentially, in: Workshop on Word Avoidability, Complexity and Morphisms, LaRIA Techn. Report 200407, 2004, pp. 21–24. [3] J.-P. Allouche, J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge University Press, Cambridge, 2003. [4] M. Baake, V. Elser, U. Grimm, The entropy of square-free words, Math. Comput. Modelling 26 (1997) 13–26. [5] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [6] F.-J. Brandenburg, Uniformly growing k-th power-free homomorphisms, Theoret. Comput. Sci. 23 (1983) 69–82. [7] J. Brinkhuis, Nonrepetitive sequences on three symbols, Quart. J. Math. Oxford 34 (1983) 145–149. [8] A. Carpi, On Abelian power-free morphisms, Internat. J. Algebra Comput. 3 (1993) 151–167. [9] A. Carpi, Overlap-free words and ﬁnite automata, Theoret. Comput. Sci. 115 (2) (1993) 243–260. [10] A. Carpi, On the number of Abelian square-free words on four letters, Discrete Appl. Math. 81 (1998) 155–167. [11] A. Carpi, A. De Luca, Square-free words on partially commutative monoids, Inform. Proc. Lett. 22 (1986) 125–131. [12] J. Cassaigne, Counting overlap-free words, in: P. Enjalbert, A. Finkel, K. Wagner (Eds.), STACS ’93, Lecture Notes in Computer Science, Vol. 665, Springer, Berlin, 1993, pp. 216–225. [13] R. Cori, M. Formisano, Partially Abelian squarefree words, RAIRO Inform. Théoret. Appl. 24 (6) (1990) 509–520. [14] R. Cori, M. Formisano, On the number of partially Abelian squarefree words on a three-letter alphabet, Theoret. Comput. Sci. 81 (1) (1991) 147–153. [15] J. Currie, D. Fitzpatrick, Circular words avoiding patterns, in: M. Ito, M. Toyama (Eds.), Developments in Language Theory, DLT 2002, Lecture Notes in Computer Science, Springer, Berlin, 2004, pp. 319–325. [16] A. Edlin, The number of binary cube-free words of length up to 47 and their numerical analysis, J. Differential Equations Appl. 5 (1999) 153–154. [17] S.B. Ekhad, D. Zeilberger, There are more than 2n/17 n-letter ternary square-free words, J. Integer Seq. (1998) (Article 98.1.9). [18] W. Feller, An Introduction to Probability Theory and its Applications, Wiley, New York, 1966. [19] D. Fitzpatrick, There are binary cube-free circular words of length n contained within the Thue–Morse word for all positive integers n, Electron. J. Combin. 11 (2004) R14. [20] R. Graham, D. Knuth, O. Patashnik, Concrete Mathematics, Addison-Wesley, Reading, MA, 1989. [21] U. Grimm, Improved bounds on the number of ternary square-free words, J. Integer Seq. (2001) (Article 01.2.7). [22] J. Karhumäki, J. Shallit, Polynomial versus exponential growth in repetition-free binary words, J. Combin. Theory Ser. A 105 (2004) 335–347. [23] V. Keränen, Abelian squares are avoidable on 4 letters, in: ICALP ’92, Lecture Notes in Computer Science, Vol. 623, Springer, Berlin, 1992, pp. 41–52. [24] R. Kfoury, A linear time algorithm to decide whether a binary word contains an overlap, RAIRO Inform. Théoret. Appl. 22 (1988) 135–145. [25] Y. Kobayashi, Enumeration of irreductible binary words, Discrete Appl. Math. 20 (1988) 221–232.

290

J. Berstel / Theoretical Computer Science 340 (2005) 280 – 290

[26] A. Lepistö, A characterization of 2+ -free words over a binary alphabet, Master’s Thesis, University of Turku, Finland, 1995. [27] M. Lothaire, Applied Combinatorics on Words, Cambridge University Press, Cambridge, 2005. [28] J. Moulin-Ollagnier, Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters, Theoret. Comput. Sci. 95 (1992) 187–205. [29] J. Nanoon, D. Zeilberger, The Goulden–Jackson cluster method: extensions, applications and implementations, J. Differential Equations Appl. 5 (1999) 355–377. [30] P. Nicodème, B. Salvy, P. Flajolet, Motif statistics, Theoret. Comput. Sci. 287 (2002) 593–617. [31] J.-J. Pansiot, A propos d’une conjecture de F. Dejean sur les répétitions dans les mots, Discrete Appl. Math. 7 (1984) 297–311. [32] N. Rampersad, Words avoiding 7/3-powers and the Thue–Morse morphism, 2003, Preprint available at http://www.arxiv.org/abs/math.CO/0307401. [33] M. Régnier, A uniﬁed approach to word occurrence probabilities, Discrete Appl. Math. 104 (2000) 259–280. [34] A. Restivo, S. Salemi, On weakly square free words, Bull. EATCS 21 (1983) 49–56. [35] A. Restivo, S. Salemi, Overlap-free words on two symbols, in: M. Nivat, D. Perrin (Eds.), Automata on Inﬁnite Words, Lecture Notes in Computer Science, Vol. 192, Springer, Berlin, 1985, pp. 198–206. [36] C. Richard, U. Grimm, On the entropy and letter frequencies of ternary square-free words, http://arXiv:math.CO/0302302, July 2004. [37] G. Richomme, P. Séébold, Conjectures and results on morphisms generating k-power-free words, Internat. J. Found. Comput. Sci. 15 (2) (2004) 307–316. [38] M.-P. Schützenberger, On the synchronizing properties of certain preﬁx codes, Inform. Control 7 (1964) 23–36. [39] P. Séébold, Morphismes itérés, mot de Morse et mot de Fibonacci, C. R. Acad. Sci. Paris 295 (1982) 439–441. [40] P. Séébold, Sequences generated by inﬁnitely iterated morphisms, Discrete Appl. Math. 11 (1985) 255–264. [41] J. Shallit, Avoidability in words: recent results and open problems, in: Workshop on Word Avoidability, Complexity and Morphisms, LaRIA Techn. Report 2004-07, 2004, pp. 1–4. [42] N.J.A. Sloane, The on-line encyclopedia of integer sequences, http://www.research.att. com/∼njas/sequences/. [43] X. Sun, New lower bound on the number of ternary square-free words, J. Integer Seq. (2003) (Article 03.2.2).

Theoretical Computer Science 340 (2005) 291 – 321 www.elsevier.com/locate/tcs

Algebraic recognizability of regular tree languages Zoltán Ésika, b,∗,1 , Pascal Weilc,2 a Department of Computer Science, University of Szeged, Hungary b Research Group on Mathematical Linguistics, Rovira i Virgili University, Tarragona, Spain c LaBRI, CNRS, Université Bordeaux-1, France

Abstract We propose a new algebraic framework to discuss and classify recognizable tree languages, and to characterize interesting classes of such languages. Our algebraic tool, called preclones, encompasses the classical notion of syntactic -algebra or minimal tree automaton, but adds new expressivity to it. The main result in this paper is a variety theorem à la Eilenberg, but we also discuss important examples of logically deﬁned classes of recognizable tree languages, whose characterization and decidability was established in recent papers (by Benedikt and Ségouﬁn, and by Boja´nczyk and Walukiewicz) and can be naturally formulated in terms of pseudovarieties of preclones. Finally, this paper constitutes the foundation for another paper by the same authors, where ﬁrst-order deﬁnable tree languages receive an algebraic characterization. © 2005 Elsevier B.V. All rights reserved. Keywords: Recognizability; Regular tree languages; Variety theorem; Pseudovariety; Preclones

1. Introduction The notion of recognizability emerged in the 1960s (Eilenberg, Mezei, Wright, and others, cf. [17,30]) and has been the subject of considerable attention since, notably because ∗ Corresponding author.

E-mail addresses: [email protected] (Z. Ésik), [email protected] (P. Weil). 1 Partial support from the National Foundation of Hungary for Scientiﬁc Research, Grant T46686 is gratefully

acknowledged. 2 Partial support from the ACI Sécurité Informatique (projetVERSYDIS) of the French Ministère de la Recherche

is gratefully acknowledged. Part of this work was done while P. Weil was an invited professor at the University of Nebraska in Lincoln. 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.038

292

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

of its close connections with automata-theoretic formalisms and with logical deﬁnability, cf. [6,15,18,38] for some early papers. Recognizability was ﬁrst considered for sets (languages) of ﬁnite words, cf. [16] and the references contained in op. cit. The general idea is to use the algebraic structure of the domain, say, the monoid structure on the set of all ﬁnite words, to describe some of its subsets, and to use algebraic considerations to discuss the combinatorial or logical properties of these subsets. More precisely, a set of words is said to be recognizable if it is a union of classes in a (locally) ﬁnite congruence. The same concept was adapted to the case of ﬁnite trees, traces, ﬁnite graphs, etc, cf. [17,30,14,9], where it always entertains close connections with logical deﬁnability [11,12]. It follows rather directly from this deﬁnition of (algebraic) recognizability that a ﬁnite— or ﬁnitary—algebraic structure can be canonically associated with each recognizable subset L, called its syntactic structure. Moreover, the algebraic properties of the syntactic structure of L reﬂect its combinatorial and logical properties. The archetypal example is that of starfree languages of ﬁnite words: they are exactly the languages whose syntactic monoid is aperiodic, cf. [34]. They are also exactly the languages that can be deﬁned by a ﬁrst-order sentence of the predicate < (FO[<]), cf. [29], and the languages that can be deﬁned by a temporal logic formula, cf. [27,22,7]. In particular, every algorithm we know for deciding the FO[<]-deﬁnability of a regular language L, works by checking, more or less explicitly, whether the syntactic monoid of L is aperiodic. Let be a ranked alphabet. In this paper, we are interested in sets of ﬁnite -labeled trees, or tree languages. It has been known since the 1960s [17,30,15] that the tree languages that are deﬁnable in monadic second-order logic are exactly the so-called regular tree languages, that is, those accepted by bottom-up tree automata. Moreover, deterministic tree automata sufﬁce to accept these languages, and each regular tree language admits a unique, minimal deterministic automaton. From the algebraic point of view, the set of all -labeled trees can be viewed in a natural way as a (free) -algebra, where is now seen as a signature. Moreover, a deterministic bottom-up tree automaton can be identiﬁed with a ﬁnite algebra, with some distinguished (ﬁnal) elements. Thus regular tree languages are also the recognizable subsets of the free -algebra. The situation however is not entirely satisfying, because we know very little about the structure of ﬁnite -algebras, and very few classes of tree languages have been characterized in algebraic terms, see [26,32,33] for attempts to use -algebra-theoretic considerations (and some variants) for the purpose of classifying tree languages. In particular, the important problem of deciding whether a regular tree language is FO[<]-deﬁnable remained open [33]. Based on the word language case, it is tempting to guess that an answer to this problem ought to be found using algebraic methods. In this paper, we introduce a new algebraic framework to handle tree languages. More precisely, we consider algebras called preclones (they lack some of the operations and axioms of clones [13]). Precise deﬁnitions are given in Section 2.1. Let us simply say here that, in contrast with the more classical monoids or -algebras, preclones have inﬁnitely many sorts, one for each integer n 0. As a result, there is no nontrivial ﬁnite preclone. The corresponding notion is that of ﬁnitary preclones, that have a ﬁnite number of elements of each sort. An important class of preclones is given by the transformations T (Q) of a set Q. The elements of sort (or rank) n are the mappings from Qn into Q and the (preclone)

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

293

composition operation is the usual composition of mappings. Note that T (Q) is ﬁnitary if Q is ﬁnite. It turns out that the ﬁnite -labeled trees can be identiﬁed with the 0-sort of the free preclone generated by . The naturally deﬁned syntactic preclone of a tree language L is ﬁnitary if and only if L is regular. In fact, if S is the syntactic -algebra of L, the syntactic preclone is the sub-preclone of T (S) generated by the elements of (if ∈ is an operation of rank r, it deﬁnes a mapping from S r into S, and hence an element of sort r in T (S)). Note that this provides an effectively constructible description of the syntactic preclone of L. It is important to note that the class of recognizable tree languages in the preclonetheoretic sense, is exactly the same as the usual one—we are simply adding more algebraic structure to the ﬁnitary minimal object associated with a regular tree language, and thus, we give ourselves a more expressive language to capture families of tree languages. In order to justify the introduction of such an algebraic framework, we must show not only that it offers a well-structured framework, that accounts for the basic notions concerning tree languages, but also that it allows the characterization of interesting classes of tree languages. The ﬁrst objective is captured in the deﬁnition of varieties of tree languages, and their connection with pseudovarieties of ﬁnitary preclones, by means of an Eilenberg-type theorem. This is not unexpected, but it requires combinatorially much more complex proofs than in the classical word case, the details of which can be found below in Section 5.1. As for the second objective, we offer several elements. First the readers will ﬁnd in this paper a few simple but hopefully illuminating examples, which illustrate similarities and differences with the classical examples from the theory of word languages. Second, we discuss a couple of important recent results on the characterization of certain classes of tree languages: one concerns the tree languages that are deﬁnable in the ﬁrst-order logic of successors (FO(Succ)), and is due to Benedikt and Ségouﬁn [3]; the second one concerns the tree languages deﬁned in the logics EF and EX, and is due to Boja´nczyk and Walukiewicz [5]. Neither of these remarkable results can be expressed directly in terms of syntactic -algebras; neither mentions preclones (of course) but both use mappings of arity greater than 1 on -algebras, that is, they can be naturally expressed in terms of preclones, as we explain in Sections 5.2.2 and 5.2.3. It is also very interesting to note that the conditions that characterize these various classes of tree languages include the semigroup-theoretic characterization of their word language analogues, but cannot be reduced to them. Another such result, and that was our original motivation to introduce the formalism of preclones, is a nice algebraic characterization of FO[<]-deﬁnable tree languages (and a number of extensions of FO[<], such as the introduction of additional, modular quantiﬁers), brieﬂy discussed in Section 5.2.4. Let us say immediately that we do not know yet whether this characterization can be turned into a decision algorithm! In order to keep this paper within a reasonable number of pages, this characterization will be the subject of another paper by the same authors [21]. The main results of this upcoming paper can be found, along with an outline of the present paper, in [20]. To summarize the plan of the paper, Section 2 introduces the algebraic framework of preclones, discussing in particular the all-important cases of free preclones, in which tree languages live (Section 2.2), and of preclones associated with tree automata (Section 2.3). Section 2.4 discusses in some details the notion of ﬁnite determination for a preclone, a ﬁniteness condition different from being ﬁnitary, which is crucial in the sequel. Section 2.5 is

294

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

included for completeness (and can be skipped at ﬁrst reading): its aim is to make explicit the connection between our preclones and other known algebraic structures, namely magmoids and strict monoidal categories. Recognizable tree languages are the subject of Section 3. Here tree languages are meant to be any subset of some Mk , and the preclone structure on M naturally induces a notion of recognizability, as well as a notion of syntactic preclone (Section 3.1). As pointed out earlier, the usual recognizable tree languages, that is, subsets of M0 , fall nicely in this framework, and there is a tight connection between the minimal automaton of such a language and its syntactic preclone (Section 3.2). Speciﬁc examples are given in Section 3.3. Pseudovarieties of ﬁnitary preclones are discussed in detail in Section 4. As it turns out, this notion is not a direct translate of the classical notion for semigroups or monoids, due to the inﬁnite number of sorts. The technical treatment of these classes is rather complex, and we deal with it thoroughly, since it is the foundation of our construction. We show in particular that pseudovarieties are characterized by their ﬁnitely determined elements (Section 4.1), and we describe the pseudovarieties generated by a given set of ﬁnitary preclones, showing in particular that membership in a 1-generated pseudovariety is decidable (Section 4.2). Finally, we introduce varieties of tree languages and we establish the variety theorem in Section 5.1. Section 5.2 presents the examples described above, based on the results by Benedikt and Ségouﬁn [3] and by Boja´nczyk and Walukiewicz [5]. 2. The algebraic framework In this section, we introduce the notion of preclones, a multi-sorted kind of algebra which is our central tool in this paper. In the sequel, if n is an integer, [n] denotes the set of integers {1, . . . , n}. In particular, [0] denotes the empty set. 2.1. Preclones and preclone-generators pairs Let Q be a set and let Tn (Q) denote the set of n-ary transformations of Q, that is, mappings from Qn to Q. Let then T (Q) be the sequence of sets of transformations T (Q) = (Tn (Q))n 0 , which will be called the preclone of transformations of Q. The set T1 (Q) of transformations of Q is a monoid under the composition of functions. Composition can be considered on T (Q) in general: if f ∈ Tn (Q) and gi ∈ Tmi (Q) (1 i n), then the composite h = f (g1 , . . . , gn ), deﬁned in the natural way, is an element of Tm (Q) where m = i∈[n] mi : h(q1,1 , . . . , qn,mn ) = f (g1 (q1,1 , . . . , q1,m1 ), . . . , gn (qn,1 , . . . , qn,mn )) for all qi,j ∈ Q, 1 i n, 1 j mi . This composition operation and its associativity properties are exactly what is captured in the notion of a preclone. In general, a preclone is a many-sorted algebra S = ((Sn )n 0 , •, 1). The elements of the sets Sn , where n ranges over the nonnegative integers, are said to have rank n. The composition operation • associates with each f ∈ Sn and g1 ∈ Sm1 , . . . , gn ∈ Smn , an element •(f, g1 , . . . , gn ) ∈ Sm , of rank m = i∈[n] mi . We usually write f ·(g1 ⊕· · ·⊕gn )

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

295

for •(f, g1 , . . . , gn ). Finally, the constant 1 is in S1 . Moreover, we require the following three equational axioms: (f · (g1 ⊕ · · · ⊕ gn )) · (h1 ⊕ · · · ⊕ hm ) = f · ((g1 · h1 ) ⊕ · · · ⊕ (gn · hn )), (1) where f, g1 , . . . , gn are as above, hj ∈ Skj (j ∈ [m]), and if we denote j ∈[i] mj by m[i] , then hi = hm[i−1] +1 ⊕ · · · ⊕ hm[i] for each i ∈ [n]; 1 · f = f, f · (1 ⊕ · · · ⊕ 1) = f,

(2) (3)

where f ∈ Sn and 1 appears n times on the left-hand side of the last equation. Note that Axiom (1) generalizes associativity, and Axioms (2) and (3) can be said to state that 1 is a neutral element. Remark 2.1. The elements of rank 1 of a preclone form a monoid. It is immediately veriﬁed that T (Q), the preclone of transformation of a set Q, is indeed a preclone for the natural composition of functions, with the identity function idQ as 1. Preclones are an abstraction of sets of n-ary transformations of a set, which generalizes the abstraction from transformation monoids to monoids. Remark 2.2. Clones [13], or equivalently, Lawvere theories [4,19] are another more classical abstraction. Readers interested in the comparison between clones and preclones will have no difﬁculty tracing their differences in the sequel. We will simply point out here the fact that, in contrast with the deﬁnition of the clone of transformations of Q, each of the m arguments of the composite f (g1 , . . . , gn ) above is used in exactly one of the gi ’s, the ﬁrst m1 in g1 , the next m2 in g2 , etc. We observe that a preclone with at least one element of rank greater than 1 must have elements of arbitrarily high rank, and hence cannot be ﬁnite. We say that a preclone S is ﬁnitary if and only if each Sn is ﬁnite. For instance, the preclone of transformations T (Q) is ﬁnitary if and only if the set Q is ﬁnite. The notions of morphism between preclones, sub-preclone, congruence and quotient are deﬁned as usual [25,39]. Note that, as is customary for multi-sorted algebras, a morphism maps elements of rank n to elements of the same rank, and a congruence only relates elements of the same rank. To facilitate discussions, we introduce the following short-hand notation. An n-tuple (g1 , . . . , gn ) of elements of S will often be written as a formal ⊕-sum: g1 ⊕ · · · ⊕ gn . Moreover, if gi ∈ Smi (1 i n), we say that g1 ⊕ · · · ⊕ gn has total rank m = i∈[n] mi . Finally, we denote by Sn,m the set of all n-tuples of total rank m. With this notation, S1,n = Sn . The n-tuple 1 ⊕ · · · ⊕ 1 ∈ Sn,n is denoted by n. If G is a subset of S, we also denote by Gn,m the set of n-tuples of elements of G, of total rank m. Observe that a preclone morphism : S → T , naturally extends to a map : Sn,m → Tn,m for each n, m 0, by mapping g1 ⊕ · · · ⊕ gn to (g1 ) ⊕ · · · ⊕ (gn ). For technical reasons, it will often be preferable to work with pairs (S, A) consisting of a preclone S and a (possibly empty) set A of generators of S. We call such pairs

296

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

preclone-generators pairs, or pg-pairs. The notions of morphisms and congruences must be revised accordingly: in particular, a morphism of pg-pairs from (S, A) to (T , B) must map A into B. A pg-pair (S, A) is said to be ﬁnitary if S is ﬁnitary and A is ﬁnite. Besides preclones of transformations of the form Tn (Q), fundamental examples of preclones and pg-pairs are the free preclones and the preclones associated with a tree automaton. These are discussed in the next sections. 2.2. Trees and free preclones Let be a ranked alphabet, say, = (n )n 0 , and let (vk )k 1 be a sequence of variable names. We let Mn be the set of ﬁnite trees whose inner nodes are labeled by elements of (according to their rank), whose leaves are labeled by elements of 0 ∪ {v1 , . . . , vn }, and whose frontier (the left to right sequence of variables appearing in the tree) is the word v1 · · · vn : that is, each variable occurs exactly once, and in the natural order. Note that M0 is the set of ﬁnite -labeled trees. We let M = (Mn )n .

f

f

υ1

υ2

...

υn

... g1

g2

gn

If f ∈ Mn and g1 , . . . , gn ∈ M, the composite tree f · (g1 ⊕ · · · ⊕ gn ) is obtained by substituting the root of the tree gi for the variable vi in f for each i, and renumbering consecutively the variables in the frontiers of g1 , . . . , gn . Let also 1 ∈ M1 be the tree with a single vertex, labeled v1 . Then (M, ·, 1) is a preclone. Each letter ∈ of rank n can be identiﬁed with the tree with root labeled , where the root’s children are leaves labeled v1 , . . . , vn . It is easily veriﬁed that every rank-preserving map from to a preclone S can be extended in a unique fashion to a preclone morphism from M into S. That is: Proposition 2.3. M is the free preclone generated by , and (M, ) is the free pg-pair generated by . Remark 2.4. If n = ∅ for each n = 1, then Mn = ∅ for all n = 1, and M1 can be assimilated with the set of all ﬁnite words on the alphabet 1 . If at least one n with n > 1 is nonempty, then inﬁnitely many Mn are nonempty, and if in addition 0 = ∅, then each Mn is nonempty. 2.3. Examples of preclones We already discussed preclones of transformations and free preclones. The next important class of examples is that of preclones (and pg-pairs) associated with -algebras and tree

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

297

automata. We also discuss a few simple examples of preclones that will be useful in the sequel. 2.3.1. Preclone associated with an automaton Let be a ranked alphabet as above and let Q be a -algebra: that is, Q is a set and each element ∈ n deﬁnes an n-ary transformation of Q, i.e., a mapping Q : Qn → Q. Recall that Q, equipped with a set F ⊆ Q of ﬁnal states, can also be described as a (deterministic, bottom-up) tree automaton accepting trees in M0 , cf. [15,38,23,24,8]. More precisely, the mapping → Q induces a morphism of -algebras from M0 , viewed here as the initial -algebra (i.e., the algebra of -terms), to Q, say, val: M0 → Q, and the tree language accepted by Q is the set val−1 (F ) of trees which evaluate to an element of F. Now, since the elements of n can be viewed also as elements of Tn (Q), the mapping → Q also extends to a preclone morphism : M → T (Q), whose restriction to the rank 0 elements is exactly the morphism val. The range of is called the preclone associated with Q, and the pg-pair associated with Q, written pg(Q), is the pair ((M), ()). We observe in particular that a morphism of -algebras : Q → Q induces a morphism of pg-pairs : pg(Q) → pg(Q ) in a functorial way. Conversely, if Q is a set and : M → T (Q) is a preclone morphism such that (M0 ) = Q, letting Q = () endows the set Q with a structure of -algebra, for which the associated preclone is the range of . In the sequel, when discussing decidability issues concerning preclones, we will say that a preclone is effectively given if it is given as the preclone associated with a ﬁnite -algebra Q, that is, by a ﬁnite set of generators in T (Q). By deﬁnition, such a preclone is ﬁnitary. 2.3.2. Simple examples of preclones The following examples of preclones and pg-pairs will be discussed throughout the rest of this paper. Example 2.5. Let B be the 2-element set B = {true, false}, and let T∃ be the subset of T (B) whose rank n elements are the n-ary or function and the n-ary constant true, written, respectively, orn and truen . One veriﬁes easily that T∃ is a preclone, which is generated by the binary or2 function and the nullary constants true0 and false0 . That is, if consists of these three generators, T∃ is the preclone associated with the -automaton whose state set is B. Moreover, the rank 1 elements of T∃ form a 2-element monoid, isomorphic to the multiplicative monoid {0, 1}, and known as U1 in the literature on monoid theory, e.g. [31]. Example 2.6. Let p be an integer, p 2 and let Bp = {0, 1, . . . , p − 1}. We let Tp be the subset of T (Bp ) whose rank n elements (n 0) are the mappings fn,r : (r1 , . . . , rn ) → r1 + · · · + rn + r mod p for 0 r < p. It is not difﬁcult to verify that Tp is a preclone, and that it is generated by the nullary constant 0, the unary increment function f1,1 and the binary sum f2,0 . As in Example 2.5, Tp can be seen as the preclone associated with a p-state automaton. Moreover, its rank 1 elements form a monoid isomorphic to the cyclic group of order p.

298

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

Example 2.7. Let again B = {true, false}, and let Tpath be the subset of T (B) whose rank 0 elements are the nullary constants true0 and false0 , and the rank n elements (n > 0) are the n-ary constants truen and falsen , and the n-ary partial disjunctions orP (if P ⊆ [n], orP is the disjunction of the ith arguments, i ∈ P ). One veriﬁes easily that Tpath is a preclone, which is generated by the binary or2 function, the nullary constants true0 and false0 , and the unary constant false1 . The rank 1 elements of Tpath form a 3-element monoid, isomorphic to the multiplicative monoid {1, a, b} with xy = y for x, y = 1, known as U2 in the literature on monoid theory, e.g. [31]. 2.4. Representation of preclones Section 2.3.1 shows the importance of the representation of preclones as preclones of transformations. It is not difﬁcult to establish the following analogue of Cayley’s theorem. Proposition 2.8. Every preclone can be embedded in a preclone of transformations. Proof. Suppose that S is a preclone and let Q be the disjoint union of the sets Sn , n 0. For each f ∈ Sn , let f be the function Qn → Q given by f (g1 , . . . , gn ) = f · (g1 ⊕ · · · ⊕ gn ). The assignment f → f deﬁnes an injective morphism S → T (Q).

This result however is not very satisfactory: it does not tell us whether a ﬁnitary preclone can be embedded in the preclone of transformations of a ﬁnite set. It is actually not always the case, and this leads to the following discussion. Let k 0. We say that a preclone S is k-determined if distinct elements can be separated by k-ary equations. Formally, let ∼k denote the following equivalence relation: for all f, g ∈ Sn (n 0), f ∼k g ⇐⇒ f · h = g · h,

for all h ∈ Sn,! with ! k.

Note that for each ! k, ∼k is the identity relation on T! . We call S k-determined if the relation ∼k is the identity relation on each Sn , n 0, and we say that S is ﬁnitely determined if it is k-determined for some integer k. We also say that a pg-pair (S, A) is k-determined (resp. ﬁnitely determined) if S is. Example 2.9. The preclone of transformations of a set is 0-determined. We observe the two following easy lemmas: Lemma 2.10. For each k, ∼k is a congruence relation. Proof. Let f, g ∈ Sn be ∼k -equivalent. For each i ∈ [n], let fi , gi ∈ Smi , such that fi ∼k gi . Wewant to show that f · (f1 ⊕ · · · ⊕ fn ) ∼k g · (g1 ⊕ · · · ⊕ gn ). Let m = i∈[n] mi and let h ∈ Sm,! for some ! k. Then h is an m-tuple, and we let h1 be the tuple of the ﬁrst m1 terms of h, h2 consist of the next m2 elements, etc, until hn ,

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

299

which consists of the last mn elements of h. Note that each hi lies in some Smi ,!i and that i∈[n] !i = !. In particular, !i k for each i and we have f · (f1 ⊕ · · · ⊕ fn ) · h = f · (f1 · h1 ⊕ · · · ⊕ fn · hn ) = f · (g1 · h1 ⊕ · · · ⊕ gn · hn ) = g · (g1 · h1 ⊕ · · · ⊕ gn · hn ) = g · (g1 ⊕ · · · ⊕ gn ) · h.

Lemma 2.11. For each k 0, the quotient preclone S/∼k is k-determined. Proof. Let T = S/∼k and let [f ], [g] ∈ Tn , where [f ] denotes the ∼k -equivalence class of f (necessarily in Sn ). Let ! k and assume that [f ] · [h] = [g] · [h] for each h ∈ Tn,! . Then f · h ∼k g · h for each h. But f · h and g · h lie in S! , and we already noted that ∼k is the identity relation on S! (since ! k). Thus f · h = g · h for all h ∈ Sn,! , and since this holds for each ! k, we have f ∼k g, and hence [f ] = [g]. We say that a preclone morphism : S → T is k-injective if it is injective on each S! with ! k. The next lemma, relating k-determination and k-injectivity, will be used to discuss embeddability of a ﬁnitary preclone in the preclone of transformations of a ﬁnite set. Lemma 2.12. Let S be a k-determined preclone. If : S → T is a k-injective morphism, then is injective. Proof. If (f ) = (g) for some f, g ∈ Sn , then (f · h) = (f ) · (h) = h(g) · (h) = (g · h), for all h ∈ Sn,! with ! k. Since is k-injective, it follows f · h = g · h for all h ∈ Sn,! with ! k, and since S is k-determined, this implies f = g. Proposition 2.13. Let S be a ﬁnitary and ﬁnitely determined preclone. Then there is a ﬁnite set Q such that S embeds in T (Q). If in addition S is 0-determined, the set Q can be taken equal to S0 . Proof. We modify the construction in the proof of Proposition 2.8. Let k 0 be such that S is k-determined, and let Q = {⊥} ∪ i k Si , where the sets Si are assumed to be pairwise disjoint and ⊥ is a new symbol, not in any of those sets. For each f ∈ Sn (n 0), let (f ) = f : Qn → Q be the function deﬁned by f · (q1 ⊕ · · · ⊕ qn ) if q1 ∈ Sm1 , . . . , qn ∈ Smn and mi k, f (q1 , . . . , qn ) = ⊥ otherwise. It is easy to check that is a morphism. By Lemma 2.12, is injective. To conclude, we observe that if k = 0, we can choose Q = S0 since ni=1 mi 0 is possible if and only if each mi = 0. For later use we also note the following technical results:

300

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

Proposition 2.14. Let S and T be preclones, with T k-determined. Let G be a (ranked) generating set of S and let : G → T be a rank-preserving map, whose range includes all of T! , for each ! k. Then can be extended to a preclone morphism : S → T iff for all g ∈ Gn , n 0, and for all h ∈ Gn,! with ! k,

(g · h) = (g) · (h).

(4)

Proof. Condition (4) is obviously necessary, and we show that it is sufﬁcient. Let f ∈ Sn , n 0. Any possible image of f by a preclone morphism is an element g ∈ Tn such that, if h ∈ Tn,! for some ! k and if h ∈ Gn,! is such that (h ) = h, then g · h = (f · h ). Since T is k-determined and each T! (! k) is in the range of , the element g is completely determined by f. That is, if an extension of exists, then it is unique. We now show the existence of this extension. We want to assign an image to an arbitrary element f of S, and we proceed by induction on the height of an expression of f in terms of the elements of G; such an expression exists since G generates S. If f ∈ G, we let (f ) = (f ). Note also that if f = 1, then we let (f ) = 1. If f ∈ / G, then f = g · h for some g ∈ G ∩ Sn and some h = h1 ⊕ · · · ⊕ hn . By induction, the elements (hi ) are well deﬁned for each i ∈ [n]. We then let (f ) = (g) · ((h1 ) ⊕ · · · ⊕ (hn )). To show that (f ) is well deﬁned, we consider a different decomposition of f, say, f = g · h with h = h1 ⊕ · · · ⊕ hm . If f has rank ! k, then h ∈ Sn,! and h ∈ Sm,! , so ¯ and h = (h¯ ) for some h¯ ∈ Gn,! and h¯ ∈ Gm,! . By Condition (4), we have h = (h) ¯ = (g) · (h) ¯ = (g · h) = (f ) (g) · (h) ¯ = (g ) · (h¯ ). So is well deﬁned on all the elements of and by symmetry, (g) · (h) S of rank at most k. Now if f has rank ! > k, let x ∈ T!,p with p k. Then there exists x ∈ G!,p such that x = (x ). Note that h · x and h · x are well deﬁned, in Sn,p and in Sm,p , respectively. In particular, (h · x ) is well deﬁned, and equal to (h) · x. Similarly, (h · x ) is well deﬁned, equal to (h ) · x. It follows that ((g) · (h)) · x = (g) · ((h) · x) = (g) · (h · x ) = (g · (h · x )) = (f · x ). By symmetry, ((g ) · (h )) · x = ((g) · (h)) · x, and since T is k-determined, we have (g) · (h) = (g ) · (h ). Thus, is well deﬁned on S. By essentially the same argument, one veriﬁes that preserves composition. Corollary 2.15. Let S and T be k-determined preclones that are generated by their elements of rank at most k. If there exist bijections from S! to T! for each ! k, that preserve all compositions of the form f · g, where f ∈ Sn , g ∈ Sn,! with n, ! k, then S and T are isomorphic. Proof. Using the fact that S is generated by its elements of rank at most k and T is kdetermined, Proposition 2.14 shows that the given bijections extend to a morphism from S to T. This morphism is onto since T as well is generated by its rank k elements. It

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

301

is also k-injective by construction, and hence it is injective by Lemma 2.12 since S is k-determined. 2.5. Preclones, magmoids and strict monoidal categories The point of this short subsection is to verify the close connection between the category of preclones and the category of magmoids, cf. [2], which are in turn a special case of strict monoidal categories, cf. [28]. We recall that a magmoid is a category M whose objects are the nonnegative integers equipped with an associative bifunctor ⊕ such that 0 ⊕x = x = x ⊕0. A morphism of magmoids is a functor that preserves objects and ⊕. We say that a magmoid M is determined by its scalar morphisms if each morphism f : n → m can be written in a unique way as a ⊕-sum f1 ⊕· · ·⊕fn , where each fi is a morphism with source 1. Moreover, there is a morphism 0 to n if and only if n = 0 (in which case there is a unique morphism). Proposition 2.16. The category of preclones is equivalent to the full subcategory of magmoids spanned by those magmoids which are determined by their scalar morphisms. Proof. With each preclone S, we associate a category whose objects are the nonnegative integers and whose morphisms n → m are the elements of Sn,m , that is, the n-tuples of elements of S of total rank m. Composition is deﬁned in the following way: let f = f1 ⊕ · · · ⊕ fn ∈ Sn,m and g = g1 ⊕ · · · ⊕ gm ∈ Sm,p , and suppose that fi ∈ Smi , i ∈ [n] (so that m = i∈[n] mi ). For each i, let g i = gm1 +···+mi−1 +1 ⊕ · · · ⊕ gm1 +···+mi . Then we let f · g = f1 · g 1 ⊕ · · · ⊕ fm · g m . The identity morphism at object n is the n-tuple n = 1 ⊕ · · · ⊕ 1. Note that when n = 0, this is the unique morphism 0 → 0, and there are no morphisms from 0 to n if n = 0. One may then regard ⊕ as a bifunctor S × S → S that maps a pair (f, g) with f = f1 ⊕ · · ·⊕fn ∈ Sn,p and g = g1 ⊕· · ·⊕gm ∈ Sm,q to the morphism f1 ⊕· · ·⊕fn ⊕g1 ⊕· · ·⊕gm from n + m to p + q. Then S, equipped with the bifunctor ⊕, is a magmoid. Moreover, S, as a magmoid, is determined by its scalar morphisms. It is clear that each preclone morphism determines a functor between the corresponding magmoids which is the identity function on objects and preserves ⊕ and is thus a morphism of magmoids. Conversely, if M is a magmoid determined by its scalar morphism, then its morphisms with source 1 constitute a preclone S, moreover, M is isomorphic to the magmoid determined by S. 3. Recognizable tree languages As discussed in the introduction, the theory of (regular) tree languages is well developed [23,24,8] (see also Section 2.3.1 above). Here we slightly extend the notion of tree languages, to mean any subset of some Mk , k 0. In the classical setting, tree languages are subsets of M0 .

302

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

The preclone structure on M, described in Section 2, leads in a standard fashion to a deﬁnition of recognizable tree languages [17,30,9,12,40]. This is discussed in some detail in Section 3.1. As we will see in Section 3.2, recognizability extends the classical notion of regularity for tree languages, and it gives us richer algebraic tools to discuss these languages. Further examples are given in Section 3.3. 3.1. Syntactic preclones Suppose that : M → S is a preclone morphism, or a morphism (M, ) → (S, A). We say that a subset L of Mk is recognized by if L = −1 (L), or equivalently, if L = −1 (F ) for some F ⊆ Sk . Moreover, we say that L is recognized by S, or by (S, A), if L is recognized by some morphism M → S or (M, ) → (S, A). Finally, we say that a subset L of Mk is recognizable if it is recognized by a ﬁnitary preclone, or pg-pair. As usual, the notion of recognizability can be expressed equivalently by stating that L is saturated by some locally ﬁnite congruence on M, that is, L is a union of classes of a congruence which has ﬁnite index on each sort [10,11,40]. With every subset L ⊆ Mk , recognizable or not, we associate a congruence on M, called the syntactic congruence of L. This relation is deﬁned as follows. First, an n-ary context in Mk is a tuple (u, k1 , v, k2 ) where • k1 , k2 are nonnegative integers, • u ∈ Mk1 +1+k2 , and • v = v1 ⊕ · · · ⊕ vn ∈ Mn,! , with k = k1 + ! + k2 . (u, k1 , v, k2 ) is an L-context of an element f ∈ Mn if u · (k1 ⊕ f · v ⊕ k2 ) ∈ L. Recall that k denotes the ⊕-sum of k terms equal to 1. Below, when k1 and k2 are clear from the context (or do not play any role), we will write just (u, v) to denote the context (u, k1 , v, k2 ).

For each f, g ∈ Mn , we let f ∼L g if and only if f and g have the same L-contexts. Proposition 3.1. The relation ∼L , associated with a subset L of Mk , is a preclone congruence which saturates L. Proof. Suppose that f, f ∈ Mn and g, g ∈ Mn,m with f ∼L f , g = g1 ⊕ · · · ⊕ gn , g = g1 ⊕ · · · ⊕ gn and gi ∼L gi for each 1 i n. We prove that f · g ∼L f · g . Let mi be

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

303

the rank of gi and gi , so that m = i∈[n] mi and consider any m-ary context (u, k1 , v, k2 ) in Mk . Then v ∈ Mm,! with ! = k − (k1 + k2 ). Thus, v is an m-tuple, and we let w1 be the ⊕-sum of the ﬁrst m1 -terms of v, w2 be the ⊕-sum of the following m2 -terms of v, etc, until ﬁnally wn is the ⊕-sum of the last mn terms of v. In particular, we may write v = w1 ⊕ · · · ⊕ wn . Since f ∼L f , we have u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L. It sufﬁces to consider the n-ary context (u, k1 , g · v, k2 ), where g · v = (g1 ⊕ · · · ⊕ gn ) · v stands for g1 · w1 ⊕ · · · ⊕ gn · wn . Moreover, since gi ∼L gi , we have u · (k1 ⊕ f · (g1 · w1 ⊕ · · · ⊕ gi−1 · wi−1 ⊕ gi · wi ⊕ gi+1 ·wi+1 · · · ⊕ gn · wn ) ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f · (g1 · w1 ⊕ · · · ⊕ gi · wi ⊕ gi+1 ·wi+1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 ) ∈ L

for each 1 i n. To justify this statement, it sufﬁces to consider the following mi -ary context (for gi and gi ), u · (g1 · w1 ⊕ · · · gi−1 · wi−1 ⊕ 1 ⊕ gi+1 · wi+1 ⊕ · · · ⊕ gn · wn ), k1 + li,1 , wi , li,2 + k2 , where !i,1 is the sum of the ranks of w1 , . . . , wi−1 , and !i,2 is the sum of the ranks of wi+1 , . . . , wn . We now have u · (k1 ⊕ f · g · v ⊕ k2 ) ∈ L ⇐⇒ u · (k1 ⊕ f ⇐⇒ u · (k1 ⊕ f .. . ⇐⇒ u · (k1 ⊕ f ⇐⇒ u · (k1 ⊕ f

· (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · (g1 · w1 ⊕ · · · ⊕ gn · wn ) ⊕ k2 )∈L · g · v ⊕ k2 ) ∈ L.

This completes the proof that ∼L is a congruence. Next we observe that an element f ∈ Mk is in L if and only if the k-ary context (1, k) is an L-context of f: it follows immediately that ∼L saturates L. We denote by (ML , L ) the quotient pg-pair (M/∼L , /∼L ), called the syntactic pgpair of L. ML is the syntactic preclone of L and the projection morphism L : M → ML , or L : (M, ) → (ML , L ), is the syntactic morphism of L. We note the following, expected result: Proposition 3.2. The syntactic congruence of a subset L of Mk is the coarsest preclone congruence which saturates L. A preclone morphism : M → S (resp. a morphism of pg-pairs : (M, ) → (S, A)) recognizes L if and only if can be factored through the syntactic morphism L . In particular, L is recognizable if and only if ∼L is locally ﬁnite, if and only if ML is ﬁnitary.

304

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

Proof. Let ≈ be a congruence saturating L and assume that f, g ∈ Mn are ≈-equivalent. Let (u, k1 , v, k2 ) be an n-ary context: then u · (k1 ⊕ f · v ⊕ k2 ) ≈ u · (k1 ⊕ g · v ⊕ k2 ), and since ≈ saturates L, u · (k1 ⊕ f · v ⊕ k2 ) ∈ L iff u · (k1 ⊕ g · v ⊕ k2 ) ∈ L. Since this holds for all n-ary contexts in Mk , it follows that f ∼L g. We also note that syntactic preclones are ﬁnitely determined. Proposition 3.3. The syntactic preclone of a subset L of Mk is k-determined. Proof. We show that if f, g ∈ Mn and f · h ∼L g · h for all h ∈ Mn,! with ! k, then f ∼L g, that is, f and g have the same L-contexts. Let (u, k1 , v, k2 ) be an L-context of f. Note in particular that v ∈ Mn,p with k = k1 + p + k2 . It follows that f · v ∈ Mp , and that f · v ∼L g · v. Moreover (u, k1 , p, k2 ) is an L-context of f · v. But in that case, (u, k1 , p, k2 ) is also an L-context of g · v, and hence (u, k1 , v, k2 ) is an L-context of g, which concludes the proof. 3.2. The usual notion of regular tree languages We now turn to tree languages in the usual sense, that is, subsets of M0 . For these sets, there exists a well-known notion of (bottom-up) tree automaton, whose expressive power is equivalent to monadic second-order deﬁnability, to certain rational expressions, and to recognizability by a ﬁnite -algebra [23,24] (see Section 2.3.1). The tree languages captured by these mechanisms are said to be regular. It is an essential remark (Theorem 3.4 below) that the regular tree languages are exactly the subsets of M0 that are recognized by a ﬁnitary preclone. Recall that the minimal tree automaton of a regular tree language is the least deterministic tree automaton accepting it, and the -algebra associated with this automaton is called the syntactic -algebra of the language. It is characterized by the fact that the natural morphism from the initial -algebra to the syntactic -algebra of L, factors through every morphism of -algebra which recognizes L (see Section 2.3.1 and [23,24,1]). Theorem 3.4. A tree language L ⊆ M0 is recognizable if and only if it is regular. Moreover, the syntactic preclone (resp. pg-pair) of L is the preclone (resp. pg-pair) associated with its syntactic -algebra. Proof. Let Q be the syntactic -algebra of L, let (S, A) be its syntactic pg-pair, and let : (M, ) → (S, A) be its syntactic morphism. As discussed in Section 2.3.1, the pg-pair associated with Q, pg(Q), recognizes L, and hence the syntactic morphism of L factors through an onto morphism of pg-pairs pg(Q) → (S, A). In particular, if L is regular, then Q is ﬁnite, so pg(Q) is ﬁnitary, and so is (S, A): thus L is recognizable. Conversely, assume that L is recognizable. Since (S, A) is ﬁnitary and 0-determined (Proposition 3.3), so (S, A) is isomorphic to a sub-pg-pair of T (S0 ) by Proposition 2.13. Using again the discussion in Section 2.3.1, S0 has a natural structure of -algebra (via the morphism ), such that (S, A) = pg(S0 ) and such that S0 recognizes L as a -algebra. In particular, L is recognized by a ﬁnite -algebra, and hence L is regular.

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

305

Moreover, the recognizing morphism M0 → S0 is the restriction to M0 of , the syntactic morphism of L. Therefore there exists an onto morphism of -algebras S0 → Q, which in turn induces a morphism of preclones from (S, A) = pg(S0 ) onto pg(Q). Since Q and S0 are ﬁnite, it follows that the morphisms between them described above are isomorphisms, and this implies that pg(Q) is isomorphic to (S, A). While not difﬁcult, Theorem 3.4 is important because it shows that we are not introducing a new class of recognizable tree languages. We are simply associating with each regular tree language a ﬁnitary algebraic structure which is richer than its syntactic -algebra (a.k.a. minimal deterministic tree automaton). This theorem also implies that the syntactic pg-pair of a recognizable tree language has an effectively computable ﬁnite presentation. Remark 3.5. If L ⊆ M0 , the deﬁnition of the syntactic congruence of L involves the consideration of n-ary contexts in M0 . Such contexts are necessarily of the form (u, 0, v, 0), where u ∈ M1 and v ∈ Mn,0 , which somewhat simpliﬁes matters. 3.3. More examples of recognizable tree languages The examples in this section are directly related with the preclones discussed in Section 2.3.2. Let be a ranked Boolean alphabet, that is, a ranked alphabet such that each n is either empty or equal to {0n , 1n }, and 0 and at least one n (n 2) are nonempty. Let k 0 be an integer. 3.3.1. Verifying the occurrence of a letter Let Kk (∃) be the set of all trees in Mk containing at least one vertex labeled 1n (for some n). Then Kk (∃) is recognizable, by a morphism into the preclone T∃ (see Example 2.5). Let : M → T∃ be the morphism of preclones given by (0n ) = orn ((00 ) = false0 ) and (1n ) = truen whenever n = ∅. It is not difﬁcult to verify that −1 (truek ) = Kk (∃). Moreover, () contains a generating set of T∃ , so is onto, and the syntactic morphism of Kk (∃) factors through . But T∃ has at most 2 elements of each rank, so any proper quotient M of T∃ has exactly one element of rank n for some integer n. One can then show that M cannot recognize Kk (∃). Thus the syntactic pg-pair of Kk (∃) is (T∃ , ()). If is any ranked alphabet such that 0 and at least one n (n > 1) is nonempty, if is a proper nonempty subset of , and Kk ( ) is the set of all trees in Mk containing at least a node labeled in , then Kk ( ) too has syntactic preclone T∃ . The veriﬁcation of this fact can be done using a morphism from M to M, mapping each letter of rank n to 1n if it is in , to 0n otherwise. 3.3.2. Counting the occurrences of a letter Let p, r be integers such that 0 r < p and let Kk (∃rp ) consist of the trees in Mk such that the number of vertices labeled 1n (for some n) is congruent to r modulo p. Then Kk (∃rp ) is recognizable, by a morphism into the preclone Tp (see Example 2.6). Let indeed : M → Tp be the morphism given by (0n ) = fn,0 and (1n ) = fn,1 whenever n = ∅. Then one veriﬁes that −1 (fk,r ) = Kk (∃rp ). Moreover, () contains a generating set of Tp , so is onto, and the syntactic morphism of Kk (∃rp ) factors through

306

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

. An elementary veriﬁcation then establishes that no proper quotient of Tp can recognize Kk (∃rp ), and hence the syntactic pg-pair of Kk (∃rp ) is (Tp , ()). As above, this can be extended to recognizing the set of all trees in Mk where the number of nodes labeled in some proper nonempty subset of is congruent to r modulo p. Using the same idea, one can also handle tree languages deﬁned by counting the number of occurrences of certain letters modulo p threshold q. It sufﬁces to consider, in analogy with the mod p case, the languages of the form Kk (∃rp,q ), and the preclone Tp,q , a sub-preclone of T (Bp+q ), whose rank n elements are the mappings fr : (r1 , . . . , rn ) → r1 + · · · + rn + r, where the sum is taken modulo p threshold q. Note that this notion generalizes both above examples, since Tp = Tp,0 and that T∃ = T1,1 . 3.3.3. Identiﬁcation of a path Let Kk (path) be the set of all the trees in Mk such that all the vertices along at least one maximal path from the root to a leaf are labeled 1n (for the appropriate values of n). Then Kk (path) is recognized by the preclone Tpath (see Example 2.7). Let indeed : M → Tpath be the morphism given by (0n ) = falsen , (10 ) = true0 and (1n ) = orn (n = 0). One can then verify that −1 (truek ) = Kk (path). 3.3.4. Identiﬁcation of the next modality Let Kk (next) consist of all the trees in Mk such that each maximal path has length at least two and the children of the root are labeled 1n (for the appropriate n). We show that Kk (next) is recognizable. Recall that B = {true, false}, and let : M → T (B × B) be the morphism given as follows: • (00 ) is the nullary constant (false, false)0 , • (10 ) is the nullary constant (false, true)0 , • if n > 0, then (0n ) is the n-ary map ((x1 , y1 ), . . . , (xn , yn )) → (∧i yi , false), • if n > 0, then (1n ) is the n-ary map ((x1 , y1 ), . . . , (xn , yn )) → (∧i yi , true). One can verify by structural induction that for each element x ∈ Mk , the second component of (x) is true if and only if the root of x is labeled 1n for some n, and the ﬁrst component of (x) is true if and only if every child of the root of x is labeled 1n for some n, that is, if and only if x ∈ Kk (next). Thus Kk (next) is recognized by the morphism . 4. Pseudovarieties of preclones In the usual setting of one-sorted algebras, a pseudovariety is a class of ﬁnite algebras closed under taking ﬁnite direct products, sub-algebras and quotients. Because we are dealing with preclones, which are inﬁnitely sorted, we need to consider ﬁnitary algebras instead of ﬁnite ones, and to adopt more constraining closure properties in the deﬁnition. (We discuss in Remark 4.18 an alternative approach, which consists in introducing stricter ﬁniteness conditions on the preclones themselves, namely in considering only ﬁnitely generated, ﬁnitely determined, ﬁnitary preclones.) We say that a class of ﬁnitary preclones is a pseudovariety if it is closed under ﬁnite direct product, sub-preclones, quotients, ﬁnitary unions of -chains and ﬁnitary inverse limits of

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

307

-diagrams. Here, we say that a union T = n T (n) of an -chain of preclones T (n) , n 0 is ﬁnitary exactly when T is ﬁnitary. Finitary inverse limits limn T (n) of -diagrams n : T (n+1) → T (n) , n 0 are deﬁned in the same way. Remark 4.1. To be perfectly rigorous, we actually require pseudovarieties to be closed under taking preclones isomorphic to a ﬁnitary -union or to a ﬁnitary inverse limit of an -diagram of their elements. Remark 4.2. Recall that the inverse limit T of the -diagram (n )n 0 , written T = T (n) if the n : T (n+1) → T (n) are clear, is the sub-preclone of the direct prodlimn uct n T (n) whose set of elements of rank m consists of those sequences (xn )n 0 with (n) xn ∈ Tm such that n (xn+1 ) = xn , for all n 0. We call the coordinate projections p : limn T (n) → T (p) the induced projection morphisms. T

. . .π

n+1

T (n+1)

πn

n

... T (n)

...

T (0)

The inverse limit has the following universal property. Whenever S is a preclone and the morphisms n : S → T (n) satisfy n = n ◦ n+1 for each n 0, then there is a unique morphism : S → limn T (n) with n ◦ = n , for all n. This morphism maps an element s ∈ S to the sequence ( n (s))n 0 . Example 4.3. Here we show that the inverse limit of an -diagram of 1-generated ﬁnitary preclones need not be ﬁnitary. Let = {}, where has rank 1 and consider the free preclone M. Note that M has only elements of rank 1, and that M1 can be identiﬁed with the monoid ∗ . For each n 0, let ≈n be the congruence deﬁned by letting k ≈n ! if and only if k = !, or k, ! n. Let T (n) = M/≈n . Then T (n) is again -generated, and it can be identiﬁed with the monoid {0, 1, . . . , n} under addition threshold n. In particular, T (n) is a ﬁnitary preclone. Since ≈n+1 -equivalent elements of M are also ≈n -equivalent, there is a natural morphism of preclones from T (n+1) to T (n) , mapping to itself, and the inverse limit of the resulting -diagram is M itself, which is not ﬁnitary. Pseudovarieties of preclones can be characterized using the notion of division: we say that a preclone S divides a preclone T, written S < T , if S is a quotient of a sub-preclone of T. It is immediately veriﬁed that a nonempty class of ﬁnitary preclones is a pseudovariety if and only if it is closed with respect to division, binary direct product, ﬁnitary unions of -chains and ﬁnitary inverse limits of -diagrams. Example 4.4. It is immediate that the intersection of a collection of pseudovarieties of preclones is a pseudovariety. It follows that if K is a class of ﬁnitary preclones, then the pseudovariety generated by K is well deﬁned, as the least pseudovariety containing K. In

308

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

particular, the elements of this pseudovariety, written K!, can be described in terms of the elements of K, taking subpreclones, quotients, direct products, ﬁnitary unions of -chains and inverse limits of -diagrams. See Section 4.2 below. We discuss other examples in Section 5.2. We ﬁrst explore the relation between pseudovarieties and their ﬁnitely determined elements, then we discuss pseudovarieties generated by a class of preclones, and ﬁnally, we explore some additional closure properties of pseudovarieties. 4.1. Pseudovarieties and their ﬁnitely determined elements Proposition 4.5. Let S be a preclone. • S is isomorphic to the inverse limit limn S (n) of an -diagram, where each S (n) is an n-determined quotient of S. • If S is ﬁnitary, then S is isomorphic to the union of an -chain n 0 T (n) , where each T (n) is the inverse limit of an -diagram of ﬁnitely generated, ﬁnitely determined divisors of S. Proof. Let S (n) = S/∼n (where ∼n is deﬁned in Section 2.4) and let n : S → S (n) be the corresponding projection. Since ∼n+1 -related elements of S are also ∼n -related, there exists a morphism of preclones n : S (n+1) → S (n) such that n = n ◦ n+1 . Thus the n determine a morphism : S → limn S (n) , such that (s) = ( n (s))n for each s ∈ S (Remark 4.2). Moreover, since ∼n is the identity relation on the elements of S of rank at most n, we ﬁnd that for each k n, n establishes a bijection between the elements of rank k of S and those of S (n) . In particular, is injective since each element of S has rank k for some ﬁnite integer k. Furthermore, for each k n, n establishes a bijection between the elements of rank k, and it follows that each element of rank k of limn S (n) is the -image of its kth component. That is, is onto. Finally, Lemma 2.11 shows that each S (n) is n-determined. This concludes the proof of the ﬁrst statement. We now assume that S is ﬁnitary, and we let T (m) be the sub-preclone generated by the elements of S of rank at most m. Then T (m) is ﬁnitely generated, and the ﬁrst statement shows that T (m) is the inverse limit of an -diagram of ﬁnitely generated, ﬁnitely determined quotients of T (m) , which are in particular divisors of S. The following corollary follows immediately: Corollary 4.6. Every pseudovariety of preclones is uniquely determined by its ﬁnitely generated, ﬁnitely determined elements. We can go a little further, and show that a pseudovariety is determined by the syntactic preclones it contains. Proposition 4.7. Let S be a ﬁnitely generated, k-determined, ﬁnitary preclone, let A be a ﬁnite ranked set and let : AM → S be an onto morphism. Then S divides the direct product

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

309

of the syntactic preclones of the languages −1 (s), where s runs over the ( ﬁnitely many) elements of S of rank at most k. Proof. It sufﬁces to show that if x, y ∈ AMn for some n 0 and x ∼−1 (s) y for each s ∈ S! , ! k, then (x) = (y). First, suppose that x and y have rank n k, and let s = (x). Then (1, 0, n, 0) is a −1 (s)context of x, so it is also a −1 (s)-context of y, and we have (y) = s = (x). Now, if x and y have rank n > k, let v ∈ Sn,p for some p k. Since is onto, there exists an element z ∈ AMn,p such that (z) = v. For each s ∈ S! , ! k, we have x ∼−1 (s) y, and hence also x · z ∼−1 (s) y · z. The previous discussion shows therefore that (x · z) = (y · z), that is, (x) · v = (y) · v. Since S is k-determined, it follows that (x) = (y). Corollary 4.8. Every pseudovariety of preclones is uniquely determined by the syntactic preclones it contains. Proof. This follows directly from Corollary 4.6 and Proposition 4.7.

4.2. The pseudovariety generated by a class of preclones Let I, H, S, P, L, U denote, respectively, the operators of taking all isomorphic images, homomorphic images, subpreclones, ﬁnite direct products, ﬁnitary inverse limits of an diagram, and ﬁnitary -unions over a class of ﬁnitary preclones. The following fact is a special case of a well-known result in universal algebra. Lemma 4.9. If K is a class of ﬁnitary preclones, then HSP(K) is the least class of ﬁnitary preclones containing K, closed under homomorphic images, subpreclones and ﬁnite direct products. Next, we observe the following elementary facts: Lemma 4.10. For all classes K of ﬁnitary preclones, we have (1) PL(K) ⊆ LP(K), (3) SL(K) ⊆ LS(K),

(2) PU(K) ⊆ UP(K), (4) SU(K) ⊆ US(K).

Proof. To prove the ﬁrst inclusion, suppose that S is the direct product of the ﬁnitary preclones S (i) , i ∈ [n], where each S (i) is a limit of an -diagram of preclones S (i,k) in K determined by a family of morphisms i,k : S (i,k+1) → S (i,k) , k 0. For each k, let T (k) be the direct product i∈[n] S (i,k) , and let k = i∈[n] i,k : T (k+1) → T (k) . It is a routine matter to verify that S is isomorphic to the limit of the -diagram determined by the family of morphisms k : T (k+1) → T (k) , k 0. Thus, S ∈ LP(K). (i,k) Now, for each i ∈ [n], preclones in K. Let us as let (S (i,k))k 0 be an -chain of ﬁnitary (i) is ﬁnitary, and let S = i∈[n] S (i) . If s = (s1 , . . . , sn ) ∈ sume that each S = k 0 S S, then each si belongs to S (i,ki ) , for some ki . Thus s ∈ i∈[n] S (i,k) , where k = max ki , and we have shown that S ∈ k 0 i∈[n] S (i,k) , so that S ∈ UP(K).

310

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

To prove the third inclusion, let T be a sub-preclone of limn S (n) , the ﬁnitary inverse limit of an -diagram n : S (n+1) → S (n) of elements of K. Let n : T → S (n) be the natural projections (restricted to T), and let T (n) = n (T ). Then T (n) is a subpreclone of S (n) for each n. Moreover, the restrictions of the n to T (n+1) deﬁne an -diagram of subpreclones of elements of K, and it is an elementary veriﬁcation that T = limn T (n) . Since T is ﬁnitary, we have proved that T ∈ LS(K). As for the last inclusion, let T be a subpreclone of a ﬁnitary union k 0 S (k) with S (k) ∈ K, for all k 0. Let T (k) = S (k) ∩ T for each k 0. Then each T (k) is a subpreclone of S (k) and T = k 0 T (k) . It follows that T ∈ US(K). Our proof of the third inclusion actually yields the following result. Corollary 4.11. If a ﬁnitary preclone S embeds in an inverse limit limn S (n) , then S is isomorphic to a ( ﬁnitary) inverse limit limn T (n) , where each T (n) is a ﬁnitary sub-preclone of S (n) . We can be more precise than Lemma 4.10 for what concerns ﬁnitely generated, ﬁnitely determined preclones. Lemma 4.12. Let T be a preclone which embeds in the union of an -chain (S (n) )n . If T is ﬁnitely generated, then T embeds in S (n) for all large enough n. Proof. Since T is ﬁnitely generated, its set of generators is entirely contained in some S (k) , and hence T embeds in each S (n) , n k. Lemma 4.13. Let T be a quotient of the union of an -chain (S (n) )n . If T is ﬁnitely generated, then T is a quotient of S (n) for all large enough n. Proof. Let be a surjective morphism from S = n S (n) onto T. Since T is ﬁnitely generated, there exists an integer k such that (S (k) ) contains all the generators of T, and this implies that the restriction of to S (k) (and to each S (n) , n k) is onto. Lemma 4.14. Let T be a preclone which embeds in the inverse limit limn S (n) of an diagram, and for each n, let n : T → S (n) be the natural projection (restricted to T). If T is ﬁnitary, then for each k, n is k-injective for all large enough n. If in addition T is ﬁnitely determined, then T embeds in Sn for all large enough it n. Proof. Since T is ﬁnitary, Tk is ﬁnite for each integer k, and hence there exists an integer nk such that n is injective on Tk for each n nk . In particular, for each integer k, n is k-injective for all large enough n. The last part of the statement follows from Lemma 2.12. Lemma 4.15. Let T be a quotient of the ﬁnitary inverse limit limn S (n) of an -diagram. If T is ﬁnitely determined, then T is a quotient of a sub-preclone of one of the S (n) .

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

311

Proof. Let S = limn S (n) and let n : S → S (n) be the corresponding projection. Let also : S → T be an onto morphism, and let k 0 be an integer such that T is k-determined. By Lemma 4.14, n is k-injective for some integer n. Consider the preclone n (S) ⊆ S (n) . Then we claim that the assignment n (s) → (s) deﬁnes a surjective morphism n (S) → T . The only nontrivial point is to verify that this assignment is well deﬁned. Let s, s ∈ Sp and suppose that n (s) = n (s ). We want to show that (s) = (s ), and for that purpose, we show that (s) · v = (s ) · v for each v ∈ Tp,! , ! k (since T is k-determined). Since is onto, there exists w ∈ Sp,! such that v = (w). In particular, (s) · v = (s · w) and similarly, (s ) · v = (s · w). Moreover, we have n (s · w) = n (s · w). Now s · w and s · w lie in S! , and n is injective on S! , so s · w = s · w. It follows that (s) · v = (s ) · v, and hence (s) = (s ). We are now ready to describe the ﬁnitely generated, ﬁnitely determined elements of the pseudovariety generated by a given class of ﬁnitary preclones. Proposition 4.16. Let K be a class of ﬁnitary preclones. A ﬁnitely generated, ﬁnitely determined, ﬁnitary preclone belongs to the pseudovariety K! generated by K if and only if it divides a ﬁnite direct product of preclones in K, i.e., it lies in HSP(K). Proof. It is easily veriﬁed that K! = n Vn , where V0 = K and Vn+1 = HSPUHSPL (Vn ). We show by induction on n that if T a ﬁnitely generated, ﬁnitely determined preclone in Vn , then T ∈ HSP(K). The case n = 0 is trivial and we now assume that T ∈ Vn+1 . By Lemma 4.10, T lies in HUSPHLSP(Vn ). Then Lemma 4.13 shows that T is in fact in HSPHLSP(Vn ), which is equal to HSPLSP(Vn ) by Lemma 4.9, and is contained in HLSP(Vn ) by Lemma 4.10 again. Now Lemma 4.15 shows that T lies in fact in HSP(Vn ), and we conclude by induction that T ∈ HSP(K). Corollary 4.17. If K is a class of ﬁnitary preclones, then K! = IULHSP(K). Proof. The containment IULHSP(K) ⊆ K! is immediate. To show the reverse inclusion, we consider a ﬁnitary preclone T ∈ K!. Then T = T (n) , where T (n) denotes the subpreclone of T generated by the elements of rank at most n. Now each T (n) is ﬁnitely generated, and by Proposition 4.5, it is isomorphic to the inverse limit of the -diagram formed by the ﬁnitely generated, ﬁnitely determined preclones Tn /∼m , m 0. By the Proposition 4.16, each of these preclones is in HSP(K), so T ∈ IULHSP(K). Remark 4.18. As indicated in the ﬁrst paragraph of Section 4, Proposition 4.16 hints at an alternative treatment of the notion of pseudovarieties of preclones, limited to the consideration of ﬁnitely generated, ﬁnitely determined, ﬁnitary preclones. Say that a class K of ﬁnitely generated, ﬁnitely determined, ﬁnitary preclones is a relative pseudovariety if whenever a ﬁnitely generated, ﬁnitely determined, ﬁnitary preclone S divides a ﬁnite direct product of preclones in K, then S is in fact in K. For each pseudovariety V, the class Vfin of all its ﬁnitary, ﬁnitely generated, ﬁnitely determined members is a relative pseudovariety, and the map V → Vfin is injective by Corollary 4.6. Moreover, Proposition 4.16 can be used

312

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

to show that this map is onto. That is, the map V → Vfin is an order-preserving bijective correspondence (with respect to the inclusion order) between pseudovarieties and relative pseudovarieties of preclones. Proposition 4.16 also leads to the following useful result. Recall that a ﬁnitely generated preclone S is effectively given if we are given a ﬁnite generating set A as transformations of ﬁnite arity of a given ﬁnite set Q, see Section 2.3.1. Corollary 4.19. Let S and T be effectively given, ﬁnitely generated, ﬁnitely determined preclones. Then it is decidable whether T belongs to the pseudovariety of preclones generated by S. Proof. Let A (resp. B) be the given set of generators of S (resp. T) and let V be the pseudovariety generated by S. By Corollary 4.16, T ∈ V if and only if T divides a direct power of S, say, T < S m . Since B is ﬁnite, almost all the sets Bk are empty. We claim that the exponent m can be bounded by |Ak ||Bk | . Bk =∅

Indeed, there exists a sub-preclone S ⊆ S m and an onto morphism S → T . Since B generates T, we may assume without loss of generality that this morphism deﬁnes a bijection from a set A of generators of S to B, and in particular, we may identify Bk with Ak , a subset of Am k . Next, one veriﬁes that if m is greater than the bound in the claim, then there exist 1 i < j m such that for all k and x ∈ Ak , the ith and the jth components of x are equal—but this implies that the exponent can be decreased by 1. Thus, it sufﬁces to test whether or not T divides S m , where m is given by the above formula. But as discussed above, this holds if and only if Am contains a set A and a rank preserving bijection from A to B which can be extended to a morphism from the sub-preclone of S m generated by A to T. By Proposition 2.14, and since S and T are effectively given and T is ﬁnitely determined, this can be checked algorithmically. 4.3. Closure properties of pseudovarieties Here we record additional closure properties of pseudovarieties of preclones. Lemma 4.20. Let V be a pseudovariety of preclones and let T be a ﬁnitary preclone. If T embeds in the inverse limit of an -diagram of preclones in V, then T ∈ V. Proof. The lemma follows immediately from Corollary 4.11.

Proposition 4.21. Let V be a pseudovariety of preclones and let S be a ﬁnitary preclone. If for each n 0, there exists a morphism n : S → S (n) such that S (n) ∈ V and n is injective on elements of rank exactly n, then S ∈ V. Proof. Without loss of generality we may assume that each n is surjective. For each n 0, consider the direct product T (n) = S (0) × · · · × S (n) , which is in V, and let n denote the

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

313

natural projection of T (n+1) onto T (n) . Let also n : S → T (n) be the target tupling of the morphisms i , i n, let T be the inverse limit limn T (n) determined by the morphisms n , and let n : T → T (n) be the corresponding projection morphisms. Note that each n is n-injective, and equals the composite of n+1 and n . Thus, there exists a (unique) morphism : S → T such that the composite of and n is n for each n. It follows from the n-injectivity of each n , that is injective. Thus, S embeds in the inverse limit of an -diagram of preclones in V, and we conclude by Lemma 4.20. We note the following easy corollary of Proposition 4.21: Corollary 4.22. Let V be a pseudovariety of preclones. Let S be a ﬁnitary preclone such that distinct elements of equal rank can be separated by a morphism from S to a preclone in V. Then S ∈ V. Proof. For any distinct elements f, g of equal rank n, let f,g : S → Sf,g be a morphism such that Sf,g ∈ V and f,g (f ) = f,g (g). For any integer n, let n be the target tupling of the ﬁnite collection of morphisms f,g with f, g ∈ Sn . Then n is injective on Sn and we conclude by Proposition 4.21. 4.4. Pseudovarieties of pg-pairs The formal treatment pseudovarieties of pg-pairs is similar to the above treatment of pseudovarieties of preclones—but for the following remarks. We deﬁne a pseudovariety of pg-pairs to be a class of ﬁnitary pg-pairs closed under ﬁnite direct product, sub-pg-pairs, quotients and ﬁnitary inverse limits of -diagrams. Our ﬁrst remark is that, in this case, we do not need to mention ﬁnitary unions of -chains: indeed, ﬁnitary pg-pairs are ﬁnitely generated, so the union of an -chain, if it is ﬁnitary, amounts to a ﬁnite union. Next, the notion of inverse limit of -diagrams of pg-pairs needs some clariﬁcation. Consider a sequence of morphisms of pg-pairs, say n : (S (n+1) , A(n+1) ) → (S (n) , A(n) ). That is, each n is a preclone morphism from S (n+1) to S (n) , which maps A(n+1) into A(n) . We can then form the inverse limit limn S (n) of the -diagram determined by the preclone morphisms n , and the inverse limit limn A(n) determined by the set mappings n . The inverse limit limn (S (n) , A(n) ) of the -diagram determined by the morphisms of pg-pairs n (as determined by the appropriate universal limit, see Remark 4.2) is the pg-pair (S, A), where A = limn A(n) and S is the subpreclone of limn S (n) generated by A. Recall that this inverse limit is called ﬁnitary exactly when S is ﬁnitary and A is ﬁnite (see Example 4.3). We now establish the close connection between this inverse limit and the inverse limit of the underlying -diagram of preclones, when the latter is ﬁnitary. Proposition 4.23. Let n : (S (n+1) , A(n+1) ) → (S (n) , A(n) ) be an -diagram of pg-pairs. Let S = limn S (n) and let and (T , A) = limn (S (n) , A(n) ). If S is ﬁnitary, then S = T . Proof. We need to show that A generates S. Without loss of generality, we may assume that each n maps A(n+1) surjectively onto A(n) , and we denote by n the restriction of n to

314

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

A(n+1) . By deﬁnition, A is the inverse limit of the -diagram given by the n , and we denote by n : A → A(n) the corresponding projection. We also denote by n and n the extensions of these mappings to preclone morphisms A(n+1) M → A(n) M and AM → A(n) M. It is not difﬁcult to verify that AM is the inverse limit of the -diagram given by the n , and that the n are the corresponding projections.

Moreover, each k is onto (even from A to A(k) ). Let indeed ak ∈ A(k) . Since the n are onto, we can deﬁne by induction a sequence (an )n k such that n (an+1 ) = an for each n k. This sequence can be completed with the iterated images of ak by k−1 , . . . , 0 to yield an element of A whose kth projection is ak . Since A(n) generates S (n) , the morphism n : A(n) M → S (n) induced by idA(n) is surjective. Moreover, the composites n ◦ n+1 and n ◦ n coincide.

It follows that the morphisms n ◦ n : AM → S (n) and n ◦ n+1 ◦ n+1 coincide, and hence there exists a morphism : AM → S such that n ◦ = n ◦ n for each n. Since n and n are onto, it follows that each n is surjective. We now use the fact that S is ﬁnitary. By Lemma 4.14, n is k-injective for each large enough n. Now let s ∈ Sk . We want to show that s ∈ (AM). Let nk be such that n is k-injective for each n nk . We can choose an element tnk ∈ A(nk ) M such that

nk (tnk ) = nk (s). Then, by induction, we can construct a sequence (tn )n of elements such that n (tn+1 ) = tn for each n 0. We need to show that n (tn ) = n (s) for each n. This equality is immediate for n nk , and we assume by induction that it holds for some n nk . We have

n ( n+1 (tn+1 )) = n ( n (tn+1 )) = n (tn ) = n (s) = n ( n+1 (tn+1 )). Since n and n+1 are surjective, since they are injective on Sk , and since n ◦ n+1 = n , (n+1) we ﬁnd that n is injective on Sk , and hence n+1 (tn+1 ) = n+1 (s), as expected. Thus (tn )n ∈ AM and (t) = s, which concludes the proof that S is generated by A.

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

315

5. Varieties of tree languages Let V = (V,k ),k be a collection of nonempty classes of recognizable tree languages L ⊆ Mk , where runs over the ﬁnite ranked alphabet and k runs over the nonnegative integers. We call V a variety of tree languages, or a tree language variety, if each V,k is closed under the Boolean operations, and V is closed under inverse morphisms between free preclones generated by ﬁnite ranked sets, and under quotients deﬁned as follows. Let L ⊆ Mk be a tree language, let k1 and k2 be nonnegative integers, u ∈ Mk1 +1+k2 and v ∈ Mn,k . Then the left quotient (u, k1 , k2 )−1 L and the right quotient Lv −1 are deﬁned by (u, k1 , k2 )−1 L = {t ∈ Mn | u · (k1 ⊕ t ⊕ k2 ) ∈ L} where k = k1 + n + k2 , Lv −1 = {t ∈ Mn | t · v ∈ L}, that is, (u, k1 , k2 )−1 L is the set of elements of Mn for which (u, k1 , n, k2 ) is an L-context, and Lv −1 is the set of elements of Mn for which (1, 0, v, 0) is an L-context. Below we will write just u−1 L for (u, k1 , k2 )−1 L if k1 and k2 are understood, or play no role. A literal variety of tree languages is deﬁned similarly, but instead of closure under inverse morphisms between ﬁnitely generated free preclones, we require closure under inverse morphisms between ﬁnitely generated free pg-pairs. Thus, if L ⊆ Mk is in a literal variety V and : M → M is a preclone morphism with , ﬁnite and () ⊆ , then −1 (L) is also in V. 5.1. Varieties of tree languages vs. pseudovarieties of preclones The aim of this section is to prove an Eilenberg correspondence between pseudovarieties of preclones (resp. pg-pairs), and varieties (resp. literal varieties) of tree languages. For each pseudovariety V of preclones (resp. pg-pairs), let var(V) = (V,k ),k , where V,k denotes the class of the tree languages L ⊆ Mk whose syntactic preclone (resp. pg-pair) belongs to V. It follows from Proposition 3.2 that var(V) consists of all those tree languages that can be recognized by a preclone (resp. pg-pair) in V. Conversely, if W is a variety (resp. a literal variety) of tree languages, we let psv(W) be the class of all ﬁnitary preclones (resp. pg-pairs) that only accept languages in W, i.e., −1 (F ) ⊆ Mk belongs to W, for all morphisms : M → S (resp. : (M, ) → (S, A)), k 0 and F ⊆ Sk . Theorem 5.1. The mappings var and psv are mutually inverse lattice isomorphisms between the lattice of pseudovarieties of preclones (resp. pg-paris) and the lattice of varieties (resp. literal varieties) of tree languages. Proof. We only prove the theorem for pseudovarieties of pg-pairs and literal varieties of tree languages. It is clear that for each pseudovariety V of ﬁnitary pg-pairs, if var(V) = (V,k ),k , then each V,k is closed under complementation and contains the languages ∅ and Mk . The closure of V,k under union follows in the standard way from the closure of V under direct product: if L, L ⊆ Mk are recognized by morphisms into pg-pairs (S, A)

316

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

and (S , A ) in V, then L ∪ L is recognized by a morphism into (S, A) × (S , A ). Thus V,k is closed under the Boolean operations. We now show that V is closed under quotients. Let L ⊆ Mk be in V,k , let : (M, ) → (S, A) be a morphism recognizing L with (S, A) ∈ V and L = −1 (L), and let F = (L). Let (u, k1 , v, k2 ) be an n-ary context, that is, u ∈ Mk1 +1+k2 , v ∈ Mn,! and k1 +!+k2 = k. Now let F = {f ∈ S! | (u) · (k1 ⊕ f ⊕ k2 ) ∈ F }. Then for any t ∈ M! , (t) ∈ F if and only if (u) · (k1 ⊕ (t) ⊕ k2 ) ∈ F , if and only if (u · (k1 ⊕ t ⊕ k2 )) ∈ F iff u · (k1 ⊕ t ⊕ k2 ) ∈ L. Thus, −1 (F ) = (u, k1 , k2 )−1 L, which is therefore in V,! . Now let F = {f ∈ Mn : f · (v) ∈ L}. It follows as above that Lv −1 = −1 (F ) and hence Lv −1 ∈ V,n . Before we proceed, let us observe that we just showed the following: if L ⊆ Mk is a recognizable tree language, then for each n 0 there are only ﬁnitely many distinct sets of the form ((u, k1 , k2 )−1 L)v −1 , where (u, k1 , v, k2 ) is an n-ary context of Mk . Next, let : (M, ) → (M, ) be a morphism of pg-pairs and L ⊆ Mk . If L is recognized by a morphism : (M, ) → (S, A), then −1 (L) is recognized by the composite morphism ◦ , and the closure of V by inverse morphisms between free pgpairs follows immediately. Thus the mapping var does associate with each pseudovariety of pg-pairs a literal variety of tree languages, and it clearly preserves the inclusion order. Now consider the mapping psv: we ﬁrst verify that if W is a literal variety of tree languages, then the class psv(W) is a pseudovariety. Recall that, if (S, A) < (T , B), then any language recognized by (S, A) is also recognized by (T , B), so if each language recognized by (T , B) belongs to W, then the same holds for (S, A). Note also that any language recognized by the direct product (S, A) × (T , B) is a ﬁnite union of intersections of the form L∩M, where L is recognized by (S, A) and M by (T , B); thus psv(W) is closed under binary direct products. Finally, if (S, A) = limn (S (n) , A(n) ) is the ﬁnitary inverse limit of an -diagram of ﬁnitary pg-pairs, then Lemma 4.14 shows that the languages recognized by (S, A) are recognized by almost all of the (S (n) , A(n) ). Thus (S, A) ∈ psv(W), which concludes the proof that psv(W) is a pseudovariety of pg-pairs. Let W be a literal variety of tree languages, and let V = var(psv(W)). We now show that V = W. Since V consists of all the tree languages recognized by a pg-pair in W = psv(W), it is clear that V ⊆ W. Now let L ∈ W,k , and let (ML , AL ) be its syntactic pg-pair. To prove that (ML , AL ) ∈ W, it sufﬁces to show that if : (M, ) → (ML , AL ) is a morphism of pg-pairs and x ∈ ML , then −1 (x) ∈ W. Since a morphism of pg-pairs maps generators to generators, up to renaming and identifying letters (which can be done by morphisms between free pg-pairs), we may assume that is the syntactic morphism of L. Thus −1 (x) is an equivalence class [w] in the syntactic congruence of L, and hence −1 (x) = ((u, k1 , k2 )−1 L)v −1 w∈((u,k1 ,k2 )−1 L)v −1

∩

w∈((u,k / 1 ,k2

)−1 L)v −1

((u, k1 , k2 )−1 L)v −1 ,

where L denotes the complement of L. If x has rank n, the intersections in this formula run over n-ary contexts (u, k1 , v, k2 ), and as observed above, these intersections are in fact ﬁnite. It follows that −1 (x) ∈ V. This concludes the veriﬁcation that V = W, so var ◦ psv is the identity mapping, and in particular var is surjective and psv is injective.

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

317

It is clear that both maps var and psv preserve the inclusion order. In order to conclude that they are mutually reciprocal bijections, it sufﬁces to verify that var is injective. If V and W are pseudovarieties such that var(V) = var(W) = V, then a tree language is in V if and only if its syntactic preclone is in V, if and only if its syntactic preclone is in W. Thus V and W contain the same syntactic preclones, and it follows from Corollary 4.8 that V = W. Remark 5.2. Three further variety theorems for ﬁnite trees exist in the literature. They differ from the variety theorem proved above since they use a different notion of morphism, quotient, and syntactic algebra. The variety theorem in [1,35] is formulated for tree language varieties over some ﬁxed ranked alphabet and the morphisms are homomorphisms between ﬁnitely generated free algebras, whereas the “general variety theorem” of [36] allows for tree languages over different ranked alphabets and a more general notion of morphism, closely related to the morphisms of free pg-pairs. On the other hand, the morphisms in [19] are much more general than those in either [1,35,36] or the present paper, they even include nonlinear tree morphisms that allow for the duplication of a variable. Another difference is that the tree language varieties in [1,35,36] involve only left quotients, whereas the one presented here (and the varieties of [19]) are deﬁned using two-sided quotients. The notion of syntactic algebra is also different in these papers: minimal tree automata in [1,35], a variant of minimal tree automata in [36], minimal clone (or Lawvere theory) in [19], and minimal preclone, or pg-pair, here. We refer to [19, Section 14] for a more detailed comparative discussion. As noted above, the abundance of variety theorems for ﬁnite trees is due to the fact that there are several reasonable ways of deﬁning morphisms and quotients, and a choice of these notions is reﬂected by the corresponding notion of syntactic algebra. No variety theorem is known for the 3-sorted algebras proposed in [41]. 5.2. Examples of varieties of tree languages 5.2.1. Small examples As a practice example, we describe the variety of tree languages associated with the pseudovariety T∃ ! generated by T∃ (see Section 2.3.2). Let be a ﬁnite ranked alphabet and let L ⊆ Mk be a tree language accepted by a preclone in T∃ !. Then the syntactic preclone S of L lies in T∃ !. Recall that a syntactic preclone is ﬁnitely generated and ﬁnitely determined: it follows from Proposition 4.16 that S divides a product of a ﬁnite number of copies of T∃ . By a standard argument, L is therefore a (positive) Boolean combination of languages recognized by a morphism from M to T∃ . Now let : M → T∃ be a morphism. As discussed in Section 3.3, a tree language in M recognized by is either of the form Kk ( ) for some ⊆ , or it is the complement of such a language. From there, and using the same reasoning as in the analogous case concerning word languages, one can verify that a language L ∈ Mk is accepted by a preclone in T∃ ! if and only if L is a Boolean combination of languages of the form Kk ( ) ( ⊆ ), or equivalently, L is a Boolean combination of languages of the form Lk ( ), ⊆ , where Lk () is the set of all -trees of rank k, for which the set of node labels is exactly .

318

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

Similarly—and referring again to Section 3.3 for notation—one can give a description of the variety of tree languages associated with the pseudovariety Tp !, or the pseudovariety Tp,q !, using the languages of the form Kk (∃rp ) or Kk (∃rp,q ) instead of the Kk (∃). 5.2.2. FO[Succ]-deﬁnable tree languages In a recent paper [3], Benedikt and Ségouﬁn considered the class of FO[Succ]-deﬁnable tree languages. Note that the logical language used in FO[Succ] does not allow the predicate <, and FO[Succ] is a fragment of FO[<]. We refer the reader to [3] for precise deﬁnitions, and we point out here that the characterization established there can be expressed in the framework developed in the present paper. More precisely, the results of Benedikt and Ségouﬁn establish that FO[Succ]-deﬁnable tree languages form a variety of languages, and that the corresponding pseudovariety of preclones consists of the preclones S such that (1) the semigroup S1 satisﬁes x ! = x !+1 and exfyezf = ezfyexf for all elements e, f, x, y, z such that e = e2 and f = f 2 and for ! = |S1 |; (2) for each x ∈ S2 , e ∈ S1 such that e = e2 , and s, t ∈ S0 , x ·(e ·s ⊕e ·t) = x ·(e ·t ⊕e ·s). In particular, FO[Succ]-deﬁnability is decidable for regular tree languages. It is clearly argued in [3] that FO[Succ]-deﬁnable tree languages are exactly the locally threshold testable languages, for general model-theoretic reasons, but that this fact alone does not directly yield a decision procedure. The result stated above is analogous to the characterization of FO[Succ]-deﬁnability for recognizable word languages—more precisely, Condition (1) sufﬁces for languages of words and their syntactic semigroups. Condition (2), which makes sense in trees but not in words, must be added to the other one to characterize FO[Succ]-deﬁnability for tree languages. 5.2.3. Some classes of languages deﬁnable in modal logic Boja´nczyk and Walukiewicz also characterized interesting logically deﬁned classes of tree languages [5]. Again, their results are not couched in terms of preclones, but they can conveniently be expressed in this way. These authors consider three fragments of CTL∗ : TL(EX), TL(EF) and TL(EX + EF). Here EX (resp. EF) denotes the modality whereby a tree t satisﬁes EX (resp. EF) if some child of the root (resp. some node properly below the root) of t satisﬁes . The set of formulas constructed using one or both of these modalities, plus Boolean operations and letter constants form the logical languages TL(EX), TL(EF) and TL(EX + EF). Boja´nczyk and Walukiewicz ﬁrst observe that a tree language L is TL(EX)-deﬁnable if and only if there exists an integer k such that membership of a tree t in L depends only on the fragment of t consisting of the nodes of depth at most k. They then show that these tree languages form a variety, and the corresponding pseudovariety of preclones consists of the preclones S such that the semigroup S1 satisﬁes ex = e for each idempotent e. Note that this is exactly the same characterization as for languages of ﬁnite words [31]. For the characterization of TL(EF)-deﬁnable languages, let us ﬁrst deﬁne the following relation on a preclone S : if s, t ∈ Sn , we say that s t if s = u · t for some u ∈ S1 . It is easily veriﬁed that is a quasi-order. (The direction of the order is reversed from that used by Boja´nczyk and Walukiewicz, to enhance the analogy with the R- and L-orders in semigroup theory.)

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

319

Now let (S, A) be the syntactic pg-pair of a tree language L ⊆ M0 . Then L is TL(EF)deﬁnable if and only if • S1 satisﬁes the pseudoidentity v(uv) = (uv) (where x designates the unique idempotent which is a power of x); that is, S1 is L-trivial, and equivalently, the relation is an order relation; • a · (s1 ⊕ · · · ⊕ sn ) = a · (s (1) ⊕ · · · ⊕ s (n) ) for each a ∈ An and s1 , . . . , sn ∈ S0 , and for each permutation of [n]; • a · (s1 ⊕ s2 ⊕ s3 ⊕ · · · ⊕ sn ) = a · (s2 ⊕ s2 ⊕ s3 ⊕ · · · ⊕ sn ) for each a ∈ An and s1 , . . . , sn ∈ S0 such that s2 s1 ; • if b, c ∈ Ap and y ∈ Sp,0 are such that, for each d ∈ Ap , we have d · (b · y ⊕ · · · ⊕ b · y) = d · y = d · (c · y ⊕ · · · ⊕ c · y), then a · (z ⊕ b · y) = a · (z ⊕ c · y) for each a ∈ An and z ∈ Sn−1,0 . This characterization directly implies the decidability of TL(EF)-deﬁnability. Boja´nczyk and Walukiewicz also give an interesting characterization of the TL(EF)deﬁnable languages in terms of so-called type dependency. In particular, they show that a tree language is TL(EF)-deﬁnable if and only if its syntactic preclone S is such that, whenever a is the syntactic equivalence class of a letter in n , and the ti ’s are syntactic equivalence classes of trees in M0 , then the value of a product a · (t1 ⊕ · · · ⊕ tn ) depends only on a and on the set {t | ti t for some 1 i n}. The characterization of TL(EX+EF)-deﬁnable languages given in [5] can also be restated in similar—albeit more complex—terms. 5.2.4. FO[<]-deﬁnable tree languages The characterization and decidability of FO[<]-deﬁnable regular tree languages is an open problem that has attracted some efforts along the years, as discussed in the introduction. We obtained an algebraic characterization of FO[<]-deﬁnable regular tree languages in terms of pseudovarieties of preclones, as is reported in [20]. A detailed report of this result will appear in [21], and the present paper lays the foundations for this proof. Let us note here that this characterization is analogous to the characterization of FO[<]deﬁnable languages of ﬁnite words in the following sense: it is established in [21] that FO[<]-deﬁnable tree languages form a variety of tree languages, whose associated pseudovariety of preclones is the least pseudovariety containing the preclone T∃ and closed under a suitable notion of block product. It was pointed out in Example 2.5 that the rank 1 elements of T∃ form the 2-element monoid U1 = {1, 0}, and it is a classical result of language theory that the least pseudovariety of monoids containing U1 and closed under block product is associated with the variety of FO[<]-deﬁnable word languages [37]. It is also known that, in the word case, this pseudovariety is exactly that of aperiodic monoids, and membership in it is decidable, which shows that FO[<]-deﬁnability is decidable for recognizable word languages. At the moment, we do not have an analogue of this result, and we do not know whether FO[<]-deﬁnability is decidable for regular tree languages. Our result [20,21] actually applies to a larger class of logically deﬁned regular tree languages, based on the use of Lindström quantiﬁers. First-order logic is thus a particular case of our result, which also yields (for instance) an algebraic characterization of ﬁrst-order logic with modular quantiﬁers added.

320

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

References [1] J. Almeida, On pseudovarieties, varieties of languages, ﬁlters of congruences, pseudoidentities and related topics, Algebra Univ. 27 (1990) 333–350. [2] A. Arnold, M. Dauchet, Théorie des magmoïdes I and II, RAIRO Theoret. Inform. Appl. 12 (1978), 235–257; 3 (1979) 135–154 (in French). [3] M. Benedikt, L. Ségouﬁn, Regular tree languages deﬁnable in FO, in: V. Diekert, B. Durand (Eds.), STACS 2005, Lecture Notes in Computer Science, vol. 3404, Springer, Berlin, pp. 327–339. [4] S.L. Bloom, Z. Ésik, Iteration Theories, Springer, Berlin, 1993. [5] M. Boja´nczyk, I. Walukiewicz, Characterizing EF and EX tree logics, in: P. Gardner, N. Yoshida (Eds.), CONCUR 2004, Lecture Notes in Computer Science, Vol. 3170, Springer, Berlin, 2004. [6] J.R. Büchi, Weak second-order arithmetic and ﬁnite automata, Z. Math. Logik Grundlagen Math. 6 (1960) 66–92. [7] J. Cohen, J.-E. Pin, D. Perrin, On the expressive power of temporal logic, J. Comput. System Sci. 46 (1993) 271–294. [8] H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, M. Tommasi, Tree Automata Techniques and Applications, available on: http://www.grappa.univ-lille3.fr/tata (release October 2002). [9] B. Courcelle, The monadic second-order logic of graphs. I. Recognizable sets of ﬁnite graphs, Inform. and Comput. 85 (1990) 12–75. [10] B. Courcelle, Basic notions of universal algebra for language theory and graph grammars, Theoret. Comput. Sci. 163 (1996) 1–54. [11] B. Courcelle, The expression of graph properties and graph transformations in monadic second order logic, in: G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformations, Vol. 1, World Scientiﬁc, Singapore, 1997, pp. 313–400. [12] B. Courcelle, P. Weil, The recognizability of sets of graphs is a robust property, Theoret. Comput. Sci. to appear. [13] K. Denecke, S.L. Wismath, Universal Algebra and Applications in Theoretical Computer Science, Chapman & Hall, New York, 2002. [14] V. Diekert, Combinatorics on Traces, Lecture Notes in Computer Science, Vol. 454, Springer, Berlin, 1987. [15] J. Doner, Tree acceptors and some of their applications, J. Comput. System Sci. 4 (1970) 406–451. [16] S. Eilenberg, Automata, Languages, and Machines, Vols. A and B, Academic Press, New York, 1974, 1976. [17] S. Eilenberg, J.B. Wright, Automata in general algebras, Inform. and Control 11 (1967) 452–470. [18] C.C. Elgot, Decision problems of ﬁnite automata design and related arithmetics, Trans. Amer. Math. Soc. 98 (1961) 21–51. [19] Z. Ésik, A variety theorem for trees and theories, Publ. Math. 54 (1999) 711–762. [20] Z. Ésik, P. Weil, On logically deﬁned recognizable tree languages, in: P.K. Pandya, J. Radhakrishnan (Eds.), Proc. FST TCS 2003, Lecture Notes in Computer Science, Vol. 2914, Springer, Berlin, 2003, pp. 195–207. [21] Z. Ésik, P. Weil, Algebraic characterization of logically deﬁned tree languages, in preparation. [22] D.M. Gabbay, A. Pnueli, S. Shelah, J. Stavi, On the temporal analysis of fairness, in: Proc. 12th ACM Symp. Principles of Programming Languages, Las Vegas, 1980, pp. 163–173. [23] F. Gécseg, M. Steiby, Tree Automata, Akadémiai Kiadó, Budapest, 1984. [24] F. Gécseg, M. Steiby, Tree languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 3, Springer, Berlin, 1997, pp. 1–68. [25] G. Grätzer, Universal Algebra, Springer, Berlin, 1979. [26] U. Heuter, First-order properties of trees, star-free expressions, and aperiodicity, in: R. Cori, M, Wirsing (Eds.), STACS 88, Lecture Notes in Computer Science, Vol. 294, Springer, Berlin, 1988, pp. 136–148. [27] J.A. Kamp, Tense logic and the theory of linear order, Ph.D. Thesis, UCLA, 1968. [28] S. MacLane, Categories for the Working Mathematician, Springer, Berlin, 1971. [29] R. McNaughton, S. Papert, Counter-Free Automata, MIT Press, Cambridge, MA, 1971. [30] J. Mezei, J.B. Wright, Algebraic automata and context-free sets, Inform. and Control 11 (1967) 3–29. [31] J.-E. Pin, Variétés de langages formels, Masson, Paris, 1984 (English translation: Varieties of formal languages, Plenum, New York, 1986).

Z. Ésik, P. Weil / Theoretical Computer Science 340 (2005) 291 – 321

321

[32] A. Potthoff, Modulo counting quantiﬁers over ﬁnite trees, in: J.-C. Raoult (Ed.), CAAP ’92, Lecture Notes in Computer Science, Vol. 581, Springer, Berlin, 1992, pp. 265–278. [33] A. Potthoff, First order logic on ﬁnite trees, in: P.D. Mosses, M. Nielsen, M.I. Schwartzbach (Eds.), TAPSOFT ’95, Lecture Notes in Computer Science, Vol. 915, Springer, Berlin, 1995, pp. 125–139. [34] M.P. Schützenberger, On ﬁnite monoids having only trivial subgroups, Inform. and Control 8 (1965) 190–194. [35] M. Steinby,A theory of tree language varieties, in: M. Nivat,A. Podelski (Eds.), TreeAutomata and Languages, North-Holland, Amsterdam, 1992, pp. 57–81. [36] M. Steinby, General varieties of tree languages, Theoret. Comput. Sci. 205 (1998) 1–43. [37] H. Straubing, Finite Automata, Formal Logic, and Circuit Complexity, Birkhaüser, Boston, MA, 1994. [38] J.W. Thatcher, J.B. Wright, Generalized ﬁnite automata theory with an application to a decision problem of second-order logic, Math. Systems Theory 2 (1968) 57–81. [39] W. Wechler, Universal algebra, EATCS Monographs on Theoretical Computer Science, Vol. 10, Springer, Berlin, 1992. [40] P. Weil, Algebraic recognizability of languages, in: J. Fiala, V. Koubek, J. Kratochvíl (Eds.), MFCS 2004, Lecture Notes in Computer Science, Vol. 3153, Springer, Berlin, 2004, pp. 149–175. [41] Th. Wilke, An algebraic characterization of frontier testable tree languages, Theoret. Comput. Sci. 154 (1996) 85–106.

Theoretical Computer Science 340 (2005) 322 – 333 www.elsevier.com/locate/tcs

Commutation with codes Juhani Karhumäkia,∗ , Michel Latteuxb , Ion Petrec a Department of Mathematics, University of Turku and Turku Centre for Computer Science, Turku 20014, Finland b LIFL, URA CNRS 369, Université des Sciences et Technologie de Lille, F-59655 Villeneuve d’Ascq, France c Department of Computer Science, Abo ˚ Akademi University and Turku Centre for Computer Science,

Turku 20520, Finland

Abstract The centralizer of a set of words X is the largest set of words C(X)commuting with X: X C(X) = C(X)X. It has been a long standing open question due to [J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London (1971).], whether the centralizer of any rational set is rational. While the answer turned out to be negative in general, see [M. Kunc, Proc. of ICALP 2004, Lecture Notes in Computer Science, Vol. 3142, Springer, Berlin, 2004, pp. 870–881.], we prove here that the situation is different for codes: the centralizer of any rational code is rational and if the code is ﬁnite, then the centralizer is ﬁnitely generated. This result has been previously proved only for binary and ternary sets of words in a series of papers by the authors and for preﬁx codes in an ingenious paper by [B. Ratoandromanana, RAIRO Inform. Theor. 23(4) (1989) 425–444.]—many of the techniques we use in this paper follow her ideas. We also give in this paper an elementary proof for the preﬁx case. © 2005 Elsevier B.V. All rights reserved. Keywords: Codes; Commutation; Centralizer; Conway’s problem; Preﬁx codes

1. Introduction The centralizer of a set of words X is the largest set of words C(X) commuting with X: XC(X) = C(X)X. It is easy to see that the centralizer is well-deﬁned for any language X — indeed, C(X) is the union of all languages commuting with X. It is important to note that for any language X, X∗ ⊆ C(X) and C(X) is a monoid. Conway raised the following problem ∗ Corresponding author.

E-mail addresses: [email protected].ﬁ (J. Karhumäki), michel.latteux@liﬂ.fr (M. Latteux), ion.petre@abo.ﬁ (I. Petre). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.037

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

323

related to centralizers, see [8, p. 55] (note that Conway uses the term “normalizer”), more than thirty years ago: Conway’s Problem. Is it true that the centralizer of any rational language is rational? This problem has recently received much attention. In a series of papers by the authors and others, see [5,10,11,13–16,26,22,23], it has been proved that the problem has indeed a positive answer for sets with at most three words and for rational preﬁx codes. It has also been proved in [14] that the centralizer of any recursive language is Co-RE. However, it has recently been proved in a breakthrough paper [18], see also [17] for related issues, that Conway’s problem has a negative answer in general: there are ﬁnite languages with non-RE centralizer. The surprising power of ﬁnite sets of words is also shown in a related result of [12], showing that the equivalence problem for ﬁnite substitutions on ab∗ c is undecidable! Ratoandromanana raised a related question in [23] concerning the commutation with codes. In a paper displaying an impressive array of technical results related to codes she proved that the commutation with preﬁx codes can be characterized as in free monoids: if X is a preﬁx code, then for any language L commuting with X, L = (X)I , where I ⊆ N and (X) is the primitive root of X. In particular, this implies that the centralizer of any preﬁx code X is (X)∗ and thus, Conway’s problem has a positive answer for rational preﬁx codes. Two conjectures are stated in [23]: Conjecture 1 (Ratoandromanana [23]). Two codes commute if and only if they have a common root. Conjecture 2 (Ratoandromanana [23]). Any code has a unique primitive root. These two conjectures, remained open until now, provide evidence that the commutation with codes has very special properties, in particular that Conway’s problem may have a positive answer for codes. We prove in this paper that this is indeed the case: Theorem 1. The centralizer of any rational code is rational. We also prove that the centralizer of any ﬁnite code is ﬁnitely generated. It is worth mentioning that throughout the paper we essentially use the techniques of [23], at times reﬁned and extended to codes rather than preﬁxes. We also give in Section 4 an elementary proof for Ratoandromanana’s result [23] that C(X) = (X)∗ , for any preﬁx code X. 2. Deﬁnitions For basic notions and results of Combinatorics on Words we refer to [3,19,20] and for those of Theory of Codes to [2]. For details on the notion of centralizer and the commutation of languages we refer to [14,15,22]. In the sequel, denotes a ﬁnite alphabet, ∗ the set of all ﬁnite words over and the set of all (right) inﬁnite words over . We denote by 1 the empty word and by |u| the

324

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

length of u ∈ ∗ . For a word u ∈ ∗ , u denotes the inﬁnite word uuu . . . , while for a language L ⊆ ∗ , L = {u1 u2 u3 . . . | un ∈ L, n 1} ⊆ . For a language L ⊆ ∗ , we denote by l(L) the length of a shortest word in L and by Lmin = {u ∈ L | |u| = l(L)}. We say that a word u is a preﬁx of a word v, denoted as u v, if v = uw, for some w ∈ ∗ . We say that u and v are preﬁx comparable if either u v, or v u. A language L is called a preﬁx code if no two words of L are preﬁx comparable. The following result is well-known. Lemma 2 (Berstel and Perrin [2], Perrin [21]). The set of preﬁx codes forms a free semigroup. In particular, any preﬁx code has a unique primitive root. For a word u and a language L, we say that v1 . . . vn is an L-factorization of u if u = v1 . . . vn and vi ∈ L, for all 1 i n. For an inﬁnite word , we say that v1 v2 . . . vn . . . is an L-factorization of if = v1 v2 . . . vn . . . and vi ∈ L, for all i 1. A relation over L is an equality u1 . . . um = v1 . . . vn , with ui , vj ∈ L, for all 1 i m, 1 j n; the relation is trivial if m = n and ui = vi , for all 1 i m. We say that L is a code if any word of ∗ has at most one L-factorization. Equivalently, L is a code if and only if all relations over L are trivial. The following simple result is often useful in our considerations. Lemma 3. For any language L ⊆ + and any u ∈ L, z ∈ C(L), (zu) ∈ L . Proof. Let z1 = z, u1 = u and for all n 1, deﬁne zn+1 ∈ C(L) and un+1 ∈ L such that zn un = un+1 zn+1 . Then, by induction on n, it follows that (z1 u1 )n = u2 u3 . . . un un+1 zn+1 zn . . . z2 , for all n 1, and so, (zu) = u2 u3 . . . un . . . ∈ L . Indeed, since 1 ∈ / L, the two inﬁnite words have arbitrarily long common preﬁxes, and so they coincide. 3. Preliminary results We prove in this section several results related to the commutation of arbitrary sets of words. We will use these results in the following sections when we discuss the commutation with codes and preﬁxes. For any sets R ⊆ ∗ , S ⊆ ∗ × ∗ and any nonnegative integer n ∈ N, we denote by R
S
We also denote by Rn , Sn the sets Rn = {u ∈ R | |u| = n},

Sn = {(u, v) ∈ S | |uv| = n}.

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

325

For two sets of words X, Y ⊆ ∗ , we say that the product XY is unambiguous if x1 y1 = x2 y2 implies x1 = x2 and y1 = y2 , for any x1 , x2 ∈ X and y1 , y2 ∈ Y . Lemma 4. Let A, B be some subsets of + such that the product AB is unambiguous. Then (i) If AB ⊆ BA, then AB = BA and the product BA is unambiguous. (ii) For any n 1, if (AB) 0 and ∈ X+ such that (xy)k ∈ X+ . Proof. It follows from Lemma 2, [23] that there exists ∈ X+ such that (yx)k ∈ X+ , for some k 1. But then (xy)k x = x(yx)k ∈ X+ . The following are results of Ratoandromanana [23] that we will use often in our considerations.

326

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

Lemma 8 (Ratoandromanana [23], Lemma 3). For any code X and any language Y such that Y X ⊆ XY or XY ⊆ Y X, if X ∩ Y = ∅, then X ⊆ Y . Lemma 9 (Ratoandromanana [23], Proposition 7). Two codes X, Y commute if and only if there are positive integers m, n such that X m = Y n . Lemma 10 (Ratoandromanana [23], Lemma 10). For any code X consider the set C(X) = {Y | Y is a code commuting with X}. Then C(X) is a commutative stable semigroup. In particular, for any two codes Y, Z commuting with X, YZ is a code and Y Z = ZY . 4. The commutation with preﬁx codes We characterize in this section the commutation with preﬁx codes, proving that for any preﬁx X, C(X) = (X)∗ and LX = XL implies L = (X)I , where (X) is the primitive root of X and I ⊆ N. These results were originally proved in Ratoandromanana [23] using ingenious combinatorial techniques on words and preﬁx codes. Following the ideas in [23], we give here simpler proofs of those results. There are two crucial ingredients in our proof. First, we observe that the products LX and XL are unambiguous for any language L commuting with X. Second, we prove that for any such L, there is a preﬁx code P(L) ⊆ L that commutes with X, thus being able to exploit the fact that the set of preﬁx codes forms a free monoid. We prove several lemmata ﬁrst. Lemma 11. For any preﬁx code X and any language L commuting with X, both LX and XL are unambiguous. Proof. The result follows from Lemma 4 and the fact that XL is necessarily unambiguous since X is a preﬁx. For a set of words A over the alphabet , let Com(A) = {L ⊆ ∗ | LA = AL} and P(A) = A \ A+ . Note that P(A) is a preﬁx code for any A and if A = ∅, then P(A) = ∅. Indeed, Amin ⊆ P(A). Lemma 12. For any preﬁx code X, if L ∈ Com(X), then P(L) ∈ Com(X). Proof. If P(L)X ⊆ XP(L), then we are done by Lemma 4. So, let us assume the contrary and let lx be a shortest word in P(L)X \ XP(L), with l ∈ P(L), x ∈ X and let n = |lx|. Then (P(L)X)
J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

327

The following result is proved in [23] in the case of codes, using some involved arguments and results. For the sake of completeness, we give here a simple proof in the case of preﬁx codes, which are the focus of this section. The techniques used here are essentially those of [23]. Lemma 13 (Ratoandromanana [23], Lemma 17). For any preﬁx code X and any language L, if Xi L = LXi , for some nonnegative integer i, then X i (L \ X ∗ ) = (L \ X ∗ )X i . Proof. Let L1 = L ∩ X ∗ , L2 = L \ X ∗ . If X i L = LXi , then Xi L1 + X i L2 = L1 X i + L2 X i . Let us assume that Xi L2 ∩ L1 X i = ∅. Then there are x1 , x2 ∈ Xi and l1 ∈ L1 , l2 ∈ L2 such that x2 l2 = l1 x1 . Thus, x2 l2 ∈ X∗ and, since X is a preﬁx code, l2 ∈ X∗ , a contradiction. Thus, Xi L2 ⊆ L2 X i . Since X i L2 is unambiguous, it follows by Lemma 4 that X i L2 = L2 X i . We are now ready to characterize the centralizer of a preﬁx code. Based on this characterization we then answer Conway’s problem and characterize the commutation with preﬁx codes. Theorem 14. Let X be a preﬁx code, (X) its primitive root, and C(X) its centralizer. Then C(X) = (X)∗ . Proof. Assume that C(X) = (X)∗ . Then, by Lemma 13, the language L = C(X) \ (X)∗ = ∅ commutes with X and so, by Lemma 12, P(L) is a preﬁx code commuting with X. Thus, P(L) = (X)t , for some nonnegative integer t. This is a contradiction since P(L) ⊆ L and L ∩ (X)∗ = ∅. Corollary 15. For any preﬁx code X, if the set of words L commutes with X, then L = i i∈I (X) , for some I ⊆ N. Proof. To prove the claim of the theorem, it is enough to prove that for any n 0, if L ∩ (X)n = ∅, then (X)n ⊆ L. This follows from [23, Lemma 18], but for the sake of completeness, we include a short proof here. Let u1 , . . . , un ∈ (X) such that u1 . . . un ∈ L and let 1 , . . . , n be arbitrary elements of (X). Let also X = (X)k , k 1. Then, since X n L = LXn and (1 . . . n )k ∈ (X)nk = Xn , it follows that u1 . . . un (1 . . . n )k ∈ Xn L = (X)kn L. Since L ⊆ (X)∗ and (X) is a preﬁx, this can only lead to a trivial (X)-relation, i.e., 1 . . . n ∈ L. Thus, (X)n ⊆ L, proving the claim. Corollary 16. Conway’s problem has an afﬁrmative answer for rational preﬁx codes: for any rational preﬁx code X, both (X) and C(X) are rational and C(X) = (X)∗ . Proof. It is not difﬁcult, see, e.g., [4] or [24], to prove that for any rational language R such that R = R0n , for some language R0 and some positive integer n, there is a rational language

328

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

R1 such that R0 ⊆ R1 , and R = R1n . Using this observation it follows that (X) and thus, also C(X) must be rational.

5. The commutation with codes We describe in this section the form of the centralizer of any code. In particular, we prove that the centralizer of any rational code is rational, thus giving a positive answer to Conway’s problem in the case of codes. It also follows that the centralizer of any ﬁnite code is ﬁnitely generated. One of the crucial ingredients in our proof is that for any code X and any language L commuting with X, the products LX and XL are unambiguous. Theorem 17. For any code X, the products XC(X) and C(X)X are unambiguous. Proof. Assume that XC(X) is ambiguous, i.e., there are x, y ∈ X, u, v ∈ C(X) such that xu = yv and x = y. By Lemma 7, there exists ∈ X+ such that (xu)k ∈ X+ . Let z = (xu)k . Then z = ((xu)k ) = (x(ux)k−1 u) = x((ux)k−1 ux) = x(wx) , where w = (ux)k−1 u ∈ C(X). As it is easy to see, for any ∈ C(X) and any t ∈ X, (t) ∈ X and so, (wx) ∈ X . Consequently, z ∈ xX . Analogously, z = ((yv)k ) ∈ yX and so, z has two different X-factorizations. It is not difﬁcult now to see that this leads to a contradiction. For the sake of completeness, we give here a simple argument on how to conclude it, but note that the same follows also from a result of [9] stating that X is a code if and only if for any ∈ X + , has exactly one X-factorization. Assume that there is a word z ∈ X+ such that z has a second X-factorization z = 1 2 , . . . , i ∈ X, 1 = z. By the pigeon hole principle, it follows that there are i < j such that 1 . . . i = zni and 1 . . . j = znj , for some nonnegative integers ni < nj and a proper preﬁx of z. It is easy to see then that 1 . . . j = znj −ni 1 . . . i , a contradiction since X is a code. Corollary 18. For any code X and any language L commuting with X, the products LX and XL are unambiguous. Proof. If XL were ambiguous, then necessarily XC(X) would be ambiguous since L ⊆ C(X). Lemma 19. Let X be a code, n a positive integer, and L a language commuting with Xn , with l(L) = l(X). Then X ⊆ L. Proof. Clearly, Lmin and Xmin are two commuting preﬁx codes and since l(L) = l(X), it follows that Lmin = Xmin .

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

329

Since X n is a code, the product X n L is unambiguous by Corollary 18. Assume now that there exists a word x ∈ X \ L. Let us consider u = xs n with s ∈ Lmin = Xmin . Then u = xs n−1 s ∈ X n L = LX n . Since x ∈ L, u ∈ (L \ X ∗ )X n = X n (L \ X ∗ ). This is a contradiction since u = xs n ∈ Xn (L ∩ X). The following result was proved in Lemma 24, [23] for preﬁx codes. We extend it here to arbitrary codes, using essentially the techniques in [23]. Lemma 20. Let X be a code and L a language commuting with X. If l(x) = kl(L), for some k > 1, then there exists a code Y such that X = Y k . Proof. Clearly, Xmin Lmin = Lmin Xmin and since l(X) = kl(L), it follows that Xmin = Lkmin . Thus, X ∩ Lk = ∅ and it follows from Lemma 8 that X ⊆ Lk . Let l0 ∈ Lmin and Y = {y ∈ L | l0k−1 y, yl0k−1 ∈ X}. We prove that Y is a code and X = Y k . Clearly, Y = ∅, e.g., Lmin ⊆ Y . Claim 1. If x = l1 . . . lk ∈ X, with li ∈ L, then l2 . . . lk l0 , l0 l1 . . . lk−1 ∈ X. Proof of Claim 1. We have u = l2 . . . lk (l0 )k ∈ Lk−1 X = XLk−1 , so u = wy, with w ∈ X and y ∈ Lk−1 . Note that x(l0 )k = l1 u = l1 wy. Since l1 w ∈ LX = XL, we deduce that l1 w = x l , for some x ∈ X, l ∈ L. Consequently, x(l0 )k = x (l y), with x, x ∈ X, (l0 )k , l y ∈ Lk . Since XLk is unambiguous by Corollary 18, it follows that l0k = l y. Now, l0 ∈ Lmin and so, l = l0 and y = l0k−1 . Then, since l2 . . . lk (l0 )k = wy, it follows that w = l2 . . . lk l0 ∈ X. The second part of Claim 1 is proved analogously. Using Claim 1, we can deduce easily Claim 2. Claim 2. If x = l1 . . . lk ∈ X, with li ∈ L, then for any i ∈ {1, . . . , k}, li l0k−1 , l0k−1 li ∈ X. Claim 3. Any word x ∈ X ⊆ Lk has a unique L-factorization in Lk . Proof of Claim 3. Assume that x = l1 l2 . . . lk = l1 l2 . . . lk ∈ X, with li , li ∈ L, for all i = 1, 2, . . . , k. Then ((l0 )k−1 l1 )(l2 . . . lk ) = ((l0 )k−1 l1 )(l2 . . . lk ), with (l0 )k−1 l1 , (l0 )k−1 l1 ∈ X according to Claim 2. Since XLk−1 is unambiguous, we obtain that (l0 )k−1 l1 = (l0 )k−1 l1 and so l1 = l1 . Then, according to Claim 1, l2 . . . lk l0 = l2 . . . lk l0 ∈ X, etc. Claim 4. If y ∈ Y and x = l1 l2 . . . lk ∈ X, with li ∈ L, then l2 . . . lk y ∈ X. Proof of Claim 4. Since Y ⊆ L, xy ∈ XL = LX and so, xy = l1 x , with x ∈ X and l1 ∈ L. We will prove that l1 = l1 . Then, x = l2 . . . lk y ∈ X, proving the claim. Clearly, x l0k−1 ∈ XLk−1 = Lk−1 X and so, x l0k−1 = l2 . . . lk x , with x ∈ X and li ∈ L. It follows from the deﬁnition of Y that u = yl0k−1 ∈ X. Consequently, (l1 l2 . . . lk )(yl0k−1 ) = xyl0k−1 = l1 x l0k−1 = (l1 l2 . . . lk )x .

330

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

Since Lk X is unambiguous by Corollary 18, it follows that l1 l2 . . . lk = l1 l2 . . . lk . Now, l1 l2 . . . lk = x ∈ X and it follows by Claim 3 that l1 = l1 , concluding the proof of Claim 4. We can prove now that X ⊆ Y k . For this, let x ∈ X. As observed in the beginning of the proof, X ⊆ Lk and so, x = l1 . . . lk , with li ∈ L. From Claim 2 it follows that li l0k−1 , l0k−1 li ∈ X for all i = 1, 2, . . . , k and so, li ∈ Y , for all i. Consequently, X ⊆ Y k . For the reverse inclusion, consider y1 , . . . , yk ∈ Y and x = l1 . . . lk ∈ X, with li ∈ L. It follows from Claim 4 by induction that li . . . lk y1 . . . yi−1 ∈ X, for all i = 2, 3, . . . , k. Thus, y1 . . . yk ∈ X, i.e., Y k ⊆ X. It follows then by Claim 3 that X = Y k . It also follows that Y is a code, concluding the proof. Lemma 21. Let X be a code and L ⊆ + be a language commuting with X. Then there exists a code Y commuting with X such that Lmin = Ymin and Y ⊆ L. Moreover, if X is rational, then Y is rational. Proof. Set t = l(X) and s = l(L). Since LX s = X s L, X s is a code and l(X s ) = tl(L), it follows from Lemma 20 that there exists a code Y such that Y t = X s . Then LY t = Y t L, with l(Y ) = l(L) and so, Lmin = Ymin ⊆ Y implying by Lemma 19 that Y ⊆ L. Moreover, from Lemma 9 we also obtain that Y is commuting with X. Observe now that if X is a rational code, then Xs and so, Y t , is a rational code. It follows then that Y is rational. The following result describes the form of all monoids commuting with a given code. Theorem 22. For any code X and any monoid M commuting with X, there exist codes C1 , . . . , Ck commuting with X such that M = (C1 ∪ · · · ∪ Ck )∗ . Moreover, if X is rational, then M is rational. Proof. Let M0 = M \ {1}. It is a result of [23] ([23, Lemma 4]) that M0 X = XM0 . Thus, by Lemma 21, there exists a code C1 ⊆ M0 commuting with X with (C1 )min = (M0 )min . Let B1 = C1 . For all i 1 consider Bi = C1 ∪ · · · ∪ Ci ⊆ M0 and Mi = M \ Bi∗ . Since M is a monoid, ∗ Bi ⊆ M and so, by Lemma 5, we have that Mi X = XMi . If Mi = ∅, then by Lemma 21 there exists a code Ci+1 ⊆ Mi commuting with X such that (Ci+1 )min = (Mi )min . Assume that for all j 1, Mj = ∅ and set d = gcd{l(Cj ) | j 1}. Then d = gcd{l(C1 ), l(C2 ), . . . , l(Cn )}, for some n 1. Clearly, by construction, l(Cp ) < l(Cp+1 ), for all p 1. Thus, there is h > n such that l(Ch ) = t1 l(C1 )+· · ·+tn l(Cn ), for some nonnegative integers t1 , . . . , tn . Let us consider Y = (C1 )t1 . . . (Cn )tn . From Lemma 10 it follows that Y is a code commuting with Ch . Since l(Ch ) = l(Y ), we get that Ymin = (Ch )min , hence Ch ∩ Y = ∅. Consequently, Ch = Y ⊆ Bn∗ , a contradiction since Ch ⊆ Mn = M \ Bn∗ . Now let k be the least integer such that Mk = ∅. Then M = (C1 ∪ . . . Ck )∗ . The second part of the claim follows from Lemma 21: C1 , . . . , Ck are rational and so, M is rational. The main result of this paper follows now as a simple consequence of Theorem 22 since the centralizer of any language is a monoid.

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

331

Theorem 23. The centralizer of any rational code is rational. The following result also follows from Theorem 22 in the case of ﬁnite codes. Theorem 24. Any monoid commuting with a ﬁnite code is ﬁnitely generated. In particular, the centralizer of a ﬁnite code is a ﬁnitely generated monoid. Proof. Let X be a ﬁnite code and M a monoid commuting with X. Then by Theorem 22 M = (C1 ∪ · · · ∪ Ck )∗ with Ci codes commuting with X, for all i = 1, 2, . . . , k. Thus, by Lemma 9, Citi = X si , for some positive integers ti , si . Thus, each Ci is ﬁnite, proving the claim.

6. Conclusions The behavior of codes under commutation is special. While the centralizer of a ﬁnite set is not necessarily recursively enumerable, we describe here the form of the centralizer of a code and prove that it is necessarily rational if the code is rational. Moreover, if the code is ﬁnite, then the centralizer is ﬁnitely generated. The crucial difference between codes and arbitrary sets of words seems to be in the fact that for a code X, the product XC(X) is unambiguous, as proved in Theorem 17. We also give in this paper a simple, self-contained proof for the case of preﬁx codes, proving that for any preﬁx code X, C(X) = (X)∗ , a result originally proved in [23]. In proving our results, we exploited a series of deep results on commutation proved in Ratoandromanana [23]. Two conjectures proposed in [23], related to commutation with codes, remain however open. Conjecture 1 (Conjecture 1, [23]). Two codes commute if and only if they have a common root. Conjecture 2 (Conjecture 2, [23]). Any code has a unique primitive root. Two other conjectures have been given in the literature in connection with the commutation of codes, see, e.g., [11,14,15]. Conjecture 3. The centralizer of a code is a free monoid. Conjecture 4. For any code X, if LX = XL, then there is a code R such that X = R m and L = R I , for some m 1, I ⊆ N. Note that the characterization conjectured above holds for the commutation of polynomials and formal power series with coefﬁcients in a ﬁeld, see [1,6,7,25]. We prove here that in fact Conjectures 1–4 are equivalent. Theorem 25. Conjectures 1–4 are equivalent.

332

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

Proof. Let X be a code. We prove ﬁrst that Conjectures 1 and 2 are equivalent. Considering that Conjecture 1 holds, assume that the code X has two distinct primitive roots Y and Z, X = Y i = Z j . It then follows from Lemma 9 that Y and Z commute and according to Conjecture 1, they have a common root. Since they are primitive, it follows that Y = Z, a contradiction. To prove the reverse implication, assume that Conjecture 2 holds and consider now two commuting codes X, Y and their unique primitive roots U, V : X = U s , Y = V t . Then, by Lemma 9, Xi = Y j , for some i, j > 0 and so, U, V are primitive roots of the code U si = V tj . It follows then from Conjecture 2 that U = V , i.e., X, Y have a common root. We prove now that Conjectures 1 and 2 imply Conjecture 3. Let Z be the primitive root of the code X. Then Z ∗ commutes with X and so, Z ∗ ⊆ C(X). Assume that C(X) \ Z ∗ = ∅. Then, by Lemma 5, C(X) \ Z ∗ commutes with X and then, by Lemma 21, it follows that there is a code Y ⊆ C(X) \ Z ∗ such that XY = Y X. Thus, by Lemma 10, Y Z = ZY and so, from Conjecture 1 it follows that there is a code R such that Y = R m , Z = R n . Since Z is primitive, we have Y = Z m , contradicting the fact that Y ∩ Z ∗ = ∅. We prove now that Conjecture 3 implies Conjecture 4. It follows from Conjecture 3 that C(X) = Z ∗ , for some code Z. It follows then from Theorem 22 that XZ = ZX and then from Lemma 9 that X i = Z j , for some i, j > 0. Consider now a language L commuting with X. Then L commutes also with Xi , i.e., with Z j . Since L ⊆ C(X) = Z ∗ , it follows from Lemma 18 of [23] that L = Z I , where I = {i 0 | Z i ⊆ L = ∅}, concluding Conjecture 4. Note that this also implies that Z is the unique primitive root of X. We prove now that Conjecture 4 implies Conjecture 1. Consider two codes X, Y such that XY = Y X. It then follows from Conjecture 4 that there is a set V such that X = V I , Y = V J , for some I, J ⊆ N. Then necessarily V is a code, I, J are singletons and V is a common root of X and Y. Acknowledgements The authors gratefully acknowledge the detailed referee reports that helped to improve the presentation of the paper. Juhani Karhumäki was supported by Academy of Finland under Grant 44087. Ion Petre was supported by Academy of Finland under Grant 203667. References [1] G. Bergman, Centralizers in free associative algebras, Trans. Amer. Math. Soc. 137 (1969) 327–344. [2] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [3] C. Choffrut, J. Karhumäki, Combinatorics of words, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 1, Springer, Berlin, 1997, pp. 329–438. [4] C. Choffrut, J. Karhumäki, On Fatou properties of rational languages, in: C. Martin-Vide, V. Mitrana (Eds.), Where Mathematics, Computer Science, Linguistics and Biology Meet, Kluwer, Dordrecht, 2000. [5] C. Choffrut, J. Karhumäki, N. Ollinger, The commutation of ﬁnite sets: a challenging problem, Theoret. Comput. Sci. 273 (1–2) (2002) 69–79. [6] P.M. Cohn, Factorization in noncommuting power series rings, Proc. Cambridge Philos. Soc. 58 (1962) 452–464. [7] P.M. Cohn, Centralisateurs dans les corps libres, in: J. Berstel (Ed.), Séries formelles, Paris, 1978, pp. 45–54. [8] J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London, 1971.

J. Karhumäki et al. / Theoretical Computer Science 340 (2005) 322 – 333

333

[9] J. Devolder, M. Latteux, I. Litovsky, L. Staiger, Codes and inﬁnite words, Acta Cybernet. 11 (1994) 241–256. [10] J. Karhumäki, Challenges of commutation: an advertisement, in: Proc. of FCT 2001, Lecture Notes in Computer Science, Vol. 2138, Springer, Berlin, 2001, pp. 15–23. [11] J. Karhumäki, M. Latteux, I. Petre, The commutation with ternary sets of words, Theory Comput. Systems, 38 (2) (2005) 161–169. [12] J. Karhumäki, L. Lisovik, The equivalence problem for ﬁnite substitutions on ab∗ c, with applications, IJFCS 14 (2003) 699–710; preliminary version in Springer Lecture Notes in Computer Science, Vol. 2380, 2002, pp. 812–820. [13] J. Karhumäki, I. Petre, On the centralizer of a ﬁnite set, in: Proc. of ICALP 2000, Lecture Notes in Computer Science, Vol. 1853, Springer, Berlin, 2000, 536–546. [14] J. Karhumäki, I. Petre, Conway’s Problem for three-word sets, Theoret. Comput. Sci. 289/1 (2002) 705–725. [15] J. Karhumäki, I. Petre, Two problems on commutation of languages, in: G. Paun, G. Rozenberg, A. Salomaa (Eds.), Current trends in Theoretical Computer Science (The Challenge of the New Century) Vol. 2, World Scientiﬁc 2004, pp. 477–494. [16] J. Karhumäki, I. Petre, The branching point approach to Conway’s problem, Lecture Notes in Computer Science, Vol. 2300, Springer, Berlin, 2002, pp. 69–76. [17] M. Kunc, Regular solutions of language inequalities and well quasi-orders, in: Proc. of ICALP 2004, Lecture Notes in Computer Science, Vol. 3142, Springer, Berlin, 2004, pp. 870–881; ﬁnal version in Theoret. Comput. Sci. (2005), to appear. [18] M. Kunc, The power of commuting with ﬁnite sets of words, Lecture Notes in Computer Science, Vol. 3404, Springer, Berlin, 2005, pp. 569–580. [19] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983. [20] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002. [21] D. Perrin, Codes conjugués, Inform. Control 20 (1972) 222–231. [22] I. Petre, Commutation problems on sets of words and formal power series, Ph.D. Thesis, University of Turku, 2002. [23] B. Ratoandromanana, Codes et motifs, RAIRO Inform. Theor. 23 (4) (1989) 425–444. [24] A. Restivo, Some decision results for recognizable sets in arbitrary monoids, in: Proc. of ICALP 1978, Lecture Notes in Computer Science, Vol. 62, Springer, Berlin, 1978, pp. 363–371. [25] C. Reutenauer, Centralisers of noncommutative series and polynomials, in: M. Lothaire (Ed.), Algebraic Combinatorics on Word, Cambridge University Press, Cambridge, USA, 2002, pp. 312–329. [26] A. Salomaa, S. Yu, On the decomposition of ﬁnite languages, in: G. Rozenberg, W. Thomas (Eds.), Developments in Language Theory, World Scientiﬁc, Singapore, 2000, pp. 22–31.

Theoretical Computer Science 340 (2005) 334 – 348 www.elsevier.com/locate/tcs

Palindromic factors of billiard words J.-P. Borela,1 , C. Reutenauerb,∗ a LACO, UMR CNRS 6090, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France b UQÀM, Département de Mathématiques, case postale 8888, succursale Centre-Ville, Montreal, Québec,

Canada H3C 3P8

Abstract We study palindromic factors of billiard words, in any dimension. There are differences between the two-dimensional case, and higher dimension. Arbitrary long palindrome factors exist in any dimension, but arbitrary long palindromic preﬁxes exist in general only in dimension 2. © 2005 Elsevier B.V. All rights reserved. MSC: 68R15 Keywords: Words; Languages; Sturmian; Billiard; Palindromes

1. Introduction and notations 1.1. Billiard and Christoffel words in dimension 2 Let be a positive irrational number. In several ways one may associate to it a Sturmian word on the alphabet A := {a, b}. We use here a geometrical approach. Consider the grid G on the ﬁrst quadrant of the plane: it is the set of vertical half-lines with integer x-coordinate and of horizontal half-lines with integer y-coordinate. The line D through the origin O and slope divides G into two parts. We construct the word u and the billiard word c (cutting sequence) as follows: ∗ Corresponding author. Tel.: +1 514 9873000x3228; fax: +1 514 9878274.

E-mail addresses: [email protected] (J.-P. Borel), [email protected] (C. Reutenauer). 1 Research partially supported by Région Limousin.

0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.036

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

b b a

b

b a a

a

b

b

a

b

b

a

b

a

b

b b

a

a

b

b

b a

b

b

b

335

a

b

b b

a

a

b

b

b (1)

(2)

(3)

Fig. 1.

Denoting by a the horizontal segment and by b the vertical one, u encodes the discrete path immediately under the half-line D; hence in the example u = ababbababbabb . . . (Fig. 1(1)). Looking at the squares crossed by D (grey in Fig. 1(2)) and their blackened sides, the billiard word c encodes the sequence of sides crossed by D (a for a vertical side, b for an horizontal one). Here c = babbababbabb . . . . Equivalently, c encodes the discrete path joining the center of the crossed squares (see Fig. 1(3)). It is easily seen that u = ac . The words u are called Christoffel words [7] and are particular Sturmian words. Regarding factors, this does not restrict generality. See [1,5,15] for the theory of Sturmian words. Christoffel words are known since Bernoulli and have many applications in mathematics and physics; they are related to continued fractions, Farey sequences, and the Stern–Brocot tree (see e.g. [12] for the latter). 1.2. Billiard words in dimension 3 Let D be the half-line of origin O, in k-dimensional space, and parallel to vector (1 , 2 , . . . , k ), with i positive. We consider the sequence of k-cubes crossed by D, and facets joining each cube to the next: a facet is a subset of the cube formed by all points having a ﬁxed integer ith coordinate. This ith coordinate will be encoded by ai , and thus we obtain a sequence on the alphabet A = {a1 , a2 , . . . , ak }, encoding the facets crossed by the half-line D. This works soon as one has

i ∈ /Q j

(1)

336

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

for any i = j : indeed in this case, each facet is crossed in its interior, so that the corresponding intersection point has a unique integer coordinate, its ith coordinate. 2 In this way we obtain the billiard word c1 ,2 ,...,k , or c , if we denote :=(1 , 2 , . . . , k ). Note that, as in Fig. 1(3), c encodes also the discrete path joining the centers of the k-cubes crossed by D. We use this interpretation in the sequel. It is known that the number of ﬁnite factors of length n of c , in dimension 2, is equal to n + 1. This well-known property characterises the Sturmian words in dimension 2; in dimension 3, the number of ﬁnite factors of length n is equal to n2 + n + 1, see [2]; in dimension k, it is min(k−1,n) k−1 n i!, i i i=0 see [3]. 1.3. Finite billiard words Let M := (m1 , m2 , . . . , mk ) ∈ Nk , where the mi are pairwise relatively prime. The segment OM crosses several k-cubes and one deﬁnes, as before, a ﬁnite word cM on the same alphabet, called the (ﬁnite) billiard word associated to M. One has |cM |ai = mi − 1, 1 i k, k |cM | = mi − k. i=1

Note that, as usual, |v| is the length of word v, and |v|a its a-degree. Observe that cM is a palindrome, that is, equal to its reversal. We denote by v the reversal of word v. For a palindrome, one has v = v by deﬁnition. 2. Main results 2.1. Dimension 2 Everything is known in this case, and palindromic factors and preﬁxes of Sturmian words have been intensively studied; even, they characterize Sturmian words, see [10,11,14]. 2.1.1. Palindromic preﬁxes Theorem 2.1. The palindromic preﬁxes of the inﬁnite billiard word c are ﬁnite billiard words; for all n > 0 they are the preﬁxes of length pn + qn − 2, for all the main and intermediate convergents pn /qn of the continued fraction expansion of the real number . This result is stated in [4,8,9], in a slightly different formulation. 2 Note that the previous condition holds if the coordinates are Q-linearly independents, which is necessary i for some results.

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

0/1 a

337

1/0 b 1/1 a|b 1/2 a|ab 1/3 a|aab 2/5 aaab|aab 3/7 aaabaab|aab 4/9 aaabaabaab|aab 7/16 aaabaabaab|aaabaabaabaab 10/23 aaabaabaab|aaabaabaabaaabaabaaba Fig. 2.

Consider an example. Let u = ac = aaabaabaabaaabaabaabaaabaabaabaab . . . obtained by taking in the interval fraction expansion: 1 7 = 16 2+

1 3+ 21

and

7 16

<<

10 23 .

These two numbers have the continued

10 1 = 23 2+

1 3+ 13

denoted by [0; 2, 3, 2] and [0; 2, 3, 3] = [0; 2, 3, 2, 1]. In other words, = [0; 2, 3, 3, . . .]. The sequence of intermediate and main convergents is 0 1 1 1 2 3 4 7 10 1 , 1 , 2 , 3 , 5 , 7 , 9 , 16 , 23 , . . .

which may be read on the Stern–Brocot tree of 10 23 , see Fig. 2 In Fig. 2, we have indicated the factorization of each Christoffel word, coming from the words above. To each such Christoffel word, which will be of the form aub, associate its factor u. These words u are exactly the palindromic preﬁxes of c (here, those of length 33): c = aabaabaabaaabaabaabaaabaabaabaab . . . a aa aaa aaabaab aaabaabaab aaabaabaabaaabaabaabaaa aaabaabaabaaabaabaabaaabaabaabaab ε a aabaa aabaabaa aabaabaabaaabaabaabaa aaababaabaaabaabaabaaabaabaabaa

We have listed the Christoffel words on the ﬁrst line, and the associated billiard words on the second. Note that there exists palindromic factors which are not preﬁxes, as b, aba, aaa or baab. But they are central factors of the previous palindromes.

338

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

D

Fig. 3.

2.1.2. Palindromic factors It is well-known that the language L of factors of c is stable under reversal: v ∈ L ⇒ v ∈ L. Let v be any palindromic factor of c . It is sometimes possible to extend v into another factor ava or bvb, and to iterate. In the following, we call central factor of a ﬁnite word u any factor v such that u = v1 vv2 , with v1 and v2 of the same length. Theorem 2.2. Each palindromic factor of a billiard word c is a central factor of some palindromic preﬁx of c . This result is obtained in [8]. Note that palindromic factors characterize Sturmian words, see [11]. Proof. Let v be a maximal palindromic factor of c in a nonpreﬁx position; maximal means that nor ava, nor bvb is a factor of c . Consider the ﬁgure representing the sequence of squares encoded by v. Since v is a factor of c , either avb or bva is a factor of c . We consider the ﬁrst case, as in Fig. 3, corresponding to v = aba. Then line D enters the ﬁgure through a vertical line, stays inside it, and leaves the ﬁgure through an horizontal line. By symmetry under the center of the ﬁgure (since v is a palindrome), there exist a parallel line entering by an horizontal and leaving by a vertical segment (in the ﬁgure, the dotted line with long segments). Hence the parallel line beginning at the lower left point of the ﬁgure stays in the ﬁgure (in the ﬁgure, the dotted line). This means that v is a preﬁx of c . 2.2. Thus there are inﬁnitely many palindromic preﬁxes in an inﬁnite billiard word, and palindromic factors are factors of the palindromic preﬁxes. They appear inﬁnitely often, since billiard words are recurrent (even uniformly recurrent). The work of Laurent Vuillon [17] gives even precise information on their appearance, through the notion of ﬁrst return of a factor.

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

339

2.3. Dimension 3 2.3.1. Preﬁx integer point Let M := (m1 , m2 , . . . , mk ) ∈ Nk and H the orthogonal projection of M on the line through O parallel to (1 , 2 , . . . , k ). Deﬁnition 2.1. A point in Rk is a 2-integer point if at least two of its coordinates are integers. Deﬁnition 2.2. M = (m1 , m2 , . . . , mk ) is called an integer preﬁx of (1 , 2 , . . . , k ) if triangle OMH does not contain any 2-integer point, except O and M. 2.3.2. Palindromic preﬁxes Denote by ij the mapping that associates to a word u on {a1 , a2 , . . . , ak } the word on {ai , aj } obtained by erasing all other letters. Then ij (c1 ,2 ,...,k ) is the billiard word ci ,j . Theorem 2.3. A preﬁx v of c1 ,2 ,...,k is palindromic if and only if each ij (v) is a palindromic preﬁx of ci ,j . Theorem 2.4. • For almost all (1 , 2 , . . . , k ) ∈ Rk+ , in the sense of Lebesgue, the word c1 ,2 ,...,k has only ﬁnitely many palindromic preﬁxes. • There exist (1 , 2 , . . . , k ) such that c1 ,2 ,...,k has inﬁnitely many palindromic preﬁxes. We shall prove these results in the sequel. We give, for the last property, a proof that will imply that the corresponding lines are dense. In the ﬁrst case, the number of palindromic factors may be very small: we give an example in dimension 3 where the only nonempty palindromic preﬁx is the ﬁrst letter. According to Theorem 2.3, in order to have palindromic preﬁxes, there must be some synchronization between the palindromic preﬁxes of the words ij (c1 ,2 ,...,k ), hence between the corresponding convergents. 2.3.3. Palindromic factors As said in the abstract, the situation is the same in any dimension. Theorem 2.5. Each factor of c1 ,2 ,...,k is a factor of some palindromic factor of c1 ,2 ,...,k . In particular, arbitrary long palindromic factors exist.

3. Integer preﬁx point, up-down method and synchronization 3.1. Integer preﬁx point We consider D as before: it is the half-line of origin O and parallel to vector := (1 , 2 , . . . , k ), with the condition (i /j ) ∈ / Q for any i = j .

340

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

Proposition 3.1. Let M = (m1 , m2 , . . . , mk ) ∈ Nk . Let H be the orthogonal projection of M onto D, and T the intersection of D and the facets of the k-dimensional parallelepiped P of long diagonal OM. Then the following conditions are equivalent: (i) M is an integer preﬁx point of (1 , 2 , . . . , k ). (i ) There is no 2-integer point in the triangle OMT, except O and M. (ii) Triangle OMH intersects the integer k-cubes in their facets, except for the points O and M. (iii) The ﬁnite billiard word cM is a preﬁx of the inﬁnite billiard word c . Proof. Let M be any point on HM and consider the ﬁnite word v encoding the intersection of the segment OM with the facets of the k-dimensional grid in Rk . When M = H , v = v is a preﬁx of c . When M = M, v = cM . Property (iii) means that v is constant when M varies on segment MH: each segment OM meets the k-cubes by the same facets. Hence, that segment never contains a 2-integer point. Hence (ii) is true, and (i) is equivalent to (ii). Now, triangle OMT is contained in OMH, thus (i) implies (i ). Conversely, if (i ) is true, when D leaves P it enters some k-cube of which M is a vertex. This means that H is interior to this k-cube, and each point of triangle MTH, not on MT, is in the interior of this k-cube, hence has no integer coordinate, and this proves (i). In dimension 2, these integer preﬁx points are well-known, and 2-integer points are exactly integer points. Consider the billiard word c , with = (1 , 2 ) and 1 /2 irrational. Let D be the half-line of origin O and slope 2 /1 . Proposition 3.2. Let v be a preﬁx of c and M the point (m1 , m2 ), with mi := |v|i + 1. The following conditions are equivalent: (1) v is a palindrome. (1 ) v = cM . (2) Triangle OMH contains no integer point except O and M. (3) Distance MH is minimal among all distances M H , where M is an integer point, on the same side of D as M, and such that its orthogonal projection H onto D is between O and H . (4) m2 /m1 is an intermediate or main convergent of 2 /1 . Equivalence of (2)–(4) is in [6], and the equivalence between (1) and (4), i.e. Theorem 2.1, is in [8]. 3.2. Up and down 3.2.1. On preﬁx integer points Let M = (m1 , m2 , . . . , mk ) be a preﬁx integer point of = (1 , 2 , . . . , k ), and i = j in {1, . . . , k}. Let Mij ∈ Z2 be the projection (mi , mj ), and ij = (i , j ). Proposition 3.3. M is a preﬁx integer point of if and only if each Mij is a preﬁx integer point of ij .

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

341

Proof. Denote also by ij the projection Rk → R2 which sends onto (i , j ). Then ij (D) is the half-line Dij , ij (T ) = Tij , Tij is the point where Dij leaves the rectangle of diagonal OMij . Hence: OMij Tij = ij (OMT). If in triangle OMT there is no other point than O and M, having at least two integer coordinates, the same holds for all triangles OMij Tij , and hence Mij is a preﬁx integer point of (i , j ). If however there is some 2-integer point in triangle OMT, different from O, M, let i, j corresponding to its integer coordinates. Then, by projection, there is some integer point in triangle OMij Tij , different from O and Mij . 3.2.2. On palindromes Proposition 3.4. Let v be a ﬁnite word on A = {a1 , a2 , . . . , ak }. Then v is a palindrome if and only if all its projections ij are palindromes. Proof. This is clearly necessary. Suppose now that v is not palindrome: v = v . Then v = wai w , v = waj w , where w is the longest common preﬁx of v and v , and thus i = j . Then:

ij (v) = ij (w)ai ij (w ) = ij (w)aj ij (w ) = ij ( v ) = ij (v) which means that ij (v) is not a palindrome.

Theorem 2.3 is an immediate consequence, since for v preﬁx of c1 ,2 ,...,k , ij (v) is a preﬁx of ij (c1 ,2 ,...,k ) = ci j . 4. Auxiliary results on continued fractions 4.1. A probabilistic result Let q = qn the denominator of a main convergent of the real number (see [13, Chapter X] or [16] for general results on continued fractions). Then by [13, Theorem 171, p. 140]: − pn 1 . qn qn qn+1 Thus, since qn+1 = an+1 qn + qn−1 , where all these numbers are integers and positive, one has qn

1 qn+1

1 . qn

We denote as usual by x the distance of x to the closest integer. When x varies between 0 and 1, the inequality qx 1/q is satisﬁed in a union of q + 1 disjoint intervals, whose sum of lengths is 2/q. Taking uniform probability, we deduce that the probability that x

342

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

in [0, 1] has q as denominator of some main convergent is bounded by 2/q. Note that this probability exists since the set of all corresponding x is a ﬁnite union of intervals, except rational numbers. Note also that this argument does not work for intermediate convergents, since qx may be bigger than 1/q. Proposition 4.1. Let q be a positive integer 2, and 0 < x < 1. Then the probability Pq that q be a denominator of a main or intermediate convergent of x satisﬁes: 2 2 Pq √ + √ . q q −1 This result is certainly not optimal, but sufﬁcient for our purposes. Note that Pq exists for the same reason as above. Proof. Let q be the denominator of some intermediate convergent of . Then one has, for denominators qn and qn+1 of two successive main convergents: qn q < qn+1 , q = aqn + qn−1 , 1 a an − 1 where all these numbers are positive integers. Moreover, the intermediate convergent is p apn + pn−1 . = q aqn + qn−1 We have two cases. √ √ √ • If a > q − 1, then q = aqn + qn−1 > ( q − 1)qn and qn < q/( q − 1). Moreover: qn <

1 qn+1

1 q

and so belongs to a set whose probability is 2/q. Since qn may take any integer value √ between 1 and q/( q − 1), the probability that x has q as denominator of main or √ intermediate convergent with a > q − 1 is bounded by 2 2 . 1 √ q 1 i< √ q q −1 q−1 √ • If a q − 1, then

√ q = aqn + qn−1 < (a + 1)qn qqn √ which implies qn > q, and q < qn−1 <

1 1 <√ . qn q

Indeed, the ﬁrst inequality is a consequence of the theory of continued fractions, since the points (p, q) associated to main and intermediate convergents on the same side of D are closer and closer to D. √ Probability that x satisﬁes this inequality is bounded by 2/ q. We conclude by summing the two bounds above.

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

343

4.2. Synchronization of convergents Proposition 4.2. For almost all positive real number , the set of positive real numbers , having an inﬁnity of denominators of intermediate or main convergents in common with , has Lebesgue measure 0. Proof. Let f (n) be an increasing function, such that the series ∞ n=1 1/f (n) converges. Then for almost all positive real numbers the sequence of partial quotients an of satisﬁes an f (n) for large n, see [16]. We only consider these numbers , for f (n) := n2 , hence there exists a positive constant C such that an C n2 . √ Let S := 1/ q where the sum is over all intermediate or main convergents of . Then S

∞ a ∞ n2 n √ < 2C √ n/2 < ∞, 5+1 n=1 qn n=1 2

where (qn ) is the sequence of denominators of main convergents of , and where in S we √ √ have grouped those q with qn q < qn+1 , hence 1/ q 1/ qn ; there are an such q. The last inequality follows from the fact that the denominators of main convergents are minimal for the golden number := [1; 1, 1, . . .], and then equal to the Fibonacci sequence: n √ n √ √ √ √ n 5+1 5+1 5+ 5 5− 5 1− 5 1 + > . Fn = 10 2 10 2 2 2 Thanks to Proposition 4.1, the series Pq converges. Then, the lemma of Borel–Cantelli implies that the set of ∈ [0, 1] having an inﬁnity of these q as denominator of convergent (main or intermediate) is of measure 0. Since denominators of convergents depend only on the fractional part of , so is the set of positive real numbers . 5. Proof of main results 5.1. Existence of arbitrarily long palindromic factors Proposition 5.1. Let the j ’s be Q-linearly independents. The word c1 ,2 ,...,k contains arbitrarily long palindromic factors. More precisely: • it contains arbitrarily long palindromic factors of even length, • it contains for any i arbitrary long palindromic factors of odd length and central letter ai . In the proof below, we use two-sided inﬁnite words, and billiard words. Their deﬁnition is straightforward. Proof. Let C := ( 21 , 21 , . . . , 21 ). Consider the line D passing through C and parallel to (1 , 2 , . . . , k ). The associated billiard word c is well-deﬁned, since the quotients i /j

344

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

are irrational for i = j . Moreover, due to the linear independence over Q, c1 ,2 ,...,k has the same factors as c. Observe that C is a center of symmetry for the integer lattice, and after C is for the above line D . Hence the right inﬁnite word deﬁned by the half-line D+ before C. Thus the reversal of the left inﬁnite word deﬁned by the symmetric half-line D− c = v v. Hence for each preﬁx w of v, w w is a palindrome of even length which is factor of c, hence of c1 ,2 ,...,k . We conclude since w is arbitrary long. For the second assertion, we argue similarly, by replacing C by the point ( 21 , 21 , . . . , 21 , 0, 1 1 2 , . . . , 2 ) with a 0 in ith position. 5.2. Palindromic preﬁxes: general case Let k 3 and v a palindromic preﬁx of c1 ,2 ,...,k and M = (m1 , m2 , . . . , mk ) its integer preﬁx point corresponding to v. Then, by Proposition 3.3, (m2 , m1 ) and (m3 , m1 ) are integer preﬁx points of (3 , 1 ) and (3 , 1 ). Hence m1 is a denominator of some intermediate or main convergent of 2 /1 and 3 /1 . Hence for almost all (2 , 3 ), and a fortiori for almost all (1 , 2 , . . . , k ), m1 is bounded. This means that the number of occurrences of letter a1 in v is bounded, and thus v is of bounded length, since c1 ,2 ,...,k has inﬁnitely many occurrences of letter a1 . Even, for these (1 , 2 , . . . , k ), there are only ﬁnitely many palindromic preﬁxes. This proves the ﬁrst part of Theorem 2.4. 5.3. An example Let

√ √ 15 + 5 1 + 5 (, , ) = 1, , . 10 2

The expansion into continued fractions are √ 15 + 5 = = [1; 1, 2, 1, 1, 1, . . .] and 10 √ 1+ 5 = = [1; 1, 1, 1, 1, 1, . . .]. 2 The sequence of denominators of intermediate and main convergents are respectively (1, 1, 2, 3, 4, 7, 11, . . .)

and

(1, 2, 3, 5, 8, 13, . . .)

and all convergents are main convergents, except that corresponding to 2 in the left-hand case. The remaining denominators satisfy the same recursion qn+2 = qn+1 + qn and hence the values that appear cannot be equal: in the ﬁrst case, denominators are qn = Fn−1 −Fn−5 for n 6, and qn = Fn in the second one. Only (1, 2, 3) are common to both sequences: indeed Fn−1 = Fn − Fn−2 < Fn − Fn−4 < Fn for n 4. The billiard word c,, is c,, = bcabcbcabcbacbcabcbcabcbacbcbacbacbcbacbbc . . .

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

345

whose projections onto {a, b} and {a, c} are c, = babbabbababbabbabbababbabb . . . and c, = caccacaccaccacaccacaccacc . . . . The synchronization principle shows that palindromic preﬁxes of c,, corresponds to palindromic preﬁxes of c, and c, having the same occurrence of letters a. Since in dimension 2, palindromic preﬁxes correspond to denominators of convergents (main or intermediate), with q = 1+ numbers of a’s, only common values of q are possible, and hence the number of a’s can be only 0, 1 or 2. Hence, looking for these preﬁxes, we ﬁnd for c, the words b, bab et babbab and for c, the words c, cac et caccac. Moving up to c,, , we ﬁnd, according to the number of a’s (which is 0, 1 or 2): • for 0, b or bc, • for 1, bcabc, • for 2, bcabcbcabcb. Hence, only b is a palindromic preﬁx of c,, . 5.4. Palindromic preﬁxes of arbitrarily long length We need the following result on continued fractions due to Lagrange [13, Theorem 184, p. 153]. Proposition 5.2. Let x be a real number and p/q a rational number such that x − p < 1 . q 2q 2 Then p/q is a main convergent of x. Let n be any ﬁxed integer. We consider any numbers 1 , 2 , . . . , k whose continued fractions expansion are given until the order n. We deduce from it the existence of a positive constant K bigger than any convergents p/q of all these i , and bigger than any ratios p/q i , where p/q is a convergent of i . (i) The corresponding denominators are denoted by qj , 1 i k and 0 j n. The expansion is extended two ranks more: • n + 1 : for any i, (i)

(i)

(i)

qn+1 = an+1 qn(i) + qn−1 , (i)

(i)

(i)

where an+1 is chosen in such a way that qn+1 =: n+1 are distinct prime numbers. This (i)

(i)

is possible since qn and qn−1 are relatively prime, by using Dirichlet’s theorem. We may even assume that (i)

an+1 > A := 4K 2 .

346

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348 (i)

• n + 2 : since the n+1 are distinct prime numbers, we may ﬁnd by the Chinese remainder theorem an arbitrary big integer Q such that: (i)

Q ≡ qn(i)

(mod n+1 ). (i)

Thus we may ﬁnd an+2 such that: (i)

(i)

(i)

Q = qn+2 = an+2 qn+1 + qn(i) . This construction is iterated, and leads to numbers 1 , 2 , . . . , k , such that from some rank, they have the synchronization property for all steps of type “n + 2’’ above. For such a rank, the convergent of each i can be written (i)

pn+2 (i)

qn+2

=

P (i) Q (i)

as the denominator qn+2 does not depend on i, and the index n+2 is cancelled for simplicity reason, for a given n. Lemma 1. P (i) /P (j ) is a main convergent of ji if i = j . (i)

Proof. Since the coefﬁcients an+3 are of type “n + 1’’ above, we have the corresponding inequality: (i) pn+2 1 P (i) 1 1 . = i − (i) < (i) (i) < (i) (i)2 < i − 2 Q AQ qn+2 qn+3 qn+2 an+3 qn+2 Hence (Qi − P (i) )P (j ) − (Qj − P (j ) )P (i) i P (i) − (j ) = P Qj P (j ) j P (i) P (j ) + P (i) P (j ) P (i) P (j ) 1 1 i < + − (j ) < j Q Q Qj A P (j )2 P AQ2 j P (j ) 2K 2 1 1 < A P (j )2 2P (j )2 which implies the lemma by Proposition 5.2. <

(1) (2) (k) Hence (P ,(i)P , . . . , P ) is an integer preﬁx point of (1 , 2 , . . . , k ), and the preﬁx of length P − k of c1 ,2 ,...,k is a palindrome. These points are in inﬁnite number, since this happens for n + 2, n + 4, n + 6 and so on.

5.4.1. An example Let √ √ √ √ 5−1 5+1 5+3 5+5 , , , , 1 ∈ R5 . = 2 2 2 2

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

347

The denominators of convergents of i /5 are the same, and correspond to the Fibonacci sequence. The i are not independent over Q, but i /j is irrational if i = j . The word w = c1 ,2 ,3 ,4 ,5 is w = dcdbcdedcbdcadbcdedcbdcdedcbadcdbcdedcbdcdab cdedcdbcdedcbdacdbcdedcddcde . . . . The number 2 in the Fibonacci sequence corresponds to the approximations ( 21 , 23 , 25 , 27 , 22 ), hence to the preﬁx of length 13 = 1 + 3 + 5 + 7 + 2 − 5. This preﬁx is dcdbcdedcbdca, which is not a palindrome (but almost: replace the last letter a by d, which is the following letter in w). 21 29 8 Number 8 in the Fibonacci sequence corresponds to the approximations ( 58 , 13 8 , 8 , 8 , 8 ), hence to the preﬁx of length 71 = 5 + 13 + 21 + 29 + 8 − 5. This preﬁx is dcdbcdedcbdcadbcdedcbdcdedcbadcdbcdedcb dcdabcdedcdbcdedcbdacdbcdedcbdcd. It is a palindrome. This occurs again with 34, which results in a palindromic preﬁx of length 317, and from there, with periodicity 3 on the Fibonacci sequence. In this way are obtained all palindromic preﬁxes of w, which are curiously all of odd length, with central letter e, except for the two ﬁrst d and dcd. Acknowledgements The authors thank Srecko Brlek for a preliminary computing in dimension 3, and Aldo de Luca for an important mail exchange on this subject. References [1] J.-P. Allouche, J. Shallit, Automatic Sequences, Cambridge University Press, Cambridge, 2003. [2] P. Arnoux, C. Mauduit, I. Shiokawa, J.I. Tamura, Complexity of sequences deﬁned by billiard in the cube, Bull. Soc. Math. France 122 (1994) 1–12. [3] Y. Baryshnikov, Complexity of trajectories in rectangular billiards, Comm. Math. Phys. 174 (1995) 43–56. [4] J. Berstel, A. de Luca, Sturmian words, Lyndon words and trees, Theoret. Comput. Sci. 178 (1997) 171–203. [5] J. Berstel, P. Seebold, Sturmian words, in: M. Lothaire (Ed.), Algebraic Combinatorics on Words, Cambridge University Press, Cambridge, 2002. [6] J.-P. Borel, F. Laubie, Quelques mots sur la droite projective réelle, J. Théorie Nombres Bordeaux 5 (1993) 23–51. [7] E.B. Christoffel, Observatio arithmetica, Ann. Mat. 6 (1875) 148–152. [8] A. de Luca, Sturmian words: structure, combinatorics, and their arithmetics, Theoret. Comput. Sci. 183 (1997) 45–82. [9] A. de Luca, Combinatorics of standard Sturmian words, Structures in Logic and Computer Science, Lecture Notes in Computer Science, Vol. 1261, 1997, pp. 249–267. [10] X. Droubay, Palindromes in the Fibonacci word, Inform. Process. Lett. 55 (1995) 217–221. [11] X. Droubay, G. Pirillo, Palindromes and Sturmian words, Theoret. Comput. Sci. 223 (1999) 73–85. [12] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, second ed., Addison-Wesley, Reading, MA, 1994.

348

J.-P. Borel, C. Reutenauer / Theoretical Computer Science 340 (2005) 334 – 348

[13] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, fourth ed., Clarendon Press, Oxford, 1975. [14] G. Pirillo, A new characteristic property of the palindrome preﬁxes of a standard Sturmian word, Sem. Lotharingien Combin. 43, 1999 (electronic, see http://www.mat.univie.ac.at/∼slc/). [15] N. Pytheas Fogg, in: V. Berthé, et al. (Ed.), Substitutions in Dynamics, Arithmetics and Combinatorics, Lecture Notes in Mathematics, Vol. 1794, 2002. [16] A.M. Rockett, P. Szüsz, Continued Fractions, World Scientiﬁc Pub. Co, Singapore, 1992. [17] L. Vuillon, A characterization of Sturmian words by return words, European J. Combin. 22 (2001) 263–275.

Theoretical Computer Science 340 (2005) 349 – 363 www.elsevier.com/locate/tcs

Regular splicing languages and subclasses Paola Bonizzoni∗ , Giancarlo Mauri Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano – Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy

Abstract Recent developments in the theory of ﬁnite splicing systems have revealed surprising connections between long-standing notions in the formal language theory and splicing operation. More precisely, the syntactic monoid and Schützenberger constant have strong interaction with the investigation of regular splicing languages. This paper surveys results of structural characterization of classes of regular splicing languages based on the above two notions and discusses basic questions that motivate further investigations in this ﬁeld. In particular, we improve the result in [6] that provides a structural characterization of reﬂexive symmetric splicing languages by showing that it can be extended to the class of all reﬂexive splicing languages: this is the larger class for which a characterization is known. © 2005 Elsevier B.V. All rights reserved. Keywords: Automata; Regular languages; Molecular computing; Splicing systems

1. Introduction The formal model of splicing system has been originally introduced in [13] to investigate the potentiality of a fundamental biological mechanism occurring in nature: restriction enzymes act on double-stranded DNA molecules by cutting them into segments that can be joined in the presence of ligase enzyme. The original deﬁnition of splicing system was formulated to describe the biochemical processes involved in molecular cut and paste phenomenon. Later the notion was reformulated by G. Paun at a less restrictive level of ∗ Corresponding author.

E-mail addresses: [email protected] (P. Bonizzoni), [email protected] (G. Mauri). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.035

350

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

generality, giving rise to the model of splicing operation that was then commonly adopted in splicing systems theory, and is nowadays a standard [19]. Since, a splicing system is a formal device to generate languages, called splicing languages, the splicing operation has been deeply investigated in the framework of formal language theory, by establishing a link between biomolecular sciences and language theory [20]. Moreover, this strict connection has contributed to a novel interest for the development of language theory. On the other side, theoretical results in splicing systems theory suggested new ideas in the framework of biomolecular science, for example the design of automated enzymatic processes. In this paper, we focus on the original concept of ﬁnite splicing system that is closest to the real biological process: the splicing operation is meant to act by a ﬁnite set of rules (modelling enzymes) on a ﬁnite set of initial strings (modelling DNA sequences). Under this formal model, a splicing system is a generative mechanism of languages, which turn out to be regular splicing languages. This basic result has been ﬁrst proved in [9], and later proved in several other papers by using different approaches (see for example [23,17,26]). More precisely, not all regular languages can be generated by splicing and a characterization of the class of regular splicing languages is still unknown. This open question is related to several challenging issues concerning splicing theory and formal language theory that motivate a continuous development of the research in this direction [12,6]. Several progress have been made in the investigation of the generative power of ﬁnite splicing systems. For a better understanding of the basic issues in this ﬁeld, it is necessary to classify splicing systems w.r.t. the splicing operation. In the literature three main splicing operations have been introduced, known as Head, Paun and Pixton operations. Each splicing operation leads to distinct classes of splicing languages (known also as Head, Paun, Pixton splicing languages) generated by splicing systems. Actually, it turns out that the relationship between the different classes of splicing languages can be explained by using the classical notion of Paun splicing operation and viewing the set of splicing rules as inducing a binary relation. A set R of rules consists of ordered pairs of factored words, denoted as ((u1 , u2 )$(u3 , u4 )), called rules, where u1 u2 , u3 u4 are splicing sites. The set R speciﬁes a binary relation between factored sites that can be reﬂexive, symmetric or a transitive relation as shown in [4]. It turns out that distinct classes of splicing languages are generated by splicing systems where R is a binary relation that obeys different restrictions. For instance, when R is restricted to be reﬂexive, symmetric and transitive it allows one to characterize splicing languages generated by the original Head splicing operation. On the other hand, Paun splicing languages are generated by splicing systems where the set R of rules are not necessarily symmetric or reﬂexive. In particular, reﬂexivity and symmetry are natural properties for splicing systems as originally deﬁned in [15]. More precisely, reﬂexive and symmetric splicing systems are the most important from a modelling perspective. The ﬁrst characterization of reﬂexive symmetric splicing languages has been given in [6] by using the concept of constant, introduced by Schützenberger [24]. Every language L in this class is constructed from a ﬁnite set of constants for L, as L is expressed by a ﬁnite union of constant languages and split languages, where a split language is a language obtained by a single iteration of a splicing operation over constant languages. In this paper, we discuss this result which is a ﬁrst signiﬁcative progress in this research ﬁeld, as it sheds light on the fundamental concepts in formal language theory that can explain

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

351

the generative power of splicing operation and how they can be used in this framework: these are the concepts of constant and of syntactic congruence. Moreover, we improve the result of structural characterization given in [6], by showing that it generalizes to all reﬂexive (i.e., not necessarily symmetric) splicing languages: this result is stated in Proposition 4.2. Furthermore, by observing that a reﬂexive regular splicing language L is characterized by one iteration step of splicing rules applied to constant languages, we prove that a recent characterization of reﬂexive languages given in [12] can be obtained as a Corollary of Proposition 4.2. Two fundamental questions arise when dealing with splicing systems. • Question A: recognition Give an effective procedure to decide whether a regular language is an X-splicing language (X reﬂexive, symmetric). • Question B: synthesis Give an effective procedure to construct, given L an X-splicing language, a splicing system S with X-rules such that L = L(S). In the paper, we address these two questions by presenting and analyzing main results related to them appearing in the literature. In particular, question A has been solved for the class of reﬂexive splicing languages (in [12] a decision procedure for this class has been proposed), and for special subclasses of regular languages. Clearly, the problem is strictly related to the question of providing a structural characterization of splicing languages. A graph-based algorithm that solves this question for null context splicing languages (NCS) is proposed in [7]. Other decision results have been given for larger classes of languages including the class NCS, such as the classes of FCS languages and of marker languages characterized by the notions of constant and of syntactic monoid. Question B has been solved for the class of reﬂexive symmetric languages in [6]. The paper is organized as follows. In Section 2 basic notions on ﬁnite splicing systems are given, Sections 3 and 4 are devoted to Question B, while in Section 5 we discuss results concerning Question A. Finally, in Section 6 we list open problems in this research ﬁeld.

2. Finite splicing systems In this section, we give the basic notions of ﬁnite splicing systems theory and of formal language theory that have been used to investigate subclasses of splicing languages. Let A be a ﬁnite alphabet. We denote the empty word over A by 1. Assume that w ∈ A+ , a 2-factor of w is an ordered pair (w1 , w2 ) such that w1 , w2 ∈ A∗ and w = w1 w2 . A rule r consists of a pair of 2-factors (u1 , u2 ) and (u3 , u4 ) and is denoted as ((u1 , u2 )$(u3 , u4 )): each single word u1 u2 and u3 u4 is called splicing site of rule r. A set R of rules speciﬁes a binary relation between 2-factors of sites that can be reﬂexive, symmetric or even a transitive relation [4]. Precisely, R is reﬂexive if given ((u1 , u2 )$(u3 , u4 )) ∈ R, then ((u1 , u2 )$(u1 , u2 )) ∈ R and ((u3 , u4 )$(u3 , u4 )) ∈ R, while R is symmetric if given ((u1 , u2 )$(u3 , u4 )) ∈ R, then ((u3 , u4 )$(u1 , u2 )) ∈ R.

352

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

Given x, y ∈ A+ , then rule r = ((u1 , u2 )$(u3 , u4 )) applies to x, y if x has factor the splice site u1 u2 and y has factor the splice site u3 u4 , that is x = x1 u1 u2 x2 and y = y1 u3 u4 y2 . Then the result of a splicing operation of x, y by rule r is the word w = x1 u1 u4 y2 which is said to be generated by splicing of x, y by r. If R is symmetric, since given rule r = ((u1 , u2 )$(u3 , u4 )) ∈ R, also r¯ = ((u3 , u4 )$(u1 , u2 )) ∈ R, then word w¯ = y1 u3 u2 x2 is generated by splicing of x, y by r¯ . Let L ⊆ A∗ . We deﬁne the closure of L by R as the set cl(L, R) of all words that are obtained as the result of a splicing operation of a pair of words in L by a rule r ∈ R. A splicing system S consists of a triple S = (A, I, R), where A is the alphabet of the system, R is a set of splicing rules and I ⊆ A∗ an initial language. Given a splicing system S = (A, I, R), the iterated splicing is deﬁned as follows:

0 (I ) = I, and i+1 (I ) = i (I ) ∪ cl(i (I ), R), i ∗ (I ) = (I ).

for i 0,

i 0

In this paper, we deal with ﬁnite splicing systems that is splicing systems where I and R are meant to be ﬁnite sets: in this case S is called H-system and L(S) = ∗ (I ) is the splicing language generated by S. Thus in the rest of the paper, by a splicing system we mean a ﬁnite splicing system and a splicing language is a language generated by a ﬁnite splicing system. For convenience, we assume that all rules in R are useful for the language L(S), that is, for each rule r ∈ R, there exist w, x, y ∈ L(S) such that w is generated by splicing of x, y by r. A splicing language L is a reﬂexive or symmetric splicing language if L = L(S), where S = (A, I, R) and R is reﬂexive or symmetric, respectively. It must be pointed out that in the literature at least two other notions of splicing rules and splicing operations have been proposed. These are known as Head and Pixton splicing operations, respectively. In [8] it has been shown that splicing systems based on Pixton splicing operation are more powerful than the ones based on the standard (Paun) splicing, and these systems are more powerful than Head splicing systems. A classiﬁcation of these different notions of splicing may be given by using the standard (Paun) splicing operation adopted also in this paper, simply by requiring that the set R of rules is a speciﬁc (symmetric, reﬂexive or transitive) binary relation over 2-factors, as pointed out partially in [4]. The relationship between symmetric and nonsymmetric splicing languages has been investigated in [25]. The class of splicing languages (called 1-splicing languages and introduced in [20]) includes properly the class of symmetric splicing languages as proved in [25] (these are equivalent to the 2-splicing languages introduced in [20]), indeed, the languages of Lemma 4.3 show the strict inclusion. Remark 2.1. Observe that we use a deﬁnition of cl(I, R) based on the 1-splicing operation [20]. This notion is generalized to the case of 2-splicing operation by deﬁning the set cl2 (I, R) = {x1 u1 u4 y2 , y1 u3 u2 x2 : x1 u1 u2 x2 , y1 u3 u4 y2 ∈ I, ((u1 , u2 )$(u3 , u4 )) ∈ R}, while cl(I, R) = {x1 u1 u4 y2 : x1 u1 u2 x2 , y1 u3 u4 y2 ∈ I, ((u1 , u2 )$(u3 , u4 )) ∈ R}.

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

353

Given R a set of rules, let us denote by sym(R) the symmetric closure of set R. Formally, sym(R) = {(s1 $s2 ), (s2 $s1 ) : (s1 $s2 ) ∈ R}. Then it is immediate to verify that cl2 (I, R) = cl(I, sym(R)). Vice versa, given cl(I, R), where R is a set of symmetric rules, then cl(I, R) = cl2 (I, R). In [26], a proof that splicing languages are regular languages is given, thus providing an alternative proof of the known inclusion of splicing languages in the class of regular languages. Actually, this main result in splicing theory has been ﬁrstly proved in [9], but there are several other proofs using different approaches (see for example [23,17]). For example, in [17], an algorithm has been given to construct a ﬁnite state automaton that recognizes the language generated by a splicing system (A, L, R) that is not necessarily ﬁnite, as L is a regular language and R is a ﬁnite set. Clearly, this result proves the existence of a ﬁnite state automaton that recognizes a splicing language generated by a ﬁnite splicing systems, i.e., in the special case L is ﬁnite. A fundamental property introduced in several papers [6,5,11,12] relating rules to a language L and used to build splicing systems that generate a language is the closure of L under a set R of rules, stated below. Deﬁnition 2.1. A language L is closed under a set R of rules iff cl(L, R) ⊆ L. We conclude the section by giving the basic notions of formal language theory used in the paper: these are the notions of a constant and syntactic monoid. In this paper, when dealing with a ﬁnite state automaton A = (A, Q, , q0 , F ) recognizing a regular language L, we assume that A is trim, that is each state is accessible and coaccessible, and is the minimal automaton of L (see [21] for basic notions). Then is the transition function of the deterministic automaton A, q0 is the initial state, F the set of ﬁnal states [2,16]. Given L a regular language, the reduced graph GA (L), denotes the transition diagram for the minimal automaton A recognizing L. A path in the reduced graph GA (L) is a ﬁnite sequence = (q, a1 , q1 )(q1 , a2 , q2 ) . . . (qn−1 , an , qn ) where for each i = 1, . . . , n − 1, then (qi , ai+1 ) = qi+1 and (q, a1 ) = q1 . An abbreviated notation for a path is = (q, a1 a2 · · · an , qn ) and a1 a2 . . . an is called the label of . A path = (q, x, qn ) with x ∈ A+ , is a closed path iff q = qn . Moreover, we say that q, q1 , . . . , qn are the states crossed by the path (q, a1 · · · an , qn ) and, for each i ∈ {1, . . . , n − 1}, qi is an internal state crossed by the same path. Given w ∈ A+ and q ∈ Q, then Qw denotes the set {q ∈ Q : (q, w) = q , q ∈ Q}. Given m ∈ A+ , we deﬁne the left context and right context of m w.r.t. L, the language Cont L (L, m) = {x ∈ A∗ : xmy ∈ L} and Cont R (L, m) = {y ∈ A∗ : xmy ∈ L}, respectively. Moreover, given q ∈ Qm , then Cont R,q (L, m) = {y ∈ A∗ : (q, my) ∈ F }. A word w ∈ A+ is a constant for a regular language L iff given xwy ∈ L and zwv ∈ L, then xwv, zwy ∈ L, i.e., Cont L (L, w)wCont R (L, w) ⊆ L. The notion of constant has been introduced by Schützenberger [24]. A word w ∈ A+ is singular iff |Qw | = 1. A characterization of constants of a regular language L in terms of the reduced graph GA (L) is given in Proposition 2.1. This result is more or less a folklore and its proof can be found in [6].

354

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

Proposition 2.1. Let L ⊆ A∗ be a regular language and let GA (L) be the reduced graph for L. A word m ∈ A∗ is a constant for L if and only if there exists qm ∈ Q such that for each path in GA (L) which has m as a label, there exists q ∈ Qm such that = (q, m, qm ). The syntactic congruence plays a central role in the development of regular language theory [21,22]. The syntactic congruence ≡L w.r.t. a language L is a binary relation over words: w ≡L z iff for all x, y ∈ A∗ , xwy ∈ L if and only if xzy ∈ L. The quotient A∗ w.r.t. the congruence ≡L is the syntactic monoid of L, M(L). In the paper, the equivalence class of word x is denoted as [x].

3. Classes of splicing languages The notion of constant appears to be crucial in characterizing the computational power of splicing systems. Indeed, the structure of reﬂexive languages, as well as of other special classes of splicing languages below the regular ones, is deﬁned in terms of constants, as proved in recent papers [6,12]. 3.1. Constant and splicing languages The ﬁrst characterization of classes of splicing languages in terms of the concept of constant is given in the seminal work on splicing operation [13]: the class of null context splicing languages (NCS, in short) is equal to the one of strictly locally testable languages. This result is based on a characterization of strictly locally testable languages (SLT) by means of the concept of constant given by De Luca and Restivo in [10]. In [10], SLT are characterized as those languages for which there exists a positive integer k such that every string in A∗ of length k is a constant. Let us recall that null context splicing languages are those languages generated by a system S = (A, I, R) where each rule r ∈ R is of the form ((x, 1)$(x, 1)) or ((1, x)$(1, x)), for x ∈ A+ . A crucial notion in ﬁnite splicing theory that has been ﬁrstly introduced in [14] is that of constant language. Deﬁnition 3.1 (constant language for m). Let L be a regular language and m be a word in A∗ that is a constant for L. A constant language in L for m is the language L(m, L) ⊆ L such that L(m, L) = Cont L (L, m)mCont R (L, m). A language L is simply a constant language if L(m, L) = L. In the paper, for simplicity we use the notation L(m) for denoting a constant language L(m, L) in L. Null context splicing languages are properly included in a larger class of languages investigated in [14] that we call ﬁnitely constant generated splicing languages, or simply FCS languages. These languages are the splicing languages generated by systems with onesided rules that are reﬂexive, which generalize the rules of NCS languages: one-sided rules are rules of the form (1, v)$(1, u) or (v, 1)$(u, 1), for u, v ∈ A∗ .

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

355

The language L = b+ ab+ is an example of FCS language that is not a NCS language, as indeed L is not strictly locally testable. Moreover, note that a NCS language is not necessarily a constant language, as it holds in the case of language L = a ∗ ∪ b∗ , as L is an NCS language that is union of two constant languages over two distinct symbols of the alphabet. As for NCS languages, a nice characterization of FCS languages is given in terms of constants in [14]: a language L is a FCS language if it is a ﬁnite union of a ﬁnite set with a ﬁnite set of constant languages in L for a set M of constants of L (these languages are called ﬁnitely constant based languages in [12]). This result is stated in the following theorem. Theorem 3.1 (FCS languages (Head [14])). Let L ⊂ A∗ be a regular splicing language. Then the following are equivalent. (1) L is generated by a splicing system S = (A, I, R), where each rule r ∈ R is one-sided and Ris reﬂexive. (2) L = m∈M L(m) ∪ X, where X is a ﬁnite subset of A∗ , L(m) is a constant language in L for m ∈ M ⊆ A∗ and M is a ﬁnite set of constants for L. 3.2. Syntactic monoid and splicing languages The notion of syntactic monoid has been used in [3] and in [5] in order to characterize new classes of regular languages generated by splicing. An example of how the use of the syntactic monoid may provide new insight in the investigation of splicing languages is obtained by naturally extending the notion of a constant language introducing congruence classes in place of constants. Precisely, in [3] it has been proved that regular languages of the form L = L1 [x]1 L2 , where L1 and L2 are regular languages and [x]1 is a marker that is deﬁned by means of a syntactic congruence class [x] of M(L) are splicing languages, called marker languages. More precisely, a marker [x]1 for the congruence class [x] is deﬁnes as [x]1 = [x], if [x] is ﬁnite, otherwise [x]1 = [x] ∪ {1} where x is a label of a closed path that is singular. Marker languages form a class of regular splicing languages which is not comparable to the class of FCS languages [5]. More precisely, there are regular languages that are marker languages of the form L1 [x]1 L2 and are not in the class FCS, even though they are generated by splicing, as shown in the following example. Example 3.1. The regular language L = L1 (ab+ a)∗ L2 = b+ a 2 da(ab+ a)∗ ada 2 b+ , with L1 = b+ a 2 da and L2 = ada 2 b+ is a marker language which is not in the class FCS. First observe that (ab+ a)+ is a syntactic congruence class of language L, and thus [(aba)] ∪ {1} is a marker as aba is the singular label of a closed path. The language L is not in the class FCS, as it is an inﬁnite union of constant languages as proved below. Let us ﬁrst show that every factor of language L1 ab+ = b+ aL2 is not a constant. Indeed, assume that z is a factor of w ∈ L1 ab+ , that is w = w1 zw2 . As w ∈ b+ aL2 , it follows that there exists a word y ∈ L such that y = ww, as L ⊇ L1 ab+ b+ aL2 . Given y = w1 zw2 w1 zw2 , if z is a constant, by deﬁnition of a constant it holds that w1 zw2 ∈ L, that is there exists bi a 2 da 2 bj ∈ L, for i, j > 0. This fact leads to a contradiction as each word in L must

356

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

contain two d symbols of the alphabet. Consequently a constant of L must be a factor of L, but not of L1 ab+ and of b+ aL2 . Thus, each constant z of L must be of the form z1 abi az2 , with i > 0, for z1 , z2 ∈ A∗ . Indeed, each factor abi a is a constant by Proposition 2.1 as it is a singular word and thus every word having abi a as a factor is also a constant, by a known property proved in [6] and in [10]. But, for each i > 0, there exists an inﬁnite set of words in L that do not have z has a factor, thus implying that there exists no ﬁnite set M of constants of L such that L = m∈M L(m) ∪ X, where X is a ﬁnite subset of A∗ .

4. Reﬂexive symmetric splicing languages In this section, we illustrate the characterization of reﬂexive symmetric splicing languages given in [6] and show that this result extends to all reﬂexive splicing languages. Moreover, we relate this result to a decision algorithm proposed in [12] for reﬂexive splicing languages. Again, the notion of constant is fundamental in giving a structural description of regular splicing languages. Indeed, given L a reﬂexive symmetric splicing language, then L is characterized in terms of a ﬁnite set M of constants for language L. More precisely, L is deﬁned in ﬁnite terms as a ﬁnite union of languages obtained by one single iteration of a splicing operation. The ﬁrst intermediate signiﬁcative result relating splicing languages to constants has been proved for symmetric and reﬂexive languages in [6]: it states that splicing sites of rules of a symmetric and reﬂexive splicing language L are constants for the language. Actually, we can improve this result by showing in Proposition 4.1 that reﬂexivity is a necessary and sufﬁcient condition for a splicing language to satisfy the above stated property (an independent proof of this Lemma is given in [12]). Lemma 4.1. Let L be a language and r = ((u1 , u2 )$(u1 , u2 )) a rule. Then L is closed under rule r iff u1 u2 is a constant for L. Proof. Let L be closed under rule r. Let w1 = xu1 u2 y ∈ L and w2 = zu1 u2 v ∈ L. Since r is applied to w1 , w2 in different order, then it is immediate that xu1 u2 v ∈ L and zu1 u2 y ∈ L. Consequently, u1 u2 is a constant for L. Vice versa, assume that u1 u2 is a constant for L. By deﬁnition of constant, given xu1 u2 y ∈ L and zu1 u2 v ∈ L, then xu1 u2 v ∈ L and zu1 u2 y ∈ L, thus implying that L is closed under r. Thus we state the ﬁrst characterization theorem for reﬂexive splicing languages. Proposition 4.1. Let L be a regular language. Then L is a reﬂexive splicing language iff there exists a splicing system S = (A, I, R) such that L(S) = L and the sites of rules in R are constants for L. Proof. If L is reﬂexive, then there exists a system S = (A, I, R), where R is a reﬂexive set of rules and L = L(S). Then for each pair sij = (ui , uj ) such that (sij $s) or (s$sij ) is a rule in R, there exists the rule (sij $sij ) in R. By Lemma 4.1, ui uj is a constant for L. Vice

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

357

versa, if each site ui uj of a rule r is a constant, rule (sij $sij ), with sij = (ui , uj ) may be added to R as by Lemma 4.1, L is closed under rule r. This fact implies that there exists a splicing system S = (A, I, R ), with a reﬂexive set of rules R such that L = L(S) and thus L is reﬂexive. Given M a ﬁnite set of constants for language L, we deﬁne the set F (M) of 2-factors of words in M (a 2-factor in F (M) is named split of a constant in [6]): F (M) = {((mi1 , mi2 ) : mi1 mi2 ∈ M}. A binary relation over F (M) induces a set RM of rules, precisely, RM ⊆ {(s1 $s2 ) : s1 , s2 ∈ F (M)}: let us call RM a set of constant-based rules over M. A splicing operation is deﬁned for a pair of constant languages L(mi ), L(mj ) by a rule r ∈ RM if each of the constants mi and mj is a distinct site of rule r. Formally, given r = ((u1 , u2 )$(u3 , u4 )), such that u1 u2 = mi , u3 u4 = mj , and L(mi ) = Li1 u1 u2 Li2 , L(mj ) = Lj 1 u3 u4 Lj 2 , then the result of a splicing operation of L(mi ), L(mj ) by r is the language Li1 u1 u4 Lj 2 denoted as SPLIT(L(mi ), L(mj ), r) and called split language. Clearly, a split language is obtained as cl(Li ∪ Lj , r) (see Section 2). Remark 4.1. In [6], the notion of a split language is introduced by using the 2-splicing operation. More precisely, the split language of L(mi ) and L(mj ) by a rule rij consists of cl2 (L(mi )∪L(mj ), rij ). But, by Remark 2.1, it is immediate that cl2 (L(mi )∪L(mj ), rij ) = cl(L(mi ) ∪ L(mj ), sym({rij })). By the above remark, the characterization theorem for reﬂexive symmetric splicing languages in [6] can be also stated as follows: Theorem 4.1. Let L be a regular language. Then L is a reﬂexive symmetric splicing language iff there exists a ﬁnite set X ⊂ A∗ , a ﬁnite set of constants M for L, a set RM of constant based rules over M such that is symmetric and L=

mi ∈M

L(mi )

rij ∈RM

SPLIT (L(mi ), L(mj )), rij ) ∪ X.

In [6], Theorem 4.1 is proved under the additional hypothesis that X is a ﬁnite set of words such that no factor of a word in X is a constant m ∈ M. Given a rule rij ∈ RM , it holds that the language L of all words in L that have the splice site m of rij as a factor is uniquely speciﬁed by the expression Cont L (L, m)mCont R (L, m), i.e., L = L(m). Based on this observation, the ﬁnite union of split languages can be denoted by the closure of union of constant languages under rules in RM . Lemma 4.2. Let RM be a set of constant-based rules over M. Then, it holds that SPLIT (L(mi ), L(mj )), rij ) = cl L(mi ), RM . rij ∈RM

mi ∈M

358

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

Proof. Clearly, rij ∈RM SPLIT(L(mi ), L(mj )), rij ) ⊆ cl( mi ∈M L(mi ), RM ). Now, given rij ∈ RM and x, y ∈ mi ∈M L(mi ) such that rij applies to x, y, it holds that x ∈ L(mi ) and y ∈ L(mj ), where mi , mj are the two splicing sites of rij . Consequently, it holds that cl( mi ∈M L(mi ), rij ) ⊆ SPLIT (L(mi ), L(mj ), rij ), which concludes the proof of the Lemma. By using Proposition 4.1, in the following we show that Theorem 4.1 can be generalized to all reﬂexive (symmetric or nonsymmetric) splicing languages. Proposition 4.2. Let L be a regular language. The following are equivalent: (1) L is a reﬂexive (symmetric) splicing language. (2) There exists a ﬁnite set X ⊂ A∗ , a ﬁnite set of constants M for L, a set RM of (symmetric) constant-based rules over M and L= (1) L(mi ) ∪ cl L(mi ), RM ∪ X. mi ∈M

mi ∈M

Proof. The proof of the implication (2) → (1) is obtained by showing that the proof of the same implication given for Theorem 4.1 in [6] holds in general, without assuming that RM is necessarily symmetric. Thus let us now show that (1) → (2) holds. If (1) holds then there exists a splicing system S = (A, I, R) such that L = L(S) and R is reﬂexive. By Proposition 4.1 the set M of sites of rules in R is a ﬁnite set of constants. Thus RM = R is a set of constant base rules over M, and L is closed under RM . By this fact it holds that L ⊇ L = mi ∈M L(mi )∪cl( mi ∈M L(mi ), RM ). Thus L ∪I ⊆ L, where I is the initial language of S. Let us now show by induction on the length of a word w ∈ L that L ⊆ L ∪I . Clearly, if w ∈ I , then w ∈ L. Thus assume that w ∈ L, w ∈ I . Then, w ∈ cl(x ∪ y, r), for r ∈ R. By induction x, y ∈ L ∪ I and consequently w ∈ cl( mi ∈M L(mi ), RM ), thus proving that w ∈ L . Example 4.1. The regular language L = a + ba + ba + ∪a + ca + ba + is a reﬂexive symmetric splicing language. Indeed, given the set M = {c, bab} of constants for L and the constant languages L1 = a + m1 a + and L2 = a + m2 a + ba + , where m1 = bab, m2 = c, then L = L1 ∪ L2 ∪ Split(L1 ∪ L2 , r), where r = ((b, ab)$(ac, 1)) ∈ RM . The following remark has been stated in [6]. Remark 4.2. Given L a regular language, a constant language L(m) is a special case of split language, as indeed L(m) = SPLIT (L(m), L(m), r), where r = ((m, 1)$(m, 1)) is a constant-based rule. Then, we obtain as a Corollary of Theorem 4.1 the following characterization of reﬂexive splicing languages, proved in [12]. Corollary 1. Let L be a regular language. Then the following are equivalent: (1) L is a reﬂexive (symmetric) splicing language.

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

359

(2) There exists a set R of reﬂexive (symmetric) rules such that L is closed under R and L = cl(L, R) ∪ X, for X ⊂ A∗ a ﬁnite set. Proof. By using Remark 4.2, Lemmas 4.2 and 4.1, we can show that statement (2) is equivalent to statement (2) of Proposition 4.2. Consequently, by a direct application of Proposition 4.2, the equality of the two statements holds. There exist splicing languages that are reﬂexive and not symmetric as stated in Lemma 4.3. Indeed, by applying a Theorem stated in [25], we can show that L1 = a ∗ + a ∗ ba ∗ and L2 = a ∗ ∪ da ∗ ∪ a ∗ c are not symmetric languages, while Lemma 4.3 shows that these languages are reﬂexive. Lemma 4.3. Languages L1 = a ∗ +a ∗ ba ∗ and L2 = a ∗ ∪da ∗ ∪a ∗ c are splicing languages that are reﬂexive and not symmetric. Proof. The language L1 can be expressed as L(b) ∪ cl(L(b), R), where R = {r}, r = ((1, b)$(b, 1)), L(b) = a ∗ ba ∗ is a constant language. Similarly, the language L2 = L(d) ∪ L(c) ∪ cl(L(c) ∪ L(d), r), where L(d) = da ∗ and L(c) = a ∗ c and r = ((1, c), (d, 1)). Then, by Proposition 4.2, L1 is a reﬂexive splicing language. The existence of nonreﬂexive splicing languages has been proved in [12], indeed, as shown in [12], a ∗ b∗ a ∗ b∗ a ∗ ∪ a ∗ b∗ a ∗ is an example of a symmetric, nonreﬂexive splicing language, while b∗ a ∗ b∗ a ∗ ∪ a ∗ b∗ a ∗ ∪ a ∗ provides an example of a splicing language that is neither symmetric nor reﬂexive. Language a ∗ b∗ a ∗ is an example of reﬂexive splicing language that is not in FCS, as shown in [12].

5. Decision algorithms for subclasses of regular splicing languages A characterization theorem that extends the result for reﬂexive languages to all regular splicing languages is still unknown. Indeed, a procedure to decide whether a regular language is a splicing language is still unknown. On the other end, we still do not know how to use the characterization of Theorem 4.1 to obtain a procedure to decide whether a regular language is a reﬂexive splicing languages. Indeed, this question is a generalization of the problem posed in [14]: ﬁnd a decision procedure for the class of FCS languages. However partial results have been achieved in [12], where it is proved that we can decide whether a regular language is reﬂexive. The design of algorithms to solve decision problems for regular splicing languages and subclasses of splicing languages is a topic that is still unexplored. In the following we list basic results that have been achieved in different papers and are strongly related to the solution of the above-mentioned questions. These results are stated below and then detailed by the Lemmas and Remarks that follow. • A decision procedure to establish when a language L is closed w.r.t. to a given set R of rules (see Lemmas 5.1, 5.2 and Remark 5.1).

360

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

• A standard procedure for the construction of an initial language and basic rules to generate constant generated splicing languages or marker splicing languages, given the reduced graph for the language [6,5,11]. • A characterization of splice sites in terms of the syntactic congruence (see Lemma 5.3). The following Lemma has been proved in [6] for symmetric splicing systems, but it can be easily extended to the general case. Lemma 5.1. Let S = (A, I, R) be a splicing system, let L ⊆ A∗ be a regular language and let A be the automaton recognizing L. Then L = L(A) is closed with respect to a rule r = ((u1 , u2 )$(u3 , u4 )) ∈ R if and only if for each pair (p, q) ∈ Qu1 u2 × Qu3 u4 , we have Cont R,p (L, u1 u2 ) ⊆ Cont R,q (L, u3 u2 ).

(2)

More precisely, the following result is used to prove containment relation between languages. Lemma 5.2 (Bonizzoni et al. [3]). Let S = (A, I, R) be a splicing system, let L ⊆ A∗ be a regular language such that I ⊆ L. If L is closed under R, then L ⊆ L(S). Remark 5.1. There is an effective procedure to decide whether a language L is closed under a set R of rules, given A the automaton for L. Indeed, given w ∈ A+ , and p ∈ Qw , then Cont R,p (L, w) is a regular language (see deﬁnition in Section 2). Remark 5.2. There is an effective procedure to build the splicing system that generates a reﬂexive splicing language (see [5] and [6] for a simpler construction). The following property relates splice sites w.r.t. the syntactic congruence and has been proved in several papers [5,12]. Lemma 5.3. Let L ⊆ A∗ be a regular language that is closed under rule r = ((u1 , u2 ), (u3 , u4 )). Then L is closed under each rule r¯ = ((u1 , u2 ), (u3 , u4 )), where ui ∈ [ui ], for i ∈ {1, 2, 3, 4}. 5.1. Decision algorithms for reﬂexive and symmetric splicing languages The characterization Theorem 4.1 (and Proposition 4.2) leads to an effective algorithm to decide whether a regular language L is a reﬂexive symmetric splicing language, whenever a bound on the size of each rule in R can be given. Assume that given L, such a bound is speciﬁed by the value Bound(L). Thus the set of rules generating L consists of the larger set of constant-based rules RM over set M such that L is closed under RM , where M is the ﬁnite set of all constants of L of length n Bound(L): by Remark 5.1 such a set has an effective construction algorithm. Since, given two regular languages X, Y , it is decidable whether X = Y , then equation 1 of Theorem 4.1 can be tested by classical algorithms, thus it is immediate to obtain the required decision procedure. Actually, this algorithmic approach has been proposed in [12] to ﬁnd a procedural application of Corollary 1. Such a procedure is based on an upper bound for Bound(L) in terms of the size of the syntactic monoid for L.

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

361

5.2. Decision algorithms for NCS and marker languages A decision algorithm for marker language, based on properties of markers, is given in [5]. An almost unexplored approach to the development of decision algorithms for the classes of regular splicing languages discussed in the previous sections is based on properties of the reduced graphs recognizing such languages. An example in this direction is given in [7], where a characterization of NCS languages in terms of a property of the reduced graph automaton recognizing such languages is proposed. More precisely, in [7], using the algorithmic approach proposed in [18] to recognize locally testable languages, the graph properties that relate SLT to their reduced graphs are investigated and a graph-based algorithm to recognize SLT languages and other subclasses of regular languages is given. Recently, we discovered that similar results have been achieved independently in [1] in a different framework. 6. Conclusions and open problems Finite splicing systems theory has revealed that there are extensive interactions between the notion of splicing operation and two classical tools in formal language theory: the constant and the syntactic congruence. Even though many theorists have moved their attention towards new models for molecular computation, we believe that the ﬁnite splicing systems theory still hides promising developments, mainly from the point of view of formal language theory as well as concerning the original motivation of ﬁnding procedures for building simple models to describe enzymatic processes. In this paper, we have discussed the most signiﬁcative progress in this theory made to understand the structure of regular splicing languages. We improve the result given in [6], by showing that the larger class of regular languages that has a structural characterization is the one of reﬂexive splicing language. It remains a challenging open question to drop the reﬂexivity assumption. In this paper, we also discuss the most recent progress made towards the solution of two fundamental questions in this theory: the development of decision algorithms for classes of regular splicing languages, and the synthesis of splicing systems for such languages. In this direction, some basic questions are still open and we believe that it will be fruitful for the formal language theory of splicing systems to look for their solution. Below, we just list some intriguing open questions. • Question 1: Is there a nice characterization of reﬂexive splicing languages in terms of classes of the syntactic monoids, as for marker languages [3] or in terms of reduced graphs properties as for NCS languages? • Question 2: Find a characterization of the ﬁnite set of constants that are used in Theorem 4.1. • Question 3: Investigate boolean closure properties of reﬂexive and nonreﬂexive splicing languages. We conclude this list by pointing out an intriguing conjecture proposed in [12] and mentioned in [11]. Conjecture 1. A splicing language must have constants.

362

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

Acknowledgements The authors would like to thank C. De Felice and R. Zizza for long discussions on the topics covered in the paper. This work is partially supported by MIUR Project “Linguaggi Formali e Automi: teoria ed applicazioni”, by the contribution of EU Commission under the Fifth Framework Programme project MolCoNet IST-2001-32008.

References [1] M.P. Béal, Codage Symbolique, Masson, 1993. [2] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [3] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Decision Problems on Linear and Circular Splicing, in: M. Ito, M. Toyama (Eds.), Proc. DLT 2002, Lecture Notes in Computer Science, Vol. 2450, Springer, Berlin, 2003, pp. 78–92. [4] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Regular Languages Generated by Reﬂexive Finite Linear Splicing Systems, in: Z. Esik, Z. Fülöp (Eds.), Proc. DLT 2003, Lecture Notes in Computer Science, Vol. 2710, Springer, Berlin, 2003, pp. 134–145. [5] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Linear splicing and syntatic monoid, 2004, submitted for publication. [6] P. Bonizzoni, C. De Felice, R. Zizza, The structure of reﬂexive regular splicing languages via Schützenberger constants, Theoret. Comput. Sci. 334 (1–3) (2005) 71–98. [7] P. Bonizzoni, C. Ferretti, G. Mauri, Splicing Systems with Marked Rules, Romanian J. Inform. Sci. Technol. 1 (4) (1998) 295–306. [8] P. Bonizzoni, C. Ferretti, G. Mauri, R. Zizza, Separating some splicing models, Inform. Process. Lett. 79 (6) (2001) 255–259. [9] K. Culik, T. Harju, Splicing semigroups of dominoes and DNA, Discrete Appl. Math. 31 (1991) 261– 277. [10] A. De Luca, A. Restivo, A characterization of strictly locally testable languages and its application to semigroups of free semigroup, Inform. Control 44 (1980) 300–319. [11] E. Goode, Constants and splicing systems, Ph.D. Thesis, Binghamton University, 1999. [12] E. Goode, D. Pixton, Recognizing Splicing Languages: Syntactic Monoids and Simultaneous Pumping, 2004, submitted for publication. (Available from http://www.math.binghamton.edu/dennis/Papers/index.html.) [13] T. Head, Formal language theory and DNA: an analysis of the generative capacity of speciﬁc recombinant behaviours, Bull. Math. Biol. 49 (1987) 737–759. [14] T. Head, Splicing languages generated with one sided context, in: Gh. Paun (Ed.), Computing with Biomolecules, Theory and Experiments, Springer, Singapore, 1998. [15] T. Head, Gh. Paun, D. Pixton, Language theory and molecular genetics: generative mechanisms suggested by DNA recombination, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Springer, Berlin, 1996, pp. 295–360. [16] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, MA, 2001. [17] S.M. Kim, Computational modeling for genetic splicing systems, SIAM J. Comput. 26 (1997) 1284–1309. [18] S.M. Kim, R. McNaughton, Computing the order of a locally testable automaton, SIAM J. Comput. 23 (1994) 1193–1215. [19] G. Paun, On the splicing operation, Discrete Appl. Math. 70 (1996) 57–79. [20] G. Paun, G. Rozenberg, A. Salomaa, DNA computing, New Computing Paradigms, Springer, Berlin, 1998. [21] D. Perrin, Finite Automata, in: J. Van Leeuwen (Ed.), Handbook of Theoretical Computer Science, Vol. B, Elsevier, Amsterdam, 1990, pp. 1–57. [22] J.-E. Pin, Variétés de langages formels, Masson, Paris, 1984; English translation: Varieties of formal languages, Plenum, New York, 1986.

P. Bonizzoni, G. Mauri / Theoretical Computer Science 340 (2005) 349 – 363

363

[23] D. Pixton, Regularity of splicing languages, Discrete Appl. Math. 69 (1996) 101–124. [24] M.P. Schützenberger, Sur certaines opérations de fermeture dans le langages rationnels, Symposia Math. 15 (1975) 245–253. [25] S. Verlan, R. Zizza. 1-splicing vs. 2-splicing: separating results, Proc. Words03, Turku, Finland, 2003, pp. 320–331. [26] S. Verlan, Head systems and applications to bio-informatics, Ph.D. Thesis, University of Metz, 2004.

Theoretical Computer Science 340 (2005) 364 – 380 www.elsevier.com/locate/tcs

Collage of two-dimensional words Christian Choffrut∗ , Berke Durak Université Paris VII, L.I.A.F.A., 2 Place Jussieu, 75221 Paris, France

Abstract We consider a new operation on one-dimensional (resp. two-dimensional) word languages, obtained by piling up, one on top of the other, words of a given recognizable language (resp. two-dimensional recognizable language) on a previously empty one-dimensional (resp. two-dimensional) array. The resulting language is the set of words “seen from above”: a position in the array is labeled by the topmost letter. We show that in the one-dimensional case, the language is always recognizable. This is no longer true in the two-dimensional case which is shown by a counter-example, and we investigate in which particular cases the result may still hold. © 2005 Published by Elsevier B.V. Keywords: Regular languages; Picture languages

1. Introduction The present paper deals with the notion of recognizable collection of pictures, a picture being a matrix whose entries (pixels) are taken in a ﬁnite alphabet (colors). The reader unfamiliar with the formal deﬁnition might ﬁnd it suggestive to think of the set of chessboards of arbitrary dimension or of the set of squares with, say, their north-west to south-east diagonal marked with some particular color, as typical examples. Assume we are given a collection of strips of wallpapers of different textures in such a way that it forms a recognizable collection. Assume further that starting from an empty frame we can paste these strips one at a time, in any arbitrary way, with possible overlapping but without rotation. At each position, the visible pixel is that belonging to the last pasted ∗ Corresponding author.

E-mail addresses: [email protected] (C. Choffrut), [email protected] (B. Durak) URLs: http://www.liafa.jussieu.fr/∼cc (C. Choffrut), http://www.liafa.jussieu.fr/∼durak. 0304-3975/$ - see front matter © 2005 Published by Elsevier B.V. doi:10.1016/j.tcs.2005.03.034

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

365

strip. This is reminiscent of the so-called painter’s algorithm achieving face elimination in computer graphics where the objects nearest to the observer are painted last. Our result says that if we start with a recognizable collection of strips reduced to one column (resp. to one row), then all possible collages form again a recognizable collection. This property is obtained by studying the particular case of one-dimensional pictures, i.e., words, and by extending to two-dimensional pictures via row- (or column-) Kleene concatenation. Furthermore, we show that this closure property no longer holds when this hypothesis fails; using a counting argument, we show that there exists a ﬁnite language consisting of two strips whose collage is not recognizable. There exist general simple conditions guaranteeing recognizability of the collage in terms of the parameters of the collage, such as the maximum number of levels of strips. In the case where the alphabet is unary, yielding thus binary pictures with a color for the background and a color for the foreground, the collage is recognizable whatever the collection of strips (it may even be non-recursive). As far as we know, the operation of collage as we mean it here is new. In [7, Proposition 5.1.], the author considers the operation consisting of tiling a picture with nonoverlapping strips and shows a closure property for recognizable pictures. Concerning one-dimensional pictures, the notion of quasiperiodicity, which is remotely connected to our notion of collage, was introduced in [1]. In our terminology it is a collage of a unidimensional picture with a unique strip as explained above and where the overlapping occurrences of the strip are obliged to match. A ﬁnal word of caution though: the term collage was coined in [5] as a means of deﬁning pictures via recursive geometric functions in the spirit of fractals [2]. We use it here in a different meaning which we think appropriate for its kinship with the art movement in painting of the ﬁrst decades of the 20th century.

2. The unidimensional case Given a ﬁnite alphabet , we denote by ∗ the free monoid of words or strings over , and by the empty string. The product or concatenate of two words u and v is simply denoted by uv. For a string w ∈ ∗ , we denote by |w| its length and by w[i] the ith symbol of w, i = 1, . . . , |w|. A string z ∈ ∗ is a subword or factor of w if there exist two strings u, v ∈ ∗ such that w = uzv and we write z = w[i . . . j ] where |u| = i − 1 and |uz| = j . If t ∈ ∗ has the same length as z, the substitution of t for z in w results in the word utv r which we write w → utv. We say that u is placed at position i. The notations → for the ∗ rth iterate and → for the reﬂexive and transitive closure of → are used with their standard meaning. Given a subset W ⊆ ∗ of patches, the operation of collage consists of producing words in ( ∪ {})∗ ( is a new symbol not in ) by starting with a word of the form n and then repeatedly replacing random factors of the current word with elements of W . A word thus obtained is called a collage of W . Formally C0 (W ) = ∗ and for all k 0 Ck+1 (W ) = {w | ∃w ∈ Ck (W ), w → w }. The set of collages of W is the union Collage(W ) = k 0 Ck (W ). We say position 0 < j n of w ∈ Ck (W ) is covered by an occurrence u ∈ W whenever there exists an

366

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

integer < k and two words w and w such that

k−−1

n → w → w −−−−→ w holds where for some w1 , w2 , w3 ∈ ∗ we have w = w1 w2 w3 , w = w1 uw3 and |w1 | < j |w1 | + |u|. An occurrence u of W placed on the interval 1, . . . , n is obscured by some occurrence v placed at some later time, whenever the subintervals corresponding to the two occurrences intersect. Example 1. Consider n = 11 and assume the words aba, bbbbc, and abaabcab belong to the subset W and are placed respectively at the positions 2, 4, 10 and 1 in that order. The resulting word is at the top of Table 1. Position 4 is covered by the occurrences aba, bbbbc, and abaabcab. Position 9 is covered by no occurrence and position 2 is covered by the occurrences aba and abaabcab. Said differently, the collage is the word obtained when reading “from above” the rectangular array. It is convenient to deﬁne the structure obtained by packing the words aba, ca, bbbbc and abaabcab. This is achieved by removing all spaces between vertically aligned letters. In the previous example, each occurrence of a letter of a word of W would “fall” in its slot as long as it does not hit another letter or the ﬂoor of the structure (the indices in the following examples are just meant for clarifying further explanations and refer to the ordering of the word to which each letter belongs while processing the collage). This leads to a sequence of columns of varying height (possibly height 0), as shown in Table 2. The next two deﬁnitions provide a more formal approach (it might prove useful to have the previous example in mind). Deﬁnition 1. Let W be a collection of words, let n be an integer and let P be a ﬁnite sequence of pairs (x, u) ∈ N × W called a stack. The pile PilenP deﬁned by these data is an array of n words in ∗ which is deﬁned by induction on the length of the sequence P as follows. Table 1 a a

1

b b a 2

a a

a a

b b

c c

a a

b b

b 3

b a 4

b

b

b

c

5

6

7

8

c4 b2

a4 b2

c

a

c

a

9

10

11

b4 c2

c3

a3

Table 2 The pile resulting from Example 1

a4

b4 a1

a4 b1

a4 b2 a1

b4 b2

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

367

If P has length 0 then PilenP is an array of n empty words. Otherwise let P be the sequence P deprived of its last pair (x, u). Let be the length of u. For all 1 i n we deﬁne PilenP [i]u[i − x + 1] if x i x + − 1, PilenP [i] = PilenP [i] otherwise. The height of the pile is the maximum of the lengths of its entries. In the running example, the pile is the array with 11 elements and its height is 3 (a4 , a1 b4 ,b1 a4 , a1 b2 a4 , b2 b4 , b2 c4 , b2 a4 , c2 b4 , , c3 , a3 ). Now we state a precise deﬁnition for what is meant by “seen from above”. Deﬁnition 2. Given a pile p = PilenP , its associated collage is the word w of length n, denoted by CollagenP , where for i = 1, . . . , n the following holds: if p[i] = , w[i] = a ∈ if p[i] = ua, u ∈ ∗ , a ∈ . In other words, the collage is obtained by selecting the topmost letter of each column and by taking when the column is empty. If the resulting word w does not contain the symbol , i.e., if w ∈ ∗ , it is said to be completely covered by W . We extend the operation of collage to subsets by setting for all W ⊆ ∗ Collage(W ) = {w ∈ ( ∪ )∗ | w = CollagenP , n ∈ N, P ∈ (N × W )∗ }. (1) Observation. There are other natural deﬁnitions of collage of a language. Indeed, we may suppress the condition that the occurrences of W are contained in the interval 1, . . . , n by allowing to clip them to the interval. Another possibility is not to ﬁx the length of the resulting word a priori, i.e., to achieve the collage along the inﬁnite integer line and consider the smallest interval which contains all occurrences pasted. As far as recognizable languages are concerned, the closure property is equally valid in these three cases. Observe, however, that the closure property no longer holds for context-free languages. Indeed, we leave it to the reader to verify that the collage of the context-free language = {ca n bm d ∈ {a, b, c, d}∗ | n > m} is not context-free. Theorem 1. If W ⊆ ∗ is recognizable then so is Collage(W ). Proof. First observe that it sufﬁces to prove that the set of completely covered words is recognizable. Indeed, if we denote this set by Covered(W ) then we have Collage(W ) = ∗ (Covered(W )+ )∗ Covered(W )∗ . The crux of the proof is that the pile deﬁning a given collage may be assumed of bounded height. Lemma 2. Let N be the number of states of an automaton recognizing W . For each w ∈ Covered(W ) there exists a pile of height at most 2N whose associated collage is w.

368

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

Proof. There is no loss of generality to assume that, given w ∈ Collage(W ), the associated pile PilenP , where P = (xt , ut )1 t r , satisﬁes the condition that (xt , ut ) is not obscured by (xt+1 , ut+1 ) for 0 t r − 1, since otherwise we can eliminate (xt , ut ) to start with. We can further assume that no factor of length N of a word ut is obscured by the set of words {ut+1 , ut+2 , . . . , ur }. Indeed, consider the factor ut [i . . . i + N − 1] and assume that for all i j i + N − 1, the position xt + j − 1 is covered by some us where t < s r.

.. .. ..

ut 1

i+N−1

i

m

Let q (resp. p) be the state of the automaton after reading u[1 . . . i − 1] (resp. u[1 . . . i + N − 1]) starting from the initial state. Let v (resp. z) be a word of length less than N taking q to some ﬁnal state (resp. the initial state to p). Replacing the pair (xt , ut ) by the pair (xt , ut [1 . . . i − 1]v) followed by the pair (xt + i + N − |z|, zut [i + N . . . m]), where m = |ut |, results in the same collage.

.. .. .. z u[1...i − 1]

u[i+N...m]

v

Now assume a pile satisfying the preliminary claim has height greater than or equal to 2N at a position 1 i n. Let 1 s1 < · · · < sk r be the maximal increasing sequence of indices such that us1 , us2 , . . . , usk cover position i and for t = 1, . . . , k let [st , st ] be the interval covered by the sequence ust , ust+1 , . . . , usk with k 2N by hypothesis. The sequence [s1 , s1 ] ⊃ [s2 , s2 ] ⊃ · · · ⊃ [sk , sk ] is strictly decreasing by the preliminary remark, and each element contains the position i. Then either i − s1 − 1 or sk − i − 1 is greater than N, a contradiction. We now turn to the proof of our theorem. We call a language over a given alphabet marked local whenever it is possible to partition the alphabet = I ∪ H ∪ F in such a way that a word belongs to the language if and only if its initial letter belongs to I, its ﬁnal letter belongs to F, the remaining letters belong to H and the transitions between consecutive letters belong to a subset V ⊆ × . This is a strengthening of the standard notion of local

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

369

languages. It is clear that the recognizable language W is the image of some marked local language U in a letter-to-letter morphism f. It is also clear that the collage of W is the image of the collage of U in the morphism f. Consequently, we may assume without loss of generality that W itself is marked local. Because of Lemma 2, it sufﬁces to consider piles of words of length bounded by some integer h in order to generate all words in the collage of W . If we prove that the set of piles, viewed as words over the alphabet = 0 i h i , is recognizable, then the collage itself is recognizable. This is achieved by showing that the language Pile(W ) is marked local. Indeed, the possible initial (resp. ﬁnal) letters (over the alphabet ) are of the form a1 · · · ak with 0 < k h and ai ∈ I (resp. ai ∈ F ) for i = 1, . . . , k. The allowed transitions are the pairs (A, B), where A = a1 · · · ak and B = b1 · · · b satisfy the following condition. Let ai1 , ai2 , …, air with 1 i1 < i2 · · · < ir be the sequence of the letters of the alphabet I ∪H in A and bj1 , bj2 , …, bjs with 1 j1 < j2 · · · < js be the sequence of the letters of the alphabet F ∪ H in B. Then r = s and the pairs (ait , bjt ) belong to V for t = 1, . . . , r. We illustrate this last construction with the example of Table 2, without distinguishing explicitly between the three subalphabets I, F and H. Consider the transition between column 3 and 4. Then A = b1 a4 and B = a1 b2 a4 . For A the subsequence of letters in I ∪ H is b1 , a4 , for B the subsequence of letters in F ∪ H is a1 , a4 . Similarly, consider the transition between columns 4 and 5. Then A = a1 b2 a4 and B = b2 b4 . For A the subsequence of letters in I ∪ H is b2 , a4 and for B the subsequence of letters in F ∪ H is b2 , a4 .

3. Preliminaries on picture languages Here we borrow the terminology to the chapter of the Handbook of Formal Languages written by Giammarresi and Restivo [4]. The reader is also referred to [6]. We restrict ourselves to the results which are necessary for a self-contained exposition of our work. The deﬁnitions for the free monoid extend to two-dimensional strings in a rather natural way. A two-dimensional string (or picture) is a two-dimensional rectangular array of elements in . The size of a picture p is the pair (r(p), c(p)) of its number of rows and columns, also denoted by (r, c) when the picture p is understood. The element at position (i, j ) with 1 i r, 1 j c, also called pixel, is denoted by p[i, j ]. As for usual arrays, the indices grow from top to bottom for the rows and from left to right for the columns. The set of all pictures over is denoted by ∗×∗ . The subset of all pictures with n columns (resp. with p rows, with n columns and p rows) is denoted by ∗×n (resp. p×∗ , p×n ). A two-dimensional language over is a subset of ∗×∗ . 3.1. Different characterizations The ﬁrst attempt at deﬁning some procedure for recognizing pictures is credited to Blum and Hewitt in the 1970s, [3]. Their model is an extension of the ordinary two-way onetape automata by allowing the read head to move in all cardinal directions. It was however superseded by the more powerful and robust class of recognizable languages.

370

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

There are different and equivalent deﬁnitions for recognizable picture languages, see [4, Theorem 8.7]. In particular the notions of tiling systems and of (some type of) regular expressions lead to the same family. The notion of tiling systems is the most suitable for our purpose and we recall it now. 3.1.1. Tiling systems Before running a procedure on the pictures, we border them with occurrences of a symbol %∈ / , e.g.,

2

1

3

1

2

2

3

3

1

3

3

1

3

2

3

2

1

2

3

2

We ﬁrst deﬁne a local language as a language consisting of all pictures whose 2 × 2subpictures belong to a ﬁxed subset of { ∪%}2×2 . For example, the following 10 subpictures deﬁne all rectangular chessboards of odd number of rows and columns. 0 ,

0 ,

0

, 0

1 ,

, 1

0

, 0

1

0

0 ,

1

0

1

,

1

0

0

1

, 1

0

,

Formally, we have Deﬁnition 3. A local system is a pair (, ) where is a ﬁnite alphabet and a subset of 2×2 . The language deﬁned by the system is the set of all pictures whose 2 × 2-subpictures belong to . The deﬁnition of the more general family of recognizable picture languages requires the notion of projection which is a mapping h from an alphabet into some other alphabet which extends to pictures by substituting the color h(a) ∈ for color a ∈ for each pixel, resulting in a picture of the same size. Formally, we have Deﬁnition 4. A tiling system is a quadruple (, , , h) where and are ﬁnite alphabets, is a subset of 2×2 and h : → is a projection. The language recognized by the tiling system is the projection by h of the local language recognized by the local system (, ). A language is tiling recognizable if it is recognized by some tiling system.

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

371

With the previous example, identifying 0 and 1 deﬁnes the collection of all pictures with uniform contents and odd number of rows and of columns. Using this characterization, it can be seen that the collection of all squares is tiling recognizable but not local. 3.1.2. Regular expressions The allowed operations are the union, intersection (not complementation), row- and column-concatenation which are partial operations and row- and column-Kleene closure. By row-concatenation of two picture languages P , Q ⊆ ∗×∗ is meant the language, denoted P Q, of all pictures obtained by taking two arbitrary pictures p ∈ P and q ∈ Q with the same number of columns and by putting p on top of q. The Kleene row-concatenation closure of a language P ⊆ ∗×∗ is the set of all pictures obtained by taking a ﬁnite sequence of pictures with the same number of columns p1 , . . . , pn ∈ P and by putting p1 on top of p2 , …,pn−1 on top of pn . The notions of column-concatenation and column-Kleene closure are deﬁned dually by concatenating from left to right. The column-concatenation of two picture languages P , Q ⊆ ∗×∗ is denoted by P ¸Q. The fundamental result of this theory is that the collection of languages recognized by some tiling system is identical to the smallest family of languages of pictures comprising all ﬁnite languages and closed under union, intersection, row- and column-concatenation, under Kleene row- and column-concatenation closure and projection. Henceforth, this collection of pictures is called the family of recognizable picture languages. Example 2.

1

0

2

0

1

0

p=

1 p= 0 2

,q=

2

2

1

0

0

0

1

1

1

0

2

2

1

1 ,q=

0

0

0

2

1

1

1

,p

,p

1

0

2

0

1

0

q= 2

2

1

0

0

0

1

1

1

1

0

2

2

1

q= 0

1

0

0

0

2

2

1

1

1

3.2. A necessary condition Recognizable languages over strings are characterized by the ﬁniteness of the number of different right (or left) contexts. For picture languages there exists some weaker version. Indeed, it can be shown that for such a language to be recognizable, the number of nonequivalent pictures of a given size may not grow too quickly relative to the size of the picture. More precisely, given two pictures p and q with r rows and c columns respectively

372

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

and a picture language X, we say that p, q ∈ X are equivalent relative to X when for all pictures htop , hbottom , vleft , vright of suitable size, we have vleft ¸(htop p hbottom )¸vright ∈ X, ⇔ vleft ¸(htop q hbottom )¸vright ∈ X. htop

vleft

p

hbottom

htop

vright

=

vleft

q

vright

hbottom

Given a picture language X and two integers r, c, we denote by f (r, c) the number of nonequivalent pictures relative to X. Then we have a weak form of syntactic characterization. Proposition 3. If the language X is recognizable then there exists an integer k such that for all pairs (r, c), the number of non-equivalent pictures relative to X is less than k r+c . 3.3. A new closure property Because of the fundamental theorem on recognizable picture languages, this family is closed under projection. It can be proven that it is not closed under complementation [4, Theorem 7.5]. The following property transforms the contents of the pictures, not their size. It is inspired from bit blitting operations used in computer graphics. Let 1 , 2 and 3 be three alphabets and let f : 1 × 2 → 3 be a function. Given and p2 ∈ r×c deﬁne F (p1 , p2 ) as the picture p3 ∈ r×c two pictures p1 ∈ r×c 1 2 3 , where p3 [i, j ] = f (p1 [i, j ], p2 [i, j ]). This operation extends naturally to pairs of picture languages. For example, on pictures over the binary alphabet {0, 1}, if we take f to be logical disjunction and if X, Y ⊆ {0, 1}∗×∗ then F (X, Y ) will be the set of pictures obtained by combining one picture of X and one picture of Y (these two pictures having the same dimension) with a logical OR operation. The following proposition shows that the resulting language is recognizable if X and Y are. Proposition 4. If X ⊆ ∗×∗ and Y ⊆ ∗×∗ are recognizable languages then F (X, Y ) is 1 2 recognizable. Proof. Let (1 , 1 , 1 , h1 ) and (2 , 2 , 2 , h2 ) be tiling systems recognizing X and Y. Deﬁne 3 = 1 × 2 , 3 = {t | 1 (t) ∈ 1 ∧ 2 (t) ∈ 2 } and h3 (x, y) = f (h1 (x), h2 (y)), where 1 : 1 × 2 → 1 and 2 : 1 × 2 → 2 are projections. The system (3 , 3 , 3 , h3 ) recognizes F (X, Y ).

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

373

4. Collage of pictures Here, we extend to the two-dimensional case the notions introduced in Section 2. It consists of “piling up” pictures belonging to a given collection one on top of the other, above a horizontal surface ﬁlled with a blank symbol. The result is the picture seen from above, the top symbol at each position obscuring all symbols under it. We directly deﬁne the collage of a picture instead of proceeding as in the previous one-dimensional case with the intermediate notion of pile. Deﬁnition 5. Let P ⊆ ∗×∗ be a collection of pictures, called patches, let (r, c) be a pair of integers and let S be a ﬁnite sequence of triples (x, y, p) ∈ N2 × P , called a stack. The (r,c) collage CollageS is the r × c-array of symbols in ∪ {} deﬁned by induction on the number of elements in S as follows. (r,c) If S = ∅ then CollageS is the r × c-array whose entries are all equal to the letter . Otherwise let S be the sequence S deprived of its last triple (x, y, p).   p[i − x + 1, j − y + 1] if i ∈ [x, x + r(p) − 1] (r,c) and j ∈ [y, y + c(p) − 1], CollageS [i, j ] =  (r,c) otherwise. CollageS [i, j ] Example 3. The sequence of triples S = {(1, 1, p1 ), (4, 2, p2 ), (4, 2, p3 ), (2, 2, p4 ), (4, 1, p5 )} with the following patches 1 1 p1 =

1 1

2

2

2

2

2

2

2

2

2

2

1

3

1 1

p2 =

p3 =

1

3 3 3

3 4

4

4

4

4

4

3 p4 =

3 3

p5 =

5

5

5

5

5

5

5

5

deﬁnes the collage

p=

1

1

1

2

1

4

4

3

1

4

4

4

5

5

4

5

5

5

3

6

5

5

3

7

5

5

3

2

2

8

We are interested in studying the languages of pictures obtained by applying the collage operation from a recognizable picture language. This requires extending the operation to subsets of pictures.

374

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

Deﬁnition 6. Given a set of patches P ⊆ ∗×∗ , its collage closure is the set (r,c)

Collage(P ) = {p ∈ ( ∪ )∗×∗ | p = CollageS

for some r, c and S}.

(2)

When the resulting picture p does not contain the symbol , it is said to be completely covered by P. 4.1. Closure and non-closure properties The two-dimensional case does not enjoy as nice closure properties as the one-dimensional case, as far as the collage operation is concerned. Indeed, the collage of a set of recognizable patches is not recognizable in general. However, there are a number of hypotheses under which this property still holds. As an immediate consequence of Section 2, this is the case when the patches all have a unique column (or all have a unique row). Another type of restriction is when the stacks associated with a collage have bounded height. Unary alphabets are special in the sense that closure by collage is recognizable, regardless of the set of patches we start with (it may even be non-recursive). We ﬁnally give an example of a set of two patches whose collage is non-recognizable. Proposition 5. Let P ⊆ ∗×1 (resp. P ⊆ 1×∗ ) be a recognizable language of patches. Then the picture language Collage(P ) ⊆ ( ∪ {%})∗×∗ is again recognizable. Proof. Indeed, a picture in Collage(P ) is a row- (column-) concatenation of unidimensional collages. We may conclude by Theorem 1 and by the deﬁnition of recognizable picture languages. We may also bound the height of the stack, which is deﬁned as follows. Let S = (x1 , y1 , p1 ), . . . , (xk , yk , pk ) be a stack of k elements. Intuitively, we may consider the elements as falling on the ground and being prevented from hitting it only by previously fallen other elements occupying a position overlapping their own position. The th element of the stack is placed at a particular integer altitude z 0 such that two elements at the same altitude do not overlap while minimizing the maximum altitude, which by deﬁnition is the height of the stack. Observe that the number of patches covering a particular position may be bounded even if the height is not, think of a staircase for example. (r,c) Formally given a collage CollageS we deﬁne the height h(i, j ) of each pixel 1 i r, 1 j c by induction on the cardinality k of the stack S. If k = 0 then h(i, j ) = 0. Otherwise, let S be the stack S deprived of its last triple (x, y, p) and let h (i, j ) be the height of the pixel (i, j ) in this collage. Then we have by setting I = [x, x + r(p) − 1]× [y, y + c(p) − 1] 1 + max{h (k, ) | (k, ) ∈ I } if (i, j ) ∈ I, h(i, j ) = h (i, j ) otherwise. The height of the stack is the maximum value of h(i, j ) when (i, j ) runs over the picture.

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

375

Proposition 6. Let P ⊆ ∗×∗ be a recognizable language of patches and let k be an integer. The set Collageh k (P ) of collages of P which can be obtained by stacks of height k or less is recognizable. Proof. In the case where k = 1, the proposition is a consequence of [7, Proposition 5.1.] asserting that tilings of recognizable picture language are recognizable. Indeed, a tiling is a collage of patches such that no two patches overlap and that the whole picture is covered by some patch. Then the tiling by the recognizable picture language P ∪ {}∗×∗ is precisely a collage of height 1. Let P be this language. Let f : ( ∪ {})2 → ∪ {} be deﬁned by f (x, y) = x for x = and f (, y) = y otherwise. This function allows us to combine layers of tilings of P by treating as a transparent color. We then have, with the notations of Proposition 4,

Collageh (P ) = F (P , F (P , . . . , F (P , P ) · · ·)) . h−1 times

Using the closure under union allows us to complete the proof.

In the one-letter case, the resulting pictures are binary with e.g., 1 standing for the letter and 0 for the symbol . Theorem 7. If P ⊆ {a}∗×∗ is an arbitrary picture language then the picture language Collage(P ) is recognizable. Proof. Let us start with some elementary observations. If p belongs to P, then each occurrence of a rectangle q is contained in some occurrence of a rectangle p satisfying r(p) r(q) and c(p) c(q). Thus, Collage(P ) equals Collage(Q) where Q is the set of minimal patches in P where minimal is meant componentwise. By Dickson’s Lemma asserting in particular that all subsets of N2 have ﬁnitely many minimal elements, the subset Q is ﬁnite. In the unary case, collage corresponds to taking the logical OR on pixels; thus by Proposition 4 where the function f achieves the logical disjunction of the pixels, we see that it sufﬁces to consider the case where Q is reduced to a unique element. Let P = {a}r×c be a singleton. Consider an element p ∈ Collage(P ). Every pixel (i, j ) of p can belong to a number of occurrences. Since there is only one non-blank letter, the order in which these patches are laid is irrelevant. The number of times a given patch is laid on a given position is also irrelevant. The position of the patches however is relevant. We may therefore consider the set Bi,j of pairs (k, ) such that the rectangle a r×c can be placed in p with its top left corner at position (i − k + 1, j − + 1), that is if the subpicture [i − k + 1 . . . i − k + r] × [j − + 1 . . . j − + c] of p is made of all a’s. Let be the power set of {1, . . . , r} × {1, . . . , c}. A tiling system (, , , h) recognizing Collage(P ) is speciﬁed as follows. The primary alphabet is = {a, } and the auxiliary alphabet is . The projection h maps ∅ to and every other element to a. Consider the following four

376

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

subsets of ( ∪ {#})2×2 :

x y 1 = | ∀k∀ (k, ) ∈ x ∧ < c ⇒ (k, + 1) ∈ y , z t

x y 2 = | ∀k∀ (k, ) ∈ y ∧ > 1 ⇒ (k, − 1) ∈ x , z t

x y 3 = | ∀k∀ (k, ) ∈ x ∧ k < r ⇒ (k + 1, ) ∈ z , z t

x y 4 = | ∀k∀ (k, ) ∈ z ∧ k > 1 ⇒ (k − 1, ) ∈ x . z t The set 1 (resp. 2 , 3 , 4 ) enforces coherent propagation of the hypotheses towards the right (resp. leftwards, downwards and upwards). We set = 1 i 4 i . We make the somewhat untidy convention that a condition of the form (k, ) ∈ x is false whenever x = #. It is clear that all pictures in Collage(P ) are recognized by the tiling system. Conversely, assume a picture is recognized by the system. Then it sufﬁces to observe that if (k, ) is an element of a subset of which labels the pixel at position (i, j ), then all pixels at positions (i + , j + ) satisfying i − k + 1 i + i − k + r and j − + 1 j + j − + c are labeled by a subset containing the element (k + , + ), proving thus that the picture is a union of occurrences of the rectangle. 4.2. The general case We show in this paragraph that even if the language P of patches is ﬁnite, Collage(P ) might no longer be recognizable. Actually we prove it with a set P consisting of two patches of dimension 1 × 3 and 3 × 1, respectively. Indeed, consider the language of pictures over the alphabet {a, b, e} (b for suggesting the beginning and e the end) consisting of the horizontal patch b a e

(3)

and of the vertical patch b a. e

(4)

Theorem 8. The language Collage(P ) where P consists of the two horizontal and vertical patches 3 and 4 is not recognizable. Proof. Given a permutation on the set of integers {1, . . . , p} we construct a picture with 3p −1 rows and 3p +1 columns. This picture is based on a structure composed of p different paths in the discrete plane, each path being itself composed of a horizontal line followed by

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

377

1 2 3 4 1

2

3

4

Fig. 1. The permutation (1) = 1, (2) = 4, (3) = 2, (4) = 3.

1

3 (i)

3i−2

b b a b b a b b a b b a b b b

3p−1

b b a b b a b b a b b a b

e

Fig. 2. A path associated with the pair (i, (i)).

a vertical, see Fig. 1. The coordinates of the end points of these two lines are respectively (3i − 2, 1) and

(3i − 2, 3 (i) + 1),

(3i − 2, 3 (i)) and

((3p − 1, 3 (i)).

Now we view each path of this picture as obtained by pasting one on top of the previous one, from left to right and from top to bottom, occurrences of the horizontal and then of the vertical patch. The horizontal line starts at position (3i − 2, 1), has length 3 (i) + 1 and is covered by occurrences of the horizontal patch with periodic shift resulting in the sequence of labels b, b, a, b, b, a, . . . , b, b, a followed by the ﬁnal sequence b, b, b, e, see Fig. 2. The vertical path starts at position (3i − 2, 3 (i)), has length 3(n − i) + 2 and is covered by occurrences of the vertical patch starting with the sequence of labels b, b and followed by a periodic sequence b, a, b, b, a, . . . , b, b, a, b (Fig. 3).

378

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

1

b

b

b

e

b b 2

3

4

b

b

b

b

a

b

b

a

b

b

a

b

b

b b

b

b

a

b

b

b

b

b

b

e

a

b

b

b

b

b

b

a

b

b

a

b

e

b

b

e

a

b

b

b

b

1

2

3

4

Fig. 3. The 4 paths associated with the permutation of Fig. 1.

1 2 .. . i .. . p−1 p

(1)

...

(i) . . .

(p)

Fig. 4. A general view with a context creating a loop by closing the path connecting i and (i).

Consequently, the picture consists of p different strips built on the previous p paths, which are covered by piling up occurrences of the two patches in such a way that the collage of a patch is done on top of the previous patch. Furthermore, the order of achieving the collage of two strips is irrelevant as they intersect on an element labeled by a letter belonging to both patches. Consider two different permutations and and assume (i) = (i) for some 1 i p. Then it is not difﬁcult to design a context which, as Fig. 4 suggests intuitively, connects (i) back to i and adds the minimum information so that all paths associated with the integer j = i represent a legal collage of patches. This latter is done by simply appending a below e positions (3p − 1, 3 (j )) for all j = i.

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

379

(i,1) b a a e e e e e e e e e a e

b

a b

a e

b e

b

e

b b a b b a b a b (3p − 1,3 (i)) a e a e

Fig. 5. A closer view at the loop.

b

b

b

e

b b b

a

b

b

b

a

b

b

a

b

b

a

b

b

b

a

b

b

e

b

b

e

b

b

a

b

b

b

e

a

e

b

b

b

e

b

b

b

e

b

b

a

b

b

a

b

b

b

e

a

e

b

b

b

b

e

a

a

a

a

e

e

e

e

b

e a

e

a a

e

e

e

e

e

e

e

e

e

e

e

a

e

e Fig. 6. The sub-picture associated with the current permutation surrounded by a context creating a loop.

Since all permutations on {1, . . . , p} deﬁne a picture having context discriminating them among all other permutations, there exist (p!) non-equivalent pictures whose number of rows and columns is in O(p), contradicting thus Proposition 3 and completing the proof (Figs. 5 and 6).

380

C. Choffrut, B. Durak / Theoretical Computer Science 340 (2005) 364 – 380

References [1] A. Apostolico, A. Ehrenfeucht, Efﬁcient detection of quasiperiodicities in strings, Theoret. Comput. Sci. 119 (1) (1993) 247–265. [2] M. Barnsley, Fractals Everywhere, Academic Press, Boston, 1988. [3] M. Blum, C. Hewitt, Automata on two-dimensional tape, in: IEEE Symposium on Switching and Automata Theory, 1967, pp. 155–160. [4] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages: Beyond Words, Vol. 3, Springer, Berlin, 1997, pp. 215–267. [5] A. Habel, H.-J. Kreowski, Collage grammars, in: H. Ehrig, H.-J. Kreowski, G. Rozenberg (Eds.), Proc. Fourth Internat. Workshop on Graph Grammars and their Applications to Computer Science, Lecture Notes in Computer Science, Vol. 532, 1991, pp. 411–429. [6] K. Inoue, I. Takanami, A survey of two-dimensional automata theory, Proc. Fifth Internat. Meeting of Young Computer Scientists, Lecture Notes in Computer Science, Vol. 381, 1990, pp. 72–91. [7] D. Simplot, A characterization of recognizable picture languages by tilings by ﬁnite sets, Theoret. Comput. Sci. 218 (2) (1999) 297–323.

Theoretical Computer Science 340 (2005) 381 – 393 www.elsevier.com/locate/tcs

Codes and soﬁc constraints Marie-Pierre Béal∗ , Dominique Perrin Institut Gaspard-Monge, Université de Marne-la-Vallée, 77454 Marne-la-Vallée Cedex 2, France

Abstract We study the notion of a code in a soﬁc subshift. We ﬁrst give a generalization of the Kraft–McMillan inequality to this case. We then prove that the polynomial of the alphabet in an irreducible soﬁc shift divides the polynomial of any ﬁnite code which is complete for this soﬁc shift. This settles a conjecture from Reutenauer. © 2005 Elsevier B.V. All rights reserved. Keywords: Kraft-McMillan inequality; Soﬁc shifts; Symbolic dynamics; Variable-length codes

1. Introduction There is a rich and fruitful interplay between two theories which arose at ﬁrst independently. One is the theory of automata and formal languages, born in the context of theoretical computer science. The other one is symbolic dynamics which arose from the theory of dynamical systems in topology and probability theory. The theory of variable-length codes is one of the contact points between these domains, with a counterpart in symbolic dynamics in renewal systems and ﬁnite-to-one maps. Antonio Restivo has initiated in [6] a new direction by studying systematically the notion of a code in a subshift of ﬁnite type. In particular, he has studied the relationship between maximal and complete codes, with results that essentially extend those known in the case of the free monoid, or equivalently the full shift in symbolic dynamics. In this paper, we continue this exploration. First of all, we adopt a deﬁnition which is not exactly the same one. To be more speciﬁc, we consider a soﬁc shift S and a subset X of the ∗ Corresponding author.

E-mail addresses: [email protected] (M.-P. Béal), [email protected] (D. Perrin). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.033

382

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

set FS of factors of S. We consider such a set X which is a code. Observe that we do not require, as in [6,7] that X ∗ ⊂ FS . This modiﬁes the subsequent notions of an S-complete or S-maximal code. Actually, our notion has also connections with the deﬁnition of a code in a graph introduced by Christophe Reutenauer in [8], as we shall see below. The case of biﬁx codes was studied by Clelia De Felice in [3]. Codes in subshifts have also a connection with the study of permutation groups in syntactic semigroups (see [5]). We ﬁrst show that one may generalize to this situation the classical Kraft–McMillan inequality. In fact, we associate to each set X of words a series pX (z) which, in the case where S is the full shift, reduces to the generating series of the words of X by length, i.e. n pX (z) = n 1 un z where un is the number of words of length n in X. Let h(S) be the entropy of S and let S be such that h(S) = − log(S ). The precise deﬁnition of the entropy is given in the next section. It uses a logarithm and the same basis has to be used in both deﬁnitions of S and h(S). In particular, S = 1/k when S is the full shift on k symbols. We prove that if X is a code, then pX (S ) 1. Actually, we obtain this result as a corollary of a more general one, corresponding to an assignment of real values to the letters generalizing the notion of a Bernoulli distribution (Theorem 1). We say that X is S-complete if X ⊂ FS ⊂ F (X ∗ ), where F (X ∗ ) denotes the set of factors of the words in X∗ . We prove that when X is regular, then pX (S ) = 1 if and only if X is S-complete. This is again obtained as a corollary of a more general result (Theorem 2). We prove in the second part of the paper a generalization of a result of [8] concerning a multivariate polynomial p(X) associated with a code X, called the determinant of the code. It says that, for a ﬁnite S-complete code, where S is an irreducible soﬁc shift, this polynomial is divisible by the polynomial p(A). The proof uses the results of the previous section, in contrast with the algebraic arguments of [8] using the notion of syntactic category, a generalization of the syntactic semigroup.

2. Codes and soﬁc shifts Let A be a ﬁnite alphabet. We denote by A∗ the set of ﬁnite words and by AZ the set of bi-inﬁnite words on A. A subshift is a closed subset S of AZ which is invariant by the shift transformation (i.e. (S) = S) deﬁned by ((ai )i∈Z ) = (ai+1 )i∈Z . A ﬁnite automaton is a ﬁnite multigraph labeled by a ﬁnite alphabet A. It is denoted A = (Q, E), where Q is a ﬁnite set of states, and E a ﬁnite set of edges labeled by A. A soﬁc shift is the set of labels of all bi-inﬁnite paths on a ﬁnite automaton. A soﬁc shift is irreducible if there is such a ﬁnite automaton with a strongly connected graph. If A is deterministic, for any state p ∈ Q and any word u, there is at most one path labeled u and going out of p. We denote by pu the target of this path when it exists. Irreducible soﬁc shifts have a unique (up to isomorphisms of automata) minimal deterministic automaton, that is a deterministic automaton having the fewest states among all deterministic automata representing the shift. This automaton is called the right Fischer cover of the shift. A subshift of ﬁnite type is deﬁned as the bi-inﬁnite words on a ﬁnite alphabet avoiding a ﬁnite set of ﬁnite words. It is a soﬁc shift. An edge shift is the set of the labels of all bi-inﬁnite paths on a ﬁnite automaton where all edges have distinct labels. The full shift on the ﬁnite alphabet A is the set of all bi-inﬁnites sequences on A, i.e. AZ .

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

383

b a

1

2 a

Fig. 1. The golden mean system.

Let S be a subshift on the alphabet A. We denote by FS the set of ﬁnite factors, or blocks, of words in S. We denote by h(S) the entropy of a S. It is equal to entropy h(L) of the language L = FS , where h(L) = lim sup n→∞

1 log Card(L ∩ An ). n

We denote by S the unique positive real number such that h(S) = log(1/S ). The positive real number S is the convergence radius of the generating series of L by length. Example 1. If S be the full shift on A, then S = 1/Card(A). Example 2. Let S be the irreducible subshift of ﬁnite type on A = {a, b} deﬁned by the ﬁnite set of forbidden sequences I = {bb}. It is the so-called golden mean system, and S is the inverse of the golden mean, solution of 2S = 1 − S . The right Fischer cover of S is represented in Fig. 1. A set of ﬁnite words X on an alphabet A is a code if and only if whenever x1 x2 . . . xn = y1 y2 . . . ym , where xi , yj ∈ X and n, m are positive integers, one has n = m and xi = yi for 1 i n. If X is set of ﬁnite words on a ﬁnite alphabet, X ∗ denotes all ﬁnite concatenations of words of X. Let S be a soﬁc shift. A set X on the alphabet A is said to be complete in S, or S-complete, if X ⊂ FS , and any word in FS is a factor of a word in X ∗ . Observe that we do not require that X∗ ⊂ FS . A code X is S-maximal if X ⊂ FS and it is maximal for this property. Let S be an irreducible soﬁc shift over the alphabet A. Let (Q, E) be its right Fischer cover. Let be the morphism from A∗ into the monoid of Q × Q matrices over the monoid A∗ ∪ {0} deﬁned as follows. For each word u, the matrix (u) is deﬁned by u if pu = q, (u)pq = 0 otherwise. The elements of the matrix (u) can be considered as subsets of A∗ , interpreting 0 as the empty set. The morphism can be extended to the semiring P(A∗ ) of subsets of A∗ by linearity. Thus, it becomes a morphism from P(A∗ ) to the semiring of Q × Q matrices on P(A∗ ). For any subset X of A∗ , we have

(Xn ) = (X)n

384

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

and

(X∗ ) =

n0

(Xn ).

We denote by an assignment of positive real values to the elements of A. We extend it to a semigroup morphism from A∗ ∪ {0} to R. We denote by u → fu (z) the morphism from A∗ into the monoid of Q × Q matrices with coefﬁcients in the polynomial ring R[z], deﬁned for each word w by (u)z|u| if pu = q, fu (z)pq = 0 otherwise. If U is a set of words, we denote by fU (z) the matrix whose coefﬁcients are real power series deﬁned by fU (z) = fu (z). u∈U

Actually, fU (z) can also be considered as a series whose coefﬁcients are real matrices. In this sense, we will talk about the radius of convergence of fU (z) and denote it by (fU (z)). It is the minimum of the radii of convergence of its elements when viewed as a matrix. Note that, for a set of words U, the elements of the matrix fU (z) are obtained from the elements of (U ) as the generating series of the values by . More precisely fU (z)pq = (u)z|u| , where the sum runs over all u ∈ (U )pq . If X is a code, then fXn (z) = (fX (z))n and fX∗ (z) = (I − fX (z))−1 . We consider the polynomial d(z) = det(I − fA (z)). We say that an assignment is admissible if the following condition is satisﬁed: the value z = 1 is a root of d(z) and || 1 for any other root . There is always at least one admissible assignment for each nonempty alphabet. For instance, one can show that the assignment deﬁned by (a) = S for any a ∈ A is admissible (see for example [4]). An admissible assignment can be seen as a generalization of a Bernoulli distribution on the alphabet. Actually, coincide. Indeed, in when S is the full shift on A, both deﬁnitions this case, fA (1) = a∈A (a) and thus is admissible if and only if a∈A (a) = 1. Note that, when is admissible, the radius of convergence of fA∗ (z) is 1. Indeed, for every complex number z such that |z| < 1, det(I − fA (z)) = 0 and thus fA∗ (z) converges. The following example describes the admissible assignments in the case of the golden mean system.

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

385

Example 3. We consider again the golden mean system of Example 2. The morphism is deﬁned by: a 0 0 b (a) = , (b) = . a 0 0 0 Let (a) = p, (b) = q. We have pz 0 0 qz , fb (z) = , fa (z) = pz 0 0 0

fA (z) =

pz qz . pz 0

Hence d(z) = 1 − pz − pqz2 . Thus is admissible if and only if p(1 + q) = 1. In particular, (a) = (b) = S is admissible. Our ﬁrst result is the following statement. Theorem 1. Let S be an irreducible soﬁc shift and let be an admissible assignment. If X ⊂ A+ is a code, then the series fX (z) converges for z = 1 and det(I − fX (1)) 0. Proof. Let A = (Q, E) be the right Fischer cover of S. For any two states p, q ∈ Q, (X∗ )pq ⊆ (A∗ )pq . Thus (fX∗ (z)) (fA∗ (z)). Since is admissible, (fA∗ (z)) = 1. It follows that (fX∗ (z)) 1. Since X is a code, fX∗ (z) = (I − fX (z))−1 . Thus det(I − fX (z)) = 0 for 0 z < 1. Since det(I − fX (0)) = 1, we obtain det(I − fX (z)) 0 for 0 z < 1 by continuity. Again by continuity we conclude that det(I − fX (1)) 0. Example 4. Let S be the golden mean system and let X = {aa, ab}. Let (a) = p, (b) = q be an admissible assignment, i.e. such that p(1 + q) = 1. We have 2 p z pqz2 fX (z) = . p 2 z pqz2 Hence det(I − fX (1)) = 1 − p 2 − pq which is at most equal to 1. We will now prove a complement to Theorem 1 describing the equality case. The proof uses the following lemma stating a classical property on regular languages. If u is word and L a set of ﬁnite words, u−1 L denotes the set {w ∈ A∗ | uw ∈ L}. Lemma 1. If L is a regular language of ﬁnite words, then is a ﬁnite subset P of A∗ there −1 such that the set of factors F (L) of L satisﬁes F (L) ⊆ v,w∈P v Lw −1 . Proof. Let A = (Q, I, T , E) be a ﬁnite automaton recognizing the language L with I the set of initial states, T the set of terminal states, and E the set of edges. We may assume that, vq

wq

for each state q ∈ Q, there exist (i, t) ∈ I × T and two words vq , wq such that i → q → t. u For each word u ∈ F (L), there exist p, q ∈ Q such that p → q. Thus vp uwq ∈ L. This −1 −1 shows that F (L) ⊆ p,q∈Q vp Lwq .

386

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

Theorem 2. Let S be an irreducible soﬁc shift and let be an admissible assignment. If X ⊂ FS is a regular code, then X is S-complete if and only if det(I − fX (1)) = 0. Proof. Let A = (Q, E) be the right Fischer cover of S. Let p be a state of A. If X is S-complete, any word u in FS with pu = p is a factor of a word in X ∗ . If u is factor of word in X, then there are states s, t ∈ Q such that u is factor of a word in (X)st , since X ⊂ FS . If u is not factor of a word in X, by considering the words un , for n 1, we get that u is factor of some word in (X ∗ )st , for some states s, t ∈ Q. It follows that, in any case, there are two states s, t ∈ Q such that u is factor of a word in (X ∗ )st . Since X is regular, (X∗ )st is also regular. It follows from Lemma 1 that, for each p ∈ Q, there is a ﬁnite set of words U such that −1 (A∗ )pp ⊂ v (X ∗ )st w −1 . s,t∈Q v,w∈U

For any s, t ∈ Q, any v, w ∈ A∗ ,

(fv−1 (X∗ )st w−1 (z)) (f(X∗ )st (z)) (fX∗ (z)). Thus (fA∗ (z)) (fX∗ (z)). Since is admissible, (fA∗ (z)) = 1. By Theorem 1, (fX∗ (z)) 1. Hence (fX∗ (z)) = 1. Since X is a regular code, for any two states p, q ∈ Q, fX∗ (z)pq is a rational series with nonnegative real coefﬁcients. By Lemma [2, Lemma 2.3, pp. 82], either fX∗ (z)pq is a polynomial, or the minimal modulus of the poles of fX∗ (z)pq is itself a pole. Since fX∗ (z) = (I − fX (z))−1 , it follows that det(I − fX (z)) vanishes at 1. Conversely, let us assume that X is not S-complete. There is a word u ∈ FS such that u is not factor of any word in X∗ . Thus X ∗ ⊆ A∗ − A∗ uA∗ . Because S is irreducible and assigns a positive real value to each letter, the matrix fA (1) is an irreducible nonnegative real matrix. Thus, it follows from Proposition [4, Theorem 4.4.7] that, for p, q ∈ Q, (fA∗ −A∗ uA∗ (z)pq ) > (fA∗ (z)pq ). We get that, for any states p, q ∈ Q, 1 = (fA∗ (z)) (fA∗ (z)pq ) < (fA∗ −A∗ uA∗ (z)pq ) (fX∗ (z)pq ). Hence (fX∗ (z)) > 1. This implies that det(I − fX (1)) > 0. We now derive from Theorems 1 and 2 two corollaries which constitute a generalization of the Kraft–McMillan inequality. Let be the admissible assignment deﬁned by (a) = S for any a ∈ A. For a subset X of A∗ , we denote for convenience pX (z) = 1 − det(I − fX (z/S )). Thus pX ( S ) = 1 − det(I − fX (1)). When S is the full shift on k symbols S = 1/k and pX (z) = n 1 un zn , where un is the number of words of length n in X. Thus the inequality pX (S ) 1 takes the form n 1 un k −n 1, which is the Kraft–McMillan inequality. Corollary 3. Let S be an irreducible soﬁc shift. If X ⊂ A+ is a code, then pX (S ) 1.

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

387

a b

1

2 a

Fig. 2. The even system.

Corollary 4. Let S be an irreducible soﬁc shift and let X ⊂ FS be a regular code. The code X is S-complete if and only if pX (S ) = 1. The following examples illustrate the different possible cases. The ﬁrst two ones give examples of S-complete codes. Example 5. Let S be the golden mean system. The set X = a + ab is both a code and an S-complete set. We have pX (z) = z + z2 and thus pX (S ) = 1. Example 6. Let S be the golden mean system again. Let X = aa + ab + ba. The set X is an S-complete code since it is formed of all factors of length 2 of S. We have pX (z) = 3z2 − z4 and pX (S ) = 32S − 4S = 1. The last example shows a case of a code which is not S-complete. Example 7. Let S be the even system represented in Fig. 2. The set X = b(a 2 )∗ b is a code. It is not S-complete. The value of S is the same as for the golden mean system. We have 2 2 ∗ z (z ) 0 , fX (z/S ) = 0 0 and

z2 pX (z) = 1 − (1 − z (z ) ) = 1 − 1 − 1 − z2 √ Hence pX (S ) = 2/(1 + 5) < 1. 2

2 ∗

=

z2 . 1 − z2

We brieﬂy investigate the relation between the extremal properties of being S-complete and S-maximal. It is shown in [7] (and it is also a consequence of a result of [1]) that, when S is a subshift of ﬁnite type, any S-maximal code is S-complete (with the deﬁnitions of an S-code given in [7]). The result still holds for shifts of ﬁnite type with our deﬁnitions of S-maximal and S-complete codes. Conversely, there is an example in [8] of an S-complete code which is not S-maximal. Indeed, consider the shift S described in Fig. 3. The code X = {ab} is S-complete and not S-maximal since it is included in {ab, ba}.

388

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

b 1

2 a

Fig. 3. A shift of null entropy.

3. Factorization In this section, we consider the case where X is a ﬁnite code, and S is an irreducible soﬁc shift with A as right Fischer cover. We prove that the polynomial of the alphabet for S divides the polynomial of any ﬁnite code X which is complete for S. Both polynomials are deﬁned below. This settles a conjecture from Reutenauer given in [8, pp. 150]. In [8, pp. 150], Reutenauer proves the same result when S is an edge shift satisfying a constraint called the condition (0): each state of its right Fischer cover has a loop. Let us denote by the morphism obtained from by taking the commutative image of the elements. For any ﬁnite code X, the polynomial p(X) = det(I − (X)) is in Z[A], the set of polynomials over Z in the commuting variables a in A. The polynomial det(I − fX (z)) and pX (z) are of course closely related to p(X). Indeed, for any point x = ((a)z)a∈A , where is an assignment, det(I − fX (z)) = p(X)(x), where p(X)(x) denotes the value of p(X) at the point x. Theorem 5. Let S be an irreducible soﬁc shift. When X is a ﬁnite S-complete code, the polynomial p(X) is a multiple of the polynomial p(A). Proof. We ﬁrst assume that S is an irreducible edge shift. It is known that det(I − (A)) is equal to 1+ (−1)k (c1 ) . . . (ck ), k 1

c1 ...ck

where the second sum is over all simple cycling paths c1 , . . . , ck of S which do not share any state, two by two. If c is such a simple cycling path, (c) denotes its label seen as a monomial of N[A]. Since S is an irreducible edge shift, the partial degree of p(A) in each letter is at most 1 and all its monomials have coefﬁcients 1 or −1. Moreover, it is proven in [8, Theorem 3] that p(A) is an irreducible polynomial of Z[A]. Let a be a letter appearing in a simple cycling path c with (c) = au, and B = A − {a}. Thus u is a word of commutative letters in B. It follows from the above remarks that p(A) = −au(1 − q) + 1 + r, where q and r are polynomials in Z[B] with no constant term. The polynomials p(X) and p(A) can be seen as polynomials in a with coefﬁcients in Z[B]. We now divide p(X) by p(A) in Z[A, 1/(u(1 − q))]. Thus p(X) = p(A)s + t,

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

389

where s and t are polynomials in Z[A, 1/(u(1 − q))] which are respectively the quotient and the rest of this division. The degree of t in a is zero. It follows that there is a positive integer n such that p(X)(u(1 − q))n = p(A)s + t , with s ∈ Z[A], t ∈ Z[B]. Let be an admissible assignment. We denote by the point = ((a) = S )a∈A . Let w = ((b) = S )b∈B . We get p(A)() = det(I − (A))() = det(I − fA (1)) = 0. Hence (au(1 − q))() = (1 + r)(), or

S =

(1 + r)(w) . (u(1 − q))(w)

We denote by B(w, ) the ball of dimension Card(B) and radius > 0 centered at w. A positive ball is a ball containing only points with positive coefﬁcients. There is a positive ball B(w, ) such that, for any point denoted x = (x1 , . . . , x|B| ) in B(w, ),

(1 + r)(x) > 0, (u(1 − q))(x)

and the assignment x deﬁned by

x (b) = xb if b ∈ B, x (a) =

(1 + r)(x) , (u(1 − q))(x)

is admissible. Indeed, since S is irreducible, by the Perron–Frobenius theorem, z = 1 is a simple root of det(I − fA (z)) with the assignment . Moreover, the roots of modulus 1 are e2i/k for 0 k m − 1, where m is a positive integer. By deﬁnition of x , z = 1 is still a root of det(I − fA (z)) with the assignment x . When x is close enough to w, it is the single positive real root of modulus less than or equal to 1. Thus, again by the Perron–Frobenius theorem, any root of det(I − fA (z)) with the assignment x has a modulus greater than or equal to 1. Hence x is admissible. By Theorem 2, p(X)() = 0 and p(A)() = 0 for any = ((a))a∈A where is an admissible assignment. We get that t vanishes at any point in B(w, ). Because B(w, ) has dimension Card(B), we conclude that t vanishes, and p(A) divides p(X)(u(1 − q)n ). Since p(A) is irreducible and has a monomial containing the letter a, p(A) divides p(X) in Z[A]. We now extend the result for irreducible edges shifts to irreducible soﬁc shifts. Let S be an irreducible soﬁc shift and let A be its right Fischer cover. We denote by A the automaton obtained from A as follows. For any a ∈ A, if there are m edges labeled by a in A, one labels these edges by a1 , . . . , am , respectively in A . We denote by A the ﬁnite alphabet formed of the letters with indices. Since all edges of A have distinct labels, A is the right Fischer cover of an irreducible edge shift S . We denote by the map assigning

390

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

a to each ai , for any a ∈ A, and its extension as a morphism from A∗ to A∗ . Note that, u

u

for each path p → q in A, there is a unique path p → q in A with (u ) = u, since A is deterministic. Let X be a ﬁnite S-complete code. We deﬁne the ﬁnite set of words X as the set of labels v of all paths in A such that (v) ∈ X. We claim that X is a code. Indeed, whenever u1 u2 . . . ur = v1 v2 . . . vs , with r, s positive integers, and ui , vj ∈ X , we have

(u1 )(u2 ) . . . (ur ) = (v1 )(v2 ) . . . (vs ). Since X is a code, r = s and (ui ) = (vi ) for any 1 i r. It follows that |ui | = |vi | for any 1 i r and ﬁnally ui = vi . We claim that X is a ﬁnite S -complete code. Since A is a minimal deterministic automaton recognizing the soﬁc shift S, it has a strongly connected graph and a synchronizing word (or reset sequence) u. A synchronizing word in a deterministic automaton is a word u such that the cardinality of the set Qu = {pu | q in A} is one. Hence Qu = {q0 }. Moreover u is the label of a path from p to q0 in A, where p is some state in Q. w

w

Let w be a factor in FS . There is path q → r in A and a path q → r in A with (w ) = w. Let x, y and z be words of A∗ labels of paths from q0 to p, from q0 to q, and r to p, respectively. Hence we have in A the path x

y

u

y

w

u

q0 → p → q0 → q → r → p → q0 . Since X is S-complete, xuywzu is a factor of a word in X ∗ . Moreover, there are two states s, t ∈ Q such that xuywzu is factor of a word in (X ∗ )st (see the beginning of proof of Theorem 2). As a consequence, and since u is synchronizing, there are words g, h ∈ A∗ such that g

y

w

z

u

h

s → q0 → q → r → p → q0 → t is a path in A labeled by a word in X∗ . Let g

y

w

z

u

h

s → q0 → q → r → p → q0 → t be the unique path in A such that (l ) = l for l = g, y, w, z, u, and h. We have w

w

g y w z u h ∈ X ∗ . Moreover since q → r and q → r are paths in A and (w ) = (w ), w = w . It follows that w is factor of a word in X∗ . Finally, we apply the result obtained in the case of irreducible edge shifts to S and X . It follows that p(A ) divides P (X ) in Z[A ]. By removing the indices of the letters in A (or, equivalently, by applying ), we obtain that p(A) divides P (X) in Z[A]. The following example illustrates Theorem 5 in the case of an irreducible edge shift. Example 8. If S is the irreducible edge shift described in Fig. 4, then a 0 0 b 0 0 , (b) = , (c) = . (a) = 0 0 0 0 c 0

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

391

b a

1

2 c

Fig. 4. An edge shift.

b a1

1

2 a2

Fig. 5. The edge shift S .

Let X be the ﬁnite S-complete code {aa, ab, ca, cb, bc}. We get 2 a + bc ab . (X) = ca cb We have p(X) = 1 − a 2 − 2bc + b2 c2 = (1 + a + bc)(1 − a − bc) = (1 + a + bc)p(A). The next example illustrates the proof of Theorem 5 in the case of an irreducible subshift of ﬁnite type which is not an edge shift. Example 9. If S be the golden mean subshift, then a 0 0 b (a) = , (b) = . a 0 0 0 Let X be the ﬁnite S-complete code X = {aa, ab, aab}. The shift S is recognized by the automaton in Fig. 5. Note that S is, up to a renaming of the alphabet, the edge shift of Example 8. We have X = {a1 a1 , a1 b, a1 a1 b, a2 a1 , a2 b, a2 a1 b}, and a a a b + a1 a1 b (X ) = 1 1 1 . a 2 a1 a2 b + a 2 a1 b We get p(X ) = (1 + a1 )(1 − a1 − a2 b) = (1 + a1 )p(A ). As a consequence p(X) = (1 + a)p(A), p(A) = (1 − a − ab).

392

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

b a

1

2

a

c Fig. 6. A soﬁc shift.

We may observe that the factorization of p(X) can be obtained from the following factorization of the matrix (I − (X)): I − (X) = (I + (a))(I − (A))(I + (b)). This phenomenon is linked, in the case of the full shift, to Reutenauer’s non-commutative factorization theorem [2]. We do not know whether this theorem holds for shifts of ﬁnite type. The last example below shows an irreducible soﬁc shift S which is not a shift of ﬁnite type. We give an example of an S-complete code X for which there are two ways to ﬁnd the factorization of p(X). Example 10. If S is the irreducible soﬁc shift described in Fig. 6, then a 0 0 b 0 0 (a) = , (b) = , (c) = . 0 a 0 0 c 0 Let X be the ﬁnite S-complete code {aa, ab, ac, ba, bc, cb, ca}. We get 2 a + bc ab + ba (X) = . ac + ca a 2 + cb We have p(X) = 1 − 2a 2 − 2bc + a 4 + b2 c2 − 2a 2 bc = (1 − 2a − bc + a 2 )(1 + 2a + a 2 − bc) = p(A)(1 + 2a + a 2 − bc). From 1 − A2 = (1 − A)(1 + A), we get I − (A2 ) = (I − (A))(I + (A)). It follows that det(I − (A2 )) = det(I − (A)) det(I + (A)). Since det(I − (A2 )) = p(X), det(I − (A)) = p(A), and (A) = ( ac ab ), we recover the factorization of p(X): p(X) = det(I − (A2 )) = det(I − (A)) det(I + (A)) = p(A)(1 + 2a + a 2 − bc).

M.-P. Béal, D. Perrin / Theoretical Computer Science 340 (2005) 381 – 393

393

Acknowledgements We thank an anonymous reviewer for constructive suggestions which helped us to improve the presentation of this article. References [1] J. Ashley, B. Marcus, D. Perrin, S. Tuncel, Surjective extensions of sliding-block codes, SIAM J. Discrete Math. 6 (1993) 582–611. [2] J. Berstel, Ch. Reutenauer, Rational Series and their Languages, Springer, Berlin, 1988. [3] C. De Felice, Finite bipreﬁx sets of paths in a graph, Theoret. Comput. Sci. 58 (1988) 103–128 Thirteenth Internat. Colloq. on Automata, Languages and Programming, Rennes, 1986. [4] D.A. Lind, B.H. Marcus, An Introduction to Symbolic Dynamics and Coding, Cambridge, 1995. [5] D. Perrin, G. Rindone, Syntactic groups, Bull. Belgium Math. Soc., to appear. [6] A. Restivo, Codes and local constraints, Theoret. Comput. Sci. 72 (1990) 55–64. [7] A. Restivo, Codes with constraints, in: Mots, Lang. Raison. Calc., Hermès, Paris, 1990, pp. 358–366. [8] Ch. Reutenauer, Ensembles libres de chemins dans un graphe, Bull. Soc. Math. France 114 (1986) 135–152.

Theoretical Computer Science 340 (2005) 394 – 407 www.elsevier.com/locate/tcs

Small size quantum automata recognizing some regular languages夡 Alberto Bertoni, Carlo Mereghetti∗ , Beatrice Palano Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy

Abstract Given a class {p | ∈ I } of stochastic events induced by M-state 1-way quantum ﬁnite automata (1qfa) on alphabet , we investigate the size (number of states) of 1qfa’s that -approximate a convex linear combination of {p | ∈ I }, and we apply the results to the synthesis of small size 1qfa’s. We obtain: • An O((Md/3 ) log2 (d/2 )) general size bound, where d is the Vapnik dimension of {p (w) | w ∈ ∗ }. • For commutative n-periodic events p on with || = H , we prove an O((H log n/2 )) size bound for inducing a -approximation of 21 + 21 p whenever F(p) ˆ 1 nH , where F(p) ˆ is the discrete Fourier transform of (the vector pˆ associated with) p. • If the characteristic function L of an n-periodic unary language L satisﬁes F(ˆL )) 1 n, then L is recognized with isolated cut-point by a 1qfa with O(log n) states. Vice versa, if L is recognized with isolated cut-point by a 1qfa with O(log n) state, then F(ˆL )) 1 = O(n log n). © 2005 Elsevier B.V. All rights reserved. Keywords: Stochastic events; Quantum automata

夡 Partially supported by MURST, under the projects “Linguaggi formali: teoria ed applicazioni” and “FIRB: Complessità descrizionale di automi e strutture correlate”.

∗ Corresponding author.

E-mail addresses: [email protected] (A. Bertoni), [email protected] (C. Mereghetti), [email protected] (B. Palano). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.032

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

395

1. Introduction One-way quantum ﬁnite automata (1qfa, for short) [2,4,7,8] are particularly interesting computational devices since they represent a theoretical model for a quantum computer with ﬁnite memory. 1qfa’s exhibit both advantages and disadvantages with respect to their classical (deterministic or probabilistic) counterpart. Basically, quantum superposition offers some computational advantages on probabilistic superposition. On the other hand, quantum dynamics are reversible: because of limitation of memory, it is sometimes impossible to simulate deterministic automata by quantum automata. In this paper, we develop techniques for constructing small size 1qfa’s, possibly more succinct than equivalent deterministic or probabilistic automata [13,16,18]. Given a 1qfa A on input alphabet , its behavior is the stochastic event pA : ∗ → [0, 1], where pA (w) is the probability that A accepts w. The language accepted by A with cutpoint is the set LA, = {w ∈ ∗ | pA (w) > }; the cut-point is isolated by > 0 if |pA (w) − | , for every w ∈ ∗ . First of all, we study the problem of approximating stochastic events by using (measureonce [3,6,10]) 1qfa. More precisely, we investigate the following problem: given a family {p | ∈ I } of stochastic events induced by M-state 1qfa’s A on input alphabet , ﬁnd a “succinct” 1qfa A inducing a -approximation of a convex linear combination q of p ’s, i.e., satisfying |pA (w) − q(w)| , for every w ∈ ∗ . After giving preliminary notions in Section 2, we formulate our problem as a problem of uniform convergence of empirical averages to their expectations in Section 3. By using general results (see, e.g., [1]), we prove an O((Md/3 ) log2 (d/2 )) bound on the number of states for 1qfa’s -approximating q, where d is the Vapnik dimension of the class {p (w) | w ∈ ∗ }. As we will brieﬂy observe at the end of the section, our technique can be directly used to solve the same problem for probabilistic automata. In Section 4, we specialize the previous result on a particular subclass of stochastic events: the n-periodic commutative events. An event : ∗ → [0, 1] is called n-periodic commutative if, for every w ∈ ∗ , (w) depends only on the number modulo n of occurrences in w of each symbol in . In this case, we prove a bound O((M||/2 ) log n) for 1qfa’s inducing -approximations of convex linear combinations of n-periodic commutative events on the alphabet . In Section 5, we relate the 1 -norm of the discrete Fourier transform of any given event to its approximability by 1qfa’s with O(log n) states. As an application, we consider the languages Ln,H ⊆ ∗ , with || = H , consisting of those words for which the number of occurrences of each symbol in is a multiple of n. We prove that Ln,H is recognizable with isolated cut-point by an O(H log n)-state 1qfa, while every nondeterministic automaton recognizing Ln,H requires at least nH states. In Section 6, the unary case (i.e., || = 1) is studied. We show that if the 1 -norm of the discrete Fourier transform of the characteristic function of an n-periodic unary language L does not exceed n, then L is recognized with isolated cut-point by a 1qfa with O(log n) states. Vice versa, if an n-periodic unary language L is recognized with isolated cut-point by a 1qfa with O(log n) state, then the 1 -norm of the discrete Fourier transform of the characteristic function of L does not exceed O(n log n). As an application, we consider the languages Ln,1 , and we compare Q(n) with S(n), where Q(n) (S(n)) is the

396

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

minimum number of states of 1qfa’s (probabilistic automata) accepting Ln,1 . We prove that S(n)/Q(n) = ((log n/ log log n)). Moreover, if n factorizes in a constant number of prime factors, then S(n) is “exponentially greater” than Q(n). 2. Preliminaries 2.1. Linear algebra We quickly recall some notations of linear algebra. For more details, we refer the reader to, e.g., [11,12]. We denote by C the ﬁeld of complex numbers and by Cn×m the set of n × m matrices with entries in C. Given a complex number z ∈ C, its conjugate is denoted by z, and its √ modulus is |z| = zz. The adjoint of a matrix M ∈ Cn×m is the matrix M † ∈ Cm×n , where Mij† = Mj i . For matrices A ∈ Cn×n and B ∈ Cm×m and for vectors ∈ C1×n and ∈ C1×m , their direct sum is, respectively, A 0 A⊕B = , ⊕ = (1 , . . . , n , 1 , . . . , m ). 0 B A Hilbert space of dimension n is the linear space C1×n equipped with sum and product by elements in C, in which the inner product ( , ) = † is deﬁned. If ( , )√= 0 we say that is orthogonal to . The norm of vector ∈ C1×n is deﬁned as = ( , ). Two subspaces X, Y are orthogonal if any vector in X is orthogonal to any vector in Y; in this case, the linear space generated by X ∪ Y is denoted by XY . A matrix M ∈ Cn×n is said to be unitary whenever MM † = I = M † M, where I is the identity matrix; moreover, a matrix is unitary if and only if it preserves the norm, i.e., M = for each vector ∈ C1×n . The eigenvalues of unitary matrices are complex numbers of modulus 1, i.e., they are in the form eiϑ , for some real ϑ. M is said to be Hermitian whenever M = M † . Given a Hermitian matrix O ∈ Cn×n , let c1 , . . . , cs be its eigenvalues and E1 , . . . , Es the corresponding eigenspaces. It is well-known that each eigenvalue ck is real, that Ei is orthogonal to Ej , for any i = j , and that E1 · · · Es = C1×n . Each vector ∈ C1×n can be uniquely decomposed as = 1 + · · · + s , where j ∈ Ej . The linear transformation → j is the projector Pj on the subspace Ej . It is easy to see that s j =1 Pj = I . The Hermitian matrix O is biunivocally determined by its eigenvalues and its eigenspaces or, equivalently, by its projectors: in fact, we have that O = c1 P1 + · · · + cs Ps . We denote by N the set of non-negative integers, Z the set of integers and, for the sake of readability, we let xn = x mod n, for any x ∈ Z. We let Zn = {xn | x ∈ Z} equipped with operations modulo n. 2.2. Axiomatic for quantum mechanics in short Here, we use the previous formalism to describe quantum systems. Given a set Q = {q1 , . . . , qm }, every qi can be represented by its characteristic vector ei ∈ {0, 1}1×m having 1m at the ith position and 0 elsewhere. A quantum state on Q is a superposition = k=1 k ek , where the coefﬁcients k are complex amplitudes and

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

397

= 1. Every ek is called pure state. Given an alphabet = { 1 , . . . , H }, with every symbol i we associate a unitary transformation U ( i ) : C1×m → C1×m . An observable is described by an m × m Hermitian matrix O = c1 P1 + · · · + cs Ps . Suppose that, at a given time, a quantum system is described by the quantum state . Then, we can operate: (1) Evolution U ( i ). The new quantum state = U ( i ) is reached; this dynamics is reversible, since = U † ( i ). (2) Measurement of O. Every result in {c1 , . . . , cs } can be obtained; cj is obtained with probability Pj 2 and the state after such a measurement is Pj / Pj . The state transformation induced by a measurement is typically irreversible. 2.3. One-way quantum ﬁnite automata, stochastic events and languages Several models of quantum automata have been proposed in the literature. Basically, they differ in the measurement policy [2,4,7,8]. In this paper, we consider only the measure-once model. Measure-once 1qfa’s [3,6,10] are the simplest model of quantum automata. In this model, the transformation on a symbol of the input alphabet is realized by a unitary operator, and a unique measurement is performed at the end of computation. In what follows, we will simply write 1qfa, understanding the designation “measure-once”. Let ∗ be the free monoid of words generated by the ﬁnite alphabet . For any w ∈ ∗ , we denote by #

(w) the number of occurrences of the symbol ∈ within w. Clearly, the length of w is ∈ # (w). A stochastic event on ∗ is a function p : ∗ → [0, 1]. A 1qfa with q control states on the input alphabet is a system A = (, {U ( )} ∈ , P ), where ∈ C1×q , for each ∈ , U ( ) ∈ Cq×q is a unitary matrix, and P ∈ Cq×q is a projector that biunivocally determined the observable O = 1 · P + 0 · (I − P ). For the sake of simplicity, we will denote the family {U ( )} ∈ by simply writing U ( ). The stochastic event induced by A is the function pA : ∗ → [0, 1] deﬁned, for any

1 · · · k ∈ ∗ , by k 2 pA ( 1 · · · k ) = U ( i ) P . i=1

(1)

Sometimes, it will be more convenient to specify the 1qfa A in the equivalent form A = (, U ( ), F ), where F ⊆ {1, . . . , q} indexes the (ﬁnal) states spanning the subspace onto which P projects. In this case, the event induced by A writes as 2 k pA ( 1 · · · k ) = U ( i ) . j ∈F i=1 j

(2)

The reader may easily verify that Eq. (2) coincides with Eq. (1). Given an event p : ∗ → [0, 1] and a real ∈ [0, 1], the language L ⊆ ∗ deﬁned by p with cut-point is the set L = {w ∈ ∗ | p(w) > }.

398

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

The cut-point is said to be isolated if there exists a positive real such that |p(w) − | , for any w ∈ ∗ . Moreover, if p is induced by the 1qfa A, then L is said to be recognized by A with cut-point (isolated by ). 2.4. Uniform convergence of empirical averages of random variables to their expectations Bernoulli’s theorem (see, e.g., [15]) states that the relative frequencies of an event A in a sequence of independent trials converges, in probability, to the probability of A. More precisely, given a space I on which a probability measure P is deﬁned, let be A ⊂ I and A : I → {0, 1} its characteristic function. Observe that the expectation E[A ] is the (S) of independent trials x , . . . , x , the empirical probability P A of A and, for a sequence C 1 S S average 1/S t=1 A (xt ) is the relative frequency A (C (S) ) of the elements of A in C (S) . Bernoulli’s theorem states that, for every probability distribution P on I, we have lim Prob |A (C (S) ) − PA | ε = 0 for every ε > 0. S→∞

In [19,20], the more general problem of uniform convergence of relative frequencies to their probabilities is studied. For a class D ⊂ 2I , we say that the uniform convergence of relative frequencies to their probability holds for D if and only if, for every probability distribution P on I, we have

lim Prob sup {|A (C (S) ) − PA |} ε = 0 for every ε > 0. S→∞

A∈D

To characterize the classes D for which the uniform convergence of relative frequencies to their probability holds, the relevant combinatorial measure called Vapnik–Chervonenkis dimension is introduced in [20]: A set of points {x1 , x2 , . . . , xt } is shattered by D if {(A (x1 ), A (x2 ), . . . , A (xt )) | A ∈ D} = {0, 1}t . The maximal cardinality of sets shattered by D is called Vapnik–Chervonenkis dimension of D (VC-dim(D), for short). The main result in [20] states that the uniform convergence of relative frequencies to their probability holds for D if and only if VC-dim(D) < ∞. Several attempts have been made to extend the VC-dim to arbitrary random variables. Here, we are interested in random variables of the form f : I → [0, 1]. In this framework, a useful measure is the Vapnik dimension: Deﬁnition 1. Given a class B of functions f : I → [0, 1] and ∈ (0, 1), a subset A ⊂ I is said to be shattered by B if, for every X ⊂ A, there exists g ∈ B for which x ∈ X implies g(x) , and x ∈ A − X implies g(x) < . Then the Vapnik dimension V-dim(B) is the maximal cardinality of shattered subsets of I. If B is ﬁnite, a simple bound for V-dim(B) is easily seen to be V-dim(B) log |B|.

(3)

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

399

The following theorem gives a quantitative measure of uniform convergence of empirical averages of random variables f : I → [0, 1] to their expectation. It is an immediate consequence of Theorem 3.6 and Lemmas 2.3 and 2.4 in [1]: Theorem 1 (Alon et al. [1]). Let B be the class of functions {fw : I → [0, 1] | w ∈ ∗ }, and P a probability distribution over I. Let (w) be the expectation of fw according to P, and S (w) = 1/S St=1 fw (t ) an empirical average, where 1 , . . . , S are drawn independently at random according to P. Then, for every probability distribution P and every , > 0, we get Prob

sup | S (w) − (w)| <

w∈∗

for S=O

d

3

log2

d

2

+

1

2

log

1

and d = V-dim(B).

3. Approximating the convex closure of classes of stochastic events: the general case The problem we shall be dealing with concerns the analysis of 1qfa’s whose induced events approximate given stochastic events in the following sense: Deﬁnition 2. A -approximation in L∞ of a given stochastic event p : ∗ → [0, 1] is any stochastic event q : ∗ → [0, 1] satisfying sup {|p(w) − q(w)|} .

w∈∗

Given a family = { : ∗ → [0, 1] | ∈ I } of stochastic events induced by M˜ be the convex closure of , i.e., the class of stochastic state 1qfa’s ( , U ( ), P ), let events obtained as convex linear combination (w) = ∈I b (w), with b 0 and ∈I b = 1. We are interested in estimating the number of states of 1qfa’s inducing stochastic events ˜. that -approximate ∈ Since b 0 and ∈I b = 1, we can interpret b ’s as a probability distribution on I. Then, for any w ∈ ∗ , (w) becomes a random variable with expectation E[ (w)] =

∈I

b (w) = (w).

400

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

We can approximate such an expectation by an empirical average of the events in . To this purpose, we design the following algorithm: ALGORITHM 1 for t := 1 to S do [t] := independently chosen in I with probability b ; output the 1qfa Adeﬁned as 1 A= [t] , U[t] ( ), P[t] . S [t] [t] [t] It is easy to verify that the 1qfa A output by the previous algorithm has S · M states, and induces the stochastic event S : ∗ → [0, 1] deﬁned, for any w ∈ ∗ , as

S (w) =

S 1 (w). S t=1 [t]

Moreover, notice that S is an empirical average of the events in . Now, if

Prob sup | S (w) − (w)| < 1 w∈∗

(4)

holds true, then the existence of a 1qfa—with (S · M) states—inducing a -approximation of the given stochastic event is guaranteed. Estimating

S 1 Prob sup (w) − E[ (w)] S t=1 [t] w∈∗

is a classical problem of uniform convergence of empirical averages to their expectations, a problem addressed in Section 2. A general solution in terms of the Vapnik dimension of the class of random variables { (w) | w ∈ ∗ } directly follows from Theorem 1: Theorem 2. Let { (w) | ∈ I } be a class of stochastic events induced by M-state 1qfa’s, with d = V-dim({ (w) | w ∈ ∗ }). Then every convex linear combination (w) = ∈I b (w) can be -approximated by a 1qfa with O((Md/3 ) log2 (d/2 )) states. To apply this result to the synthesis of small size 1qfa’s, we must require that: (1) The Vapnik dimension of the family must be ﬁnite. (2) The class of the events given by convex linear combinations of events in the family must not be trivial. In the next section, we consider a class of events satisfying both these conditions. We end this section with a quick comment on the applicability of the technique here presented in the realm of probabilistic automata. A probabilistic automaton is similar to a 1qfa: the main difference is that its transition matrices and superpositions are stochastic instead of unitary (we refer to, e.g., [16,18] for details). As the reader may easily verify, our technique can be directly used to evaluate the size of probabilistic automata -approximating

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

401

convex linear combinations of stochastic events, thus obtaining the analogue of Theorem 2 for probabilistic automata. 4. The commutative periodic case We recall that a language is recognized with isolated cut-point by a 1qfa if and only if it is a group language [3,6], i.e., it can be recognized by a deterministic automaton where, for any input symbol, the corresponding transition function is a permutation [17]. In this section, we consider the case where all such permutations commute. This naturally leads to the following. Deﬁnition 3. Given an alphabet = { 1 , 2 , . . . , H }, a stochastic event : ∗ → [0, 1] ˆ : Zn H → [0, 1] such that, is said to be n-periodic commutative if there exists a function ∗ for any w ∈ , we have

(w) = ˆ (# 1 (w)n , # 2 (w)n , . . . , # H (w)n ). ˆ can be viewed as a real vector whose components are indexed by Zn H . Hence, From now on, we will always denote by pˆ the vector associated with the periodic commutative event p, according to Deﬁnition 3. Now let = { (w) | ∈ I } be a class of n-periodic commutative events induced by M-state 1qfa’s, and set B = { (w) | w ∈ ∗ }. Since ˆ (k1 , k2 , . . . , kH ) | 0 k1 , k2 , . . . , kH < n}, { (w) | w ∈ ∗ } = { we have that |B| nH . By directly using the simple bound of inequality (3), we get V-dim(B) H log n. Hence, from Theorem 2, we get that we can -approximate any convex linear combination of events in by 1qfa’s with O((M · H log n/3 )(log log n + log (H /2 ))2 ) states, i.e., almost logarithmic in n. We can improve such a bound with a simple direct approach. We use Höffdings’inequality [9]: If Xi ’s are i.i.d. random variables with values in [0, 1] and expectation , then for any S 1 S

1 2 Prob Xi − 2e−2 S . (5) S i=1 This tool enables us to prove Theorem 3. Given a family of n-periodic commutative events induced by M-state 1qfa’s on an alphabet with H symbols, any event in the convex closure of can be -approximated by the event induced by a 1qfa with O((M · H /2 ) log n) states. ∗ Proof. Let = { 1 , . . . , H }, and let = { : → [0, 1] | ∈ I } be the class of nperiodic commutative events. Let (w) = ∈I b (w) be a convex linear combination

402

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

of events in . By using the construction in Algorithm 1, we are able to realize the event

S (w) such that

Prob sup | S (w) − (w)| w∈∗

= Prob

n

H

max

0 k1 ,...,kH
max

0 k1 ,...,kH
ˆ (k1 , . . . , kH ) − ˆ (k1 , . . . , kH )|} {| S

ˆ (k1 , . . . , kH ) − ˆ (k1 , . . . , kH )| }} {Prob{| S

(by union bound)

nH 2e−2

2

S

By requiring nH 2e−2

2

(by Höffdings’ inequality (5)). S

< 1, we get the result.

5. Approximating a family of periodic commutative events In this section, we study a class of n-periodic commutative events that are approximable by events induced by O(log n)-state 1qfa’s. In particular, we investigate the relation between such an approximability and the 1 -norm of the discrete Fourier transform of these events. We ﬁrst need to brieﬂy recall the notion of multidimensional discrete Fourier transform. Given an alphabet = { 1 , . . . , H }, let p : ∗ → [0, 1] be an n-periodic commutative event, and pˆ its associated vector. The discrete Fourier transform of pˆ is the complex vector P = F(p), ˆ where P : Zn H → C and P (j1 , . . . , jH ) = p(k ˆ 1 , . . . , kH )ei(2/n)(k1 j1 +···+kH jH ) . 0 k1 ,...,kH
By the well-known inversion formula, we have 1 P (j1 , . . . , jH ) e−i(2/n)(k1 j1 +···+kH jH ) . (6) nH 0 k1 ,...,kH
Theorem 4. Let p : ∗ → [0, 1] be an n-periodic commutative event on an alphabet with H symbols. Then, the event 21 + 21 (nH / F(p) ˆ 1 )p is -approximable by the event 2 induced by a 1qfa with O(H log n/ ) states. Proof. If = { 1 , . . . , H }, let pˆ : Zn H → [0, 1] be the vector associated with the n-periodic commutative event p, and P = F(p). ˆ Set P (j1 , . . . , jH ) = (j1 , . . . , jH ) eiϑ(j1 ,...,jH ) , where (j1 , . . . , jH ) and ϑ(j1 , . . . , jH ) are the modulus and the phase of

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

403

P (j1 , . . . , jH ), respectively. By recalling Eq. (6), and observing that pˆ has values in [0, 1], we get nH p(k ˆ 1 , . . . , kH ) F(p) ˆ 1 (j1 , . . . , jH ) −i((2/n)(k1 j1 +···+kH jH )−ϑ(j1 ,...,jH )) = e F(p) ˆ 1 0 j1 ,...,jH
1 2

+ 21 p is -approximable by the event

In the following example, we use this result to show that, from a descriptional point of view, quantum automata are more powerful than nondeterministic automata on accepting certain languages.

404

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

Example 1. Given an alphabet = { 1 , . . . , H }, deﬁne the language Ln,H = {w ∈ ∗ | # 1 (w)n = · · · = # H (w)n = 0}. If Ln,H is recognized by a 1-way nondeterministic ﬁnite automaton A, then A has at least nH states. In fact, suppose that any non-ﬁnal state of A has an outgoing path leading to a ﬁnal state. If the number of states of A was less than nH , then, by a simple counting argument, there would exist two distinct words x = k11 · · · kHH and y = s11 · · · sHH with 0 k1 , . . . , kH , s1 , . . . , sH < n which, given as input, would take A to the same state q. Let j j now z = 11 · · · HH be a word taking A from q to a ﬁnal state. We get that both xz and yz belong to Ln,H , that is, ki + ji n = si + ji n = 0, for 1 i H . This implies that ki = si for 1 i H , against the hypothesis x = y. On the contrary, there exists a 1qfa accepting Ln,H with isolated cut-point, which is exponentially more succinct both in the period n and in the cardinality H of the input alphabet. In fact, the language Ln,H can be deﬁned by the n-periodic commutative event p whose associated function is 1 if k1 = · · · = kH = 0, p(k ˆ 1 , . . . , kH ) = 0 otherwise. Now let P = F(p). ˆ For every 0 j1 , . . . , jH < n, we have P (j1 , . . . , jH ) = 1 and hence P 1 = nH . By applying Corollary 1, we have that the event 21 + 21 p is 18 -approximable by a 1qfa with O(H log n) states, thus accepting Ln,H with isolated cut-point.

6. The unary case In this section, we focus on the particular case of unary alphabets, e.g., = {a}. Languages deﬁned by (periodic) unary events are called (periodic) unary languages; periodic unary languages are exactly the group unary languages. In this section we point out a relation between the minimum size of a 1qfa recognizing a unary periodic language L and the 1 -norm of the discrete Fourier transform of the characteristic function L . The ﬁrst result is a direct consequence of Corollary 1: Theorem 5. Let p : {a}∗ → [0, 1] be an n-periodic event with F(p) ˆ 1 n, and let L be a unary language deﬁned by p with cut-point isolated by 4. Then L can be recognized by a 1qfa with cut-point 21 + 21 isolated by and O((1/2 ) log n) states. As an application, we exhibit a class of unary languages recognizable by 1qfa’s with less states than the equivalent probabilistic automata [16,18]. Example 2. Consider the language Ln = {a kn | k ∈ N}; let Q(n) (S(n)) be the minimum number of states for 1qfa’s (probabilistic automata) accepting Ln with isolated cut-point. By Example 1, we have that Ln = Ln,1 is recognized with isolated cut-point by a 1qfa with O(log n) states, yielding Q(n) = O(log n). If n is prime, the same upper bound is obtained in [2] by different techniques.

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

405

By recalling a result in [14], we have that if n = kj =1 pj j is the prime factorization of k n, then S(n) = ( j =1 pj j ). By a direct computation, it can be shown that the global min imum of the function f (x1 , . . . , xk ) = kj =1 exj , with constraints kj =1 xj = log n and xj 0, is ke(1/k) log n . This implies S(n) = (ke(1/k) log n ). Observe that n = kj =1 pj j implies n k!, whence k log n/log log n(1 + o(1)). Since f (k) = ke(1/k) log n is monotone decreasing in the interval [1, log n], we get S(n) = (log2 n/log log n). In conclusion, having Q(n) = O(log n), we obtain S(n)/Q(n) = (log n/log log n). Furthermore, if n factorizes in a constant number of prime factors, then S(n) is “exponentially greater” than Q(n). We have previously stated that if the 1 -norm of the discrete Fourier transform of an n-periodic event p is bounded by n, then 21 + 21 p is approximable by small size 1qfa’s. Now, we study the converse problem. We bound the 1 -norm of the discrete Fourier transform of periodic events induced by 1qfa’s in terms of the number of states. Theorem 6. Let p : {a}∗ → [0, 1] be an n-periodic event induced by an s-state 1qfa. Then F(p) ˆ 1 ns. Proof. Let A = (, U (a), F ) the 1qfa inducing the event p, i.e., p(a k ) = j ∈F |(U k 2 † (a) )j | . Since U (a) is unitary, then it can be decomposed as U (a) = U U , where U is a elements are the eigenvalues eiϑl of U (a), unitary matrix and is a diagonal matrix whose k for 1 l s. Thus, we can write p(a ) = j ∈F |(U diag(eik ϑ1 , . . . , eik ϑs )U † )j |2 . From [5, Lemma 3], we know that ei(ϑl −ϑr ) = e−i(2/n)zlr , where zlr ∈ Zn . By setting ˜ = U , with a direct computation we get p(a k ) =

1 l,r s

(˜ l ˜ ∗r

j ∈F

Ulj Urj∗ )e−i(2/n)kzlr .

(7)

Calling (P (0), . . . , P (n − 1)) = F(p), ˆ we have (see Eq. (6)) p(a k ) =

n−1 t=0

P (t) −i(2/n)kt . e n

(8)

By comparing Eqs. (7) and (8), we get P (t) ˜ l ˜ ∗r = n {zlr | zlr =t}

j ∈F

Ulj Urj∗

.

Hence n−1 F(p) ˆ 1 ∗ ∗ = ˜ l ˜ r Ulj Urj |˜ l ||˜ r | |Ulj ||Urj |. l,r n t=0 {zlr | zlr =t} j ∈F j ∈F

406

Since

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

j ∈F

|Ulj ||Urj |

F(p) ˆ 1 n

s l=1

By Schwartz inequality

s

j =1 |Ulj ||Urj |

j

|Ulj |2

j

|Urj |2 = 1 we have

2 |˜ l |

.

√ s √ | ˜ | · 1 s ˜ l |2 = s. Thus F(p) ˆ 1 ns. l=1 l l=1 |

s

Finally, we relate the size of 1qfa’s accepting periodic languages with the 1 -norm of the discrete Fourier transform of the corresponding characteristic functions. Theorem 7. Let L be an n-periodic language and L its characteristic function. ˆ ) 1 n, then L is recognizable by an O(log n)(1) If L is 18 -approximable by with F( 1 state 1qfa with cut-point isolated by 8 . (2) If L is recognizable by an O(log n)-state 1qfa with cut-point isolated by 18 , then L is 3 ˆ 7 -approximable by with F( ) 1 = O(n log n). Proof. ˆ ) 1 n, by Corollary 1 there exists an O(log n)-state 1qfa inducing a 1 (1) Since F( 16 approximation of (1+ )/2. Moreover, for any k 0, | (a k )−L (a k )| 18 . Therefore, 1 + L (a k ) 1 + (a k ) (a k ) − (a k ) 1 k k . − (a ) − (a ) + 8 2 2 2 So, L is accepted by the 1qfa inducing with cut-point 43 isolated by 18 . (2) If L is recognized by an O(log n)-state 1qfa with cut-point isolated by 18 , then it is 1 easy to ﬁnd an O(log n)-state 1qfa A recognizing L with cut-point 21 isolated by 14 . 3 The event induced by A is a 7 -approximation of L . By applying Theorem 6, we get ˆ ) 1 = O(n log n). F( Acknowledgements The authors wish to thank an anonymous referee for helpful comments. References [1] N. Alon, S. Ben-David, N. Cesa-Bianchi, D. Haussler, Scale-sensitive dimensions, Uniform Convergence, and Learnability, J. ACM 44 (1997) 615–631. [2] A. Ambainis, R. Freivalds, 1-way quantum ﬁnite automata: strengths, weaknesses and generalizations, in: Proc. 39th Symp. on Foundations of Computer Science, 1998, pp. 332–342. [3] A. Bertoni, M. Carpentieri, Regular languages accepted by quantum automata, Inform. Comput. 165 (2001) 174–182. [4] A. Bertoni, C. Mereghetti, B. Palano, Quantum computing: 1-way quantum automata, in: Proc. Seventh Conf. on Developments in Language Theory, Lecture Notes in Computer Science, Vol. 2710, Springer, Berlin, 2003, pp. 1–20.

A. Bertoni et al. / Theoretical Computer Science 340 (2005) 394 – 407

407

[5] A. Bertoni, C. Mereghetti, B. Palano, Golomb rulers and difference sets for succinct quantum automata, Internat. J. Foundations Comput. Sci. 14 (2003) 871–888. [6] A. Brodsky, N. Pippenger, Characterizations of 1-way quantum ﬁnite automata, SIAM J. Comput. 31 (2001) 1456–1478. [7] J. Gruska, Quantum Computing, McGraw-Hill, New York, 1999. [8] J. Gruska, Descriptional complexity issues in quantum computing, J. Automata, Languages Combinatorics 5 (2000) 191–218. [9] W. Höffdings, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963) 13–30. [10] C. Moore, J. Crutchﬁeld, Quantum automata and quantum grammars, Theoretical Computer Science 237 (2000) 275–306. A preliminary version of this work appears as Technical Report in 1997. [11] M. Marcus, H. Minc, Introduction to Linear Algebra, Macmillan, NewYork, 1965 (reprinted by Dover, 1988). [12] M. Marcus, H. Minc, A Survey of Matrix Theory and Matrix Inequalities, Prindle, Weber & Schmidt, Boston, MA, 1964 (reprinted by Dover, 1992). [13] C. Mereghetti, B. Palano, On the size of one-way quantum ﬁnite automata with periodic behaviors, Theoret. Inform. Appl. 36 (2002) 277–291. [14] C. Mereghetti, B. Palano, G. Pighizzini, Note on the succinctness of deterministic, nondeterministic, probabilistic and quantum ﬁnite automata, Theoret. Inform. Appl. 35 (2001) 477–490. [15] A.M. Mood, F.A. Graybill, D.C. Boes, Introduction to the Theory of Statistics, McGraw-Hill, New York, 1983. [16] A. Paz, Introduction to Probabilistic Automata, Academic Press, New York, 1971. [17] J.E. Pin, On languages accepted by ﬁnite reversible automata, in: Proc. 14th Internat. Colloq. on Automata, Languages and Programming, Lecture Notes in Computer Science, Vol. 267, Springer, Berlin, 1987, pp. 237–249. [18] M. Rabin, Probabilistic automata, Inform. Control 6 (1963) 230–245. [19] V.N. Vapnik, Inductive principles of the search for empirical dependencies, in: Proc. Second Workshop on Computer in Learning Theory, Morgan Kaufmann, Los Altos, CA, 1989, pp. 1–21. [20] V.N. Vapnik, A.Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl. 16 (1971) 264–280.

Theoretical Computer Science 340 (2005) 408 – 431 www.elsevier.com/locate/tcs

New operations and regular expressions for two-dimensional languages over one-letter alphabet夡 Marcella Anselmoa,∗ , Dora Giammarresib , Maria Madoniac a Dipartimento Informatica ed Applicazioni, Università di Salerno, 84081 Baronissi (Sa), Italy b Dipartimento Matematica, Università di Roma “Tor Vergata”, via Ricerca Scientiﬁca, 00133 Roma, Italy c Dipartimento Matematica e Informatica, Università di Catania, Viale Andrea Doria 6/a, 95125 Catania, Italy

Abstract We consider the problem of deﬁning regular expressions to characterize the class of recognizable picture languages in the case of a one-letter alphabet. We deﬁne a diagonal concatenation and its star and consider two different families, L(D) and L(CRD), of languages denoted by regular expressions involving such operations plus classical operations. L(D) is characterized both in terms of rational relations and in terms of two-dimensional automata moving only right and down. L(CRD) is included in REC and contains languages deﬁned by three-way automata while languages in L(CRD) necessarily satisfy some regularity conditions. Finally, we introduce new deﬁnitions of advanced stars expressing the necessity of conceptually different deﬁnitions for iteration. © 2005 Elsevier B.V. All rights reserved. Keywords: Two-dimensional language; Regular expression

1. Introduction The theory of one-dimensional string languages is well founded and investigated since 1950. From several years, the increasing interest for pattern recognition and image 夡 Work

partially supported by Miur Coﬁn Linguaggi Formali e Automi: Metodi, Modelli e Applicazioni.

∗ Corresponding author.

E-mail addresses: [email protected] (M. Anselmo), [email protected] (D. Giammarresi), [email protected] (M. Madonia). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.031

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

409

processing has also motivated the research on two-dimensional string languages, and nowadays this is a ﬁeld of big investigation. The scope is the generalization, or possibly the extension, of the richness of the theory of one-dimensional languages to two dimensions. A ﬁrst attempt has been devoted to the study of two-dimensional languages deﬁned by ﬁnite state devices, with the aim of ﬁnding a counterpart of what regular languages are in one dimension. Many approaches have been presented in the literature considering all ways to deﬁne regular languages: ﬁnite automata, grammars, logic and regular expressions. In 1991, an unifying point of view was presented by A. Restivo and D. Giammarresi who deﬁned the family REC of recognizable picture languages (see [7]). This class seems to be the candidate as “the” generalization of the class of regular one-dimensional languages. Indeed REC is well characterized from very different points of view and thus inherits several properties from the class of regular string (one-dimensional) languages. It is characterized in terms of projections of local languages (tiling systems), of some ﬁnite-state automata, of logic formulas and of regular expressions with alphabetic mapping. The approach by regular expressions is indeed not completely satisfactory: the concatenation and star operations involved there are partial functions and moreover an external operation of alphabetic mapping is needed. Then, in [7], the problem of stating a Kleene-like theorem for the theory of recognizable picture languages remains open. Several papers were recently devoted to ﬁnd a better formulation for regular expressions for two-dimensional languages. In [15], O. Matz affords the problem of ﬁnding some more powerful expressions to represent recognizable picture languages and suggests some regular expressions where the iteration is over combinations of operators, rather than over languages. The author shows that the power of these expressions does not exceed the family REC, but it remains open whether or not it exhausts it. In [18] some tiling operation is introduced as extension of the Kleene star to pictures and a characterization of REC that involves some morphism and the intersection is given. The paper [19] compares star-free picture expressions with ﬁrst-order logic. The aim of this paper is to look for a homogenous notion of regular expression that could extend more naturally the concept of regular expression of one-dimensional languages. In this framework, we propose some new operations on pictures and picture languages and study the families of languages that can be generated using classical and new operations. The paper focuses on one-letter alphabets. This is a particular case of the more general case of several letter alphabets. However this is not only a simpler case to handle, but it is a necessary and meaningful case to start. Indeed studying two-dimensional languages on one-letter alphabets means to study the “shapes” of pictures: if a picture language is in REC then necessarily the language of its shapes is in REC. Such approach allows us to separate the twofold nature of a picture: its shape and its content. Classical concatenation operations on pictures and picture languages are the row and column concatenations and their closure. Regular expressions that use only Boolean operations and this kind of concatenations and closure however cannot deﬁne a large number of two-dimensional languages in REC. As an example, take the simple language of “squares” (that is pictures with number of rows equal to the number of columns). The major problem with this kind of regular expressions is that they cannot describe any relationship existing between the two dimensions of the pictures. Such operations are useful

410

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

to express some regularity either on the number of rows or on the number of columns but not between them. This is the reason we introduce, in the one-letter case, a new concatenation operation between pictures: the diagonal concatenation. The diagonal concatenation introduces the possibility of constructing new pictures forcing some dependence between their dimensions. Moreover an important aspect of the diagonal concatenation is that it is a total function between pictures. This allows to ﬁnd a quite clean double characterization of D-regular languages, the picture languages denoted by regular expressions containing union, diagonal concatenation and its closure: they are exactly those picture languages in which the dimensions are related by a rational relation and also exactly those picture languages recognizable by particular two-dimensional automata moving only right and down. Unfortunately, an analogous situation does not hold anymore, when we also introduce row and column concatenations in regular expressions, essentially because they are partial functions. The class of CRD-regular languages, the languages denoted by regular expressions with union, column, row and diagonal concatenations and their closures, strictly lies between the class of languages recognized by three-way deterministic automata and REC. The main result for CRD-regular languages is a necessary condition regarding a sort of “regularity” in the possible “extensions” of a picture in a given language to another bigger picture in the language. In a CRD-regular language: if a picture is sufﬁciently “long” then we can concatenate to it some picture inﬁnitely often by columns; if a picture is sufﬁciently “high” then we can concatenate to it some picture inﬁnitely often by rows; if a picture is sufﬁciently “big” then we can concatenate to it some picture inﬁnitely often in diagonal. This result generalizes in some sense what regularity implies in one-dimensional languages over a one-letter alphabet. We also provide a collection of examples classically considered in the literature, specifying for each of them whether they belong or not to the classes of two-dimensional languages considered throughout the paper. Examining some examples of languages not captured by CRD formalism, we ﬁnd out that the “extensions’’ of a picture cannot be obtained by iterating the concatenation of the same picture, and this independently from the picture to what we concatenate. On the contrary, for some languages, a kind of iteration that generate pictures in a “non-uniform” way is needed, indeed depending from the picture just obtained. This is a new situation with respect to the one-dimensional case. Such considerations show the necessity of a more complex deﬁnition for regular expressions in order to denote a wider class of two-dimensional languages in REC. We introduce the deﬁnitions of some advanced stars. They allow to capture a wider class of languages, that still remains inside the class REC. All deﬁnitions are given in such a way to synchronize the steps of iteration on a picture with the picture just constructed. Observe that in this case we exploit the partial nature of column and row concatenation operations. We conclude by discussing some ideas for extending all those deﬁnitions to the general alphabet case. The paper is organized as follows. In Section 2 we recall some preliminary deﬁnitions and results later used in the paper. Section 3 contains the main results: it presents our proposals for possible classes of regular expressions. Moreover, it contains a table summarizing a wide collection of examples. In Section 4 we deﬁne new star operations that allow to describe many more languages (over one-letter alphabet) in REC, while in Section 5 we draw some

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

411

conclusions and proposals to extend results of this paper to two-dimensional languages over general alphabets. A preliminary version of this paper appeared in [1].

2. Preliminaries In this section we recall terminology for two-dimensional languages. Then, we brieﬂy describe some machine-based model for recognizing two-dimensional languages and summarize all major results concerning the class REC of recognizable two-dimensional languages, that is the one that seems to generalize better the family of regular string languages to two dimensions. We assume the reader is familiar with the basic terminology and properties of the theory of one-dimensional languages as can be found for example in [8]. We will ﬁrst introduce some deﬁnitions about two-dimensional languages by borrowing and extending notation from the theory of one-dimensional languages. Next, we will give formal deﬁnitions of the classical concatenation operations between two-dimensional strings (pictures) and two-dimensional languages. The notations used can be mainly found in [7]. Let be a ﬁnite alphabet. A two-dimensional string (or a picture) over is a twodimensional rectangular array of elements of . The set of all two-dimensional strings over is denoted by ∗∗ . A two-dimensional language over is a subset of ∗∗ . Given a picture p ∈ ∗∗ , let 1 (p) denote the number of rows of p and 2 (p) denote the number of columns of p. The pair (1 (p), 2 (p)) is called the size of the picture p. Unlike the one-dimensional case, we can deﬁne an inﬁnite number of empty pictures namely all the pictures of size (n, 0) and of size (0, m), for all m, n 0, that we call empty columns and empty rows, and denote by 0,m and n,0 respectively. The empty picture is the only picture of size (0, 0) and it will be denoted by 0,0 . We indicate by col and row the language of all empty columns and of all empty rows, respectively. We give ﬁrst some simple examples of two-dimensional languages. Example 1. Let = {a} be a one-letter alphabet. The set of pictures of a’s with three columns is a two-dimensional language over . It can be formally described as L = {p | 2 (p) = 3} ⊆ ∗∗ . As another example let L be the subset of ∗∗ that contains all the pictures with a shape of “squares”. More formally, L = {p | 1 (p) = 2 (p)} ⊆ ∗∗ . We now recall the classical concatenation operations between pictures and picture languages. Let p and q be two pictures over an alphabet , of size (n, m) and (m , n ), m, n, m , n 0, respectively. Deﬁnition 2. The column concatenation of p and q (denoted by p ❡q) and the row concatenation of p and q (denoted by p ❡q) are partial operations, deﬁned only if n = n and if m = m , respectively and are given by p ❡q = p

q ,

p p ❡q = . q

412

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

Moreover we set p ❡n,0 = p and p ❡0,m = p that is, the empty columns and the empty rows are the neutral elements for the column and the row concatenation operations, respectively. As in the string language theory, these deﬁnitions of pictures concatenation can be extended to deﬁne concatenations between set of pictures. Let L1 , L2 ⊆ ∗∗ , the column concatenation and the row concatenation of L1 and L2 are deﬁned respectively by L1 ❡L2 = {p ❡q| p ∈ L1 , q ∈ L2 }

and L1 ❡L2 = {p ❡q| p ∈ L1 , q ∈ L2 }.

By iterating the concatenation operations, we can deﬁne the columns and rows transitive closures, which are somehow “two-dimensional Kleene star”. Let L be a picture language. Deﬁnition 3. The column closure (column star) and the row closure (row star) of L are deﬁned as ❡ i ❡ ❡ i ❡ L∗ = L , L∗ = L , i 0

i 0

❡ ❡ ❡ ❡ ❡ ❡ ❡ where L0 = col , L1 = L, Ln = L ❡L(n−1) and L0 = row , L1 = L, Ln = ❡ L ❡L(n−1) . 2.1. Automata for two-dimensional languages In this section we brieﬂy review different kinds of automata that read two-dimensional tapes. All models reduce to conventional automata when restricted to operate on one-row pictures. One of the ﬁrst attempts at formalizing the concept of “recognizable picture language” was made by M. Blum and C. Hewitt who in 1967 introduced a model of ﬁnite automaton that reads a two-dimensional tape (cf. [3]). A deterministic (non-deterministic) four-way automaton, denoted by 4DFA (4NFA), is deﬁned as an extension of the two-way automaton that recognizes strings (cf. [8]) by allowing it to move in four directions: Left, Right, Up, Down. For example, a 4DFA can recognize squares by starting its computation from top-left corner of a given picture and going alternatively one step right one step down (i.e. following the diagonal) till it reaches the bottom-right corner. The families of picture languages recognized by some 4DFA and 4NFA are denoted by L(4DFA) and L(4NFA), respectively. An important result (cf. [3]) states that, unlike in the one-dimensional case, the family L(4DFA) is strictly included in the family L(4NFA). Both families L(4DFA) and L(4NFA) are closed under Boolean union and intersection operations. The family L(4DFA) is also closed under complement, while for L(4NFA) this is not known. From several points of view, four-way automata could appear as a reasonable model of computation for two-dimensional tapes and they were widely studied, but they have a major bug. In fact, it can be proved that both L(4DFA) and L(4NFA) are not closed under row and column concatenation and closure operations [12]. In [14], a weaker model called three-way automaton is also considered in the two versions non-deterministic and deterministic (referred to as 3NFA and 3DFA, respectively) that is

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

413

allowed to move right, left and down only. The family L(3NFA) is strictly included in L(4NFA). Another interesting model of two-dimensional automaton is the two-dimensional on-line tessellation acceptor (denoted by 2-OTA) introduced in [9]. In a sense the 2-OTA is an inﬁnite array of identical ﬁnite-state automata in a two-dimensional space. The computation goes by diagonals starting from top-left towards bottom-right corner of the picture. Depending on the corresponding kinds of automata we can have a deterministic or a nondeterministic version of 2-OTA. Despite the fact that this model is quite different in principle from four-way automaton, also in this case the family of languages corresponding to a determinist 2-OTA is strictly included in the one corresponding to the non-deterministic model. In [9] it is proved that the family of two-dimensional languages recognized by a 2-OTA, L(2-OTA) is closed under union and intersection and also under row/column concatenation and row/column star while it is not closed under complement. Moreover L(2-OTA) properly includes family L(4NFA). The only trouble with this 2-OTA model is that it is quite difﬁcult to manage. 2.2. Tiling systems and the class REC A different way to deﬁne (recognize) picture languages was introduced by A. Restivo and D. Giammarresi in [6]. It generalizes the characterization of regular languages by means of local strings language and alphabetic mapping to two dimensions (the local set together with the mapping is an alternative description of the state graph of an automaton). We recall that a local language of strings is deﬁned by means of a ﬁnite set of strings of length 2. As natural generalization, a local picture language L over an alphabet is deﬁned by means of a ﬁnite set of pictures of size (2, 2) (called tiles) that represent all allowed sub-pictures for the pictures in L. To be more precise, such set is deﬁned over ∪ {#} where # is a border symbol that we assume always to surround all the pictures. A tiling system for a language L over is a pair of a local language over an alphabet and an alphabetic mapping : → . The mapping can be extended in the obvious way from the alphabet to pictures over and to picture languages over . Then, we say that a language L ⊆ ∗∗ is recognizable by tiling systems if there exists a local language L over and a mapping : → such that L = (L ). The family of two-dimensional languages recognizable by tiling systems is denoted by REC. As an example, consider again the language L of squares over a one-letter alphabet = {a}. Then L is in REC since it can be obtained as (L ), where L is the language of squares over = {0, 1} that have 1 in the diagonal positions and 0 elsewhere and (0) = (1) = a. The family REC is closed under Boolean union and intersection but not under complement. It is also closed under all row and column concatenations and stars. Moreover, by deﬁnition, it is closed under alphabetic mappings. This notion of recognizability by tiling systems turns out to be very robust: in [11], it is proved that REC = L(2-OTA). Moreover ﬁnite tiling systems have also a natural logic meaning: in [7] it is shown that the family REC and the family of languages deﬁned by existential monadic second order formulas coincide. And this is actually the generalization of Büchi’s theorem for strings to two-dimensional languages. The class REC can also be characterized in terms of regular expressions, as speciﬁed in Section 2.3.

414

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

2.3. Regular expressions and the class REC The characterizations of the family REC show that the family REC captures in some sense the idea of uniﬁcation of the concept of recognizability from the two different points of view of descriptive and computational models, that is one of the main properties of the class of recognizable string languages. It seems thus natural to ask whether one can prove also a sort of two-dimensional Kleene’s Theorem. Using row and column concatenations and closure operations, it is possible to express two-dimensional languages by means of simpler languages. Nevertheless it can be observed that such classical operations are useful to express some regularity either on the number of rows or on the number of columns, but they cannot describe any relationship existing between the two dimensions of the pictures. As an example, already in the case of a one-letter alphabet, we have that languages such as the language of “squares’’ (see Example 1) cannot be described using only classical operations. More precisely, O. Matz [15] has characterized the class of languages that can be obtained starting from ﬁnite languages and applying Boolean operations, column and row concatenations and stars, as the class of languages that are a ﬁnite union of Cartesian products of ultimately periodic string languages. Furthermore, it can be shown (cf. [7]) that to describe the whole class REC we need to allow also the alphabet mapping between the regular operations. This characterization of REC in terms of regular expressions seems not completely satisfactory, because it is not purely constructive and involves some external operations. Therefore the problem of proving a sort of two-dimensional Kleene’s Theorem, is still under investigation. Furthermore such considerations are a clear sign that, going from one to two dimensions we ﬁnd a very rich family of languages that need a non-straightforward generalization of the one-dimensional deﬁnitions and techniques. In the next section we are going to deﬁne a new operation on picture languages and consider the class of languages that can be thereby denoted.

3. The diagonal concatenation and related regular expressions In this section we introduce a new operation on picture languages over a one-letter alphabet. We propose some different types of regular expressions involving the new operation, comparing the resulting classes of languages obtained with known families of picture languages. Through all the section, we assume to be in the case of languages over one-letter alphabet = {a}. Remark 4. When a one-letter alphabet is considered, any picture p ∈ ∗∗ is characterized only by its size. Therefore it can be equivalently represented either by a pair of words in ∗ , where the ﬁrst one is equal to the ﬁrst column of p and the second one to the ﬁrst row of p, i.e. (a 1 (p) , a 2 (p) ), or simpler by its size, i.e. (1 (p), 2 (p)). Remark 5. The one-letter alphabet case means to consider the “shapes” of pictures. Indeed if L ⊆ ∗∗ , with || 2, is in REC then the language obtained by mapping into a one-letter alphabet {a}, is still in REC, since REC is closed under alphabetic mappings.

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

415

Therefore for a language in REC, it is a necessary condition that the language of its shapes is in REC. Let us denote by CR = {∪, ❡, ❡, ∗ ❡, ∗ ❡} the set of classical operations on picture languages (C for “columns” and R for “rows”), and by L(CR) the class of languages (over a one-letter alphabet) that can be denoted by a regular expression involving only operations in CR and starting from ﬁnite languages. O. Matz [15] has characterized L(CR) as the class of languages that are a ﬁnite union of Cartesian products of ultimately periodic string languages and he has shown that L(CR) is closed under intersection. 3.1. D-regular expressions We introduce a new simple deﬁnition of concatenation of two pictures in the particular case of one-letter alphabet. The deﬁnition is motivated by the necessity of an operation between pictures that could express some relationship existing between the dimensions of the pictures. We use this new concatenation to construct some regular expressions and to deﬁne a class of languages. This class is characterized in terms of the relations between the dimensions of the pictures and in terms of the four-way automata recognizing them. Let p and q be two pictures of size (n, m) and (n , m ) respectively over a one-letter alphabet. Deﬁnition 6. The diagonal concatenation of p and q (denoted by p \❡q) is a picture over of size (n + n , m + m ). It can be represented by p p \❡q =

q

.

Observe that, unlike the classical row and column concatenation, the diagonal concatenation is a total operation. As usual, it can be extended to deﬁne the diagonal concatenation between languages. Moreover the Kleene closure of \❡ can be deﬁned as follows. Let L be a picture language over a one-letter alphabet. \❡ Deﬁnition 7. The diagonal closure or diagonal star of L (denoted by L∗ ) is deﬁned as i \❡ \❡ L , L∗ =

i 0

where L

0 \❡

= {0,0 }, L1

❡

\

= L, Ln

❡

\

\❡ = L \❡L(n−1) .

Example 8. Let Ln,n be the language of squares (see Example 1) that is Ln,n = {p | 1 (p) = \❡ \❡ 2 (p) 0}. It can be easily shown that Ln,n = {(1, 1)}∗ = {0,1 \❡1,0 }∗ , observing that 0,1 \❡1,0 is the picture (1, 1). Example 9. Let L2n,2m be the language of rectangular pictures with even dimensions, that is ❡ ❡ L2n,2m = {p | l1 (p) = 2n, l2 (p) = 2m, n, m 0}. We have that L2n,2m = {{(2, 2)}∗ }∗ , ❡ ❡ \ \ and also L2n,2m = {0,2 }∗ \❡{2,0 }∗ , using the diagonal concatenation.

416

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

Proposition 10. The family REC is closed under diagonal concatenation and diagonal star. Proof. The proof uses similar techniques to the one for the closure of REC under row (or column) concatenation and star (see [6] for more details). A tiling system for L = L1 \❡L2 can be deﬁned as follows. Let the local languages for L1 and L2 be given by a set of tiles 1 over an alphabet 1 and a set of tiles 2 over an alphabet 2 , respectively. Moreover we can always assume that 1 and 2 are disjoint. Then, the local language for L is deﬁned over the alphabet 1 ∪ 2 ∪ {x}, where x is a new symbol not in 1 ∪ 2 . The set of tiles p s , where p and q belongs s q to the local languages of L1 and L2 , respectively, and s, s are pictures containing all x. For example, the non-border tiles of consist of all non-border tiles in 1 and 2 plus the tile containing all x plus tiles obtained by replacing by x all border symbols in all right-border and bottom-border tiles in 1 and all left-border and top-border tiles in 2 , plus tiles like

is deﬁned, using 1 and 2 in a way to represent pictures

a x , where a and b are symbols in bottom-right corner tiles of and top-left corner tiles 1 x b of 2 , respectively. Observe that the last mentioned tiles are those that “glue’’ bottom-right corners of pictures in L1 to top-left corners of pictures in L2 . Finally, the projection from to maps all symbols to the unique symbol in . \❡ Regarding the closure under diagonal star, the tiling system for L ∗ can be deﬁned as above using two different local languages (i.e. over disjoint local alphabets) for L. The diagonal concatenation can be used to generate families of picture languages, starting from atomic languages. Formally, let us denote D = {∪, \❡, ∗ \❡}; the elements of D are called diagonal-regular operations, brieﬂy D-regular operations. Deﬁnition 11. A diagonal-regular expression (D-RE) is deﬁned recursively as follows: (1) ∅, (0,0 ), (0,1 ), (1,0 ) are D-RE. \❡ (2) if , are D-RE then () ∪ (), () \❡(), ()∗ are D-RE. Every D-RE denotes a language using the standard notation. Languages denoted by D-RE are called diagonal-regular languages, brieﬂy D-regular languages. The class of D-regular languages is denoted by L(D). Observe that languages containing a single picture (n, m) \❡ \❡ can be denoted by the D-RE En,m = (n1,0 ) \❡(m 0,1 ). We will now characterize D-regular languages in terms of rational relations and in terms of some 4FA. For this, let us recall that (see [2]) a rational relation over alphabets and is a rational subset of the monoid (∗ × ∗ , ., (, )), where the operation . is the componentwise product deﬁned by (u1 , v1 ) (u2 , v2 ) = (u1 u2 , v1 v2 ) for any (u1 , v1 ), (u2 , v2 ) ∈ ∗ × ∗ . When the alphabet is = = {a}, there is a natural correspondence between pictures over and relations over × . For any relation T ⊆ ∗ × ∗ we deﬁne the picture language: L(T ) = {p| 1 (p) = |r1 | and 2 (p) = |r2 | for some (r1 , r2 ) ∈ T }.

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

417

Vice versa for any picture language L ⊆ ∗∗ we deﬁne the relation: R(L) = {(r1 , r2 ) ∈ ∗ × ∗ | |r1 | = 1 (p) and |r2 | = 2 (p) for some p ∈ L}. Remark 12. We recall that a 4NFA M over a one-letter alphabet is equivalent to a two-way two-tape automaton M1 (cf. [10]). In fact, let H1 and H2 be the ﬁrst and the second heads of M1 respectively, then M1 simulates M as follows. If the input head H of M moves down (up) one square, M1 moves H1 right (left) one square without moving H2 , and if H moves right (left) one square, M1 moves H2 right (left) without moving H1 . Proposition 13. Let be a one-letter alphabet and let L ⊆ ∗∗ . Then L is a D-regular language if and only if L = L(T ) for some rational relation T ⊆ ∗ × ∗ if and only if L = L(A) for some 4NFA A that moves only right and down. Proof. In light of Remark 4, the componentwise concatenation in M = ∗ × ∗ exactly corresponds to the diagonal concatenation in ∗∗ . It is well-known that a rational subset of any monoid M is either empty or can be expressed, starting with singleton, by a ﬁnite number of the (rational) operations ∪, . (product) and .-closure (star). Thus L is a D-regular language if and only if L = L(T ) for some rational relation T ⊆ ∗ × ∗ . On the other hand, it is well-known that T ⊆ ∗ × ∗ is a rational relation iff it is accepted by a (ﬁnite) transducer, that is a (ﬁnite) automaton over ∗ × ∗ . Further such an automaton can be viewed as a (ﬁnite) one-way automaton with two tapes (cf. [17]). Then, in analogy to Remark 12, one-way two-tape automata are equivalent to 4NFA that move only right and down. Example 14. Let Ln,n be the language of squares, as in Example 8. We have Ln,n ∈ L(D). Indeed it can be easily shown that Ln,n is denoted by the following D-RE: En,n = \❡ (0,1 \❡1,0 )∗ . We have Ln,n =L(T ), where T is the rational relation T ={(a n , a n ) | n 0}. Further L = L(A) where A is the 4NFA that, starting in the top-left corner, moves along the main diagonal until it eventually reaches the bottom-right corner and accepts. More generally, the languages Ln,n+i = {p | l1 (p) = n, l2 (p) = n + i, n 1}, for some i 0, ❡ \❡ are denoted by the D-RE: En,n+i = En,n ❡((E1,i )∗ ), where E1,i = (i0,1 \❡1,0 ) denotes the language {(1, i)}. Example 15. Let L2n,2m be the language of even sides pictures, as in Example 9, that is L2n,2m = {p ∈ ∗∗ | l1 (p) = 2n, l2 (p) = 2m, n, m 0}. We have that L2n,2m = L(T ), where T is the rational relation T = {(a 2n , a 2m ) | n, m 0}. Further L = L(A), where A is the 4NFA that, starting in top-left corner, moves down checking the parity of the number of rows and then to the right checking the parity of the number of columns, eventually accepting in the bottom-right corner. Therefore L ∈ L(D). Indeed L ∈ L(CR) because of the characterization of L(CR) given in [15]. Corollary 16. L(CR) ⊂ L(D). Proof. Following [15], L(CR) is the class of languages that are a ﬁnite union of Cartesian products of ultimately periodic string languages. Let L ∈ L(CR) and suppose

418

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

L = i=1,...,k Ai × Bi . Then L can be recognized by a 4NFA (that moves only right and down) that non-deterministically checks whether a picture belongs to some Ai × Bi checking ﬁrst the belonging of the ﬁrst row to Ai and then the belonging of the last column to Bi . Hence L ∈ L(D) by Proposition 13. Moreover the inclusion is strict since for example the language of squares is in L(D) (see Example 14) and it is not in L(CR) since it is not a ﬁnite union of Cartesian products of ultimately periodic string languages. In the same way that L(D) corresponds to the class of rational relations, L(CR) corresponds to its subclass of recognizable relations. Corollary 17. L(D) is closed under intersection and complement. Proof. The result follows from the characterization of L(D) in terms of rational relations in Proposition 13, and from the closure under intersection and complement of rational relations over a one-letter alphabet [2]. The following example shows that, also in the case of a one-letter alphabet, four-way automata that move only right and down are strictly less powerful than 3DFA. Example 18. Let L be the following picture language over a one-letter alphabet: L = {(kn, n)| k, n 0}. Language L can be easily recognized by a 3DFA that, starting in the top-left corner moves along the main diagonal until it reaches the right boundary and then moves along the secondary diagonal until it reaches the left boundary and so on until it eventually reaches the bottom-right corner and accepts. By Proposition 13, language L cannot be recognized by a four-way automaton that moves only right and down, since {(a kn , a n )| k, n 0} is not a rational relation (see [4]). 3.2. CRD-regular expressions In this section we consider regular expressions that involve columns, rows and diagonal concatenations and stars deﬁned in previous sections. We refer to them as CRD-regular expression. We show that, in the case of one-letter alphabet, the class L(CRD) of corresponding languages is strictly included in the family REC, and strictly contains L(3DFA). Further we show that there are languages accepted by a four-way automaton that do not belong to L(CRD). The main result is a necessary condition for languages in L(CRD) that expresses a sort of “regularity” on the possible “extensions” of a picture (pictures containing the given one as a subpicture) inside the language. Let us denote CRD = {∪, ❡, ❡, \❡, ∗ ❡, ∗ ❡, ∗ \❡}, where C, R, D stand for “column”, “row” and “diagonal”. The elements of CRD are called CRD-regular operations. Deﬁnition 19. A CRD-regular expression (CRD-RE), is deﬁned recursively as follows: (1) ∅, (0,0 ), (0,1 ), (1,0 ) are CRD-RE. ❡ ❡ \❡ (2) if , are CRD-RE then () ∪ (), () ❡(), ()∗ , () ❡(), ()∗ , () \❡(), ()∗ are CRD-RE.

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

419

Every CRD-RE denotes a language using the standard notation. Languages denoted by CRD-RE are called CRD-regular languages. The family of CRD-regular two-dimensional languages (over one-letter alphabets) will be denoted by L(CRD). Observe that L(CRD) is contained in REC, since REC is closed under operations in CRD. A more precise positioning of L(CRD) inside REC is established in Proposition 30 below. Example 20. Let L = {(n, k1 (n + 1) + k2 (n + 2) + k3 (n + 3)) | n, k1 , k2 , k3 0}. Consider ❡ the languages Ln,n+i denoted by the D-RE: En,n+i = En,n ❡((E1,i )∗ ), as in Example 14. Language L ∈ L(CRD) since it can be denoted by the following CRD-RE: E = ∗ ❡ ❡E ∗ ❡ ❡E ∗ ❡ . En,n+1 n,n+2 n,n+3 Example 21. Let L = {(hn, hkn + n) | n, h, k 0}. Language L belongs to L(CRD). Indeed L = L1 ❡L2 , where L1 = {(n, kn) | n, k 0} and L2 = {(hm, m) | m, h 0}. If En,n is a D-RE for the languages of squares (see Example 14), a CRD-RE for L is ∗ ❡) ❡(E ∗ ❡). E = (En,n n,n We now present some “regularity’’ conditions necessarily satisﬁed by CRD-regular languages. They generalize in some sense what regularity means for one-dimensional languages in what concerns the possible extensions of a picture inside a regular language. Indeed it is well-known that a string language over a one-letter alphabet = {a} is regular if and only if it is ultimately periodic. In particular if L ⊆ {a}∗ is a regular language and a n ∈ L is a sufﬁcient long string then there exists a string a m such that a n (a m )∗ ⊆ L. We show that a generalization of this necessary condition holds for two-dimensional languages in L(CRD): if a picture is sufﬁcient “long” then we can concatenate to it some picture inﬁnitely often by columns; if a picture is sufﬁcient “high’’ then we can concatenate to it some picture inﬁnitely often by rows; if a picture is sufﬁcient “big” then we can concatenate to it some picture inﬁnitely often in diagonal. Let = {a} and L ⊆ ∗∗ . Let us deﬁne for any n, m 0, the following string languages: Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L}. Proposition 22. Let L ⊆ {a}∗∗ and L ∈ REC. Then for any n, m 0, Cn , Rm are regular languages. Proof. For any alphabet , and ﬁxed n, the ﬁxed-height-n word language of L ⊆ ∗∗ is the language L(n) over the alphabet n,1 , of all the strings of columns of height n that compose pictures in L. In [16], it is shown that if L is in REC, then L(n) is regular, for any alphabet , and any integer n. In the special case of an alphabet of a single letter, we can identify any column in {a}n,1 with a and we have that L(n) is regular iff Cn is regular. An analogous reasoning implies the regularity of Rm . The proof of the following proposition is only sketched here; a more complete proof is given in Appendix A. Proposition 23. Let L be a CRD-regular language. Then there exist

, , , : N → N, , : N × N → N increasing functions and n, m ∈ N such

420

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

that for any p = (n, m) ∈ L we have ❡ (1) if m > (n) then p ❡q ∗ ⊆ L for q = (n, (n)) with (n) = 0, ❡ (2) if n > (m) then p ❡q ∗ ⊆ L for q = ((m), m) with (n) = 0, \❡ (3) if n n, m m then p \❡q ∗ ⊆ L for some q = (nq , mq ) with nq , mq = 0, nq (n, m), mq (n, m). Proof (Sketch). First we show how to choose the functions , , , . From Proposition 22, we know that the sets Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L} are regular and therefore ultimately periodic. So we can deﬁne (, resp.) in relation to the steam of the minimal automaton of Cn (Rm , resp.) and (, resp.) related to the period of Cn (Rm , resp.), in such a way to ensure that such functions be increasing functions. Now we sketch how to choose n, m, and for a CRD-regular language L, by induction on the number of operators in a CRD-regular expression r that denotes L. For the basis, if L = ∅ then the proposition is vacuously true. If L = {0,0 }, or L = {0,1 }, or L = {1,0 }, then we can choose n, m, and in such a way the proposition is always vacuously true, having care to deﬁne (n) = 1, (m) = 1, (n, m) = (n, m) = 1 so that q = 0,0 . Suppose now r > 0. There are seven different cases depending on the form of r: r = ❡ ❡ \❡ r1 ∪ r2 , r = r1 ❡r2 , r = r1 ❡r2 , r = r1 \❡r2 , r = r1∗ , r = r1∗ , or r = r1∗ . In any of the seven cases, r1 and r2 denote some language L1 and L2 , respectively, that satisﬁes the conditions. Let i , i , i , i , i , i , ni , mi be the functions and the values for Li , with i = 1, 2. The values n, and m for L are chosen in such a way that a “big” picture (i.e. p = (n, m) with n n and m m ) in L always decomposes in some pictures in L1 and in L2 that are either “big” or “long” (i.e. m > i (n), for i = 1, 2) or “high” (i.e. n > i (m), i = 1, 2). Therefore n may depend on n1 , n2 , but also on the other functions of L1 and L2 . As an example, when L = L1 \❡L2 then any “big” p = p1 \❡p2 , where p1 ∈ L1 , p2 ∈ L2 can be such that p1 and p2 are either both “big”, or one of ﬁxed size and the other one “big” or one “high” and the other one “long”. The functions , ensure a limitation on the size of picture q that can be diagonal concatenated inﬁnitely many times to a big p. Such picture q is constructed from some corresponding pictures q1 for p1 and q2 for p2 . The major problem is due to the partiality of column and row concatenations that requires that q1 and q2 must have same number of rows or columns. This problem is solved by concatenating q1 and q2 with itself as many ❡ ❡ times as necessary. For example q1 k1 and q2 k2 have same number of rows if we choose k1 = 1 (q2 ) and k2 = 1 (q1 ) (a more reﬁned version could consider a lowest common multiple). A special care is due to handle also the case where p is an empty column or an empty row. The regularity conditions in Proposition 23 are stated in such a way a ﬁnite number of pictures that could “disturb” this regularity are put away, by properly deﬁning the limitation on the size (namely n, m, , ). Observe that such “small” pictures may indeed have an inﬁnite number of extensions in some direction (horizontal, vertical, diagonal). This situation is illustrated in the following Example 24.

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

421

Example 24. For a language L1 = {(n0 , m0 )} consisting of a single picture, the functions

1 , 1 , 1 , 1 , 1 , 1 and integers n1 , m1 as in Proposition 23 can be chosen as 1 (n) = m0 , 1 (m) = n0 , 1 (n) = 1 (m) = 1 (n, m) = 1 (n, m) = 1, n1 = n0 + 1, and m1 = m0 + 1. Indeed the conditions (1), (2) and (3) in the proposition will be vacuously true. Consider now the language of squares L2 = Ln,n = {p | 1 (p) = 2 (p) 0}. The functions 2 , 2 , 2 , 2 , 2 , 2 and integers n2 , m2 as in Proposition 23 for L2 can be chosen as follows: 2 (n) = 2 (n) = n, 2 (n) = 2 (n) = 1, n2 = m2 = 0, and 2 (n, m) = 2 (n, m) = 1. Finally, consider L = L1 ∪ L2 = {(n0 , m0 )} ∪ Ln,n and suppose (n0 , m0 ) ∈ L2 . According to Proposition 23 (case 1), the functions , , , , , and integers n, m are the following: (n) = max{n, m0 }; (n) = 1; (m) = max{n0 , m}; (n) = 1; n = max{n1 , n2 } = n0 + 1; m = max{m1 , m2 } = m0 + 1; (n, m) = max{ 1 (n, m), 2 (n, m)} = 1 and (n, m) = max{ 1 (n, m), 2 (n, m)} = 1. Observe that, even if the picture (n0 , m0 ) have an inﬁnite number of extensions, we cannot ﬁnd some picture q that can be concatenate inﬁnitely often in diagonal. The choice of n and m is made in such a way (n0 , m0 ) does not satisfy the conditions n0 n and m0 m, and it does not “disturb” the regularity of L. Remark 25. In Proposition 23 we state that for any p there exists a picture q that can be concatenate to p as many times as we want. This picture may indeed depend on p as shown in the following Example 26. ❡ Example 26. Let L = {(kn, n) | k, n 0} = (Ln,n )∗ , where Ln,n is the language of squares. The functions , , , , , and integers n, m as in Proposition 23 can be chosen as follows: (n) = n, (m) = 0, (n) = 1, (m) = max{m, 1}, n = m = 0, (n, m) = max{m, 1} and (n, m) = 1. Remark that the size of picture q in case p = (n, m) with n > (m) or n n, m m, depends on the size of p. This situation is indeed unavoidable. ❡ For example, when p = (k n , n ), any q = (nq , n ) such that p ❡q ∗ ⊆ L is such that q = (k n , n ), thus depending on the number of columns of p, as pointed out in Remark 25. ❡ Example 27. Let L = {(hn, hn + n) | n, h 0}. We have L = Ln,n ❡(Ln,n )∗ , where L1 = Ln,n is the language of squares. The functions 1 , 1 , 1 , 1 , 1 , 1 and integers n1 , m1 as in Proposition 23 for L1 can be chosen as follows: 1 (n) = 1 (n) = n,

1 (n) = 1 (n) = 1, n1 = m1 = 0, and 1 (n, m) = 1 (n, m) = 1. The functions ❡

2 , 2 , 2 , 2 , 2 , 2 and integers n2 , m2 as in Proposition 23 for L2 = (Ln,n )∗ can be chosen as follows: 2 (n) = n, 2 (m) = 0, 2 (n) = 1, 2 (m) = max{m, 1}, n2 = m2 = 0, 2 (n, m) = max{m, 1} and 2 (n, m) = 1. According to Proposition 23 (case 2), the functions , , , , , and integers n, m are the following ones: (n) = 2n;

(n) = 1; (m) = m − 1; (n) = 1; n = 1; m = 0; (n, m) = max{m, 1} and ﬁnally (n, m) = max{m, 1} + 1. Observe that Proposition 23 ensures that, for any picture p = (n , m ) = (hn, hn+n) with n 1,m 0, there exists q = (nq , mq ) with nq , mq = 0, \❡ nq max{m , 1}, mq max{m , 1} + 1 such that p \❡q ∗ ⊆ L. Indeed such a picture can be chosen as q = (n, n), that really satisﬁes n m = hn + n and n m + 1 = hn + n + 1. \❡ For example, in Fig. 1, given p1 = (6, 6) ∈ L1 we choose q1 = (1, 1) and p1 \❡q1∗ ⊆ L1 .

422

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

p

q q

1

p

2

1

q2 1

Fig. 1. Extensions of p1 , p2 and p as in Example 27.

❡ Given p2 = (6, 2) ∈ L2 we choose q2 = (2, 2) and we have p2 ❡q2∗ ⊆ L2 .Then, if we consider p = p1 ❡p2 we can choose q = (2, 2) according to Proposition 23 (case 2) and \❡ we have p \❡q ∗ ⊆ L. Proposition 23 can be used to prove that some picture languages are not CRD-regular languages, as shown in the following examples. Example 28. Let L = {(n, n2 ) | n 0}. We show that L ∈ L(CRD), proving that it does not satisfy the condition (3) in Proposition 23. Indeed suppose on the contrary that there exist n, m ∈ N, , : N → N as in the proposition. Observe that in L, for any n 0, there is only one picture with n rows and one picture with n2 columns. Hence the pictures of L with number of rows less than or equal to n or number of columns less than or equal to m are in a ﬁnite number. Since L is inﬁnite, then there exists a picture p = (n, n2 ) ∈ L such that n > n and n2 > m. Therefore there exists q = (nq , mq ) \❡ with nq , mq = 0 such that p \❡q ∗ ⊆ L. Consider p1 = p \❡q = (n + nq , n2 + mq ). We must have that n2 + mq = (n + nq )2 and thus mq = (n + nq )2 − n2 . Consider now p2 = p \❡q \❡q = (n + 2nq , n2 + 2mq ); we have that n2 + 2mq = n2 + 2(n + nq )2 − 2n2 = n2 + 4nnq + 2n2q = (n + 2nq )2 (since nq = 0) against p2 ∈ L. Example 29. Let L = {(2n , 2n ) | n 0}. We show that L ∈ L(CRD), proving that it does not satisfy the condition (3) in Proposition 23. Indeed, suppose on the contrary that there exist n, m ∈ N, , : N → N as in the proposition. Observe that in L, for any n 0, there is only one picture with 2n rows and one picture with 2n columns. Hence the pictures of L with number of rows less than or equal to n or number of columns less than or equal to

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

423

m are in ﬁnite number. Since L is inﬁnite, then there exists a picture p = (2n , 2n ) ∈ L such that 2n > n and 2n > m. Therefore there exists q = (nq , mq ) with nq , mq = 0 \❡ such that p \❡q ∗ ⊆ L. Consider p1 = p \❡q = (2n + nq , 2n + mq ). Since p1 ∈ L, we have that 2n + nq = 2n + mq = 2n+k for some k = 0 (since nq = 0) and thus nq = mq = 2n+k − 2n . Consider now p2 = p \❡q \❡q = (2n + 2nq , 2n + 2mq ); we have that 2n + 2mq = 2n + 2(2n+k − 2n ) = 2n (1 + 2k+1 − 2) = 2n (2k+1 − 1). Therefore 2n + 2mq is the product of a power of 2 times an odd number different from 1 and it cannot be a power of 2, against p2 ∈ L. We now show that the family of CRD-regular languages lies between the class L(3DFA) and REC. On the other hand, there are languages that belong to L(4NFA) and that are not CRD-regular. Proposition 30. L(3DFA) ⊂ L(CRD) ⊂ REC. Proof. Let L ∈ L(3DFA). Following [14] we have that L is a ﬁnite union of languages R whose elements are (f, g), where f = a0 + a1 n and g = h(b0 + b1 n) + b2 n + b3 k + b4 with a0 , a1 , b0 , b1 , b2 , b3 , b4 positive integers and n, h, k positive integers variables. We show that any language R of this form is in L(CRD). Let En,n a CRD-RE for the language of squares (see Example 14). The language La0 ,a1 ,b0 ,b1 = {(a0 + a1 n, b0 + b1 n) | a0 , a1 , b0 , b1 , n ∈ N} can be denoted by the CRD-RE Ea0 ,a1 ,b0 ,b1 = ((a0 , b0 ) \❡ ❡ ❡ ❡ ❡ ❡ ❡ ((En,n )a1 )b1 ). Then E=(Ea0 ,a1 ,b0 ,b1 )∗ ❡((a0 , 1)∗ ❡E0,a1 ,0,b2 ) ❡(((1, b3 )∗ )∗ ) ❡((1, b4 )∗ ❡) is a CRD-RE for language R. Moreover the inclusion L(3DFA)⊂ L(CRD) is strict: in fact the language L = {n, k1 (n + 1) + k2 (n + 2) + k3 (n + 3)} in Example 20 is in L(CRD), but it cannot be recognized by a 3DFA (cf. [14]). CRD-regular languages are contained in REC because REC is closed under union, column and row concatenations and stars (cf. [6]) and under diagonal concatenation and star (cf. Proposition 10). Moreover there can be found examples of languages in REC that are not CRD-regular languages as language L = {(2n , 2n ) | n 0} (see Example 29) or L = {(n, n2 ) | n 0}. Regarding the comparison with the class of languages recognized by four-way automata, consider language L = {(2n , 2n ) | n 0 }. As shown in Example 29, L is not a CRD-regular language, but J. Kari and C. Moore [13] showed that L is recognized by a 4DFA. On the other hand, the class L(4DFA) seems not closed under concatenation and star operations (despite the case of one-letter alphabet is still open, it seems that for example the column closure of language in Example 21 cannot be recognized by a 4NFA). 3.3. A collection of examples In this section, we give a collection of examples of two-dimensional languages and classify them with respect to their machine-type and regular expression-type. Languages are given by their representative element, where n, m, h, k 1 are integer variables and c 1 is an integer constant. Moreover f1 (n) = a1 + · · · + an , where a1 , . . . , an are all

424

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

chosen in a ﬁnite subset of N, and f2 (n) = k1 (n + 1) + k2 (n + 2) + k3 (n + 3), where k1 , k2 , k3 1 are integer variables. Element (n, n) (2, 2n) (2n, 2n) (2n, 2m) (n, cn) (n, f1 (n)) (kn, n) (n, f2 (n)) (n, kn) (2n , 2n ) (hn, hkn + n) (n, n2 ) (n2 , n) (n2 , n2 ) (n, 2n ) (n, n!)

2DFA

2NFA

3DFA

3NFA

4DFA

4NFA

D -RE

CRD-RE

REC

Y Y Y Y Y N N N N N N N N N N N

Y Y Y Y Y Y N N N N N N N N N N

Y Y Y Y Y Y Y N N N N N N N N N

Y Y Y Y Y Y Y Y N N N N N N N N

Y Y Y Y Y Y Y Y Y Y N N N N N N

Y Y Y Y Y Y Y Y Y Y Y N N N N N

Y Y Y Y Y Y N N N N N N N N N N

Y Y Y Y Y Y Y Y Y N Y N N N N N

Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N

4. Advanced star operations Using the three types of concatenation operations (row, column and diagonal) and the three corresponding stars we get regular expressions describing a quite large family of twodimensional languages over one-letter alphabet. Unfortunately, all those operations together are not enough to describe the whole family REC because in REC there are very “complex’’ languages even in the case of one-letter alphabet. For example, REC contains languages of the form L = {(n, f (n)) | n > 0}, as well as L = {(f (n), g(n)) | n > 0}, where f (n), g(n) are polynomial or exponential functions in n (see [5] for details). Observe that the peculiarities of the “classical’’star operations (along which such column, row or diagonal stars are deﬁned) are mainly the following: (a) they are a simple iteration of one kind (row- or column- or diagonal-) of concatenation between pictures; (b) they correspond to an iterative process that at each step adds (concatenates) always the same set. We can say that they correspond to the idea of the iteration for some recursive H deﬁned as H (1) = S and H (n + 1) = H (n)S, where S is a given set. In this section we deﬁne new types of iteration operations, to which we will refer as advanced stars, that result much more powerful than the “classical’’ ones. We will use subscripts “r”and “d” with the meaning of “right” and “down’’, respectively. Deﬁnition 31. Let L, Lr , Ld be two-dimensional languages. The star of L with respect to (Lr , Ld ) is deﬁned as (Lr ,Ld )i L , L(Lr ,Ld )∗ = i 0

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

425

where L(Lr ,Ld )0 = {0,0 }, L(Lr ,Ld )1 = L and p pr (Lr ,Ld )i ∗∗ . = p = |p∈L , pr ∈ Lr , pd ∈ Ld , q ∈ pd q

(Lr ,Ld )i+1

L

Remark that the operation we deﬁned cannot be simulated by a sequence of ❡ and ❡ operations because to get p we ﬁrst concatenate p ❡pr and p ❡pd , then we overlay them and ﬁnally we ﬁll the “hole’’ with a picture q ∈ ∗∗ . For this reason this deﬁnition is conceptually different from the one given by O. Matz in [15]. Moreover, observe that such advanced star is based on a reverse principle with respect to the diagonal star: we “decide’’ what to concatenate to the right and down to the given picture and then ﬁll the hole in the bottom-right corner. This implies that, at (i + 1)th step of the iteration, we are forced to select pictures pr ∈ Lr and pd ∈ Ld that have the same number of rows and the same number of columns, respectively, of pictures generated at the ith step. Therefore, we actually exploit the fact that column and row concatenations are partial operations to somehow synchronize each step of the iteration with the choice of pictures in Lr and Ld . We now state the following proposition. Proposition 32. If L, Lr , Ld are languages in REC, then L(Lr ,Ld )∗ is in REC. Proof. We give only few hints for the proof because it can be carried over using the techniques shown in the proof of Proposition 10. The idea is to assume that the tiling systems for L, Lr , Ld are over disjoints local alphabets , r , d and deﬁne a local language M over an alphabet equal to the union of the three ones together with a new different symbol {x}. Language M contains pictures like p pr , where p , pr and pd belong to the local pd s languages for L, Lr and Ld , respectively and s is any picture ﬁlled with symbol x. Then the set of tiles for L = L(Lr ,Ld )∗ can be deﬁned by taking two “different copies’’ (i.e., over disjoint local alphabets) of languages M and different local languages for Lr and Ld and deﬁne tiles according to the deﬁnition of pictures in L .

As immediate application, consider the language L = {(n, n2 ) | n 0} of Example 28. Then L can be deﬁned as advanced star of M = {(1, 1)} with respect to Mr = {(n, 2n + 1) n 0} and Md = {(1, n) | n 0} (at (i + 1)th step of the iteration we “add’’ (2i + 1) columns to the current i 2 ones and 1 row to the current i ones). Using the same principle, namely exchanging languages Mr and Md , it is easy to deﬁne also the rotation of this language, i.e. language L = {(n2 , n) | n 0}. Then also the language L = {(n2 , n2 ) | n 0} can be deﬁned as advanced star of M = {(1, 1)} with respect to Nr = {(n2 , 2n + 1) n 0} and Nd = {(2n + 1, n2 ) | n 0}, where Nr (Nd ) can be obtained by column-concatenation (row-concatenation) of two copies of L (L) and 1-row (1-column) pictures. Remark that, even using the above deﬁned advanced star, it seems still not possible to deﬁne the language of Example 29 of pictures of size (2n , 2n ) or the language of pictures of size (n, 2n ) and similar ones. In fact, for this kind of languages (recall that they are all

426

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

in REC), it would be needed a deﬁnition that allows to use as Lr and/or Ld the language itself. We give the following deﬁnition. Deﬁnition 33. Let L, Ld be two-dimensional languages. The bi-iteration along the columns of L with respect to Ld is deﬁned as (∗,Ld )i L , L(∗,Ld )∗ = i 0

where L(∗,Ld )0 = {0,0 }, L(∗,Ld )1 = L and p p 1 2 | p1 , p2 ∈ L(∗,Ld )i , pd ∈ Ld , q ∈ ∗∗ . L(∗,Ld )i+1 = p = pd q Similarly we deﬁne the bi-iteration along the rows of L with respect to a language Lr , denoted by L(Lr ,∗)∗ , where the (i + 1)th step of the iteration is given by p1 pr (Lr ,∗)i+1 (Lr ,∗)i ∗∗ . = p = | p1 , p2 ∈ L , pr ∈ Lr , q ∈ L p2 q These notations naturally bring us to deﬁne also the bi-iteration along rows and columns, denoted by L(∗,∗)∗ , where the (i + 1)th step of the iteration is given by p1 p3 (∗,∗)i+1 (∗,∗)i ∗∗ . = p = | p1 , p2 , p3 ∈ L ,q∈ L p2 q Using same techniques as in the proof of Proposition 32, one can prove that the family REC over one-letter alphabet is closed under all such bi-iteration operations. It is immediate to verify that the language L of pictures of size (n, 2n ) can be obtained from language M = {(1, 1)} and Md = {(1, n) | n > 0} as L = M (∗,Md )∗ . We conclude by observing that the language of Example 29 of pictures of size (2n , 2n ) can be obtained as a bi-iteration both along rows and columns of the same language M = {(1, 1)}. 5. Towards the general alphabet case In this paper, we have deﬁned new operations between pictures so that a quite wide class of two-dimensional languages over one-letter alphabet could be described in terms of regular expressions. All these languages belong to REC that is the class of recognizable languages that generalizes better to two dimensions the class of regular string languages. Next step is surely to complete the deﬁnitions of some other kind of “advanced’’ star operations in the aim of proving a two-dimensional Kleene’s Theorem in this simpler case of one-letter alphabet. We also emphasize that an important goal of further work is to extend all these results to the general case of two-dimensional languages over any alphabet (i.e. the case with more

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

427

than one-letter). Observe that the deﬁnitions of diagonal concatenation and star are hard to extend to such general case, even using their characterizations in terms of rational relations or in terms of automata with only two moving directions. The main problem is that, if p, q are two pictures over , to deﬁne q \❡q we need to specify two pictures r, s such that p r . p \❡q = s q On the other hand, the formalism of the advanced stars appears to be a more reasonable approach to the general case. Recall that, in this case, we need always to specify four pictures (or four languages). We will use subscripts r, d and c with the meaning of “right”, “down” and “corner”, respectively. Then, we can give the following deﬁnition that directly extends Deﬁnition 31. Deﬁnition 34. Let L, Lr , Ld , Lc be two-dimensional languages over . The star of L with respect to (Lr , Ld , Lc ) is deﬁned as (Lr ,Ld ,Lc )i L , L(Lr ,Ld ,Lc )∗ = i 0

where L(Lr ,Ld ,Lc )0 = {0,0 }, L(Lr ,Ld ,Lc )1 = L and p pr (Lr ,Ld ,Lc )i+1 (Lr ,Ld ,Lc )i = p = |p∈L , pr ∈ Lr , pd ∈ Ld , pc ∈ Lc . L pd p c Remark that this kind of star operation is not the iteration of a “classical’’ concatenation operation. These operations seem to be able to describe several languages in REC, despite the “regular expressions’’ for the two-dimensional languages in the general case will result very complex. Appendix A. Proposition 23. Let L be a CRD-regular language. Then there exist

, , , : N → N, , : N × N → N increasing functions and n, m ∈ N such that for any p = (n, m) ∈ L we have ❡ (1) if m > (n) then p ❡q ∗ ⊆ L for q = (n, (n)) with (n) = 0, ❡ ∗ ❡ (2) if n > (m) then p q ⊆ L for q = ((m), m) with (n) = 0, \❡ ❡ (3) if n n, m m then p \ q ∗ ⊆ L for some q = (nq , mq ) with nq , mq = 0, nq (n, m), mq (n, m). Proof. First let us see how to choose , , , in all these cases. From Proposition 22, we know that the sets Cn = {a m | (n, m) ∈ L} and Rm = {a n | (n, m) ∈ L} are regular and therefore ultimately periodic. So there exist hC , kC , hR , kR ∈ N such that a j ∈ Cn ⇔ a j +kC ∈ Cn , for every j hC , and a j ∈ Rm ⇔ a j +kR ∈ Rm , for every j hR . If we do not take care to the fact that , , , have to be increasing and that , have to be = 0, then it would be sufﬁcient to set (n) = hC , (m) = hR , (n) = kC and

428

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

(m) = kR . But, to be sure that (n), (m) = 0 and to assure the increase of the functions, we set (n) = hC + s1 kC , (n) = hR + s2 kR , (n) = kC + s3 kC and (m) = kR + s4 kR , where kC = max{1, kC }, kR = max{1, kR } and s1 , s2 , s3 , s4 0 are the minimal integer such that (n) (n − 1), (n) (n − 1), (n) (n − 1) and (n) (n − 1). Let us now show how to choose n, m, and for a CRD-regular language L. Let r be a CRD-regular expression denoting L. The proof is by induction on the number of operators in r. For the basis, if L = ∅ then the proposition is vacuously true. If L = {0,0 }, then we can set n = 1, m = 1, (n) = 0, (m) = 0. If L = {1,0 } (resp. L = {0,1 }), then we can set n = 2 (resp. n = 1), m = 1 (resp. m = 2), (n) = 0 (resp. (n) = 1), (m) = 1 (resp. (m) = 0). In all these cases we can set (n) = 1, (m) = 1, (n, m) = (n, m) = 1. Assume now that the proposition is true for languages denoted by CRD-regular expression with less than i operators, i 1, and let r have i operators. There are seven cases depending ❡ on the form of r: (1) r = r1 ∪r2 , (2) r = r1 ❡r2 , (3) r = r1 ❡r2 , (4) r = r1 \❡r2 , (5) r = r1∗ , ❡ \❡ (6) r = r1∗ , or (7) r = r1∗ . In any of the seven cases, r1 and r2 denote some language L1 and L2 , respectively, that satisﬁes the condition. Let 1 , 1 , 1 , 1 , 1 , 1 , n1 , m1 be the functions and the values for L1 and let 2 , 2 , 2 , 2 , 2 , 2 , n2 , m2 be the functions and the values for L2 . Case 1: We have L = L1 ∪ L2 . We set (n, m) = max{ 1 (n, m), 2 (n, m)}, (n, m) = max{ 1 (n, m), 2 (n, m)}, n = max{n1 , n2 }, m = max{m1 , m2 }. Case 2: We have L = L1 ❡L2 . We set: (n, m) = max{ 1 (n, m) 2 (n, m), 1 (m) 2 (n, m), 2 (m) 1 (n, m)},

(n,m) = max{ 1(n,m) 2 (n,m) + 2 (n,m) 1 (n,m), 1 (m) 2 (n,m), 2 (m) 1 (n, m)}, n = max{n1 , n2 , 1 (m1 ), 2 (m2 )}, m = m1 + m2 . Now, let p = (n, m) ∈ L, with n n, m m. Clearly, p = p1 ❡p2 for some p1 = (np1 , mp1 ) = (n, mp1 ) ∈ L1 and p2 = (np2 , mp2 ) = (n, mp2 ) ∈ L2 . We have to consider three different cases: (2a) mp1 m1 and mp2 m2 , (2b) mp1 < m1 , (2c) mp2 < m2 . (2a) Since np1 n1 , mp1 m1 , np2 n2 and mp2 m2 , from the hypothesis on L1 \❡ and L2 , we have that p1 \❡q1∗ ⊆ L1 for some q1 = (nq1 , mq1 ) with nq1 , mq1 = 0, \❡ nq1 1 (n, mp1 ), mq1 1 (n, mp1 ) and that p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, mp2 ), mq2 2 (n, mp2 ). \❡ Now let us set q = (nq1 nq2 , nq1 mq2 + nq2 mq1 ) = (nq , mq ). Then p \❡q ∗ ⊆ L with nq , mq = 0, nq = nq1 nq2 1 (n, mp1 ) 2 (n, mp2 ) 1 (n, m) 2 (n, m) and mq = nq1 mq2 + nq2 mq1 1 (n, m) 2 (n, m) + 2 (n, m) 1 (n, m). (2b) Since mp1 < m1 , then mp2 m2 (recall that mp1 + mp2 = m m = m1 + m2 ) and \❡ therefore p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, mp2 ), ❡ mq2 2 (n, mp2 ). Moreover nq1 = n n 1 (m1 ) > 1 (mp1 ): therefore p1 ❡q1∗ ⊆ L1 for q1 = (nq1 , mq1 ) = (1 (mp1 ), mp1 ). Note that we have nq1 = 0. Let us set q = \❡ (nq1 nq2 , nq1 mq2 ) = (nq , mq ). Then we have p \❡q ∗ ⊆ L with nq , mq = 0, nq = nq1 nq2 1 (mp1 ) 2 (n, mp2 ) 1 (m) 2 (n, m) and mq = nq1 mq2 1 (mp1 ) 2 (n, mp2 ) 1 (m) 2 (n, m).

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

429

(2c) It is analogous to the previous case. Case 3: We have L = L1 ❡L2 and the proof is similar to that one of the previous case. Case 4: We have L = L1 \❡L2 . We set: (n, m) = max{ 1 (n, m), 2 (n, m), 1 (m), 2 (m)},

(n, m) = max{ 1 (n, m), 2 (n, m), 2 (n), 1 (n)}, n = max{n1 + n2 , 1 (m1 ) + n2 , 2 (m2 ) + n1 }, m = max{m1 + m2 , 2 (n2 ) + m1 , 1 (n1 ) + m2 }. Now, let p = (n, m) ∈ L = L1 \❡L2 , with n n, m m. Clearly, p = p1 \❡p2 for some p1 = (np1 , mp1 ) ∈ L1 and p2 = (np2 , mp2 ) ∈ L2 . We have to consider two different cases 4(a) and (b): with nq , mq = 0 (4a) At least one of the following conditions (1) and (2) is veriﬁed

np1 n1 , (1) mp1 m1 .

(2)

np2 n2 , mp2 m2 .

\❡ If condition (1) is veriﬁed, then p1 \❡q1∗ ⊆ L1 for some q1 = (nq1 , mq1 ) with nq1 , mq1 = 0, nq1 1 (n, m), mq1 1 (n, m) and it sufﬁces to set q = q1 . If, instead, condition (2) is \❡ veriﬁed, then p2 \❡q2∗ ⊆ L2 for some q2 = (nq2 , mq2 ) with nq2 , mq2 = 0, nq2 2 (n, m), mq2 2 (n, m) and it sufﬁces to set q = q2 . (4b) If neither condition (1) nor condition (2) is veriﬁed, then, again, we have to consider two different subcases either np1 n1 , mp1 < m1 , np2 < n2 , mp2 m2 or np1 < n1 , mp1 m1 , np2 n2 , mp2 < m2 . We give the details only for the ﬁrst subcase, since the other one can be handled in a similar way. So, in the ﬁrst subcase, we have np1 = n−np2 n−np2 > n−n2 1 (m1 )+n2 −n2 = 1 (m1 ) > 1 (mp1 ) i.e., np1 > 1 (mp1 ) and mp2 = m − mp1 m − mp1 > m − m1 2 (n2 ) + m1 − m1 = 2 (n2 ) > 2 (np2 ) i.e., ❡ ❡ mp2 > 2 (np2 ). Therefore, p1 ❡q1∗ ⊆ L1 for q1 = (1 (mp1 ), mp1 ) and p2 ❡q2∗ ⊆ L2 for q2 = (np2 , 2 (np2 )). We set q = (nq , mq ) = (nq1 , mq2 ) = (1 (mp1 ), 2 (np2 )) \❡ and we will have p \❡q ∗ ⊆ L with nq , mq = 0, nq = 1 (mp1 ) 1 (m), mq = 2 (np2 ) 2 (n). m ❡ Case 5: We have L = L∗1 . We set (n, m) = max{ 1 (n, m), m 1 (n, m)1 (m)}, (n, m) m = max{m 1 (n, m) m−1 (n, m)1 (m), 1 (n, m)}, n = max{n1 , 1 (m1 )} and m = m1 . 1 Now, let p = (n, m) ∈ L, with n n, m m. If m = 0, then p ∈ L1 and we can apply the inductive hypothesis. If instead m = 0, then we have p = p1 ❡ · · · ❡pk with pi = (npi , mpi ) = (n, mpi ) ∈ L1 . Let us consider two different subcases 5(a) and (b). (5a) There exists some ™ ∈ {1, . . . , k} such that mpi m1 for every i = 1, . . . , ™ and mpi < m1 for every i = ™ + 1, . . . , k. Therefore, for every i = 1, . . . , ™, there exists qi = \❡ (nqi , mqi ) with nqi , mqi = 0, nqi 1 (npi , mpi ), mqi 1 (npi , mpi ), such that pi \❡qi∗ ⊆ L1 . Note that for i = 1, . . . , ™, we have nqi 1 (npi , mpi ) = 1 (n, mpi ) 1 (n, m), mqi 1 (npi , mpi ) = 1 (n, mpi ) 1 (n, m). Moreover, since for every i = ™ + 1, . . . , k, we have mpi < m1 , it follows that 1 (mpi ) < 1 (m1 ) n n = npi . So for every i = ❡ ™ + 1, . . . , k, there exists qi = (nqi , mqi ) = (1 (mqi ), mqi ) such that pi ❡qi∗ ⊆ L1 . We \❡ set q = (nq , mq ) = ( ki=1 nqi , ™i=1 (mqi kj =1,j =i nqj )). Then p \❡q ∗ ⊆ L, where

m

nq , mq = 0, with nq ™1 (n, m) 1 (mq™+1 ) . . . 1 (mqk ) m 1 (n, m)1 (m) and mq =

430

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431 m

m

m

mq1 m−1 (n, m) 1 (m) + · · · + mq™ m−1 (n, m)1 (m) 1 (n, m)™ m−1 (n, m)1 (m) 1 1 1 m (n, m) (m) . m 1 (n, m) m−1 1 1 (5b) In this subcase, for every i=1, . . . , k, mpi <m1 . Therefore, as in case (5a), for ❡ every i=1, . . . , k, there exists qi =(nqi , mqi ) = (1 (mqi ), mqi ) such that pi ❡qi∗ ⊆L1 . k m m We set q=(nq , mq )=( i=1 nqi , m). Then nq , mq =0, nq 1 (m) m 1 (n, m)1 (m), m mq = m m 1 (n, m) m−1 (n, m)1 (m). 1 Case 6: This case is analogous to the previous one. \❡ Case 7: We have L = L∗1 . If there exists q = (nq , mq ) ∈ L with nq , mq = 0, we can \❡ set (n, m) = nq , (n, m) = mq , n = m = 0. Then for every p = (n, m) ∈ L = L∗1 we \❡ have p \❡q ∗ ⊆ L. If instead L ⊆ col (resp. L ⊆ row ), then we can set n = 0, m = 1 (resp. n = 1, m = 0) and condition (3) will be vacuously true. Note that in all the cases the choice of the functions , , , , and preserves their increase.

References [1] M. Anselmo, D. Giammarresi, M. Madonia, Regular expressions for two-dimensional languages over oneletter alphabet, in: C.S. Calude, E. Calude, M.J. Dinneen (Eds.), Proc. Development in Language Theory (DLT 04), Lecture Notes in Computer Science, Vol. 3340, Springer, Berlin, 2004, pp. 63–75. [2] J. Berstel, Transductions and Context-free Languages, Teubner, 1979. [3] M. Blum, C. Hewitt, Automata on a two-dimensional tape, IEEE Symp. on Switching and Automata Theory, 1967, pp. 155–160. [4] L. De Prophetis, S. Varricchio, Recognizability of rectangular pictures by Wang systems, J. Automat. Languages Combin. 2 (1997) 269–288. [5] D. Giammarresi, Two-dimensional languages and recognizable functions, in: G. Rozenberg, A. Salomaa (Eds.), Proc. Developments in Language Theory, Finland, 1993, World Scientiﬁc Publishing Co., Singapore, 1994. [6] D. Giammarresi, A. Restivo, Two-dimensional ﬁnite state recognizability, Fund. Inform. 25 (3,4) (1996) 399–422. [7] D. Giammarresi, A. Restivo, Two-dimensional languages, in: G. Rozenberg, et al. (Eds.), Handbook of Formal Languages, Vol. III, Springer, Berlin, 1997, pp. 215–268. [8] J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley, Reading, MA, 1979. [9] K. Inoue, A. Nakamura, Some properties of two-dimensional on-line tessellation acceptors, Inform. Sci. 13 (1977) 95–121. [10] K. Inoue, A. Nakamura, Two-dimensional ﬁnite automata and unacceptable functions, Internat. J. Comput. Math. Sec. A 7 (1979) 207–213. [11] K. Inoue, I. Takanami, A characterization of recognizable picture languages, in: Proc. Second Internat. Colloq. on Parallel Image Processing, Lecture Notes in Computer Science, Vol. 654, Springer, Berlin, 1993. [12] K. Inoue, I. Takanami, A. Nakamura, A note on two-dimensional ﬁnite automata, Inform. Process. Lett. 7 (1) (1978) 49–52. [13] J. Kari, C. Moore, Rectangles and squares recognized by two-dimensional automata, In: J. Karhumaki, H. Maurer, G. Paun, G. Rozenburg (Eds.), Theory is Forever, Essays dedicated to Arto Salomaa on the occasion of his 70th birthday. LNCS 3113 (2004) 134–144. [14] E.B. Kinber, Three-way automata on rectangular tapes over a one-letter alphabet, Inform. Sci. 35 (1985) 61–77. [15] O. Matz, Regular expressions and context-free grammars for picture languages, Proc. STACS’97, Lecture Notes in Computer Science, Vol. 1200, Springer, Berlin, 1997, pp. 283–294.

M. Anselmo et al. / Theoretical Computer Science 340 (2005) 408 – 431

431

[16] O. Matz, On piecewise testable, starfree, and recognizable picture languages, in: M. Nivat (Ed.), Foundations of Software Science and Computation Structures, Vol. 1378, Springer, Berlin, 1998. [17] M. Rabin, T. Scott, Finite automata and their decision problems, IBM J. Res. Dev. 3 (1959) 114–125. [18] D. Simplot, A characterization of recognizable picture languages by tilings by ﬁnite sets, Theoret. Comput. Sci. 218 (2) (1999) 297–323. [19] T. Wilke, Star-free picture expressions are strictly weaker than ﬁrst-order logic, in: Proc. ICALP’97, Lecture Notes in Computer Science, Vol. 1256, Springer, Berlin, 1997, pp. 347–357.

Theoretical Computer Science 340 (2005) 432 – 442 www.elsevier.com/locate/tcs

Parsing with a ﬁnite dictionary Julien Clémenta , Jean-Pierre Duvalb,∗ , Giovanna Guaianab , Dominique Perrina , Giuseppina Rindonea a Institut Gaspard-Monge, Université de Marne-la-Vallée, France b LIFAR, Université de Rouen, France

Abstract We address the following issue: given a word w ∈ A∗ and a set of n nonempty words X, how does one determine efﬁciently whether w ∈ X ∗ or not? We discuss several methods including an O(r × |w| + |X|) algorithm for this problem where r n is the length of a longest sufﬁx chain of X and |X| is the sum of the lengths of words in X. We also consider the more general problem of providing all the decompositions of w in words of X. © 2005 Elsevier B.V. All rights reserved. Keywords: Finite automata; String matching

1. Introduction The complexity of algorithms related to ﬁnite automata and regular expressions is wellknown in general. In this article, we focus on a particular problem, namely the complexity of parsing a regular language of the form Y = X ∗ where X is a ﬁnite set of nonempty words. This type of language occurs often in the applications, when X is a dictionary and Y is the set of texts obtained by arbitrary concatenations of strings from this dictionary. The time and space complexity can be an important issue in such applications since the dictionaries used for natural languages can contain up to several million words. ∗ Corresponding author. Tel.: +33 235 146610; fax: +33 235 146763.

E-mail addresses: [email protected] (J. Clément), [email protected] (J.-P. Duval), [email protected] (G. Guaiana), [email protected] (D. Perrin), [email protected] (G. Rindone). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.030

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

433

As a consequence of general constructions from automata theory, any regular language can be parsed in time proportional to the product of the size of the regular expression by the length of the input word. This just amounts to simulating a nondeterministic automaton built in a standard way from the regular expression. Using a deterministic automaton produces a linear-time algorithm after completing a determinization algorithm which may itself be exponential. Our main result here is an algorithm allowing one to parse a regular language of the form X ∗ , with X ﬁnite, in time O(r × |w| + |X|) with r the length of a longest sufﬁx chain in X, w the input word and |X| the sum of the lengths of words in X (Sections 4 and 5). The quantity r depending only on the set X is upper bounded by Card(X). This algorithm allows one to get all the decompositions of the input word in words of X. We also discuss some further problems on the automata related to regular expressions of this type (Section 3). The motivation of our study is in the work of Schützenberger on this type of languages. He has shown that although the size of the automaton depends on the length of the words in X, several syntactic parameters depend only on the cardinality of X (see [14]). One of them is linked with the number of interpretations of a word in words of X, and is related with the problem considered here. A similar yet unsolved problem is the complexity of the problem of unambiguity of the expression X ∗ , i.e. of the problem of testing whether X is a code. The standard algorithm [13] gives a quadratic complexity O(|X|2 ) where |X| is the total length of the words of X. It was lowered later to O(Card(X) × |X|) by various authors [2,11,6,8]. However, it is not known whether there exists a linear algorithm (see [5]).

2. Preliminaries and notations For a more complete description of automata and fundamentals of formal languages, the reader is referred to [7,9,12] and to [16] in particular for a recent overview on recognizable languages in free monoids. Let A be a ﬁnite alphabet. We denote by ε the empty word and by A∗ (resp. A+ ) the set of ﬁnite words (resp. nonempty ﬁnite words) on A. For a word w ∈ A∗ , we denote by |w| the length of w, by w[j ] for 0 j < |w| the letter of index j in w, and by w[j..k] = w[j ]w[j + 1] · · · w[k]. For any decomposition w = uv with u, v ∈ A∗ we say that u and v are, respectively, a preﬁx and a sufﬁx of w. The sufﬁx v is said to be proper if v = w. For a ﬁnite set X of words on A, we denote by Pref (X) and Suff (X) the set of preﬁxes and sufﬁxes of the words of X, respectively, by Card(X) the cardinality of X and by |X| the sum of the lengths of words of X, that is |X| =

x∈X

|x|.

We denote a nondeterministic ﬁnite automaton over the alphabet A by A = (Q, , i, T ), where Q is the set of states, i ∈ Q is the initial state, T ⊆ Q is the set of terminal states and is the transition function. We use A to denote the number of states of A. We abbreviate by NFA a nondeterministic ﬁnite automaton and by DFA a deterministic ﬁnite automaton.

434

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

3. Using deterministic automata Before presenting our algorithm, we examine what would be the classical approach to check if a word w is in X ∗ . A natural idea is to build an automaton for X ∗ . We consider the following process: (i) build a ﬁnite automaton for X, (ii) modify this automaton to accept X∗ (doing so, we usually get an NFA), and (iii) ﬁnally get a DFA after a classical determinization procedure. An optional fourth step could be to minimize the resulting automaton. Automata for a ﬁnite set of words X. First we consider three simple ways of building an automaton for a ﬁnite set of words X. (1) The “solar” automaton SX is obtained as follows: we build one automaton per word x ∈ X with |x| + 1 states and merge all the initial states (see Fig. 1). Note that this NFA is a tree with root i and that SX = |X| + 1. (2) The tree automaton TX (see Fig. 1): this is a tree which collects words sharing a common preﬁx. In terms of automata, the set of states corresponds to the set of preﬁxes and we have TX = Q = Pref (X), , i = , T = X with (p, a) = pa if p, pa ∈ Pref (X) and a ∈ A. This DFA has TX = Card(Pref (X)) states. (3) The minimal automaton MX (see Fig. 1). Given the set X a more elaborate method is to build the minimal DFA MX recognizing X. For instance one can apply a minimization algorithm to the tree automata TX in linear time with respect to |X| [10]. Of course MX is not necessarily a tree. Automata for the language X ∗ . A straightforward way to build an NFA recognizing the language X ∗ from an automaton A = (Q, , i, T ) recognizing X is to add -transitions from each ﬁnal state of T to the initial state i. We denote this automaton by star(A) (see Fig. 2). To save a little more space, we also merge all the terminal states without outgoing

b

a a

b a

b a

b

b

b a

a

a

b

a

b

b a

b a

a

b

a

b

b

b Fig. 1. The solar NFA SX (left), the tree DFA TX (middle) and the minimal DFA MX for X = {aa, ab, bb, aba, abb}.

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

435

b

a

a b

a b

b

b

b a

a

a b

a

a a

b

b

b

Fig. 2. The ﬂower automaton merge(SX ) (left) and the NFA star(TX ) (right) for X = {aa, ab, bb, aba, abb}.

a b

a b

b

a b

b

a

a

b a Fig. 3. An NFA for X ∗ with the set X = Ak a + b (k = 5) of Example 1.

transitions with the initial state: this yields an automaton merge(A). Doing so with the solar automaton SX , we obtain merge(SX ) the classical ﬂower automaton of X ∗ (see Fig. 2). Applying the classical powerset construction to one of the previous automata accepting X∗ , we obtain a DFA for X ∗ . Note that the determinization procedure gives the same result either starting from star(SX ) or from star(TX ), due to their tree-like structures. The same is true with merge instead of star. In general, for an NFA A, the determinization procedure builds a DFA whose number of states is trivially bounded by 2A . However, when we consider the particular case of X ∗ with X ﬁnite, could an exponential blow-up really happen? The following example shows that the answer is positive. Example 1. Let us consider X = Ak a + b with k > 0. It is easy to give an NFA for X ∗ with k + 1 states (see Fig. 3). The determinization leads to a DFA with (2k ) states. Another question is to ﬁnd if we can relate the number of states of a DFA for X ∗ to |X|. Until recently, it was thought that it could not exceed (|X|2 ), a bound which was shown to be reachable in [17], as stated in the following example.

436

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

Example 2. For an integer h > 1, take X = {a h−1 , a h }. The tree DFA TX and the minimal DFA MX are the same and have h + 1 = (|X|) states. The minimal DFA for X ∗ has (|X|2 ) states (see [17]). Shallit showed in [15] with the following example that an exponential blow-up is also possible. Example 3. Let h 3 and let X = {b} ∪ {a i ba h−i−1 |1 i h − 2} ∪ {ba h−2 b}. The minimal DFA accepting X∗ has exactly 3(h − 1)2h−3 + 2h−2 states [15]. Note that the size is exponential of order (h2h ) whereas Card(X) = (h) and |X| = (h2 ). The problem of ﬁnding a tight upper bound for the number of states of the minimal DFA for X∗ in terms of the total length |X| is called by this author [15] the noncommutative Frobenius problem. The number of states of the minimal automata obtained for the family of sets used in Example 3 is (h2h ) where h = (|X|1/2 ). A priori, the upper bound for a DFA obtained by determinization of an NFA for X∗ with (|X|) states is (2|X| ). Experiments performed on the family of Example 3 show that the DFA obtained by determinization 2 (before minimization) has also (h2h ) states, and not (2h ). We do not know in general whether (i) it is possible that the minimal DFA for X∗ has (2|X| ) states; (ii) it is possible that the DFA obtained by determinization has (2|X| ) states. Simulating the determinization process. A way to avoid the determinization step would be to simulate the determinized automaton while parsing the word w. Given an NFA A accepting the language X∗ with X a ﬁnite set of words, this gives an algorithm of time complexity O(A × |w|) and the space required to simulate the determinization process is A. Since the number of states of the NFA can be of order O(|X|), this approach gives a time complexity O(|X| × |w|) in the worst case. As an example of such a situation, we have the set X = {a k b, a} with k > 0.

4. Using string matching machines The methods discussed in the previous section do not lead to an optimal algorithm in O(|w|). Indeed, either we use a DFA and we face a computation which can be exponential in |X| or we simulate the DFA and we obtain an algorithm in O(|X| × |w|). We now consider a different approach which leads to a lower complexity. Another advantage of the proposed approach is to possibly solve a more general problem. Indeed, we may be interested in obtaining the set of all decompositions of the input word over X. This cannot be achieved using a DFA accepting X ∗ given for instance by the methods described in the previous section.

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

437

Let X = {x0 , . . . , xn−1 } be a set of n words of A+ . We present in this section an algorithm, using classic pattern matching techniques, which gives all the X-decompositions of w (the decompositions of w as concatenations of words of X). Then we derive a membership test for X∗ in O(Card(X) × |w|) time complexity. In the next section, we shall study a further improvement of this algorithm.

4.1. Decompositions The following remark is the basis of our algorithm. An X-decomposition of w is always the extension of an X-decomposition of a preﬁx of w. We consider the preﬁx w[0..i] of length i + 1 of w. The word w[0..i] admits an Xdecomposition ending with a word x if and only if w[0..i] = f x for a word f in X ∗ . In other terms, w[0..i] admits an X-decomposition ending with x if and only if x is a sufﬁx of w[0..i] and w[0..i − |x |] ∈ X ∗ . We obtain all the X-decompositions of w[0..i] by examining all the words of X which are sufﬁxes of w[0..i] and which extend a previous X-decomposition. Of course, when w[0..i] = w, we get all the X-decompositions of w. So the idea of the algorithm is the following: build, for each word x ∈ X, a deterministic automaton A accepting the language A∗ x and use an auxiliary array D of size |w| such that D[i] = { ∈ [0..n − 1] | w[0..i] ∈ X∗ x }. Then testing if w[0..i] ends by the word x is equivalent to checking that the automaton A is in a terminal state after reading w[0..i]. Also testing if w[0..i − |x |] ∈ X ∗ is equivalent to checking that D[i − |x |] = ∅. In the following algorithm, the input word w is read simultaneously by all the n automata, letter by letter, from left to right. We use, for technical convenience, an additional element D[−1] initialized to an arbitrary nonempty set (for instance {∞}) meaning that the preﬁx ε of w is always in X ∗ . At the end of the scanning of w, provided D[|w| − 1] = ∅, we can process the array D from the end to the beginning and recover all the Xdecompositions for instance with a recursive procedure like PRINTALLDECOMPOSITIONS() (see below). For each word x ∈ X, the automaton A considered here is the minimal automaton which recognizes the language A∗ x . This automaton is deﬁned by A = (Q = Pref (x ), , i = ε, t = x ) where the transition function is deﬁned, for p ∈ Pref (x ) and a ∈ A, by

(p, a) = the longest sufﬁx of pa which belongs to Pref (x ). We use these principles in the following algorithm.

438

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

ISDECOMPOSEDALL(w, X = {x0 , . . . , xn−1 }) 1 Preprocessing step 2 for ← 0 to n − 1 do 3 A ← AUTOMATONFROMWORD(x ) 4 Main loop 5 for ← 0 to n − 1 do 6 p is the current state of the automaton A . 7 p ← i 8 D[−1] ← {∞} 9 for i ← 0 to |w| − 1 do 10 D[i] ← ∅ 11 for ← 0 to n − 1 do 12 p ← (p , w[i]) 13 if p = t and D[i − |x |] = ∅ then 14 D[i] ← D[i] ∪ {} 15 return D The algorithm returns an array of size O(Card(X) × |w|). The preprocessing step which builds automata requires a time O(|X|) and a space O(|X|×Card(A)) (or O(|X|) if automata are represented with the help of a failure function as usually made in stringology [4,3]). Note that we do not need to build all the automata A in the preprocessing step. We can also choose to construct in a lazy way the accessible part of the automata (corresponding for each automaton A to the preﬁxes of x occurring in w) along the processing of the input word w. For the sake of clarity, we have chosen to distinguish the preprocessing step from the rest. In view of this remark, we could omit the complexity O(|X|) of the preprocessing step in the following proposition. Proposition 4. The time and space complexity of the algorithm ISDECOMPOSEDALL() is O(Card(X) × |w| + |X|). Given the array D computed by the procedure ISDECOMPOSEDALL() for a word w, it is quite straightforward to print all the decompositions of w thanks to the following two procedures: PRINTALLDECOMPOSITIONS(w, X = {x0 , . . . , xn−1 }) 1 D ← IsDECOMPOSEDALL(w, X) 2 L ← emptyList 3 RECPRINTALLDECOMPOSITIONS(D, |w| − 1, L) RECPRINTALLDECOMPOSITIONS(D, h, L) 1 if h = −1 then 2 PRINT(L) 3 else for j ∈ D[h] do 4 RECPRINTALLDECOMPOSITIONS(D, h − |xj |, xj · L)

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

439

For a word w belonging to X ∗ the procedure PRINTALLDECOMPOSITIONS() prints every X-decomposition of w in the form xi0 · xi1 · · · xip . If we want only one X-decomposition of w, it sufﬁces to store in D[i] only one word x of X corresponding to an X-decomposition of w[0..i] ending with this x. The space required for the array then becomes O(|w|). 4.2. Membership test When we are interested only in testing the membership of w in X ∗ , we can simply use a Boolean array D setting D[i] = true if and only if there exists x ∈ X such that w[0..i] ∈ X∗ x. Moreover, it sufﬁces to use a circular Boolean array D[0..k] with k = maxx∈X |x| (instead of |w| + 1), and compute indexes in this array modulo k + 1 (which means that for m ∈ Z, one has D[m] = D[r] with 0 r < k + 1 and m = r mod (k + 1)). This yields the following algorithm. MEMBERSHIP(w, X = {x0 , . . . , xn−1 }) 1 Preprocessing step 2 for ← 0 to n − 1 do 3 A ← AUTOMATONFROMWORD(x ) 4 Main loop 5 for ← 0 to n − 1 do 6 p is the current state of the automaton A . 7 p ← i 8 D[−1] ← true 9 for i ← 0 to |w| − 1 do 10 D[i] ← false 11 for ← 0 to n − 1 do 12 p ← (p , w[i]) 13 ←0 14 do if (p = t and D[i − |x |] = true) then 15 D[i] ← true 16 ←+1 17 while ( < n and D[i] = false) 18 return D[|w| − 1] We can easily modify the algorithm while preserving the same complexity by exiting whenever all the elements of the array D from 0 to k are false. In this case, w ∈ / X∗ . The following proposition gives the complexity of the above algorithm. Proposition 5. The time complexity of the algorithm MEMBERSHIP() is O(Card(X) × |w| + |X|). The analysis of the space complexity shows that, except for the preprocessing step, the algorithm needs only O(maxx∈X |x|) additional space. In particular, the space complexity is independent of the length of the input word.

440

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

5. String matching automaton In the preceding section, we used for each word x ∈ X a distinct automaton A corresponding to A∗ x . To get a more efﬁcient algorithm, we resort in this section to the well-known Aho–Corasick algorithm [1] which is built from a ﬁnite set of words X a deterministic complete automaton (not necessarily minimal) AX recognizing the language A∗ X. This automaton is the basis of many efﬁcient algorithms on string matching problems and is often called the string matching automaton. It is a generalization of the automaton A associated to a single word. Let us brieﬂy recall its construction. We let AX = (Pref (X), , ε, Pref (X) ∩ A∗ X) be the automaton where the set of states is Pref (X), the initial state is ε, the set of ﬁnal states is Pref (X) ∩ A∗ X and the transition function is deﬁned by

(p, a) = the longest sufﬁx of pa which belongs to Pref (X). We associate to each word u ∈ A∗ , u = ε, the word Border X (u), or simply Border(u) when there is no ambiguity, deﬁned by Border(u) = the longest proper sufﬁx of u which belongs to Pref (X). The automaton AX can be easily built from the tree TX (cf. Section 3) of X by a breadth-ﬁrst exploration and using the Border function. Indeed, one has  if pa ∈ Pref (X),  pa (p, a) = (Border(p), a) if p = ε and pa ∈ Pref (X),  ε otherwise. A state p is terminal for AX if p is a word of X (i.e. p is terminal in the tree TX of X) or if a proper sufﬁx of p is a word of X. The automaton AX can be built in time and space complexity O(|X|) if we use the function Border as a failure function (see [4,3] for implementation details). We will say, for simplicity, that a state of the automaton is marked if it corresponds to a word of X and not marked otherwise. A major difference induced by the Aho–Corasick automaton is that a terminal state p, marked or not, corresponds to an ordered set Suff (p)∩X of sufﬁxes of p. The order considered is given by the sufﬁx relation suff where u suff v means that v is a proper sufﬁx of u. We denote by SufﬁxChain(p) the sequence of words in Suff (p) ∩ X ordered by this relation. To ﬁnd easily the words of SufﬁxChain(p), we associate to each terminal state p of AX the state SufﬁxLink(p) = the longest proper sufﬁx of p which belongs to X. Thus we have

 if Border(p) ∈ X,  Border(p) SufﬁxLink(p) = SufﬁxLink(Border(p)) if Border(p) ∈ X and Border(p) = ε,  undeﬁned otherwise.

Since SufﬁxLink(p) is computed in time O(|p|), the preprocessing can be done in time and space complexity O(|X|), i.e. the complexity of the Aho–Corasick algorithm.

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

a

a

a

b

a

b

a

a

b a

b

a a

b a

b

a

b

b

a b

b b

b

441

a

a

a

Fig. 4. For the set X = {a 2 , a 4 b, a 3 ba, a 2 b, ab} of Example 7: Tree TX (left), Aho–Corasick automaton with links Border (middle) and the new links SufﬁxLink (right) to add to the Aho–Corasick automaton.

To decide whether an input word w belongs to X ∗ or not (and get eventually its Xdecompositions), we use the same technique as in the previous section, considering this time the automaton AX (instead of the n automata A ). The immediate advantage is that each letter of the word w is read only once (meaning that only one transition is made in the automaton) whereas each letter was read n times before (one per automaton A ). Let us suppose that for the current preﬁx w[0..i] of w, the automaton AX ends in a terminal state p. This means that w[0..i] = fp with f ∈ A∗ and p the longest sufﬁx of w[0..i] in Pref (X) ∩ A∗ X. Consequently, w[0..i] ∈ X ∗ if and only if w[0..i − |x|] ∈ X ∗ for at least one word x of SufﬁxChain(p). This is easily checked using the marking of terminal states (whether they correspond exactly to a word of X or not), the function SufﬁxLink(p) and the array D (which plays exactly the same role as in the previous section). This yields our main result, stated in the following proposition. Proposition 6. Let X be a ﬁnite set of words on A. The membership test of a word w in X ∗ can be done in time O(r × |w| + |X|) where r is the maximal length of the sufﬁx chains in X. The space complexity includes O(|X|) for the preprocessing step (building the Aho– Corasick automaton) and O(maxx∈X |x|) for the rest of the algorithm. If X is a sufﬁx code, the complexity, except for the preprocessing step, becomes O(|w|) which is optimal, whereas the worst case happens when all words are sufﬁxes of one another giving the same complexity O(Card(X) × |w|) as in the previous section. Note also that in the particular case where X is a preﬁx code, it is easy to solve the membership problem for X ∗ in an optimal time O(|w|) after an O(|X|) preprocessing step. Example 7. Let X = {a 2 , a 4 b, a 3 ba, a 2 b, ab}. For the word w = a 5 b, it is necessary to follow the sufﬁx chain SufﬁxChain(a 4 b) = (a 4 b, a 2 b, ab) since after parsing w the automaton is in the state corresponding to a 4 b and the unique X-decomposition is a 5 b = a 2 · a 2 · ab. Fig. 4 shows the tree TX (left), the automaton AX with the links representing the

442

J. Clément et al. / Theoretical Computer Science 340 (2005) 432 – 442

failure function Border (middle) and the SufﬁxLink representing the sufﬁx chains (right) to add to the Aho–Corasick automaton. Acknowledgements We thank the referee for pointing us the reference to Shallit [15] used in Example 3. The style for algorithms is algochl.sty from [3] and automata are drawn thanks to gastex. References [1] A.V. Aho, M.J. Corasick, Efﬁcient string matching: an aid to bibliographic search, Commun. ACM 18 (6) (1975) 333–340. [2] A. Apostolico, R. Giancarlo, Pattern matching implementation of a fast test for unique decipherability, Inform. Process. Lett. 18 (1984) 155–158. [3] M. Crochemore, C. Hancart, T. Lecroq, Algorithmique du texte, Vuibert, 2001, 347pp. [4] M. Crochemore, W. Rytter, Jewels of Stringology, World Scientiﬁc, Hong-Kong, 2002, 310pp. [5] Z. Galil, Open problems in stringology, in: A. Apostolico, Z. Galil (Eds.), Combinatorial Algorithms on Words, Springer, Berlin, 1985, pp. 1–8. [6] C.M. Hoffmann, A note on unique decipherability, in: MFCS, Lecture Notes in Computer Science, Vol. 176, Springer, Berlin, New York, 1984, pp. 50–63. [7] J. Hopcroft, R. Motwani, J. Ullman, Introduction to Automata Theory, Languages and Computation, AddisonWesley, Reading MA, 2001. [8] R. McCloskey, An o(n2 ) time algorithm for deciding whether a regular language is a code, J. Comput. Inform. 2 (1) (1996) 79–89 Special issue: Proc. Eighth Internat. Conf. on Computing and Information, ICCI’96. [9] D. Perrin, Finite automata, in: J. Leeuwen (Ed.), Handbook of Theoretical Computer Science, Formal Models and Semantics, Vol. B, Elsevier, Amsterdam, 1990, pp. 1–57. [10] D. Revuz, Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci. 92 (1) (1992) 181–189. [11] M. Rodeh, A fast test for unique decipherability based on sufﬁx trees, IEEE Trans. Inform. Theory 28 (1982) 648–651. [12] J. Sakarovitch, Eléments de théorie des automates, Vuibert, 2003. [13] A. Sardinas, G. Patterson, A necessary and sufﬁcient condition for the unique decomposition of coded messages, in: IRE Convention Record, part 8, 1953, pp. 104–108. [14] M.-P. Schützenberger, A property of ﬁnitely generated submonoids of free monoids, in: G. Pollak (Ed.), Algebraic Theory of Semigroups, Proc. Sixth Algebraic Conf., Szeged, 1976, North-Holland, Amsterdam, 1979, pp. 545–576. [15] J. Shallit, Regular expressions, enumeration and state complexity, invited talk at Ninth Internat. Conf. on Implementation and Application of Automata (CIAA 2004) Queen’s University, Kingston, Ontario, Canada, July 22–24, 2004. [16] S. Yu, Regular languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer, Berlin, New York, 1997, pp. 41–110. [17] S. Yu, State complexity of regular languages, in: Proc. Descriptional Complexity of Automata, Grammars and Related Structures, 1999, pp. 77–88.

Theoretical Computer Science 340 (2005) 443 – 456 www.elsevier.com/locate/tcs

A topological approach to transductions Jean-Éric Pina,∗ , Pedro V. Silvab a LIAFA, Université Paris VII and CNRS, Case 7014, 2 Place Jussieu, 75251 Paris Cedex 05, France b Centro de Matemática, Faculdade de Ciências, Universidade do Porto, R. Campo Alegre 687,

4169-007 Porto, Portugal

Abstract This paper is a contribution to the mathematical foundations of the theory of automata. We give a topological characterization of the transductions from a monoid M into a monoid N, such that if R is a recognizable subset of N, −1 (R) is a recognizable subset of M. We impose two conditions on the monoids, which are fullﬁlled in all cases of practical interest: the monoids must be residually ﬁnite and, for every positive integer n, must have only ﬁnitely many congruences of index n. Our solution proceeds in two steps. First we show that such a monoid, equipped with the so-called Hall distance, is a metric space whose completion is compact. Next we prove that can be lifted to a map ˆ from M into the set of compact subsets of the completion of N. This latter set, equipped with the Hausdorff metric, is again a compact monoid. Finally, our main result states that −1 preserves recognizable sets if and only if ˆ is continuous. © 2005 Elsevier B.V. All rights reserved.

1. Introduction This paper is a contribution to the mathematical foundations of automata theory. We are mostly interested in the study of transductions from a monoid M into another monoid N such that, for every recognizable subset R of N, −1 (R) is a recognizable subset of M. We propose to call such transductions continuous, a term introduced in [7] in the case where M is a ﬁnitely generated free monoid.

∗ Corresponding author.

E-mail addresses: [email protected] (J.-É. Pin), [email protected] (P.V. Silva). 0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2005.03.029

444

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

In mathematics, the word “continuous” generally refers to a topology. The aim in this paper is to ﬁnd appropriate topologies for which our use of the term continuous coincides with its usual topological meaning. This problem was already solved when is a mapping from A∗ into B ∗ . In this case, a result which goes back at least to the 1980s (see [14]) states that is continuous in our sense if and only if it is continuous for the proﬁnite topology on A∗ and B ∗ . We shall not attempt to deﬁne here the proﬁnite topology and the reader is referred to [3,4,21] for more details. This result actually extends to mappings from A∗ into a residually ﬁnite monoid N, thanks to a result of Berstel et al. [7] recalled below (Proposition 2.3). However, a transduction : M → N is not a map from M into N, but a map from M into the set of subsets of N, which calls for a more sophisticated solution, since it does not sufﬁce to ﬁnd an appropriate topology on N. Our solution proceeds in two steps. We ﬁrst show, under fairly general assumptions on M and N, which are fulﬁlled in all cases of practical interest, that M and N can be equipped with a metric, the Hall metric, for which they become metric monoids whose completion (as metric spaces) is compact. Next we prove that can ) of compact subsets of N , the completion be lifted to a map from M into the monoid K(N ), equipped with the Hausdorff metric, is again a compact monoid. of N. The monoid K(N Finally, our main result states that is continuous in our sense if and only if is continuous in the topological sense. Our paper is organised as follows. Basic results on recognizable sets and transductions are recalled in Section 2. Section 3 is devoted to topology and is divided into several subsections: Section 3.1 is a reminder of basic notions in topology, metric monoids and the Hall metric are introduced in Sections 3.2 and 3.3, respectively. The connections between clopen and recognizable sets are discussed in Section 3.5 and Section 3.6 deals with the monoid of compact subsets of a compact monoid. Our main result on transductions is presented in Section 4. Examples like the transductions (x, n) → x n and x → x ∗ are studied in Section 5. The paper ends with a short conclusion.

2. Recognizable languages and transductions Recall that a subset P of a monoid M is recognizable if there exists a ﬁnite monoid F and a monoid morphism : M → F and a subset Q of F such that P = −1 (Q). The set of recognizable subsets of M is denoted by Rec(M). Recognizable subsets are closed under boolean operations, quotients and inverse morphisms. By Kleene’s theorem, a subset of a ﬁnitely generated free monoid is recognizable if and only if it is rational. The description of the recognizable subsets of a product of monoids was given by Mezei (see [5, p. 54] for a proof). Theorem 2.1 (Mezei). Let M1 , . . . , Mn be monoids. A subset of M1 × · · · × Mn is recognizable if and only if it is a ﬁnite union of subsets of the form R1 × · · · × Rn , where Ri ∈ Rec(Mi ). The following result is perhaps less known (see [5, p. 61]).

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

445

Proposition 2.2. Let A1 , …, An be ﬁnite alphabets. Then Rec(A∗1 × A∗2 × · · · × A∗n ) is closed under concatenation product. Given two monoids M and N, recall that a transduction from M into N is a relation on M and N, that we shall also consider as a map from M into the monoid of subsets of N. If X is a subset of M, we set (X) = (x). x∈X

Observe that “transductions commute with union”: if (Xi )i∈I is a family of subsets of M, then Xi = (Xi ). i∈I

i∈I

If : M → N is a transduction, then the inverse relation −1 : N → M is also a transduction, and if P is a subset of N, the following formula holds:

−1 (P ) = {x ∈ M | (x) ∩ P = ∅}. A transduction : M → N preserves recognizable sets if, for every set R ∈ Rec(M), (R) ∈ Rec(N ). It is said to be continuous if −1 preserves recognizable sets, that is, if for every set R ∈ Rec(N ), −1 (R) ∈ Rec(M). Continuous transductions were characterized in [7] when M is a ﬁnitely generated free monoid. Recall that a transduction : M → N is rational if it is a rational subset of M × N . According to Berstel et al. [7], a transduction : A∗ → N is residually rational if, for any morphism : N → F , where F is a ﬁnite monoid, the transduction ◦ : A∗ → F is rational. We can now state: Proposition 2.3 (Berstel et al. [7] ). A transduction : A∗ → N is continuous if and only if it is residually rational. 3. Topology The aim of this section is to give a topological characterization of the transductions from a monoid into another monoid such that −1 preserves recognizable sets. Even if topology is undoubtedly part of the background of the average mathematician, it is probably not a daily concern of the specialists in automata theory to which this paper is addressed. For those readers whose memories in topology might be somewhat blurry, we start with a brief overview of some key concepts in topology used in this paper. 3.1. Basic notions in topology A metric d on a set E is a map from E into the set of nonnegative real numbers satisfying the three following conditions, for all (x, y, z) ∈ E 3 : (1) d(x, y) = 0 if and only if x = y,

446

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

(2) d(y, x) = d(x, y), (3) d(x, z) d(x, y) + d(y, z). A metric is an ultrametric if (3) is replaced by the stronger condition (3 ) d(x, z) max{d(x, y), d(y, z)}. A metric space is a set E together with a metric d on E. Given a positive real number and an element x in E, the open ball of center x and radius is the set B(x, ) = {y ∈ E | d(x, y) < }. A function from a metric space (E, d) into another metric space (E , d ) is uniformly continuous if, for every > 0, there exists > 0 such that, for all (x, x ) ∈ E 2 , d(x, x ) < implies d((x), (x )) < . It is an isometry if, for all (x, x ) ∈ E 2 , d((x), (x )) = d(x, x ). A sequence (xn )n 0 of elements of E is converging to a limit x ∈ E if, for every > 0, there exists N such that for all integers n > N , d(xn , x) < . It is a Cauchy sequence if, for every positive real number > 0, there is an integer N such that for all integers p, q N , d(xp , xq ) < . A metric space E is said to be complete if every Cauchy sequence of elements of E converges to a limit. containing E as For any metric space E, one can construct a complete metric space E, 1 a dense subspace and satisfying the following universal property: if F is any complete metric space and is any uniformly continuous function from E to F, then there exists → F which extends . The space E is : E a unique uniformly continuous function determined up to isometry by this property, and is called the completion of E. Metric spaces are a special instance of the more general notion of topological space. A topology on a set E is a set T of subsets of E, called the open sets of the topology, satisfying the following conditions: (1) ∅ and E are in T , (2) T is closed under arbitrary union, (3) T is closed under ﬁnite intersection. The complement of an open set is called a closed set. The closure of a subset X of E, denoted by X, is the intersection of the closed sets containing X. A subset of E is dense if its closure is equal to E. A topological space is a set E together with a topology on E. A map from a topological space into another one is continuous if the inverse image of each open set is an open set. A basis for a topology on E is a collection B of open subsets of E such that every open set is the union of elements of B. The open sets of the topology generated by B are by deﬁnition the arbitrary unions of elements of B. In the case of a metric space, the open balls form a basis of the topology. A topological space (E, T ) is Hausdorff if for each u, v ∈ E with u = v, there exist disjoint open sets U and V such that u ∈ U and v ∈ V . A family of open sets (Ui )i∈I is said to cover a topological space (E, T ) if E = i∈I Ui . A topological space (E, T ) is said to be compact if it is Hausdorff and if, for each family of open sets covering E, there exists a ﬁnite subfamily that still covers E. To conclude this section, we remind the reader of a classical result on compact sets. 1 See deﬁnition below.

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

447

Proposition 3.1. Let T and T be two topologies on a set E. Suppose that (E, T ) is compact and that (E, T ) is Hausdorff. If T ⊆ T , then T = T . Proof. Consider the identity map from (E, T ) into (E, T ). It is a continuous map, since T ⊆ T . Therefore, if F is closed in (E, T ), it is compact, and its continuous image (F ) in the Hausdorff space (E, T ) is also compact, and hence closed. Thus −1 is also continuous, whence T = T . 3.2. Metric monoids Let M be a monoid. A monoid morphism : M → N separates two elements u and v of M if (u) = (v). By extension, we say that a monoid N separates two elements of M if there exists a morphism : M → N which separates them. A monoid is residually ﬁnite if any pair of distinct elements of M can be separated by a ﬁnite monoid. Residually ﬁnite monoids include ﬁnite monoids, free monoids, free groups and many others. They are closed under direct products and thus monoids of the form A∗1 ×A∗2 ×· · ·×A∗n are also residually ﬁnite. A metric monoid is a monoid equipped with a metric for which its multiplication is uniformly continuous. Finite monoids, equipped with the discrete metric, are examples of metric monoids. More precisely, if M is a ﬁnite monoid, the discrete metric d is deﬁned by 0 if s = t, d(s, t) = 1 otherwise. In the sequel, we shall systematically consider ﬁnite monoids as metric monoids. Morphisms between metric monoids are required to be uniformly continuous. 3.3. Hall metric Any residually ﬁnite monoid M can be equipped with the Hall metric d, deﬁned as follows. We ﬁrst set, for all (u, v) ∈ M 2 : r(u, v) = min{Card(N ) N separates u and v}. Then we set d(u, v) = 2−r(u,v) , with the usual conventions min ∅ = +∞ and 2−∞ = 0. Let us ﬁrst establish some general properties of d. Proposition 3.2. In a residually ﬁnite monoid M, d is an ultrametric. Furthermore, the relations d(uw, vw) d(u, v) and d(wu, wv) d(u, v) hold for every (u, v, w) ∈ M 3 . Proof. It is clear that d(u, v) = d(v, u). Suppose that d(u, v) = 0. Then u cannot be separated from v by any ﬁnite monoid, and since M is residually ﬁnite, this shows that u = v. Finally, let (u, v, w) ∈ M 3 . First assume that u = w. Since M is residually ﬁnite, u and w can be separated by some ﬁnite monoid F. Therefore F separates either u and v, or v and w. It follows that min{(r(u, v), r(v, w)} r(u, w) and hence d(u, w) max{d(u, v), d(v, w)}. This relation clearly also holds if u = w.

448

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

The second assertion is trivial. A ﬁnite monoid separating uw and vw certainly separates u and v. Therefore d(uw, vw) d(u, v) and dually, d(wu, wv) d(u, v). The next two propositions state two fundamental properties of the Hall metric. Proposition 3.3. Multiplication on M is uniformly continuous for the Hall metric. Thus (M, d) is a metric monoid. Proof. It is a consequence of the following relation d(uv, u v ) max{d(uv, uv ), d(uv , u v )} max{d(v, v ), d(u, u )} which follows from Proposition 3.2. Proposition 3.4. Let M be a residually ﬁnite monoid. Then any morphism from (M, d) onto a ﬁnite discrete monoid is uniformly continuous. Proof. Let be a morphism from M onto a ﬁnite monoid F. Then by deﬁnition of d, d(u, v) < 2−|F | implies (u) = (v). Thus is uniformly continuous. d), is called the Hall comThe completion of the metric space (M, d), denoted by (M, pletion of M. Since multiplication on M is uniformly continuous, it extends, in a unique which is again uniformly continuous. In particular, M way, into a multiplication onto M, any morphism from is a metric, complete monoid. Similarly, Proposition 3.4 extends to M: d) onto a ﬁnite discrete monoid is uniformly continuous. (M, is compact. We now characterize the residually ﬁnite monoids M such that M is compact if and only if, for Proposition 3.5. Let M be a residually ﬁnite monoid. Then M every positive integer n, there are only ﬁnitely many congruences of index n on M. Proof. Recall that the completion of a metric space is compact if and only if it is precompact, that is, for every > 0, it can be covered by a ﬁnite number of open balls of radius . Denote by Cn the set of all congruences on M of index n and let n be the intersection of all congruences of Cn . is compact and let n > 0. Since M is precompact, there exist a Assume ﬁrst that M ﬁnite subset F of M such that the balls B(x, 2−n ), with x ∈ F , cover M. Let x ∈ F and y ∈ B(x, 2−n ). Then r(x, y) > n and thus the monoids of size n cannot separate x from y. It follows that x y for each ∈ Cn and thus x n y. Therefore n is a congruence of ﬁnite index, whose index is at most |F |. Now each congruence of Cn is coarser than n , and since there are only ﬁnitely many congruences coarser than n , Cn is ﬁnite. Conversely, assume that, for every positive integer n, there are only ﬁnitely many congruences of index n on M. Given > 0, let n be an integer such that 2−n <. Since Cn is ﬁnite, n is a congruence of ﬁnite index on M. Let F be a ﬁnite set of representatives of the classes of n . If x ∈ F and x n y, then (x) = (y) for each morphism from M onto a monoid of size n. Thus r(x, y) > n and so d(x, y) < 2−n < . It follows that M is is compact. covered by a ﬁnite number of open balls of radius . Therefore M An important sufﬁcient condition is given in the following corollary.

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

449

is Corollary 3.6. Let M be a residually ﬁnite monoid. If M is ﬁnitely generated, then M compact. Proof. Let n>0. There are only ﬁnitely many monoids of size n. Since M is ﬁnitely generated, there are only ﬁnitely many morphisms from M onto a monoid of size n. Now, since any congruence of index n is the kernel of such a morphism, there are only ﬁnitely many is compact. congruences on M of index n. It follows by Proposition 3.5 that M 3.4. Hall-compact monoids Proposition 3.5 justiﬁes the following terminology. We will say that a monoid M is Hallcompact if it is residually ﬁnite and if, for every positive integer n, there are only ﬁnitely many congruences of index n on M. Proposition 3.5 can now be rephrased as follows: is compact.” “A residually ﬁnite monoid M is Hall-compact if and only if M and Corollary 3.6 states that “Every residually ﬁnite and ﬁnitely generated monoid is Hall-compact.” The class of Hall-compact monoids includes most of the examples used in practice: ﬁnitely generated free monoids (resp. groups), ﬁnitely generated free commutative monoids (resp. groups), ﬁnite monoids, trace monoids, ﬁnite products of such monoids, etc. The next proposition shows that the converse to Corollary 3.6 does not hold. Proposition 3.7. There exists a residually ﬁnite, nonﬁnitely generated monoid M such that is compact. M Proof. Let P be the set of all prime numbers and let M = p∈P Z/p Z, where Z/p Z denotes the additive cyclic group of order p. It is clear that M is residually ﬁnite. Furthermore, in a ﬁnitely generated commutative group, the subgroup consisting of all elements of ﬁnite period is ﬁnite [12]. It follows that M is not ﬁnitely generated. Let n > 0 and let : M → N be a morphism from M onto a ﬁnite monoid of size n. Since M is a commutative group, N is also a commutative group. For every prime p > n, the order of the image of a generator of Z/p Z must divide p and be n, hence the image of this generator must be 0. Consequently, any such morphism is determined by the images of the generators of Z/p Z for p n, and so there are only ﬁnitely many of them. Therefore is compact by there are only ﬁnitely many congruences on M of index n and so M Proposition 3.5. 3.5. Clopen sets versus recognizable sets Recall that a clopen subset of a topological space is a subset which is both open and closed. A topological space is zero-dimensional if its clopen subsets form a basis for its topology. d) are zeroProposition 3.8. Let M be a residually ﬁnite monoid. Then (M, d) and (M, dimensional.

450

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

Proof. The open balls of the form B(x, 2−n ) = {y ∈ M | d(x, y) < 2−n }, | d(x, y) < 2−n }, 2−n ) = {y ∈ M B(x, and n is a positive integer, form a basis of the Hall topology where x belongs to M (resp. M) But these balls are clopen since of M (resp. M). {y | d(x, y) < 2−n } = {y | d(x, y) 2−(n+1) }. d) are zero-dimensional. It follows that (M, d) and (M,

is proﬁnite (see Proposition 3.8 implies that if M is a Hall-compact monoid then M [1,3,4,21] for the deﬁnition of proﬁnite monoids and several equivalent properties), but we will not use this result in this paper. We now give three results relating clopen sets and recognizable sets. The ﬁrst one is due to Hunter [9, Lemma 4], the second one summarizes results due to Numakura [13] (see also [17,2]). The third result is stated in [3] for free proﬁnite monoids. For the convenience of the reader, we present a self-contained proof of the second and the third results. Recall that the syntactic congruence of a subset P of a monoid M is deﬁned, for all u, v ∈ M, by s∼t

if and only if, for all (x, y) ∈ M 2 ,

xuy ∈ P ⇔ xvy ∈ P .

It is the coarsest congruence of M which saturates P. Lemma 3.9 (Hunter’s Lemma [9]). In a compact monoid, the syntactic congruence of a clopen set is clopen. Proposition 3.10. In a compact monoid, every clopen subset is recognizable. If M is a is clopen. residually ﬁnite monoid, then every recognizable subset of M Proof. Let M be a compact monoid, let P be a clopen subset of M and let ∼P be its syntactic congruence. By Hunter’s Lemma, ∼P is clopen. Thus for each x ∈ M, there exists an open neighborhood G of x such that G × G ⊆ ∼P . Therefore G is contained in the ∼P -class of x. This proves that the ∼P -classes form an open partition of M. By compactness, this partition is ﬁnite, and hence P is recognizable. Suppose now that M is a residually ﬁnite monoid and let P be a recognizable subset Let : M → F be the syntactic morphism of P. Since P is recognizable, F is of M. ﬁnite and by Proposition 3.4, is uniformly continuous. Now P = −1 (Q) for some subset Q of F. Since F is discrete and ﬁnite, Q is a clopen subset of F and hence P is also clopen. The last result of this subsection is a clone of a standard result on free proﬁnite monoids (see [3] for instance).

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

451

Proposition 3.11. Let M be a Hall-compact monoid, let P be a subset of M and let P be The following conditions are equivalent: its closure in M. (1) P is recognizable, (2) P = K ∩ M for some clopen subset K of M, and P = P ∩ M, (3) P is clopen in M and P = P ∩ M. (4) P is recognizable in M Proof. (1) implies (2). Let : M → F be the syntactic monoid of P and let Q = (P ). Since F is ﬁnite, is uniformly continuous by Proposition 3.4 and extends to a uniformly → F . Thus K = continuous morphism :M −1 (Q) is clopen and satisﬁes K ∩ M = P . Then the (2) implies (3). Suppose that P = K ∩ M for some clopen subset K of M. equality P = P ∩ M follows from the following sequence of inclusions P ⊆ P ∩ M = (K ∩ M) ∩ M ⊆ K ∩ M = K ∩ M = P . K ∩ M is dense in K. Thus P = Furthermore, since K is open and M is dense in M, K ∩ M = K = K. Thus P is clopen in M. The equivalence of (3) and (4) follows from Proposition 3.10, which shows that in M, the notions of clopen set and of recognizable set are equivalent. → F be the syntactic monoid of P and let Q = (4) implies (1). Let :M (P ). Let be the restriction of to M. Then we have P = P ∩ M = −1 (Q) ∩ M = −1 (Q). Thus P is recognizable. 3.6. The monoid of compact subsets of a compact monoid Let M be a compact monoid, and let K(M) be the monoid of compact subsets of M. The Hausdorff metric on K(M) is deﬁned as follows. For K, K ∈ K(M), let

(K, K ) = sup inf d(x, x ) x∈K x ∈K   max((K, K ), (K , K)) if K and K are nonempty, if K and K are empty, h(K, K ) = 0  1 otherwise. The last case occurs when one and only one of K or K is empty. By a standard result of topology, K(M), equipped with this metric, is compact. The next result states a property of clopen sets which will be crucial in the proof of our main result. and let Proposition 3.12. Let M be a Hall-compact monoid, let C be a clopen subset of M : K(M) → K(M) be the map deﬁned by (K) = K ∩ C. Then is uniformly continuous for the Hausdorff metric. Proof. Since C is open, every element x ∈ C belongs to some open ball B(x, ) contained is compact, C is also compact and can be covered by a ﬁnite number of these in C. Since M open balls, say (B(xi , i ))1 i n . Let > 0 and let = min{1, , 1 , . . . , n }. Suppose that h(K, K ) < with K = K . Then K, K = ∅, d(x, K ) < for every x ∈ K and d(x , K) < for every x ∈ K .

452

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

Suppose that x ∈ K ∩ C. Since d(x, K ) < , we have d(x, x ) < for some x ∈ K . Furthermore, x ∈ B(xi , i ) for some i ∈ {1, . . . , n}. Since d is an ultrametric, the relations d(x, xi ) < i and d(x, x ) < i imply that d(x , xi ) < i and thus x ∈ B(xi , i ). Now since B(xi , i ) is contained in C, x ∈ K ∩ C and hence d(x, K ∩ C) < < . By symmetry, d(x , K ∩ C) < for every x ∈ K ∩ C. Hence h(K ∩ C, K ∩ C ) < and is continuous. 4. Transductions ), Let M and N be Hall-compact monoids and let : M → N be a transduction. Then K(N ) equipped with the Hausdorff metric, is also a compact monoid. Deﬁne a map : M → K(N (x) = (x). by setting, for each x ∈ M, Theorem 4.1. The transduction −1 preserves the recognizable sets if and only if is uniformly continuous. is compact, it Proof. Suppose that −1 preserves the recognizable sets. Let > 0. Since N can be covered by a ﬁnite number of open balls of radius /2, say = B(xi , /2). N 1i k

is zero-dimensional by Proposition 3.8, its clopen subsets constitute a basis for Since N is a union of its topology. Thus every open ball B(xi , /2) is a union of clopen sets and N clopen sets each of which is contained in a ball of radius /2. By compactness, we may assume that this union is ﬁnite. Thus = Cj , N 1j n

where each Cj is a clopen set contained in, say, B(xij , /2). It follows now from Proposition 3.11 that Cj ∩N is a recognizable subset of N. Since −1 preserves the recognizable sets, the sets Lj = −1 (Cj ∩ N ) are also recognizable. By Proposition 3.4, the syntactic morphism of Lj is uniformly continuous and thus, there exists j such that d(u, v) < j implies u ∼Lj v. Taking = min{j | 1 j n}, we have for all (u, v) ∈ M 2 , d(u, v) < ⇒ for all j ∈ {1, . . . , n}, u ∼Lj v. We claim that, whenever d(u, v) < , we have h((u), (v)) < . By deﬁnition, Lj = {x ∈ M | (x) ∩ Cj ∩ N = ∅}. / 1 j n Lj . Since u ∼Lj v for Suppose ﬁrst that (u) = ∅. Then u ∈ every j, it follows that v ∈ / 1 j n Lj , so (v) ∩ Cj ∩ N = ∅ for 1 j n. Since N = 1 j n (Cj ∩ N ), it follows that (v) = ∅. by symmetry, we conclude that (u) = ∅ if and only if (v) = ∅. Thus we may assume that both (u) and (v) are nonempty. Let y ∈ (u). Then y ∈ Cj ∩ N for some j ∈ {1, . . . , n} and so u ∈ Lj . Since u ∼Lj v, it follows that v ∈ Lj and

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

453

hence there exists some z ∈ (v) such that z ∈ Cj ∩ N . Since Cj ⊆ B(xij , /2), we obtain d(xij , y) < /2 and d(xij , z) < /2, whence d(y, z) < /2 since d is an ultrametric. Thus d(y, (v)) < /2. Since (u) is dense in (u), it follows that d(x, (v)) /2 for every x ∈ (u) and so

((u), (v)) /2 < . By symmetry, ((v), (u)) < and hence h((u), (v)) < as required. Next we show that if is uniformly continuous, then −1 preserves the recognizable can be extended to a uniformly continuous mapping sets. First, → K(N ). ˇ : M Let L be a recognizable subset of N. By Proposition 3.11, L = C ∩ N for some clopen . Let subset C of N ) | K ∩ C = ∅}. R = {K ∈ K(N ). Let : K(N ) → K(N ) be the map deﬁned by We show that R is a clopen subset of K(N (K) = K∩C. By Proposition 3.12, is uniformly continuous and since R = −1 ({∅}c ) = ). Since B(∅, 1) = {∅}, {∅} is [−1 ({∅})]c , it sufﬁces that {∅} is a clopen subset of K(N c / B(K, 1), we have B(K, 1) ⊆ {∅}c and so {∅}c is also open. open. Let K ∈ {∅} . Since ∅ ∈ Therefore {∅} is clopen and so is R. Since ˇ is continuous, ˇ −1 (R) is a clopen subset of M −1 and so M ∩ ˇ (R) is recognizable by Proposition 3.11. Now M ∩ ˇ −1 (R) = {u ∈ M | ˇ (u) ∈ R} = {u ∈ M | (u) ∈ R} = {u ∈ M | (u) ∩ C = ∅}. Since C is open, we have (u) ∩ C = ∅ if and only if (u) ∩ C = ∅, hence M ∩ ˇ −1 (R) = {u ∈ M | (u) ∩ C = ∅} = {u ∈ M | (u) ∩ L = ∅} = −1 (L) and so −1 (L) is a recognizable subset of M. Thus −1 preserves the recognizable sets.

5. Examples of continuous transductions A large number of examples of continuous transductions can be found in the literature [6–8,10,11,15,16,18,20]. We state without proof two elementary results: continuous transductions are closed under composition and include constant transductions. Proposition 5.1. Let L ⊆ N and let L : M → N be the transduction deﬁned by L (x) = L. Then L is continuous.

454

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

Theorem 5.2. The composition of two continuous transductions is a continuous transduction. Continuous transductions are also closed under product, in the following sense: Proposition 5.3. Let 1 : M → N1 and 2 : M → N2 be continuous transductions. Then the transduction : M → N1 × N2 deﬁned by (x) = 1 (x) × 2 (x) is continuous. Proof. Let R ∈ Rec(N1 × N2 ). By Mezei’s Theorem, we have R = ni=1 Ki × Li for some Ki ∈ Rec N1 and Li ∈ Rec N2 . Hence

−1 (R) = {x ∈ M | (x) ∩ R = ∅}

n = x ∈ M | (1 (x) × 2 (x)) ∩ Ki × L i = ∅ = =

n

i=1

{x ∈ M | 1 (x) ∩ Ki = ∅ and 2 (x) ∩ Li = ∅}

i=1 n i=1

−1 −1 (K ) ∩ (L ) . i i 1 2

−1 Since 1 and 2 are continuous, each of the sets −1 1 (Ki ) and 2 (Li ) is recognizable and thus −1 (R) is recognizable. It follows that is continuous.

Further examples will be presented in a forthcoming paper. We just mention here a simple but nontrivial example. An automata-theoretic proof of this result was given in [19] and we provide here a purely algebraic proof. Proposition 5.4. The function : M × N → M deﬁned by (x, n) = x n is continuous. Proof. Let R ∈ Rec M. Then

−1 (R) = {(x, n) ∈ M × N | x n ∈ R}. Let : M → F be the syntactic morphism of R in M and, for each s ∈ F , let Ps = {n ∈ N | s n ∈ (R)}. Then we have

−1 (R) = {(x, n) ∈ M × N | x n ∈ R} = {(x, n) ∈ M × N | (x) = s for some s ∈ F such that s n ∈ (R)} = {(x, n) ∈ M × N | x ∈ −1 (s) for some s ∈ F such that n ∈ Ps } −1 = (s) × Ps . s∈F

Each set −1 (s) is recognizable by construction, and thus it sufﬁces to show that P

s ∈ Rec N for each s ∈ F . Given a ﬁnite cyclic monoid generated by a and some element b of this monoid, the set {n ∈ N | a n = b} is either empty or an arithmetic progression. Applying this fact to the ﬁnite cyclic submonoid generated by s in F, we conclude that Ps ∈ Rec N as required. Thus −1 (R) ∈ Rec(M × N) and hence is continuous.

Corollary 5.5. The transduction : M → M deﬁned by (x) = x ∗ is continuous.

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

455

Proof. Let N : M → N be deﬁned by N (x) = N. By Proposition 5.1, N is continuous. Since the identity map is trivially continuous, it follows from Proposition 5.3 that : M → M × N deﬁned by (x) = {x} × N is continuous. Let : M × N → M be deﬁned by (x, n) = x n . By Proposition 5.4, is continuous. Since = ◦ , it follows from Theorem 5.2 that is continuous. 6. Conclusion We gave some topological arguments to call continuous transductions whose inverse preserve recognizable sets. It remains to see whether this approach can be pushed forward to use purely topological arguments, like ﬁxpoint theorems, to obtain new results on transductions and recognizable sets. Acknowledgements The second author acknowledges support from FCT through CMUP and the project POCTI/MAT/37670/2001, with funds from the programs POCTI and POSI, supported by national sources and the European Community fund FEDER. References [1] J. Almeida, Residually ﬁnite congruences and quasi-regular subsets in uniform algebras, Portugal. Math. 46 (3) (1989) 313–328. [2] J. Almeida, Finite semigroups: an introduction to a uniﬁed theory of pseudovarieties, in: G.M.S. Gomes, J.-E. Pin, P. Silva (Eds.), Semigroups, Algorithms, Automata and Languages, World Scientiﬁc, Singapore, 2002, pp. 3–64. [3] J. Almeida, Proﬁnite semigroups and applications. in: Proc. SMS-NATO ASI Structural Theory of Automata, Semigroups and Universal Algebra, University of Montréal, July 2003, Preprint, in press. [4] J. Almeida, P. Weil, Relatively free proﬁnite monoids: an introduction and examples, in: J. Fountain (Ed.), NATO Advanced Study Institute Semigroups, Formal Languages and Groups, Vol. 466, Kluwer Academic Publishers, Dordrecht, 1995, pp. 73–117. [5] J. Berstel, Transductions and Context-free Languages, Teubner, Stuttgart, 1979. [6] J. Berstel, L. Boasson, O. Carton, B. Petazzoni, J.-E. Pin, Operations preserving recognizable languages, in: A. Lingas, B.J. Nilsson (Eds.), Proc. FCT’2003, Lecture Notes in Computer Science, Vol. 2751, Springer, Berlin, 2003, pp. 343–354. [7] J. Berstel, L. Boasson, O. Carton, B. Petazzoni, J.-E. Pin, Operations preserving recognizable languages, Theoret. Comput. Sci. 2005, in press. [8] J.H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London, 1971. [9] R. Hunter, Certain ﬁnitely generated compact zero-dimensional semigroups, J. Austral. Math. Soc. (Ser. A) 44 (1988) 265–270. [10] S.R. Kosaraju, Correction to “Regularity preserving functions”, SIGACT News 6 (3) (1974) 22. [11] S.R. Kosaraju, Regularity preserving functions, SIGACT News 6 (2) (1974) 16–17. [12] S. Lang, Algebra, Graduate Texts in Mathematics, Vol. 211, Springer, New York, 2002. [13] K. Numakura, Theorems on compact totally disconnected semigroups and lattices, Proc. Amer. Math. Soc. 8 (1957) 623–626. [14] M. Petkovsek, A metric-space view of inﬁnite words, Unpublished, personal communication. [15] J.-E. Pin, J. Sakarovitch, Operations and transductions that preserve rationality, in: Proc. Sixth GI Conf. Lecture Notes in Computer Science, Vol. 145, Springer, Berlin, 1983, pp. 617–628.

456

J.-É. Pin, P.V. Silva / Theoretical Computer Science 340 (2005) 443 – 456

[16] J.-E. Pin, J. Sakarovitch, Une application de la représentation matricielle des transductions, Theoret. Comput. Sci. 35 (1985) 271–293. [17] J.-E. Pin, P. Weil, Uniformities on free semigroups, Internat. J. Algebra Comput. 9 (1999) 431–453. [18] J.I. Seiferas, R. McNaughton, Regularity-preserving relations, Theoret. Comput. Sci. 2 (1976) 147–154. [19] P.V. Silva, An application of ﬁrst order logic to the study of recognizable languages, Internat. J. Algebra Comput. 14 (5/6) (2004) 785–799. [20] R.E. Stearns, J. Hartmanis, Regularity preserving modiﬁcations of regular expressions, Inform. Control 6 (1963) 55–69. [21] P. Weil, Proﬁnite methods in semigroup theory, Internat. J. Algebra Comput. 12 (2002) 137–178.