Introduction to Words Avoiding Patterns

We can compress the word banana as xyyz, where x =b, y =an, z = a. We say that banana encounters yy. Thus a 'coded' version of yy shows up in banana. On the other hand, no such coded version of yyy appears in banana. We say that banana avoids yyy. One might guess, however, that every long enough word over a finite alphabet would encounter yyy. This guess would be mistaken.

To produce an arbitrarily long string of a's, b's and c's which avoids yy, start with a, and iterate the substitution a-> abc, b-> ac, c-> b. This generates successive words a, abc, abcacb, abcacbabcbac, etc. None of these words can be written as xyyz where y is a non-empty word! Playing a further riff on this theme, take these words over alphabet {a, b, c}, and apply the substitution a-> 0, b-> 01, c-> 011, producing 0, 001011, 001011001101, 001011001101001011010011, etc. None of these words can be written as xyyyz where y is a non-empty word. As each word is a prefix of the next, it makes sense to define an infinite word over {0,1} avoiding yyy, which is the limit of this sequence.

The foregoing construction of an infinite word avoiding yyy was used by Novikov and Adjan in their solution of the Burnside problem for groups. Marston Morse used a similar construction in solving a problem in symbolic dynamics. Related constructions have found applications in logic, formal language theory and other areas. Over the last 40 years, more general notions of words avoiding patterns have evolved, but many basic problems remain unsolved. In this note, several of these problems are brought together.

Some terminology

There are infinitely many words over a fixed finite alphabet which avoid yy. We say that yy is avoidable. In fact, we found infinitely many words avoiding yy over the alphabet S = {a, b, c}, so we say that yy is S-avoidable. Obviously, only the size of S matters here, so we might as well just say that yy is 3-avoidable.

We created words by starting with a and iterating the substitution system a-> abc, b-> ac, c-> b. Such a substitution system is called a D0L system (The term is short for deterministic, zero-sided Lindenmayer system. Lindenmayer systems are typically used to model cell growth.) The standard way of creating an infinite word (or omega-word avoiding a given pattern is in fact to find an appropriate D0L system. To avoid yyy, we took the homomorphic image of our infinite D0L word under the map a -> 0, b-> 01, c-> 011. Pattern yyy is 2-avoidable; since we can avoid it on 2 letters with the homomorphic image of a D0L system, we say that yy is 2-HD0L-avoidable.

While some patterns are avoidable, others are unavoidable; consider xyx. If w is a long enough word over a fixed finite alphabet, then w has a repeated letter, say u. Word w will thus have a subword uvu, some word v, and uvu is a coded version of xyx. A pretty result of Zimin is the following: Let p be a pattern containing n different letters. Then p is avoidable (on some finite alphabet) if Z_n avoids p, where Z_1 = 1, Z_2 = 121, Z_3 = 1213121, Z_4 = 121312141213121, etc. A second characterization of avoidable words is given by Baker et al. in graph theoretic terms. No one knows, however, which words are k-avoidable, or k-HDOL avoidable, where k is arbitrary but fixed. Baker et al. show that `many' avoidable words are 4-avoidable.

To guess whether a pattern p is likely to be k-avoidable, one might run a backtracking program to search for long words over a k letter alphabet avoiding p. For some p and k this process is very slow; a lot of backtracking is necessary. It would be surprising if for some k-avoidable pattern p, backtracking was never necessary. One expects that given a word w avoiding p, it is possible to have a `bad' extension of w, which cannot be further extended. In the case of yy, yyy, yyyy, etc. this has been proved. Bean et al. proved Let r be a fixed natural number and let S be an alphabet. If w is a word over S avoiding yr, then there is a maximal extension of w in S* avoiding yr. To give an example, a word over {a, b, c} avoiding yy is bac. The extension abacaba of bac cannot be extended in either direction without encountering yy. How hard is it to avoid a pattern p? Not only is yy 3-avoidable, there are uncountably many omega-words on {0,1,2} avoiding yy. More is true. We can think of any word over {0,1,2} as giving a real number in base 3, by adding a decimal at the left hand side. Thus 1210 corresponds 0.1210 (base 3) = 16/27. This will also work for omega-words. We can thus think of the set of omega-words over {0,1,2} which avoid yy as a set of real numbers. This set has no isolated points, and is a Cantor set. For any natural number k, and any finite alphabet S, the set of omega-words over S avoiding yk is a Cantor set

The notion of encounters can be broadened. For example, we may regard a word xX as coding yy as long as x and X contain letters with the same multiplicity. Thus 1231 and 2131 or 45 and 54 would be regarded as the same, and 123145213154 would encounter yxyx. If a word doesn't encounter pattern p even in this `abelian' sense, we say that it strongly avoids p. In 1961, Erdös asked whether yy is strongly 4-avoidable -- is there an infinite sequence on 4 symbols in which no two adjacent blocks were permutations of each other? This was not answered in the affirmative until 1992 by Keränen.

Another generalization of avoiding squares yy or cubes yyy is as follows: We try to avoid yr, non-integral r. Thus banana encounters y5/2, since anana = an5/2; we have an twice, and then half of an. For 1 < k < 2, we say that w = xk if we can write w = xyx, and |w| = k|x|. (Here |u| is the length of u. Thus | banana}|=6, for example.) We say that xk is n-avoidable if there are arbitrarily long words over an n letter alphabet, none of which contains a subword of the form xk. We can now define repetitive threshold function RT(n):

RT(n) = infimum {k: xk is n-avoidable}

A conjecture of Dejean from 1970 is that RT(n) = n/(n-1), n > 4. Dejean showed that RT(4) = 7/4, and conjectured that RT(5) = 7/5, which was proved by Pansiot (1984). recently Ollagnier verified the conjecture for n <12.

Results on repetitive thresholds have implications for k-avoidability of general patterns. We can make an analogous definition of fractional powers in an `abelian' sense.

For 1 < k < 2, we say that w = xk in the abelian sense if we can write w = xyX, where x and X contain letters with the same multiplicities, and |w| = k|xy|. We say that xk is n-avoidable in the abelian sense if there are arbitrarily long words over an n letter alphabet, none of which contains a subword of the form xk in the abelian sense. We define the abelian repetitive threshold function ART(n):

ART(n) = infimum {k: xk is k-avoidable in the abelian sense}