We say that a language is expressible (by word equations) if there exists a system of word equations and a variable such that is the projection of the solution set onto . We naturally extend this notion to relations. For example, the equation expresses the regular language since it equals
It is readily seen that word equations can express non-regular and non-context-free languages, e.g. with (the quadratic equation) . Let us explore the expressiveness of word equations.
Powers of a word
Let us show that is expressible by word equations for any word . The key ingredient is the following characterization of commuting words.
) Let be such that and . We have .
) If or is empty, then the claim is trivial. If , then we must have , and hence we are done. Assume (the other case is symmetric). Since , by our assumption, there exists a non-empty word such that . Thus, we have
By cancelling on the left, we obtain . Note that . Thus, by induction, and for some word and some . We are done since and .
A non-empty word is said to be primitive if it cannot be written as for some and .
Boolean operations
Recall that we introduced “expressibility” in terms of systems of word equations. This is mostly for convenience. Indeed, any system of equations over can be rewritten as a single equation, using this trick:
Click for a proof.
The left-to-right implication is trivial. Assume that . We have and hence . For the sake of contradiction, suppose that (the other case is symmetric). We must have . Without loss of generality, we may assume that . Since , there exists such that . By , we have and . In particular, this means that , and hence , which is impossible.
In fact, disjunction and disequality can also be achieved with a single equation1:
As corollary, this means that any Boolean combination of word equations can be written as a single word equation, at the cost of introducing extra variables. Note that for conjunction, no extra variable is needed.
Impossibility to express extended constraints
In the literature, word equations have been extended with constraints such as membership in regular languages; membership in context-free languages; linear constraints on the length of words or on letter counts. A natural question is whether these constraints could be expresssed directly. As we will see, the answer is no.
Regular languages
A code is a non-empty subset such that every word of has a unique decomposition as the concatenation of words from . A code is bifix is for every it is the case that is neither a prefix nor a suffix of . Among other things, the following was shown by Day et al.2:
Note that the right-to-left direction is simple. Indeed, the case of is trivial, and we proved the case of in Proposition 3.
The left-to-right direction allows to show that regular languages are not all expressible. Indeed, as is a bifix code, the regular language is unexpressible for . It is however expressible over with . In fact, any regular unary language is expressible, provided one is allowed to reason with an extra letter.
Click for a proof.
Given two non-empty expressible unary languages and , we can express as follows:
Indeed, since , the last constraint yields or .
Let be regular. It is recognized by a deterministic finite automaton, whose shape is necessarily a straight line of transitions, followed by a cycle of transitions. In other words, is ultimately periodic. So, there exist and with
We can trivially express with . Moreover, we can express with . Thus, we are done by the closure under union.
Context-free languages
Using pumping-like arguments, it has been shown that the following languages cannot be expressed by word equations3:
-
,
-
,
-
.
Language provides another example of a regular language that cannot be expressed. Moreover, provides an example of a context-free language unexpressible by word equations. To the best of my knowledge, it is still unknown whether is context-free or not. Yet, is not unambiguous context-free4.
Length and count constraints
The relation is unexpressible when , as otherwise could be expressed as follows, with two extra variables:
Furthermore, the relation is unexpressible when , as otherwise could be expressed as follows, with three extra variables:
EDT0L languages
L-systems are parallel rewriting systems that were developed by A. Lindenmayer to formalize biological processes such as the growth of plants. As we shall see, L-systems relate to word equations.
Extended deterministic table zero-context L-systems
An EDT0L-system is a tuple where
- is a finite alphabet,
- is an alphabet whose letters are said to be terminal,
- is a finite set of homomorphisms , and
- is a word called the axiom.
In a nutshell, starts from the axiom and rewrites all of its letters simultaneously using some “table” , and repeats this process using possibly different tables, until reaching a word from .
Formally, given and , we write if . We further write if for some . For example, if and , then
The language described by is . Languages described by EDT0L-systems are closed under union, concatenation, Kleene star, homomorphisms and intersection with regular languages. EDT0L languages form a strict subclass of indexed languages, which in turn are strictly included in context-sensitive languages.
Note that the expressiveness of EDT0L-systems remains the same if we allow a regular language to control the derivations. In that setting, the language of is defined by
A simple example
Consider the language which is expressed by the word equation . We define an EDT0L-system that describes . Let be the homomorphisms defined by
For example, we have
The language is described by the EDT0L-system where and .
From word equations to EDT0L
The previous example generalizes naturally to any word equation of the form . In fact, a translation into an EDT0L-system can be done for any system of word equations.
Given a system of word equations over variables and a solution , we define where is a fresh letter. The encoding of the relation expressed by the system is . The following is due to Ferté et al.5 for quadratic equations, and to Ciobanu et al.6 for the general case:
-
See Theorem 6 of Juhani Karhumäki, Filippo Mignosi, Wojciech Plandowski. The expressibility of languages and relations by word equations. Journal of the ACM, vol. 47, no. 3, 2000. ↩︎ ↩︎
-
See Corollary 21 of Joel Day, Vijay Ganesh, Nathan Grewal, Matthew Konefal, Florin Manea. A Closer Look at the Expressive Power of Logics Based on Word Equations. Theory of Computing Systems, vol. 68, 2024. ↩︎
-
H. Petersen. On the language of primitive words. Theoretical Computer Science, vol. 161, no. 1–2, 1996. ↩︎
-
Julien Ferté, Nathalie Marin, Géraud Sénizergues. Word-Mappings of Level 2. Theory of Computing Systems, vol. 54, 2014. ↩︎
-
Laura Ciobanu, Volker Diekert, Murray Elder. Solution Sets for Equations over Free Groups are EDT0L Languages. Proc. 42nd International Colloquium on Automata, Languages, and Programming (ICALP), 2015. ↩︎