Exploring word equations - My personal notes

Consider the equation $xaba = ay$ where $a$ and $b$ are letters, and $x$ and $y$ are variables over $\{a, b\}^*$ . A possible solution to this equation is $[x \mapsto a, y \mapsto aba]$ . In fact, the equation has infinitely many solutions:

\{[x \mapsto \varepsilon, y \mapsto ba]\} \cup \{[x \mapsto aw, y \mapsto waba] : w \in \{a, b\}^*\}.

Other such equations are unsatisfiable, e.g. it is impossible to find a word $x \in \{a, b\}^*$ such that $ax = xb$ . As a further example, given a primitive word $w$ , it is known that $xw = wx$ iff $x \in w^*$ .

Formally, a word equation over alphabet $\Sigma$ and variables $V$ is a pair $(u, v)$ with $u, v \in (\Sigma \cup V)^*$ . A solution to $(u, v)$ is a morphism $h \colon (\Sigma \cup V)^* \to \Sigma^*$ such that $h(u) = h(v)$ and $h(\sigma) = \sigma$ for each $\sigma \in \Sigma$ .

A system of word equations is a finite collection of word equations, to be satisfied with a common solution. For example, the system $(xaba = ay) \land (xx = ay)$ has a unique solution: $[x \mapsto aba, y \mapsto baaba]$ .

Click for a proof.

Clearly, we cannot take $x = \varepsilon$ . Thus, due to the first equation, any solution must be of the form $x = aw$ and $y = waba$ . By the second equation, we must have $|awaw| = |awaba|$ . Thus, we must have $2|w| + 2 = |w| + 4$ and hence $|w| = 2$ . So, the second equation must be of the form $a\sigma \gamma a\sigma\gamma = a \sigma\gamma aba$ with $\sigma, \gamma \in \{a, b\}$ . From this, we conclude that the only solution to the system is $[x \mapsto aba, y \mapsto baaba]$ .

Diophantine equations

It was once believed that word equations could help proving the undecidability of Hilbert’s tenth problem. Indeed, the problem of determining whether a system of word equations is satisfiable reduces to the problem of determining whether a system of Diophantine equations has a solution¹. Recall that the latter asks whether a given multivariate polynomial has a zero over the naturals, e.g. $\exists x, y, z \in \N : xz - 1 - y = 0$ .

Word equations satisfiability reduces to Diophantine equations satisfiability.

Proof.

We only prove the case of $\Sigma = \{0, 1\}$ . Our goal is to transform a word equation into a system reasoning over natural numbers. We will achieve this by establishing an isomorphism between $\Sigma^*$ and a set of matrices. Before doing so, we provide some intuition by introducing the Stern–Brocot tree. This is an infinite complete binary tree that yields a bijection between $\{0, 1\}^*$ and $\mathbb{Q}_{> 0}$ . Its first three levels are as follows:

Let us formalize the tree. Let $\mathrm{SL}(k, \mathbb{D})$ denote the monoid of $k \times k$ matrices with entries from (a commutative semiring) $\mathbb{D}$ and having determinant 1. Let $A \colon \{0, 1\}^* \to \mathrm{SL}(2, \N)$ be the morphism given by

A(0) = \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} \text{ and } A(1) = \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix}.

For example, $A(\varepsilon)$ is the identity matrix, and

A(100) = \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} =% \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}0 & 1 \\ 2 & 1\end{pmatrix} =% \begin{pmatrix}3 & 1 \\ 2 & 1\end{pmatrix}.

Let us interpret $0$ and $1$ respectively as left (╌╌) and right (──). Given $w \in \{0, 1\}^*$ , the vertex of the tree reached from path $w$ is labelled with $A(w) \cdot (1, 1)^\mathsf{T}$ , where we see $(a, b)^\mathsf{T}$ as the fraction $a / b$ . For example, $A(01) \cdot (1, 1)^\mathsf{T} = (2, 3)^\mathsf{T}$ . It is known that each number $x \in \mathbb{Q}_{>0}$ occurs exactly once in the tree.

Let us prove that $A$ is an isomorphism. We denote the rows of a matrix $B$ by $B[0]$ (top) and $B[1]$ (bottom). Note that multiplying $A(i)$ with a matrix $B$ adds $B[i]$ to $B[\neg i]$ , and leaves $B[i]$ unchanged. Thus, the first letter of a nonempty word $w$ is $i$ iff $A(w)[i] \leq A(w)[\neg i]$ . For example, the bottom row of $A(100)$ , given above, is smaller than its top row. This means that the first letter of $100$ is indeed $1$ . From this property, we can conclude that $A$ is injective.

Click for a proof.

Let $u, v \in \{0, 1\}^*$ be such that $A(u) = A(v)$ . We show that $u = v$ by induction on $|u|$ . Note that multiplying a matrix on the left by $A(0)$ or $A(1)$ increases its max-row-sum norm. So, the only word whose image is the identity matrix is the empty word. Thus, suppose $|u| > 0$ , and hence $|v| > 0$ . Let $u = i x$ and $v = j y$ where $i, j \in \{0, 1\}$ and $x, y \in \{0, 1\}^*$ . Let $B = A(u)$ . We have $B = A(i) A(x)$ and so $B[i] \leq B[\neg i]$ . By $B = A(v) = A(j) A(y)$ , we must have $j = i$ . Note that $\mathrm{SL}(2, \Z)$ is a group. Thus,

A(x) = A(i)^{-1} A(u) = A(i)^{-1} B = A(j)^{-1} B = A(j)^{-1} A(v) = A(y).

This means that $A(x) = A(y)$ , and so that $x = y$ by induction.

It can further be shown that $\mathrm{SL}(2, \N)$ is generated by matrices $A(0)$ and $A(1)$ .

Click for a proof.

Let $B \in \mathrm{SL}(2, \N)$ . There exist $a, b, c, d \in \N$ such that

B = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \text{ and } ad - cb = 1.

We show that $B \in \langle A(0), A(1) \rangle$ by induction on $\lVert{B}\rVert_\infty = \max(a + b, c + d)$ . The only matrix with $\lVert{B}\rVert_\infty = 1$ is the identity, which trivially satisfies the claim. Thus, suppose $\lVert{B}\rVert_\infty > 1$ .

Case $a = c$ . We have $a(d - b) = 1$ and hence $a = d - b = 1$ . Thus, we are done by induction since

B = \begin{pmatrix} 1 & b \\ 1 & b + 1 \end{pmatrix} =% \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix} =% A(0) \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}.

Case $a > c$ . We cannot have $b = c = 0$ , as otherwise $B$ would be the identity. Thus, assume $b + c > 0$ . We must have $b \geq d$ , as otherwise $ad - cb \geq (c + 1)(b + 1) - cb = b + c + 1 \geq 2$ . Thus, we are done by induction since

B = \begin{pmatrix} a & b \\ c & d \end{pmatrix} =% \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} a - c & b - d \\ c & d \end{pmatrix} =% A(1) \begin{pmatrix} a - c & b - d \\ c & d \end{pmatrix}.

Case $c > a$ . Symmetric to the previous case using $A(0)$ instead.

This means that $A$ is surjective, and hence an isomorphism.

Let us now describe the reduction with an example. Consider the word equation $xz = 1y$ . Since $A$ is an isomoprhism, the word equation has a solution iff $A(xz) = A(1y)$ has a solution. The latter is equivalent to

\exists x_a, \ldots, z_d \in \N : \bigwedge_{\mathclap{v \in \{x, y, z\}}} (v_a v_d - v_b v_c = 1) \land \begin{pmatrix}x_a & x_b \\ x_c & x_d\end{pmatrix} \begin{pmatrix}z_a & z_b \\ z_c & z_d\end{pmatrix} =% \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}y_a & y_b \\ y_c & y_d\end{pmatrix}.

This is a system of Diophantine equations since the last equality can be written as

\begin{aligned} (x_a z_a + x_b z_c = y_a + y_c) \land (x_a z_b + x_b z_d = y_b + y_d) \land {} \phantom{.} \\ (x_c z_a + x_d z_c = y_c)\phantom{{} + y_c} \land (x_c z_b + x_d z_d = y_d). \phantom{\land {} + y_c} \end{aligned}

It turns out that the above is in vain! Indeed, Diophantine equations satisfiability was shown undecidable by Matiyasevich in 1970 (building on the work of Davis, Putnam and Robinson), but word equations satisfiability was shown decidable by Makanin in 1977.

Determining whether a given system of word equations has a solution is decidable (even if each variable is constrained by a given regular language).

Decidability and complexity

The termination of Makanin’s algorithm is notoriously difficult to prove. Moreover, it has high complexity. Since then, Plandowski² proved that the problem belongs to PSPACE, and Jeż³ improved the upper bound to NSPACE( $n$ ). Moreover, it is easy to show that the problem is NP-hard, even for singleton alphabets.

\Sigma = \{a\}

Proof.

NP-hardness. we provide a reduction from one-in-three 3-SAT. Recall that we are given a Boolean expression in 3-CNF and must determine whether it is possible to satisfy exactly one literal per clause. We describe the reduction through an example. Consider

\varphi = (x \lor y \lor \neg z) \land (\neg x \lor y \lor \neg z).

We create two variables $v$ and $\overline{v}$ for each $v \in \{x, y, z\}$ . We translate $\varphi$ into this system of word equations:

(x\overline{x} = a) \land (y\overline{y} = a) \land (z\overline{z} = a) \land (xy\overline{z} = a) \land (\overline{x}y\overline{z} = a).

Here, $a$ and $\varepsilon$ respectively play the role of $\mathit{true}$ and $\mathit{false}$ .

NP membership. Since there is a single letter, only size matters. For example, the system of word equations $(xaxyy = azy) \land (xz = zy)$ has a solution iff

\exists x', y', z' \in \N : (2x' + y' = z') \land (x' = y').

The latter can be solved in NP via integer linear programming, or more generally via existential Presburger arithmetic.

\Sigma

Further open questions include:

If the number of variables is fixed to $k \geq 3$ , is satisfiability NP-hard or solvable in polynomial time?
Is satisfiability decidable if we allow constraints of the form $\sum_{v \in V} a_v |v| \leq b$ where $a_v, b \in \Z$ ?

Quadratic equations

Rather than delving further into the decidability and complexity of word equations, let us explore a better-behaved class of word equations. We say that a system of word equations is quadratic if each variable appears at most twice in the system. There is a relatively simple algorithm for solving quadratic systems.

The algorithm

Let us explain the procedure with the equation $E \colon xay = yxb$ . We focus on the “head” of the equation, namely “ $x = y$ ”. There are several cases to consider:

Case $x = \varepsilon$ . We must now solve $ay = yb$ .
Case $y = \varepsilon$ . We must now solve $xa = xb$ .
Case $|x| \geq |y|$ . In a solution of $E$ , we must have $x = yu$ for some $u \in \Sigma^*$ , and so $yuay = yyub$ . The latter simplifies to $uay = yub$ . For convenience, we can rename $u$ with $x$ . Thus, we must now solve $xay = yxb$ , which is exactly the original equation!
Case $|y| \geq |x|$ . In a solution of $E$ , we must have $y = xu$ for some $u \in \Sigma^*$ , and hence $xaxu = xuxb$ . This simplifies to $axu = uxb$ . After variable renaming, we must now solve $axy = yxb$ .

In all cases, we end up with a quadratic equation. The Nielsen transformations’ algorithm, also known as Levi’s method, constructs a directed graph whose nodes are quadratic equations and whose edges represent such transformations reasoning about the head of both sides⁴. It can be shown that the resulting graph is necessarily finite. Moreover, $E$ is satisfiable iff $E \to^* (\varepsilon = \varepsilon)$ in the graph. For example, our previous equation is unsatisfiable as the only way to get rid of the variables is by reaching the contradiction $a = b$ :

Another example

Let us reconsider our very first equation $xaba = ay$ . It is satisfiable since its graph is as follows:

The algorithm works

Let us provide a rough sketch explaining why the procedure works. The length of an equation $u = v$ is $|u| + |v|$ . The length of a solution $h$ is defined as $|h| = \sum_{v \in V} h(v)$ . A solution $h$ to an equation is length-minimal if $|h|$ is minimal among all solutions.

E

For quadratic equations, the procedure terminates and is correct.

Proof.

Let $E$ be a quadratic word equation. By the first item of Lemma 5, the set of nodes $\{E' : E \to^* E'\}$ must be finite, which yields termination. It remains to prove correctness.

Case $E \to^* (\varepsilon = \varepsilon)$ . By definition, the graph has a path $E = E_0 \to E_1 \to \cdots \to E_n = (\varepsilon = \varepsilon)$ . Obviously, equation $E_n$ has a solution. By the contrapositive of the second item of Lemma 5, and by induction, this means that each $E_i$ has a solution. Consequently, $E$ has a solution.

Case $E \not\to^* (\varepsilon = \varepsilon)$ . We must show that $E$ has no solution. For the sake of contradiction, suppose that $E$ has a length-minimal solution $h$ . In node $E$ , we compare the head of both sides and take a valid edge $E \to^\varphi E'$ , e.g. if $\varphi = (|x| \geq |y|)$ and $|h(x)| \geq |h(y)|$ , then we can move to $E'$ . Note that at least one such edge exists as the construction of the graph considers all possible cases on the heads. By the third item of Lemma 5, equation $E'$ has a length-minimal solution $h'$ with $|h| > |h'|$ . If $|h'| > 0$ , then we have not reached the node $(\varepsilon = \varepsilon)$ and can repeat this process. Since the length cannot decrease forever, we must eventually reach the node $(\varepsilon = \varepsilon)$ , which is a contradiction.

We provide a proof based on an exercise of Artur Jeż; see Task 8 of these lecture notes. ↩︎
Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. Journal of the ACM, vol. 51, no. 3, 2004. ↩︎
Artur Jeż. Word Equations in Nondeterministic Linear Space. Proc. 44th International Colloquium on Automata, Languages, and Programming (ICALP), 2017. ↩︎
For the exact rules, see e.g. Section 3.1 of Anthony W. Lin, Rupak Majumdar. Quadratic Word Equations with Length Constraints, Counter Systems, and Presburger Arithmetic with Divisibility. Logical Methods in Computer Science (LMCS), vol. 17, no. 4, 2021. ↩︎