Consider the equation xaba=ayxaba = ay where aa and bb are letters, and xx and yy are variables over {a,b}\{a, b\}^*. A possible solution to this equation is [xa,yaba][x \mapsto a, y \mapsto aba]. In fact, the equation has infinitely many solutions:

{[xε,yba]}{[xaw,ywaba]:w{a,b}}. \{[x \mapsto \varepsilon, y \mapsto ba]\} \cup \{[x \mapsto aw, y \mapsto waba] : w \in \{a, b\}^*\}.

Other such equations are unsatisfiable, e.g. it is impossible to find a word x{a,b}x \in \{a, b\}^* such that ax=xbax = xb. As a further example, given a primitive word ww, it is known that xw=wxxw = wx iff xwx \in w^*.

Formally, a word equation over alphabet Σ\Sigma and variables VV is a pair (u,v)(u, v) with u,v(ΣV)u, v \in (\Sigma \cup V)^*. A solution to (u,v)(u, v) is a morphism h ⁣:(ΣV)Σh \colon (\Sigma \cup V)^* \to \Sigma^* such that h(u)=h(v)h(u) = h(v) and h(σ)=σh(\sigma) = \sigma for each σΣ\sigma \in \Sigma.

A system of word equations is a finite collection of word equations, to be satisfied with a common solution. For example, the system (xaba=ay)(xx=ay)(xaba = ay) \land (xx = ay) has a unique solution: [xaba,ybaaba][x \mapsto aba, y \mapsto baaba].

Click for a proof.

Clearly, we cannot take x=εx = \varepsilon. Thus, due to the first equation, any solution must be of the form x=awx = aw and y=wabay = waba. By the second equation, we must have awaw=awaba|awaw| = |awaba|. Thus, we must have 2w+2=w+42|w| + 2 = |w| + 4 and hence w=2|w| = 2. So, the second equation must be of the form aσγaσγ=aσγabaa\sigma \gamma a\sigma\gamma = a \sigma\gamma aba with σ,γ{a,b}\sigma, \gamma \in \{a, b\}. From this, we conclude that the only solution to the system is [xaba,ybaaba][x \mapsto aba, y \mapsto baaba].

Diophantine equations

It was once believed that word equations could help proving the undecidability of Hilbert’s tenth problem. Indeed, the problem of determining whether a system of word equations is satisfiable reduces to the problem of determining whether a system of Diophantine equations has a solution1. Recall that the latter asks whether a given multivariate polynomial has a zero over the naturals, e.g. x,y,zN:xz1y=0\exists x, y, z \in \N : xz - 1 - y = 0.

Word equations satisfiability reduces to Diophantine equations satisfiability.
Proof.

We only prove the case of Σ={0,1}\Sigma = \{0, 1\}. Our goal is to transform a word equation into a system reasoning over natural numbers. We will achieve this by establishing an isomorphism between Σ\Sigma^* and a set of matrices. Before doing so, we provide some intuition by introducing the Stern–Brocot tree. This is an infinite complete binary tree that yields a bijection between {0,1}\{0, 1\}^* and Q>0\mathbb{Q}_{> 0}. Its first three levels are as follows:

Let us formalize the tree. Let SL(k,D)\mathrm{SL}(k, \mathbb{D}) denote the monoid of k×kk \times k matrices with entries from (a commutative semiring) D\mathbb{D} and having determinant 1. Let A ⁣:{0,1}SL(2,N)A \colon \{0, 1\}^* \to \mathrm{SL}(2, \N) be the morphism given by

A(0)=(1011) and A(1)=(1101). A(0) = \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} \text{ and } A(1) = \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix}.

For example, A(ε)A(\varepsilon) is the identity matrix, and

A(100)=(1101)(1011)(1011)=(1101)(0121)=(3121). A(100) = \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} \begin{pmatrix}1 & 0 \\ 1 & 1\end{pmatrix} =% \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}0 & 1 \\ 2 & 1\end{pmatrix} =% \begin{pmatrix}3 & 1 \\ 2 & 1\end{pmatrix}.

Let us interpret 00 and 11 respectively as left (╌╌) and right (──). Given w{0,1}w \in \{0, 1\}^*, the vertex of the tree reached from path ww is labelled with A(w)(1,1)TA(w) \cdot (1, 1)^\mathsf{T}, where we see (a,b)T(a, b)^\mathsf{T} as the fraction a/ba / b. For example, A(01)(1,1)T=(2,3)TA(01) \cdot (1, 1)^\mathsf{T} = (2, 3)^\mathsf{T}. It is known that each number xQ>0x \in \mathbb{Q}_{>0} occurs exactly once in the tree.

Let us prove that AA is an isomorphism. We denote the rows of a matrix BB by B[0]B[0] (top) and B[1]B[1] (bottom). Note that multiplying A(i)A(i) with a matrix BB adds B[i]B[i] to B[¬i]B[\neg i], and leaves B[i]B[i] unchanged. Thus, the first letter of a nonempty word ww is ii iff A(w)[i]A(w)[¬i]A(w)[i] \leq A(w)[\neg i]. For example, the bottom row of A(100)A(100), given above, is smaller than its top row. This means that the first letter of 100100 is indeed 11. From this property, we can conclude that AA is injective.

Click for a proof.

Let u,v{0,1}u, v \in \{0, 1\}^* be such that A(u)=A(v)A(u) = A(v). We show that u=vu = v by induction on u|u|. Note that multiplying a matrix on the left by A(0)A(0) or A(1)A(1) increases its max-row-sum norm. So, the only word whose image is the identity matrix is the empty word. Thus, suppose u>0|u| > 0, and hence v>0|v| > 0. Let u=ixu = i x and v=jyv = j y where i,j{0,1}i, j \in \{0, 1\} and x,y{0,1}x, y \in \{0, 1\}^*. Let B=A(u)B = A(u). We have B=A(i)A(x)B = A(i) A(x) and so B[i]B[¬i]B[i] \leq B[\neg i]. By B=A(v)=A(j)A(y)B = A(v) = A(j) A(y), we must have j=ij = i. Note that SL(2,Z)\mathrm{SL}(2, \Z) is a group. Thus,

A(x)=A(i)1A(u)=A(i)1B=A(j)1B=A(j)1A(v)=A(y). A(x) = A(i)^{-1} A(u) = A(i)^{-1} B = A(j)^{-1} B = A(j)^{-1} A(v) = A(y).

This means that A(x)=A(y)A(x) = A(y), and so that x=yx = y by induction.

It can further be shown that SL(2,N)\mathrm{SL}(2, \N) is generated by matrices A(0)A(0) and A(1)A(1).

Click for a proof.

Let BSL(2,N)B \in \mathrm{SL}(2, \N). There exist a,b,c,dNa, b, c, d \in \N such that

B=(abcd) and adcb=1. B = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \text{ and } ad - cb = 1.

We show that BA(0),A(1)B \in \langle A(0), A(1) \rangle by induction on B=max(a+b,c+d)\lVert{B}\rVert_\infty = \max(a + b, c + d). The only matrix with B=1\lVert{B}\rVert_\infty = 1 is the identity, which trivially satisfies the claim. Thus, suppose B>1\lVert{B}\rVert_\infty > 1.

Case a=ca = c. We have a(db)=1a(d - b) = 1 and hence a=db=1a = d - b = 1. Thus, we are done by induction since

B=(1b1b+1)=(1011)(1b01)=A(0)(1b01). B = \begin{pmatrix} 1 & b \\ 1 & b + 1 \end{pmatrix} =% \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix} =% A(0) \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}.

Case a>ca > c. We cannot have b=c=0b = c = 0, as otherwise BB would be the identity. Thus, assume b+c>0b + c > 0. We must have bdb \geq d, as otherwise adcb(c+1)(b+1)cb=b+c+12ad - cb \geq (c + 1)(b + 1) - cb = b + c + 1 \geq 2. Thus, we are done by induction since

B=(abcd)=(1101)(acbdcd)=A(1)(acbdcd). B = \begin{pmatrix} a & b \\ c & d \end{pmatrix} =% \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} a - c & b - d \\ c & d \end{pmatrix} =% A(1) \begin{pmatrix} a - c & b - d \\ c & d \end{pmatrix}.

Case c>ac > a. Symmetric to the previous case using A(0)A(0) instead.

This means that AA is surjective, and hence an isomorphism.

Let us now describe the reduction with an example. Consider the word equation xz=1yxz = 1y. Since AA is an isomoprhism, the word equation has a solution iff A(xz)=A(1y)A(xz) = A(1y) has a solution. The latter is equivalent to

xa,,zdN:v{x,y,z}(vavdvbvc=1)(xaxbxcxd)(zazbzczd)=(1101)(yaybycyd). \exists x_a, \ldots, z_d \in \N : \bigwedge_{\mathclap{v \in \{x, y, z\}}} (v_a v_d - v_b v_c = 1) \land \begin{pmatrix}x_a & x_b \\ x_c & x_d\end{pmatrix} \begin{pmatrix}z_a & z_b \\ z_c & z_d\end{pmatrix} =% \begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix} \begin{pmatrix}y_a & y_b \\ y_c & y_d\end{pmatrix}.

This is a system of Diophantine equations since the last equality can be written as

(xaza+xbzc=ya+yc)(xazb+xbzd=yb+yd).(xcza+xdzc=yc)+yc(xczb+xdzd=yd).+yc \begin{aligned} (x_a z_a + x_b z_c = y_a + y_c) \land (x_a z_b + x_b z_d = y_b + y_d) \land {} \phantom{.} \\ (x_c z_a + x_d z_c = y_c)\phantom{{} + y_c} \land (x_c z_b + x_d z_d = y_d). \phantom{\land {} + y_c} \end{aligned}

It turns out that the above is in vain! Indeed, Diophantine equations satisfiability was shown undecidable by Matiyasevich in 1970 (building on the work of Davis, Putnam and Robinson), but word equations satisfiability was shown decidable by Makanin in 1977.

Determining whether a given system of word equations has a solution is decidable (even if each variable is constrained by a given regular language).

Decidability and complexity

The termination of Makanin’s algorithm is notoriously difficult to prove. Moreover, it has high complexity. Since then, Plandowski2 proved that the problem belongs to PSPACE, and Jeż3 improved the upper bound to NSPACE(nn). Moreover, it is easy to show that the problem is NP-hard, even for singleton alphabets.

Word equations satisfiability is NP-complete for Σ={a}\Sigma = \{a\}.
Proof.

NP-hardness. we provide a reduction from one-in-three 3-SAT. Recall that we are given a Boolean expression in 3-CNF and must determine whether it is possible to satisfy exactly one literal per clause. We describe the reduction through an example. Consider

φ=(xy¬z)(¬xy¬z). \varphi = (x \lor y \lor \neg z) \land (\neg x \lor y \lor \neg z).

We create two variables vv and v\overline{v} for each v{x,y,z}v \in \{x, y, z\}. We translate φ\varphi into this system of word equations:

(xx=a)(yy=a)(zz=a)(xyz=a)(xyz=a). (x\overline{x} = a) \land (y\overline{y} = a) \land (z\overline{z} = a) \land (xy\overline{z} = a) \land (\overline{x}y\overline{z} = a).

Here, aa and ε\varepsilon respectively play the role of true\mathit{true} and false\mathit{false}.

NP membership. Since there is a single letter, only size matters. For example, the system of word equations (xaxyy=azy)(xz=zy)(xaxyy = azy) \land (xz = zy) has a solution iff

x,y,zN:(2x+y=z)(x=y). \exists x', y', z' \in \N : (2x' + y' = z') \land (x' = y').

The latter can be solved in NP via integer linear programming, or more generally via existential Presburger arithmetic.

Word equations satisfiability is NP-complete for any finite alphabet Σ\Sigma.

Further open questions include:

  • If the number of variables is fixed to k3k \geq 3, is satisfiability NP-hard or solvable in polynomial time?

  • Is satisfiability decidable if we allow constraints of the form vVavvb\sum_{v \in V} a_v |v| \leq b where av,bZa_v, b \in \Z?

Quadratic equations

Rather than delving further into the decidability and complexity of word equations, let us explore a better-behaved class of word equations. We say that a system of word equations is quadratic if each variable appears at most twice in the system. There is a relatively simple algorithm for solving quadratic systems.

The algorithm

Let us explain the procedure with the equation E ⁣:xay=yxbE \colon xay = yxb. We focus on the “head” of the equation, namely “x=yx = y”. There are several cases to consider:

  • Case x=εx = \varepsilon. We must now solve ay=ybay = yb.

  • Case y=εy = \varepsilon. We must now solve xa=xbxa = xb.

  • Case xy|x| \geq |y|. In a solution of EE, we must have x=yux = yu for some uΣu \in \Sigma^*, and so yuay=yyubyuay = yyub. The latter simplifies to uay=yubuay = yub. For convenience, we can rename uu with xx. Thus, we must now solve xay=yxbxay = yxb, which is exactly the original equation!

  • Case yx|y| \geq |x|. In a solution of EE, we must have y=xuy = xu for some uΣu \in \Sigma^*, and hence xaxu=xuxbxaxu = xuxb. This simplifies to axu=uxbaxu = uxb. After variable renaming, we must now solve axy=yxbaxy = yxb.

In all cases, we end up with a quadratic equation. The Nielsen transformations’ algorithm, also known as Levi’s method, constructs a directed graph whose nodes are quadratic equations and whose edges represent such transformations reasoning about the head of both sides4. It can be shown that the resulting graph is necessarily finite. Moreover, EE is satisfiable iff E(ε=ε)E \to^* (\varepsilon = \varepsilon) in the graph. For example, our previous equation is unsatisfiable as the only way to get rid of the variables is by reaching the contradiction a=ba = b:

Another example

Let us reconsider our very first equation xaba=ayxaba = ay. It is satisfiable since its graph is as follows:

The algorithm works

Let us provide a rough sketch explaining why the procedure works. The length of an equation u=vu = v is u+v|u| + |v|. The length of a solution hh is defined as h=vVh(v)|h| = \sum_{v \in V} h(v). A solution hh to an equation is length-minimal if h|h| is minimal among all solutions.

Let EE be an equation and let EφEE \to^{\varphi} E'. The following properties hold:

  • If EE is quadratic, then EE' is quadratic and EE|E| \geq |E'|;
  • If EE is unsatisfiable, then EE' is unsatisfiable;
  • If EE has solution hh and φ(h)\varphi(h) holds, then EE' has a solution hh' with h>h|h| > |h'|.
For quadratic equations, the procedure terminates and is correct.
Proof.

Let EE be a quadratic word equation. By the first item of Lemma 5, the set of nodes {E:EE}\{E' : E \to^* E'\} must be finite, which yields termination. It remains to prove correctness.

Case E(ε=ε)E \to^* (\varepsilon = \varepsilon). By definition, the graph has a path E=E0E1En=(ε=ε)E = E_0 \to E_1 \to \cdots \to E_n = (\varepsilon = \varepsilon). Obviously, equation EnE_n has a solution. By the contrapositive of the second item of Lemma 5, and by induction, this means that each EiE_i has a solution. Consequently, EE has a solution.

Case E̸(ε=ε)E \not\to^* (\varepsilon = \varepsilon). We must show that EE has no solution. For the sake of contradiction, suppose that EE has a length-minimal solution hh. In node EE, we compare the head of both sides and take a valid edge EφEE \to^\varphi E', e.g. if φ=(xy)\varphi = (|x| \geq |y|) and h(x)h(y)|h(x)| \geq |h(y)|, then we can move to EE'. Note that at least one such edge exists as the construction of the graph considers all possible cases on the heads. By the third item of Lemma 5, equation EE' has a length-minimal solution hh' with h>h|h| > |h'|. If h>0|h'| > 0, then we have not reached the node (ε=ε)(\varepsilon = \varepsilon) and can repeat this process. Since the length cannot decrease forever, we must eventually reach the node (ε=ε)(\varepsilon = \varepsilon), which is a contradiction.