Let D\mathbb{D} be a countable set of unordered colors, also called “data”, and let (X,+,0)(X, +, 0) be a commutative monoid. An XX-valued data vector is a mapping DX\mathbb{D} \to X with finite support, i.e. with finitely many colors mapping to nonzero values. For example, consider D={r,g,b,}\mathbb{D} = \{{\color{red}r}, {\color{green}g}, {\color{blue}b}, \ldots\} and X=NX = \N under addition. Here, a data vector is a finite multiset over D\mathbb{D}, e.g. 0\bm{0} denotes the data vector assigning 00 to every dDd \in \mathbb{D}, and { ⁣ ⁣{r,r,b} ⁣ ⁣}\{\!\!\{{\color{red}r}, {\color{red}r}, {\color{blue}b}\}\!\!\} is represented by the data vector v\bm{v} with v(r)=2\bm{v}({\color{red}r}) = 2, v(b)=1\bm{v}({\color{blue}b}) = 1, and v(d)=0\bm{v}(d) = 0 for all dD{r,b}d \in \mathbb{D} \setminus \{{\color{red}r}, {\color{blue}b}\}.

An XX-valued abstract data vector, over a finite set of variables Vars\mathit{Vars}, is a mapping a ⁣:VarsX\bm{a} \colon \mathit{Vars} \to X. Informally, a\bm{a} represents the infinite (but orbit-finite) family of data vectors where the variables are renamed by distinct colors. For example, a={x2,y1}\bm{a} = \{x \mapsto 2, y \mapsto 1\} represents

{ ⁣ ⁣{r,r,g} ⁣ ⁣},{ ⁣ ⁣{r,r,b} ⁣ ⁣},{ ⁣ ⁣{g,g,r} ⁣ ⁣},{ ⁣ ⁣{g,g,b} ⁣ ⁣},{ ⁣ ⁣{b,b,r} ⁣ ⁣},{ ⁣ ⁣{b,b,g} ⁣ ⁣}, \newcommand{\mset}[1]{\{\!\!\{#1\}\!\!\}} \mset{{\color{red}r}, {\color{red}r}, {\color{green}g}}, \mset{{\color{red}r}, {\color{red}r}, {\color{blue}b}}, \mset{{\color{green}g}, {\color{green}g}, {\color{red}r}}, \mset{{\color{green}g}, {\color{green}g}, {\color{blue}b}}, \mset{{\color{blue}b}, {\color{blue}b}, {\color{red}r}}, \mset{{\color{blue}b}, {\color{blue}b}, {\color{green}g}}, \ldots

Given an injection ι ⁣:VarsD\iota \colon \mathit{Vars} \to \mathbb{D}, we write aι\bm{a}_\iota for the data vector v\bm{v} defined by v(d)=a(ι1(d))\bm{v}(d) = \bm{a}(\iota^{-1}(d)) if dd is in the image of ι\iota, and v(d)=0\bm{v}(d) = 0 otherwise. For example, a{xr,yb}={ ⁣ ⁣{r,r,b} ⁣ ⁣}\bm{a}_{\{x \mapsto {\color{red}r}, y \mapsto {\color{blue}b}\}} = \{\!\!\{{\color{red}r}, {\color{red}r}, {\color{blue}b}\}\!\!\}.

Given a set AA of abstract data vectors, the set of data vectors generated by AA, denoted A\langle A \rangle, is defined inductively by v=0\bm{v} = \bm{0} or v=w+aι\bm{v} = \bm{w} + \bm{a}_\iota for some wA\bm{w} \in \langle A \rangle, aA\bm{a} \in A and injection ι ⁣:VarsD\iota \colon \mathit{Vars} \to \mathbb{D}.

For example, consider D={r,g,b,}\mathbb{D} = \{{\color{red}r}, {\color{green}g}, {\color{blue}b}, \ldots\}, Vars={x,y}\mathit{Vars} = \{x, y\} and X=ZX = \Z under addition. Let A={a,b}A = \{\bm{a}, \bm{b}\} where a={x2,y1}\bm{a} = \{x \mapsto 2, y \mapsto 1\} and b={x1,y2}\bm{b} = \{x \mapsto 1, y \mapsto -2\}. We have v={r3,b2}A\bm{v} = \{{\color{red}r} \mapsto 3, {\color{blue}b} \mapsto 2\} \in \langle A \rangle since

v={r2,g1}+{b2,g1}+{r1,g2}=a{rx,gy}+a{bx,gy}+b{rx,gy}. \begin{aligned} \bm{v} &= \{{\color{red}r} \mapsto 2, {\color{green}g} \mapsto 1\} + \{{\color{blue}b} \mapsto 2, {\color{green}g} \mapsto 1\} + \{{\color{red}r} \mapsto 1, {\color{green}g} \mapsto -2\} \\ % &= \bm{a}_{\{{\color{red}r} \mapsto x, {\color{green}g} \mapsto y\}} + \bm{a}_{\{{\color{blue}b} \mapsto x, {\color{green}g} \mapsto y\}} + \bm{b}_{\{{\color{red}r} \mapsto x, {\color{green}g} \mapsto y\}}. \end{aligned}

More visually, this can be depicted as

We are interested in the expressibility problem which asks, given a data vector v\bm{v} and a finite set AA of abstract data vectors, whether vA\bm{v} \in \langle A \rangle. This problem arises, e.g., in the study of unordered data Petri nets. We will see how one can decide the expressibility problem, by revisiting the work of Hofman, Leroux and Totzke1.

Edge chromatic number

To tackle the expressibility problem, we briefly turn to graph theory. An edge coloring of an undirected graph G=(V,E)G = (V, E) is a function f ⁣:ENf \colon E \to \N where f(e)=f(e)f(e) = f(e') implies that ee and ee' do not share any vertex. The size of ff is the size of its image. The edge chromatic number of GG is the size of a minimal edge coloring. For example, the edge chromatic number of this graph is 4:

König’s line coloring theorem states that the edge chromatic number of a bipartite graph is its maximum degree. For example, the above bipartite graph has maximum degree 2. We provide a proof2 of the theorem for the slightly more general case of multigraphs, i.e. where parallel edges are allowed.

The edge chromatic number of a bipartite multigraph equals its maximum degree.
Proof.

Let G=(XY,E)G = (X \cup Y, E) be a bipartite multigraph. We proceed by induction on E|E|. If E=0|E| = 0, then we are trivially done. Assume that E1|E| \geq 1. Let Δ\Delta denote the maximum degree of GG. Let eEe \in E be an arbitrary edge, and let xXx \in X and yYy \in Y be its endpoints. Let G=(XY,E)G' = (X \cup Y, E') be the graph obtained by removing ee from GG. As the maximum degree of GG' is at most Δ\Delta, by induction hypothesis, there is an edge coloring f ⁣:E[1..Δ]f' \colon E' \to [1..\Delta] of GG'.

Since degG(x),degG(y)<Δ\mathrm{deg}_{G'}(x), \mathrm{deg}_{G'}(y) < \Delta, there exist i,j[1..Δ]i, j \in [1..\Delta] such that vertex xx is not incident with an ii-edge in ff', and likewise for yy and jj. If i=ji = j, then we are done by coloring ee with ii. So, assume iji \neq j. Without loss of generality, we assume that xx is incident with a jj-edge in ff', as otherwise we could have set i=ji = j.

We construct a path. Let us start in x0=xx_0 = x and take a j{\color{magenta}j}-edge e1Ee_1 \in E' going to some vertex v1v_1. If e1e_1 is incident with an i{\color{blue}i}-edge e2Ee_2 \in E', then we take it and move to some vertex v2v_2. We continue this process alternating with j{\color{magenta}j}-edges and i{\color{blue}i}-edges. Since ff' is an edge coloring, a vertex vv cannot repeat along this path, as otherwise vv would be incident with two edges of the same color. Thus, we may obtain a maximal simple path PP of the form

v0e1v1e2v2e3  vn, where viX iff i is even. v_0 \xrightarrow{\color{magenta}e_1} v_1 \xrightarrow{\color{blue}e_2} v_2 \xrightarrow{\color{magenta}e_3}\ \cdots\ v_n, \text{ where } v_i \in X \text{ iff } i \text{ is even}.

By assumption on xx, we have n1n \geq 1. Let g ⁣:E[1..Δ]g' \colon E' \to [1..\Delta] be obtained from ff' by swapping i{\color{blue}i} and j{\color{magenta}j} on the edges of PP. Note that gg' is an edge coloring.

Click for a proof.

For the sake of contradiction, suppose that some vertex uu is incident to two edges of the same color kk in gg'. We must have u=vu = v_\ell for some [0..n]\ell \in [0..n] and k{i,j}k \in \{{\color{blue}i}, {\color{magenta}j}\}. We cannot have =0\ell = 0, as v0=xv_0 = x is not incident to an i{\color{blue}i}-edge in ff'. We cannot have 1<n1 \leq \ell < n, as we swapped the occurrences of the two colors. We cannot have =n\ell = n as otherwise we could have extended PP to a longer path.

Recall that yy is not incident to a j{\color{magenta}j}-edge in ff'. Thus, if yy appeared along PP, we would have vn=yv_n = y with nn even, which is impossible as GG is bipartite. Thus, xx is the first vertex of PP, and yy does not appear along PP. Consequently, in the edge coloring gg', neither xx nor yy is incident to a j{\color{magenta}j}-edge.

Let g ⁣:E[1..Δ]g \colon E \to [1..\Delta] be gg' extended with g(e)=jg(e) = {\color{magenta}j}. We are done since gg is an edge coloring of GG.

Note that the proof of Theorem 1 is constructive. For example, the edge coloring of the previous example was obtained as follows:

Perfect matchings on multisets

We now introduce a useful notion of matchings that can be characterized thanks to Theorem 1. Let X\mathcal{X} be a finite collection of finite multisets X1,,Xk ⁣:DNX_1, \ldots, X_k \colon \mathbb{D} \to \N. A transversal of X\mathcal{X} is a tuple (d1,,dk)Dk(d_1, \ldots, d_k) \in \mathbb{D}^k where diXid_i \in X_i and didjd_i \neq d_j for all iji \neq j. A perfect matching of X\mathcal{X} is a finite multiset TT of transversals such that Xi={ ⁣ ⁣{di:(d1,,dk)T} ⁣ ⁣}X_i = \{\!\!\{d_i : (d_1, \ldots, d_k) \in T\}\!\!\} for each i[1..k]i \in [1..k].

For example, consider X1={ ⁣ ⁣{r,r,g} ⁣ ⁣}X_1 = \{\!\!\{{\color{red}r}, {\color{red}r}, {\color{green}g}\}\!\!\}, X2={ ⁣ ⁣{r,g,b} ⁣ ⁣}X_2 = \{\!\!\{{\color{red}r}, {\color{green}g}, {\color{blue}b}\}\!\!\} and X3={ ⁣ ⁣{g,b,b} ⁣ ⁣}X_3 = \{\!\!\{{\color{green}g}, {\color{blue}b}, {\color{blue}b}\}\!\!\}. This is a perfect matching:

T={ ⁣ ⁣{(r,g,b),(r,b,g),(g,r,b)} ⁣ ⁣}. T = \{\!\!\{({\color{red}r}, {\color{green}g}, {\color{blue}b}), ({\color{red}r}, {\color{blue}b}, {\color{green}g}), ({\color{green}g}, {\color{red}r}, {\color{blue}b})\}\!\!\}.

Perfect matchings do not always exist, e.g. for X1={ ⁣ ⁣{r,r} ⁣ ⁣}X_1 = \{\!\!\{{\color{red}r}, {\color{red}r}\}\!\!\} and X2={ ⁣ ⁣{r,b} ⁣ ⁣}X_2 = \{\!\!\{{\color{red}r}, {\color{blue}b}\}\!\!\}.

There is a perfect matching iff X1==Xk=n|X_1| = \cdots = |X_k| = n and dD1ikXi(d)n\bigwedge_{d \in \mathbb{D}} \sum_{1 \leq i \leq k} X_i(d) \leq n.
Proof.

\Rightarrow) Let nn be the number of transversals of the perfect matching. Since each element of XiX_i must belong to exactly one transversal, we must have X1==Xk=n|X_1| = \cdots = |X_k| = n. Let dDd \in \mathbb{D}. Each transversal contains at most one occurrence of dd. As there are nn transversals, this means that X1(d)++Xk(d)nX_1(d) + \ldots + X_k(d) \leq n.

)\Leftarrow) Let U=[1..k]U = [1..k], and let VDV \subseteq \mathbb{D} be the finite subset of colors occurring at least once in some XiX_i. Let G=(UV,E)G = (U \cup V, E) be the bipartite multigraph where there are Xi(d)X_i(d) edges of the form {i,d}\{i, d\}. By assumption, we have deg(u)=n\mathrm{deg}(u) = n for each uUu \in U, and deg(v)n\mathrm{deg}(v) \leq n for each vVv \in V. Thus, the maximum degree of GG is nn, and hence its edge chromatic number is nn by Theorem 1.

Let f ⁣:E[1..n]f \colon E \to [1..n] be an edge coloring of GG. For every i[1..n]i \in [1..n] and j[1..k]j \in [1..k], let ei,je_{i,j} be the unique edge adjacent with vertex jUj \in U such that f(ei,j)=if(e_{i, j}) = i. Let di,jVd_{i, j} \in V be the other vertex adjacent with ei,je_{i, j}. Since ff is an edge coloring, the tuple ti=(di,1,,di,k)t_i = (d_{i, 1}, \ldots, d_{i, k}) is a transversal. Thus, T={ ⁣ ⁣{t1,,tn} ⁣ ⁣}T = \{\!\!\{t_1, \ldots, t_n\}\!\!\} is a perfect matching.

Characterizing expressibility with perfect matchings

Recall that we consider XX-valued data vectors where (X,+,0)(X, +, 0) is a commutative monoid. For every cNc \in \N and xXx \in X, let cxc \cdot x denote x++xx + \ldots + x with cc terms. We obtain the following characterization:

Let a\bm{a} be an abstract data vectors over Vars={x1,,xk}\mathit{Vars} = \{x_1, \ldots, x_k\}. It is the case that va\bm{v} \in \langle \bm{a} \rangle iff there exist finite multisets X1,,Xk ⁣:DNX_1, \ldots, X_k \colon \mathbb{D} \to \N such that

dD(v(d)=i[1..k]Xi(d)a(xi))has-perfect-matching(X1,,Xk). \bigwedge_{d \in \mathbb{D}} \bigg(\bm{v}(d) = \sum_{i \in [1..k]} X_i(d) \cdot \bm{a}(x_i)\bigg) \land \mathrm{has\text{-}perfect\text{-}matching}(X_1, \ldots, X_k).
Proof.

\Rightarrow) By assumption there exist injections ι1,,ιn ⁣:VarsD\iota_1, \ldots, \iota_n \colon \mathit{Vars} \to \mathbb{D} satisfying v=j[1..n]aιj\bm{v} = \sum_{j \in [1..n]} \bm{a}_{\iota_j}. Let Xi ⁣:DNX_i \colon \mathbb{D} \to \N be defined by Xi(d)={ ⁣ ⁣{j[1..n]:ιj(xi)=d} ⁣ ⁣}X_i(d) = |\{\!\!\{j \in [1..n] : \iota_j(x_i) = d\}\!\!\}|. For every dDd \in \mathbb{D}, we have

v(d)=i[1..k]Xi(d)a(xi). \bm{v}(d) = \sum_{i \in [1..k]} X_i(d) \cdot \bm{a}(x_i).

For each i[1..n]i \in [1..n], let ti=(ιi(x1),,ιi(xk))t_i = (\iota_i(x_1), \ldots, \iota_i(x_k)). By injectivity of ιi\iota_i, the tuple tit_i is a transversal. Thus, by definition, { ⁣ ⁣{t1,,tn} ⁣ ⁣}\{\!\!\{t_1, \ldots, t_n\}\!\!\} is a perfect matching of X1,,XkX_1, \ldots, X_k.

\Leftarrow) Let TT be a perfect matching for X1,,XkX_1, \ldots, X_k. For each tTt \in T, let ιt(xi)=t[i]\iota_t(x_i) = t[i]. Since tt is a transversal, ιt\iota_t is an injection. For every dDd \in \mathbb{D}, we have

tTaιt(d)=i[1..k]{ ⁣ ⁣{tT:ιt(xi)=d} ⁣ ⁣}a(xi)=i[1..k]{ ⁣ ⁣{tT:t[i]=d} ⁣ ⁣}a(xi)=i[1..k]Xi(d)a(xi)(by Xi={ ⁣ ⁣{t[i]:tT} ⁣ ⁣}). \begin{aligned} \sum_{t \in T} \bm{a}_{\iota_t}(d) &= \sum_{i \in [1..k]} |\{\!\!\{t \in T : \iota_t(x_i) = d\}\!\!\}| \cdot \bm{a}(x_i) \\ &= \sum_{i \in [1..k]} |\{\!\!\{t \in T : t[i] = d\}\!\!\}| \cdot \bm{a}(x_i) \\ &= \sum_{i \in [1..k]} X_i(d) \cdot \bm{a}(x_i) && (\text{by $X_i = \{\!\!\{t[i] : t \in T\}\!\!\}$}). \end{aligned}

Recall that v(d)=i[1..k]Xi(d)a(xi)\bm{v}(d) = \sum_{i \in [1..k]} X_i(d) \cdot \bm{a}(x_i). Hence, we have v=tTaιt\bm{v} = \sum_{t \in T} \bm{a}_{\iota_t}, and so va\bm{v} \in \langle \bm{a} \rangle.

Deciding expressibility

We now combine our characterizations and bound the number of colors necessary to witness expressibility.

Let A={a1,,am}A = \{\bm{a}_1, \ldots, \bm{a}_m\} be a finite set of abstract data vectors over Vars={x1,,xk}\mathit{Vars} = \{x_1, \ldots, x_k\}. It is the case that vA\bm{v} \in \langle A \rangle iff there exist n1,,nmNn_1, \ldots, n_m \in \N, finite multisets X1,1,,Xm,k ⁣:DNX_{1, 1}, \ldots, X_{m, k} \colon \mathbb{D} \to \N and data vectors v1,,vm\bm{v}_1, \ldots, \bm{v}_m such that the following holds for each i[1..m]i \in [1..m]:

  1. dD(v(d)=i[1..m]j[1..k]Xi,j(d)ai(xj))i[1..m]j[1..k](Xi,j=ni)i[1..m]dD(j[1..k]Xi,j(d)ni),\bigwedge_{d \in \mathbb{D}} \bigg(\bm{v}(d) = \sum_{\substack{i \in [1..m] \\ j \in [1..k]}} X_{i, j}(d) \cdot \bm{a}_i(x_j)\bigg) \land \bigwedge_{\mathclap{\substack{i \in [1..m] \\ j \in [1..k]}}} (|X_{i, j}| = n_i) \land \bigwedge_{\substack{i \in [1..m] \\ d \in \mathbb{D}}} \bigg(\sum_{j \in [1..k]} X_{i, j}(d) \leq n_i\bigg),
  2. supp(X1,1++Xm,k)m(2k1)+supp(v)+1|\mathrm{supp}(X_{1, 1} + \ldots + X_{m, k})| \leq m(2k-1) + |\mathrm{supp}(\bm{v})| + 1.

Proof.

We have vA\bm{v} \in \langle A \rangle iff v=v1++vm\bm{v} = \bm{v}_1 + \ldots + \bm{v}_m for some v1a1,,vmam\bm{v}_1 \in \langle \bm{a}_1 \rangle, \ldots, \bm{v}_m \in \langle \bm{a}_m \rangle. Thus, by Proposition 3 and Lemma 2, we have vA\bm{v} \in \langle A \rangle iff there exist n1,,nmNn_1, \ldots, n_m \in \N and finite multisets X1,1,,Xm,k ⁣:DNX_{1, 1}, \ldots, X_{m, k} \colon \mathbb{D} \to \N satisfying Item 1 of the claim.

It remains to prove Item 2, i.e. that the number of colors can be bounded. Let us pick a solution minimizing the size of S=supp(X1,1++Xm,k)S = \mathrm{supp}(X_{1, 1} + \ldots + X_{m, k}). Let us show that Sm(2k1)+supp(v)+1|S| \leq m(2k-1) + |\mathrm{supp}(\bm{v})| + 1.

We say that dDd \in \mathbb{D} is ii-dominant if Xi,1(d)++Xi,k(d)>ni/2X_{i, 1}(d) + \ldots + X_{i, k}(d) > n_i / 2. Let DiD_i denote the set of ii-dominant colors. We have Di<2k|D_i| < 2k, as otherwise we would obtain this contradiction:

kni=j[1..k]Xi,j=j[1..k]dDXi,j(d)dDij[1..k]Xi,j(d)>dDini/2kni. k n_i = \sum_{j \in [1..k]} |X_{i, j}| = \sum_{j \in [1..k]} \sum_{d \in \mathbb{D}} X_{i, j}(d) \geq \sum_{d \in D_i} \sum_{j \in [1..k]} X_{i, j}(d) > \sum_{d \in D_i} n_i / 2 \geq k n_i.

We say that a color is non-dominant if it is not ii-dominant for any i[1..m]i \in [1..m]. For the sake of contradiction, suppose that Sm(2k1)+supp(v)+2|S| \geq m(2k-1) + |\mathrm{supp}(\bm{v})| + 2. By the pigeonhole principle, there exist distinct non-dominant colors e,eSe, e' \in S not appearing in supp(v)\mathrm{supp}(\bm{v}).

We merge color ee' into color ee by defining

Xi,j(e)=0,Xi,j(e)=Xi,j(e)+Xi,j(e),Xi,j(f)=Xi,j(f)for all f{e,e}. \begin{aligned} X_{i, j}'(e') &= 0, \\ X_{i, j}'(e) &= X_{i, j}(e) + X_{i, j}(e'), \\ X_{i, j}'(f) &= X_{i, j}(f) && \text{for all } f \notin \{e, e'\}. \end{aligned}

Observe that X1,1,,Xm,kX_{1,1}', \ldots, X_{m,k}' satisfy Item 1 of the claim, namely:

  • dDv(d)=i[1..m],j[1..k]Xi,j(d)a(xj)\bigwedge_{d \in \mathbb{D}} \bm{v}(d) = \sum_{i \in [1..m], j \in [1..k]} X_{i, j}'(d) \cdot \bm{a}(x_j) holds since e,esupp(v)e, e' \notin \mathrm{supp}(\bm{v});

  • Xi,j=Xi,j=ni|X_{i, j}'| = |X_{i, j}| = n_i;

  • i[1..m],dDj[1..k]Xi,j(d)ni\bigwedge_{i \in [1..m], d \in \mathbb{D}} \sum_{j \in [1..k]} X_{i, j}(d) \leq n_i holds since ee and ee' are non-dominant.

This contradicts the minimality of SS since S=supp(X1,1++Xm,k)=S{e}S' = \mathrm{supp}(X_{1,1}' + \ldots + X_{m,k}') = S \setminus \{e'\}.

The expressibility problem is in NP for Z\Z^\ell-valued data vectors under addition.
Proof.
Theorem 4 yields a system of linear inequalities for which at most polynomially many colors are needed. Thus, expressibility reduces to integer linear programming feasability.