There was some discussion on my Kolmogorov complexity post ('Kolmogorov Complexity - it's a bit silly') on Hacker News recently.

Pretty much everyone disagreed with me. Oh well, such is life.

My central point in that article was that it matters which language we use. For any given string, there is a language such that the Kolomogorov complexity of the string with respect to the language is zero. This means that Kolomogorov complexity tells us nothing useful about if some given string is 'random' or not.

The invariance theorem

Some people brought up the invariance theorem in defense of some global notion of Kolmogorov complexity. While the theorem may be true, what it says is basically useless in trying to define a global Kolmogorov complexity - i.e. a complexity that is independent of the language chosen.

Let's walk through an example of how the invariance theorem fails to do anything helpful. I'll use the notation from my previous post.

First a definition of the invariance theorem (from Wikipedia):

If \( K_1 \) and \( K_2 \) are the complexity functions relative to Turing complete description languages \(L_1\) and \(L_2\), then there is a constant \(c\) – which depends only on the languages \(L_1\) and \(L_2\) chosen – such that $$ \forall s. -c \le K_1(s) - K_2(s) \le c $$

The proof follows from considering an interpreter for \(L_1\) written in \(L_2\) and vice-versa.

Ok, so that's the invariance theorem. Now lets work through an example of why it is useless. Consider some arbitrary string \(x\), for example: $$ x=ababaaabbabaabaababbbabaababbbaaababbbba $$ Let us suppose the complexity wrt. language \(L_1\) is 20, e.g. $$ K_1(x) = 20 $$ Now consider the language \(L_{silly, x}\), defined as in the previous post. This is the language specifically chosen to make the complexity of x zero in the language. From the definition of \(L_{silly, x}\), we have $$ K_{silly, x}(x) = 0 $$ So, what does the invariance theorem say about the two languages we have chosen? (\(L_1\) and \(L_{silly, x}\)) The invariance theorem says that there is some constant \(c\) such that $$ \forall s. -c \le K_1(s) - K_{silly, x}(s) \le c $$ or using our values \(K_1(x) = 20\) and \(K_{silly, x}(x) = 0\): (since the statement holds for all strings) $$ -c \le 20 - 0 \le c $$ so $$ -c \le 20 \le c $$ so $$ c \ge 20 $$ In other words, for this particular string \(x\), there is some language \(L_{silly, x}\) such that the complexity of x wrt. the language is zero, and the bound (\(c\)) for this pair of languages is large enough to cover the gap in complexities between the two languages. So the invariance theorem has not stopped the complexity of \(x\) going to zero in some other language, even though it is non-zero in \(L_1\).

An alternative definition of Kolmogorov complexity

There are alternative definitions of Kolmogorov complexity, for example the one by given by Peter Bro Miltersen in his course notes.

These alternative definitions also suffer from the same problem - the complexity depends on the language chosen, and there is no reasonable 'invariant' complexity.

Recapping his definitions, we have programming systems (languages) \(L\), and programs in these languages: \(p \in L\). These are given semantics in the following way: Each language \(L\) has two associated algorithms \(\phi\), \(\tau\) so that \(\phi(p)\) for \(p \in L \) outputs the index of a Turing machine \(M_{\phi(p)}\). So \(p\) is given the behavior of the associated Turing machine: $$ p(x) := M_{\phi(p)}(x) $$ A Kolmogorov decoder relative to L is then defined: $$ d_K(pz) := p(z) $$ This function works by splitting the string \(pz\) into two components - the description of the program \(p\), and the input to the program, \(z\). For this to work, no program string \(p\) can be the prefix of another one.

The Kolmogorov code is then defined: $$ c_K(x) := \text{the shortest y so that } d_K(y) = x $$ Note that in in this definition \(y\) is not a program, but the definition of a program along with input to the program, concatenated together.

Finally the Kolmogorov complexity is given as $$ K(x) := |c_K(x)| $$ e.g. the length of the Kolmogorov code.

Using these definitions, we can make the following theorem:

Theorem: For any finite string \(x\), there is some language \(L_{silly, x}\) such that the Kolmogorov complexity of \(x\) in \(L_{silly, x}\) is 1.

Proof: For the given string \(x\), there is some Turing machine that outputs \(x\) then halts, when given the empty string input \(\varepsilon\). Suppose this Turing machine has index i. Define the associated semantics algorithm \(\phi\) for the language such that $$ \phi(p_1) := i $$ Where $$ p_1 = ``1" $$ E.g. \(p_1\) is the string \(``1"\). Which means that $$ p_1(\varepsilon) = M_{\phi(p_1)}(\varepsilon) = M_i(\varepsilon) = x $$ So $$ d_K(p_1\varepsilon) = p_1(\varepsilon) = x $$ Given that $$ c_K(x) = \text{the shortest y so that} d_K(y) = x $$ we have $$ c_K(x) = p_1\varepsilon$$ $$ c_K(x) = ``1" $$ so $$ K(x) := |c_K(x)| = 1 $$ The reason we have a complexity of one instead of zero is due to the prefix-free property required when combining the two strings for \(d_K\).