Assume, for a contradiction, that we have some function optimise(f), that returns the fastest Turing Machine that computes the (partial) function f.
The exact definition of 'fastest' doesn't matter.
Now consider the function \( \operatorname{returnOneOnHalt_f}(x) \), that simulates the execution of function f on input x. If f halts, then \( \operatorname{returnOneOnHalt_f} \) returns 1 and halts. If the function f doesn't halt then \( \operatorname{returnOneOnHalt_f} \) doesn't halt.
Now consider what happens when we run optimise() with \( \operatorname{returnOneOnHalt_f} \) as input, e.g. \( \operatorname{optimise}( \operatorname{returnOneOnHalt_f} \)).
For a given function f, if f halts on all inputs, then \( \operatorname{returnOneOnHalt_f}(x) = 1 \). Therefore the fastest Turing machine that computes this function is the one that just writes 1 to the tape and then halts.
On the other hand, if f does not halt on all inputs, then there is at least one input value for which the optimsed \( \operatorname{returnOneOnHalt_f}\) does not halt. This Turing machine is more complex than the machine that just writes one and halts. For example it will execute more than one step as it does not halt.
Therefore we have created an effective procedure for determining if f halts on all inputs - just look at the optimised \( \operatorname{returnOneOnHalt_f} \) and see if it is the 'write one to tape and halt' machine.
However we know that such an effective procedure is impossible due to the Halting Theorem. Therefore we have a contradiction, and we cannot have such a function optimise().
Pseudocode for such a procedure looks like this:
function makeReturnsOneOnHalt(f) return function that simulates f, and when f halts, returns 1. end function doesHaltOnAllInputs(f) returnsOneOnHalt_f = makeReturnsOneOnHalt(f) optimised_returnsOneOnHalt_f = optimise(returnsOneOnHalt_f) return optimised_returnsOneOnHalt_f == returnsOne end
Consequences for superoptimisation
Superoptimisation is a compiler optimisation technique that is supposed to find the shortest, or fastest program to compute a function. In other words, it doesn't just make a program somewhat faster, but actually finds the optimal program.
The term was introduced in the paper 'Superoptimizer -- A Look at the Smallest Program' by Henry Massalin in 1987. Interestingly the paper defines superoptimisation to give the shortest program, as opposed to the fastest.
If the programs being considered are 'Turing-complete' (or 'Turing-machine equivalent'), then either kind of superoptimisation (optimising for program length of for program speed) become impossible (uncomputable).
Finding the shortest program is uncomputable as a consequence of Kolmogorov Complexity theory.
As the proof above demonstrates, finding the fastest program that computes a given function is also uncomputable. This means that superoptimisation must necessarily be restricted to subsets of all possible functions, such as the total functions.