In this blog post I'm going to write a little bit about permutations, and some connections with information theory and sorting algorithms, including giving a sketch-proof of the \(n \log n\) lower bound on comparison-based sorting.

A permutation is a re-ordering of a list. For example, (1, 3, 2), (3, 2, 1) and (2, 1, 3) are all permutations of the list (1, 2, 3).

Let's say you have a list with \(n\) elements in it (e.g. the length of the list is \(n\)). How many different ways of ordering the list (e.g. how many permutations) are there of the list?

When picking the first element of your permutation, you have n elements to choose from. Pick one. Then when picking the 2nd element of your permutation, you have one less to choose from, e.g. \(n-1\) elements. By the time you come to choose the last element of your permutation, you only have one element left. So the total number of different permutations is $$n \times (n - 1) \times (n - 2) \times ... \times 3 \times 2 \times 1 $$ This is the definition of n factorial: $$n! = n \times (n - 1) \times (n - 2) \times ... \times 3 \times 2 \times 1$$ So there are \(n!\) permutations of a list of n elements.

Let's say we want to store which permutation we have chosen in computer memory. Since there are \(n!\) possibilities, we need to use \(\log_2(n!)\) bits to store this information.

(In general, to store a choice that has x possibilities, you need to use \(\log_2(x)\) bits. For example, with 256 possibilities, you need to use \(\log_2(256) = 8\) bits.)

Once of the most important features of the log function, is the following rule - the 'log of products' is the 'sum of the logs'. In other words, $$\log(a \times b) = \log(a) + \log(b)$$

Let's have a look at the expression \(\log_2(n!)\), and substitute in the definition of \(n!\): $$\log_2(n!) = \log_2(n \times (n-1) \times (n-2) \times ... \times 2 \times 1)$$ Using the rule above to break up the log: $$\log_2(n!) = \\ \log_2(n \times (n-1) \times (n-2) \times ... \times 2 \times 1) = \\ \log_2(n) + \log_2(n-1) + \log_2(n-2) + ... + \log_2(2) + \log_2(1) $$ The last line can be interpreted in an interesting way - it gives the number of bits required when storing each element choice for the permutation separately. For example, \(\log_2(n)\) is the information needed to store the first choice, which was from \(n\) options.

\(\log_2(n-1)\) is the information needed to store the second choice, which was from \(n-1\) options. \( \log_2(1) \) is the information needed to store the last choice, which only has 1 option, e.g. there was no choice for it. Happily, \(\log_2(1) = 0\). No information is needed for this null choice!

This is one of the main reasons (perhaps the only reason?) why the log function pops up in information theory. When you have two independent choices to make, the number of possibilities is the product of the two numbers of possibilities for each choice. But the information required for both is the sum of the two individual pieces of information. The log function is the only function with this property.

Sorting algorithms

The task of a sorting algorithm is to take a permutation of some original sorted list, and return the original sorted list.

Consider a comparison based sorting algorithm, that at each step can compare two elements together with the less than (<) operator. At each step it can eliminate up to half of the possible permutations. In other words, it can gather one bit of information with each comparison.

So to gather all the bits of information needed to determine which permutation it is looking at, it needs to make \(\log_2(n!)\) comparisons.

It could do it less efficiently, but this is the lower bound on the number of comparisons which can handle the worst case.

(This is a very hand-wavy 'proof', but I hope you get the general idea)

Stirling's approximation

Stirling's approximation tells us that $$\ln(n!) \approx n \ln(n) - n$$ Here \(\ln(x)\) is the natural logarithm function, e.g. \(\log\) with base \(e\). We can convert natural logarithms to log base 2 by dividing the log function by \(\ln(2)\), which is about 0.6931, e.g. $$\log_2(x) = \ln(x) / \ln(2)$$ or $$\log_2(x) \times \ln(2) = \ln(x)$$ Using this rule in previous equation gives: $$\ln(n!) \approx n \ln(n) - n \\ \log_2(n!) \times 0.6931 \approx n \log_2(n) \times 0.6931 - n \\ \log_2(n!) \approx n \log_2(n) - n / 0.6931 $$ When you are doing asymptotic complexity analysis, constants don't matter.

So the optimal asymptotic complexity of the comparison based sort algorithm is \(n \log_2(n)\), which is the same asymptotic complexity as \(n \ln(n)\).

Let's look at some actual data points. For \(n = 10\):

n: 10
n!: 3628800
log_2(n!): 21.791061114716953
n*log_2(n): 33.219280948873624
Stirling's approximation (n log_2(n) - n / log(2)): 18.79233053998399
For \(n = 32\):
n: 32
n!: 2.631308369336935e35
log_2(n!): 117.66326365236031
n*log_2(n): 160
Stirling's approximation (n log_2(n) - n / log(2)): 113.83375869155317
For \(n = 150\):
n: 150
n!: 5.7133839564458505e262
log_2(n!): 872.8595063470792
n*log_2(n): 1084.322803574382
Stirling's approximation (n log_2(n) - n / log(2)): 867.9185474410376
One interesting thing to note is that n*log_2(n) is quite a poor estimate of the number of comparisons needed (it is quite a lot too large), at least for these relatively small n values. the (ceiling'd) log_2(n!) values give the actual number of comparisons needed.

So there you go, some curious things about permutations and sorting. I hope some people found this interesting. If there are any bits of maths that you don't understand, let me know and I will go into more depth. Corrections welcome of course.