Why do we use Huffman coding

The quantization resulted in many zero values, which was also our intention to be able to summarize redundancies. From Chapter 2.4 we know that values ​​can be grouped if they are one behind the other. In order for this to be the case in our 8 * 8 matrices, you have to read them in a zigzag course, because then as many of the zeros that were created during quantization are in a row.

The values ​​of an 8 * 8 matrix are read off using the zigzag method shown in order to group the frequently occurring zeros at the end of the matrix.

Now we have written down the values ​​one after the other, i.e. converted the 8 * 8 matrix into a 1x64 vector. The basis for the creation of a so-called Huffman code is now a stochastic processing of the values ​​that the matrix contains, because it is much more practical to give frequently occurring values ​​a short code and less frequently occurring values ​​a long code than each individual value regardless, how often it occurs to be encrypted with the same code length.

8.2 Huffman coding: stochastic evaluation

Before we apply the Huffman code to our vector, we will demonstrate the system of stochastic evaluation by encoding a conventional word instead of 64 different numerical values, consisting of certain of the 26 letters of the alphabet, which should be a bit clearer. As example welcome the popular greeting from our former math teacher, which is called: "MOORJEEEN". We want to find a unique code for this word that can be decoded without a doubt. This is an important prerequisite, because we want to dispense with separators between the codes (so as not to unnecessarily occupy memory) and automatically recognize where a letter begins and ends.

First we determine which relative frequencies of the individual letters:

Then organize we the pairs according to probabilities (from largest to smallest):

Now we summarize the two letters with the lowest probabilities and give one the code 0, the other the code 1:

In the next step, the two elements with the lowest probabilities are again summarized, to which the digits 0 or 1 are then assigned again. The letters G and N are each given a new code by simply prefixing the old one with 1.

So we do now until all letters have gone through this process:

The procedure has now been completed and now displays a Huffman code for each letter, represented by binary numbers. Our starting word MOORJEEEN now has the following code: 1000101101110000000111
A Huffman tree is often used in practice, as the process of summarizing the rarest values ​​and assigning them 0 or 1 can also be represented in the form of a probability tree (known from school stochastics), only here the Paths specify a code for each element of the result set:

The code for a particular symbol is read off by reading the individual paths of the Huffman tree; for "J" the code is 110. The "empty" nodes are not a glitch or an unpleasant side effect, but necessary: ​​They ensure that the individual codes are unique and that no code is prefixed by someone else (that will be discussed later explained in detail).

The code 1000101101110000000111 is clearly identifiable with the output word MOORJEEEN, without the need for separators, which is exactly the strength of the Huffman coding: with as little space as possible (there is no space for separators), a long piece of information can be stored. We also see that the letters that appear often in the example word have been given short codes. This also leads to a minimization of the storage space. If the output word MOORJEEEN had been converted normally into binary code, 42 would have been required instead of the 22 digits of the Huffman code (1000101101110000000111) (each letter is assigned its place number in the alphabet in binary form, and 8 separators are added because the codes are not unique) . The functional principle of the Huffman code can be illustrated here using a small Huffman program.

In addition to this method of coding, there is also the arithmetic coding - it differs little from the Huffman coding, since it gives the probabilities as decimals and not as fractions. However, it is patent-protected, which scares many users back and refers back to the Huffman coding.

8.3 Huffman coding: backgrounds

Let's briefly explain why the Huffman code is so ideal, why it is short, and why it does not need separators. To do this, we have to go back a little further: The Huffman coding belongs to the large field of Information theory. At this point the concept of the entropy:

If one considers a random experiment with n possible outcomes, what by the finite probability space With is described, there is uncertainty about the outcome of this experiment. We measure the uncertainty about the outcome of the experiment with the number H (p), the Entropy of the experiment. The degree of indeterminacy must now also be made clearly understandable: We imagine an experiment in which there are n different test outcomes. Person A already knows how the experiment turned out, Person B should find out with the help of questions. The mean number of questions B needed to find the answer is now the true entropy.

We arrive at the definition: For a chance experiment is the true entropy defined as the expected value of the number of questions required with an optimal question strategy.

The formula of the so-called ideal entropy, with which we calculate, we mention again at this point (we already encountered it in Chapter 2.3):

If you're interested, here's an interesting guide to why every Huffman code is optimal. If you don't want to go into this very meticulously, we provide two examples below to clarify:

  1. The experiment, in which it can be said how it will turn out before it is carried out, has the true entropy 0. In general, H (p)> 0 then also applies.
  2. When tossing a coin, so with , one asks about the outcome of the experiment: "It is ? "That is obviously optimal. So is because it only takes one question to find out what event occurred.

We are now replacing the words "yes" and "no" with the characters 0 and 1. The questioning strategy can now be expressed using a code. A word is a finite sequence of zeros and ones. Is we denote a word with has the length of the word, for example the length . We call the empty sequence of characters the empty word, it has the length 0.

In order for a code to be optimal, it must meet two properties: It must Prefix property own and one injective mapping be. The prefix property means that no code word may be a prefix of another. In practical terms, this means that no code word may appear in the initial characters of another code word. Injective mapping means that a code word is uniquely assigned to each test outcome. Both properties are given by the Huffman method, which was specially designed not to generate codes that could be someone else's prefix.

We consider five codes as an example , of which only three are usable. Find out which ones cannot be used for one of the reasons mentioned above!

Did you figure out which two codes are unusable? Here you can check whether your guess was correct.

8.4 Coding patterns and Huffman coding

Back to our quantized matrix, or our vector: The value in the upper left corner of the quantized matrix (or the first value of our vector) is the DC value - it is encrypted separately. Because it is by far the largest value, it is not its own value that is coded, but the difference to the DC value of the preceding block:

The code has two symbols: the first indicates how many bits are needed to store the encoded value. There is a table to determine this number of bits. Symbol 2 simply indicates the value.

In our example, the DC value of the previous block is 29. The difference to our block is 2. This value 2 is now brought into "symbol form": 2 bits are required to store the 2 and the symbol itself is 2. This follows the intermediate representation in the form (2) (2).

For the Coding of the AC values A slightly different pattern now applies: Here, too, two symbols are used, but the first has two components. The first component is what is called Barrel length of the symbol, i.e. the number of continuous zeros preceding the symbol since the last symbol, which was not zero. The second component is again the number of bits needed to store the number. The number itself is recorded in the second symbol.

For our first number not equal to zero in the zigzag series, the DCT coefficient AC3, does that mean:

  • Symbol 1: run length 1 and number of bits 1
  • Symbol 2: the value itself -7
  • Finished intermediate display: <(1; 1), - 7>

The quantized DCT coefficient AC70 for example has the value -3. This value can be coded with 2 bits, there are 11 zeros in front of it in a zigzag course. So its tuple description is <(11; 2), - 3>.

The zeros that appear one after the other in the last long sequence are represented with an "end-of-block" symbol: (0; 0), i.e. only zeros follow until the end of the block.

All values ​​of the block have to be transformed into this kind of intermediate representation in order to then encode them. This is now done with the Huffman coding indicated above.

Symbol 1 and symbol 2 are separated on the basis of a stochastic evaluation coded. You can carry out this evaluation yourself, as described in the example above, in order to obtain a unique code for each symbol. However, with the 8 * 8 blocks this is a lot of work that can be saved. There are tables that are based on empirical values ​​and thus enable a faster process (these tables are not always optimal because they apply generally to all matrices, but since they are prefabricated and cover the conventional values ​​to be encrypted, the saved effort when coding yourself a code that is not 100% optimal can be accepted). We have selected a few such tables that make coding easier for us - if you like, you can do a stochastic evaluation of our vector yourself and create your own code, but it will be a lot of work. Sample coding for our vector ...