# Why do we use Huffman coding

The quantization resulted in many zero values, which was also our intention to be able to summarize redundancies. From Chapter 2.4 we know that values ​​can be grouped if they are one behind the other. In order for this to be the case in our 8 * 8 matrices, you have to read them in a zigzag course, because then as many of the zeros that were created during quantization are in a row. The values ​​of an 8 * 8 matrix are read off using the zigzag method shown in order to group the frequently occurring zeros at the end of the matrix.

Now we have written down the values ​​one after the other, i.e. converted the 8 * 8 matrix into a 1x64 vector. The basis for the creation of a so-called Huffman code is now a stochastic processing of the values ​​that the matrix contains, because it is much more practical to give frequently occurring values ​​a short code and less frequently occurring values ​​a long code than each individual value regardless, how often it occurs to be encrypted with the same code length.

8.2 Huffman coding: stochastic evaluation

Before we apply the Huffman code to our vector, we will demonstrate the system of stochastic evaluation by encoding a conventional word instead of 64 different numerical values, consisting of certain of the 26 letters of the alphabet, which should be a bit clearer. As example welcome the popular greeting from our former math teacher, which is called: "MOORJEEEN". We want to find a unique code for this word that can be decoded without a doubt. This is an important prerequisite, because we want to dispense with separators between the codes (so as not to unnecessarily occupy memory) and automatically recognize where a letter begins and ends.

First we determine which relative frequencies of the individual letters: Then organize we the pairs according to probabilities (from largest to smallest): Now we summarize the two letters with the lowest probabilities and give one the code 0, the other the code 1: In the next step, the two elements with the lowest probabilities are again summarized, to which the digits 0 or 1 are then assigned again. The letters G and N are each given a new code by simply prefixing the old one with 1. So we do now until all letters have gone through this process: The procedure has now been completed and now displays a Huffman code for each letter, represented by binary numbers. Our starting word MOORJEEEN now has the following code: 1000101101110000000111
A Huffman tree is often used in practice, as the process of summarizing the rarest values ​​and assigning them 0 or 1 can also be represented in the form of a probability tree (known from school stochastics), only here the Paths specify a code for each element of the result set: The code for a particular symbol is read off by reading the individual paths of the Huffman tree; for "J" the code is 110. The "empty" nodes are not a glitch or an unpleasant side effect, but necessary: ​​They ensure that the individual codes are unique and that no code is prefixed by someone else (that will be discussed later explained in detail).

The code 1000101101110000000111 is clearly identifiable with the output word MOORJEEEN, without the need for separators, which is exactly the strength of the Huffman coding: with as little space as possible (there is no space for separators), a long piece of information can be stored. We also see that the letters that appear often in the example word have been given short codes. This also leads to a minimization of the storage space. If the output word MOORJEEEN had been converted normally into binary code, 42 would have been required instead of the 22 digits of the Huffman code (1000101101110000000111) (each letter is assigned its place number in the alphabet in binary form, and 8 separators are added because the codes are not unique) . The functional principle of the Huffman code can be illustrated here using a small Huffman program.

In addition to this method of coding, there is also the arithmetic coding - it differs little from the Huffman coding, since it gives the probabilities as decimals and not as fractions. However, it is patent-protected, which scares many users back and refers back to the Huffman coding.