# Luigi Paura » 3.Source Coding

### Source Coding

Summary

• I Shannon Theorem
• Huffman Coding
• Lempel – Ziv Coding

### Coding Source Theorem I Shannon Theorem

The I Shannon Theorem states how much is the compression limit that a code can reach without introducing distorsion (Distorsion-less Coding).
The minimum average number of binary simbols to represent an arbitrary source symbol (average length of the code words) cannot be smaller than the source entropy.

### Coding Source Theorem I Shannon Theorem (next)

Let S be a DMS with source alphabet A={a1,a2,…,aN}

According to Large Number Law for a symbol sequence of length n enough large (n→∞) one has:

Pr (ai)

### Coding Source Theorem I Shannon Theorem (next)

Definition

• A typical sequence is a sequence satisfying the Large Number Law, namely n i = pin For n→∞

### Coding Source Theorem I Shannon Theorem (next)

The typical sequences have the same probabilities equal to 2-nH(X)

### Coding Source Theorem I Shannon Theorem (next)

We will need only  Nτ=2nH(X) codewords, namely, if we utilize binary codes  nH(X) binary symbolsfor any
sequence of n source symbols.

Resource saving!!!

The saving is larger and larger whenever H(X) is smaller and smaller than log N !!!!

### Coding Source Theorem I Shannon Theorem (next)

What happens if the source outpot is a non-typical sequence ?
No codeword has been assigned!!!

An error occurs, namely a distorsion!!!!

For n→∞ such an event will occur with null probability
Such a coding is asymptotically ( n→∞) distortion-less.

### Coding Source Theorem I Shannon Theorem (next)

Statement

A memory-less source with rate H(X) can be coded with arbitrary small distorsion if n→∞ with Coding Rate (bits /source symbol) such that:

R≥H(X)

On the contrary if R<H(X) asymptotically null distorsion.

### Huffman Code

The I Shannon Theorem is an existence theorem: it just assures us that if we need a code with coding rate R≥H(X), it does exist.

• How to single out a code ?

Huffman Algorithm

### Huffman Code (next)

Assumption: we know the probabilities pi of DMS symbols ai (Entropic Coding).
The Huffman code is a code block/variable length.
The codewords have different lengths :

• The more probable source symbols are mapped to the shorter codewords.
• The less probable source symbols are mapped to thelonger codewords.

### Huffman Code (next)

Procedure

1. Sort source outputs in decreasing order of their probabalities.
2. Merge the two least probable source outputs into a single output whose probability is the sum of the corresponding probabilities.
3. If the number of the remaining outputs is 2, then go to the the next step, otherwise go to step 1. Arbitrarily assign “0″ and “1″as codewords for the two remaining outputs.
4. If an output is the result of the merger of two outputs” a preceding step, append the current codeword with a 0 and a 1 to obtain the codeword for the preceding outputs and then repeat.
5. If no output is preceded by another output in a preceding step, then stop.

### Huffman code

If n=2 we have 32=9 source symbols with n probabilities pi=1/9

If we built a Huffman code one has that $R2=3.2\bar{2}2$

### Lempel-Ziv Code

The Huffman coding requiresthe knowledge of the symbol probabilities.
The Lempel-Ziv coding does not requiresuch such knowledge.
The procedure is autolearning: it is a universal coding.
The Lempel-Ziv coding is a variable length /fixed length code.

### Lempel-Ziv Code (next)

It can be shown that for a stationary and ergodig source.

The number of bits for n→∞ is equal to n H(X).

nH(X) < n because H(X) < 1

Problems

• How many phrases can be inserted in the dictionary? How is it possible to avoid the overflow?
• Removing from the dictionary the phrases which did not occur in the past.

### Lempel-Ziv Code (next)

It is widely utilized to compress files for computers, routine under UNIX such as ZIP, ZOO, LZH,ARJ etc.
Compression ratio:

• it depends on the characteristics of the source.
Progetto "Campus Virtuale" dell'Università degli Studi di Napoli Federico II, realizzato con il cofinanziamento dell'Unione europea. Asse V - Società dell'informazione - Obiettivo Operativo 5.1 e-Government ed e-Inclusion