Home

Federica EU

1/25

Luigi Paura » 3.Source Coding

Source Coding

Summary

I Shannon Theorem
Huffman Coding
Lempel – Ziv Coding

Coding Source Theorem I Shannon Theorem

The I Shannon Theorem states how much is the compression limit that a code can reach without introducing distorsion (Distorsion-less Coding).
The minimum average number of binary simbols to represent an arbitrary source symbol (average length of the code words) cannot be smaller than the source entropy.

Coding Source Theorem I Shannon Theorem (next)

Let S be a DMS with source alphabet A={a₁,a₂,…,a_N}

According to Large Number Law for a symbol sequence of length n enough large (n→∞) one has:

Pr (ai)

Coding Source Theorem I Shannon Theorem (next)

Definition

A typical sequence is a sequence satisfying the Large Number Law, namely n _i = p_in For n→∞

Coding Source Theorem I Shannon Theorem (next)

The typical sequences have the same probabilities equal to 2^-nH(X)

Coding Source Theorem I Shannon Theorem (next)

We will need only N_τ=2^nH(X) codewords, namely, if we utilize binary codes nH(X) binary symbolsfor any
sequence of n source symbols.

Resource saving!!!

The saving is larger and larger whenever H(X) is smaller and smaller than log N !!!!

Coding Source Theorem I Shannon Theorem (next)

What happens if the source outpot is a non-typical sequence ?
No codeword has been assigned!!!

An error occurs, namely a distorsion!!!!

For n→∞ such an event will occur with null probability
Such a coding is asymptotically ( n→∞) distortion-less.

Coding Source Theorem I Shannon Theorem (next)

Statement

A memory-less source with rate H(X) can be coded with arbitrary small distorsion if n→∞ with Coding Rate (bits /source symbol) such that:

R≥H(X)

On the contrary if R<H(X) asymptotically null distorsion.

Huffman Code

The I Shannon Theorem is an existence theorem: it just assures us that if we need a code with coding rate R≥H(X), it does exist.

How to single out a code ?

Huffman Algorithm

Huffman Code (next)

Assumption: we know the probabilities p_i of DMS symbols a_i (Entropic Coding).
The Huffman code is a code block/variable length.
The codewords have different lengths :

The more probable source symbols are mapped to the shorter codewords.
The less probable source symbols are mapped to thelonger codewords.

Huffman Code (next)

Procedure

Sort source outputs in decreasing order of their probabalities.
Merge the two least probable source outputs into a single output whose probability is the sum of the corresponding probabilities.
If the number of the remaining outputs is 2, then go to the the next step, otherwise go to step 1. Arbitrarily assign “0″ and “1″as codewords for the two remaining outputs.
If an output is the result of the merger of two outputs” a preceding step, append the current codeword with a 0 and a 1 to obtain the codeword for the preceding outputs and then repeat.
If no output is preceded by another output in a preceding step, then stop.

Huffman Code (next)

Huffman Code : an example

Huffman Code : an example (next)

Huffman code (next)

Huffman code

If n=2 we have 32=9 source symbols with n probabilities p_i=1/9

If we built a Huffman code one has that $R2=3.2\bar{2}2$

Lempel-Ziv Code

The Huffman coding requiresthe knowledge of the symbol probabilities.
The Lempel-Ziv coding does not requiresuch such knowledge.
The procedure is autolearning: it is a universal coding.
The Lempel-Ziv coding is a variable length /fixed length code.

Lempel-Ziv Code (next)

It can be shown that for a stationary and ergodig source.

The number of bits for n→∞ is equal to n H(X).

nH(X) < n because H(X) < 1

Problems

How many phrases can be inserted in the dictionary? How is it possible to avoid the overflow?
Removing from the dictionary the phrases which did not occur in the past.