Strings Amount of Information Appropriate Concept terabit of 0's low Kolmogorov Complexity terabit of random bits high Entropy terbit of bits of pi low Kolmogorov Complexity terbit of www pages medium Kolmogorov Complexity ENTROPY Question: What is the entropy/information of result of 1 fair coin flip? Answer: 1 bit Question: What is the entropy/information of result of 2 fair coin flips? Answer: 2 bit Lemma: If X and Y are independent random events then entropy H should satisfy H(X, Y) = H(X) + H(Y) Consider distributions over the alphabet A, C, G, T. Question: What is a minimal entropy distribution over A, C, G, T, and what is the entropy of this distribution Answer: Prob[A]=1 Prob[C]=Prob[G]=Prob[T]= 0 Entropy = 0 becuase sending A reveals no new information Question: What is a maximal entropy distribution over A, C, G, T, and what is the entropy of this distribution Answer: Prob[A]=Prob[C]=Prob[G]=Prob[T]= 1/4 Entropy = 2 becuase this is equivalent to 2 indpendent random bits Consider the distribution prob(A)=1/2 prob(C)=1/4 prob(G)=prob(T)=1/8 Definition: Entropy of a probability distribution X over a set U: H(X) = Sum_{x in U} prob(x) lg (1/prob(x)) = - E[ lg prob(x)] Here H(x) = lg (1/prob(x)) is the entropy in bits of the string x. Note that prob(x) = 1/2^H(x). So H(x) is about how many bits you expect to see. Source Coding Theorem (Shannon 1948) : Assuming that you have an input that is a string S where each element of S is drawn independently according to a distribution X. The is a scheme that can transmit S using only about |S| H(x) bit. Every possible scheme uses at least about |S| H(X) bits. Proof: Divide S into blocks of size n for some reasonably large n. The probability that a block will be the string value B is Pi_{x in B} prob(x). The expected probability that you will see a string B is sum_b (Pi_{x in B} prob(x))^2. Key Insight: It is very likely that you will see a block B whose probablity is about 2^{-n H(X)}. So the distribution you see for a particular block is very close to a uniform distribution over 2^(n*H(X)) Achievability: So mostly you see blocks of size n of equal probability, so you use nH(X) bits for each such equally probable block. Use any reasonable encoding on the unlikely blocks. Optimality: Forget about encoding unlikely strings. You can't do better than using an equal number of bits for equally probable outcomes Venn Diagram for H(X, Y), H(X|Y), H(Y| X), I(X,Y) Defn: H(X, Y) = sum_{x,y} p(x,y) lg 1/p(x,y) Defn: H(X | Y) = H(X | Y) = H(X,Y) - H(Y) = sum_y p(y) sum_x p(x | y) lg 1/p(x|y) = sum_{x,y} p(x,y) lg p(y)/p(x,y) Defn: I(X,Y) = H(X,Y) - H(X|Y) - H(Y|X) = H(X) - H(X|Y) = H(Y) - H(Y|X) = sum_{x,y} p(x,y) lg p(x,y)/(p(x) * p(y)) Example: Consider a fair 6 sided die roll X= 1 if outcome is prime and 0 otherwise Y= 1 if outcome is odd and 0 otherwise H(X)=H(Y)= 1/2 H(X|Y) = P(Y=0) H(X| y=0) P(Y=1) H(X| y=1) I(X,Y) = H(X) - H(X | Y) Shannon's Noisy Channel Theorem: Let X be the random sent signal and Y and random received signal. You can get about max_{probability distributions over X} I(X;Y) bits of information through to the receiver for each bit sent. KOLMOGOROV COMPLEXITY Intuition: Measures information of a fixed string, rather than a distribution over strings/objects as does entropy. Definition: The Kolmogorov complexity K(x) of a string x is defined to be K(x) = min_{ strings y, and decoders D such that D(y)=x} length of y plus length of D = min_{programs P that write x on an empty input} length of P Theorem: The programming language only affects K(x) by an additive constant Theorem: There are incompressable strings, strings where K(x)=length of x, of every length n. Proof: Pigeon hole principle Theorem: There is no algorithm M to compute K(x) Proof: To reach a contradiction, assume M exists Consider the program P_n x=n for i = 1 to 2^x do if M(ith string of length x) >= x then output this string and halt Note that by the previous thoerem P_n always halts and outputs a string with Kolmogorov complexity at least n. But as P_n has size only O(\log n). So for some sufficiently large n, we get a contradiction to the output of P_n having Kolmogorov complexity n since P_n is a program of length O(log n) that outputs this string