Comparative analysis of Gene Regulatory Regions Using Mutual Information

Home Work Methodology Software

Mutual Information is the Information theory based calculation. Information theory is the mathematical theory of data communication and storage. The term information, used in a simple sense, refers to the transmitted messages. For example voice transmitted by telephone, images transmitted by television systems and digital data in computer systems and networks can be considered as communicated or stored information. The communications system includes five components: an information source, a transmitter, the medium, a receiver, and a destination.

Information theory answers two fundamental questions in communication theory.
  1. Data compression, which is explained by entropy (H) of the source
  2. Transmission rate of communication, which is explained by the channel capacity (C)
In the early 1940’s it was thought that increasing the transmission rate of information over a communication channel has increased the probability of error. Shannon (1948) surprised the communication theory community by providing that this was not true as long as the communication rate was below the channel capacity. The capacity can be computed from the noise characteristics of the channel. The concept of entropy is at the heart of information theory and it is characterized by the quantity of a random process’s uncertainty. If the entropy of the source is less than the capacity of the channel, then asymptotically error free communication can be achieved. The entropy of a discrete random variable X with a probability mass function p(x) is defined by

H(X) = -∑x p(x)log 2p(x)

Entropy of two random variables X and Y with probability mass functions p(x) and p(y) is defined by (Joint entropy)H

(X,Y) = -∑x,y p(x,y)log2p(x,y)

Conditional entropy H(X|Y) is the entropy of a random variable X, given another random variable Y,

H(X|Y) = -∑x,yp(x,y)log2p(x|y)

H(X|Y) = H(X,Y)-H(Y)

H(X|Y) and H(X,Y) are Conditional and Joint entropies respectively.

H(X,Y) = H(Y,X)

H(X|Y)≠H(Y|X) [equality is obtained in and only if H(X) = H(Y)]

The relative entropy is a measure of the distance between two distributions. The relative entropy D(p||q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p. The relative entropy or Kullback-Leibler distance between two probability mass functions p(x) and q(x) is defined as

D(p||q) = ∑ p(x)log(p(x)/q(x)) (Cover and Thomas, 1991).

Relative entropy is always non-negative and is zero if and only if p = q. However, it is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality. The reduction in uncertainty X due to the knowledge of Y random variable is called the mutual information. For two random variables X and Y this reduction is,

I(X;Y)=H(X)-H(X|Y)

I(X;Y)=H(Y)-H(Y|X)

I(X;Y)=H(X)+H(Y)-H(X,Y)

I(X;Y)=∑x,yp(x,y)log2 (p(x,y)/p(x)p(y))

p(x,y) Joint probability mass function; p(x)and p(y) are marginal probability mass functions

The mutual information I(X;Y) is the relative entropy between the Joint distribution and the product distribution. I(X;Y) is a measure of the dependence between the two random variables. It is symmetric in X and Y and always non-negative.

I(X;X)=H(X)-H(X|X)=H(X)

Thus the mutual information of a random variable with itself is the entropy of the random variable. This is the reason that entropy is some times referred to as self-information. Mutual information has been used to calculate the channel capacity in communications engineering. Channel capacity, is the maximum amount of discrete information rate that can be reliably transmitted over a channel By the noisy-channel coding theorem, the channel capacity of a given channel is the limiting information transport rate (in units of information per unit time) that can be achieved with vanishingly small error probability. Shannon described the notion of channel capacity and provided a mathematical model by which one can compute the maximal amount of information that can be carried by a channel. The capacity of the channel is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution.