Mark Dow

Some of my other stuff

The connection between thermodynamic entropy and information

"Gain in entropy always means loss of information, and nothing more."  Gilbert N. Lewis
"Information is just known entropy. Entropy is just unknown information."  Michael P. Frank, in  "Physical limits of Computing"
"More bullshit has been written about entropy than about any other physical quantity." David Beeman

    The bottom line is that thermodynamic entropy is best understood not as a property or macroscopic state of matter (like mass, temperature, or pressure), but as a lack of knowledge of the detailed configuration of matter. In particular, thermodynamic entropy is a measure of our lack of information about the microstate of a closed system of matter near equilibrium. To make this concrete, I'll compare two similar simple systems, one of particles and one of bits. Although the concept of entropy in classical thermodynamics was elucidated long before information theory was developed, thermodynamic entropy can be viewed as a straight-forward application of information theory to a physical problem.

    There are many other fine discussions of this topic, but few that strip it down to a simple example. A more in-depth, but more technical, discussion of the same topic is at Entropy in thermodynamics and information theory. But this discussion and others have the same bottom line, with only a variation of language:

"....it should be remembered that Gibb's statistical mechanical entropy is only one application of information theory to physical systems, relevant when the particular 'message' not yet communicated is the underlying microstate of the physical system."

The 'message' in thermodynamics, the microstate of a physical system, is never communicated. It is inaccessible to observation or transmission. 

Background:
What are particles and bits?
What the heck is "log", and what does it have to do with this topic?


    Consider a perfectly insulated 2-D box of simple particles. The macrostate of an ideal gas can be specified by the total energy E, number of particles N and volume V. There are a large but finite number of possible microstates that are all consistent with this system's single, and unchanging, macro-state:

particle ensemble

One of Ludwig Boltzmann's leaps of imagination was that the number of possible microstates, Ω, was finite, and in some sense a particle's state is discrete. But it wasn't until quantum mechanics was developed that this was clarified and shown to be strictly true.

Henri Poincairé and others showed that such an ideal particle system would necessarily cycle through all possible microstates, and that each would be visited with equal probability. The same holds for many real physical systems; generally no state or group of states is favored.

    Any one of these microstate is equally likely to be the actual microstate (near equilibrium) and we have no way of knowing which is the actual microstate. And we never will. This lack of information is not because we haven't examined the system closely; it reflects the inaccessibility of this information near equilibrium. But we can count how many microstates are possible.

    The thermodynamic entropy, S, for this case is:

S/k = log(Ωp)     Ωp = number of equally probable microstates,  k = Boltzmann's constant

Boltzmann's form of this equation is S = k ln(Ωp), where Boltzmann's constant has SI units of JK-1. Because thermodynamic entropy is dependent on the energy and temperature of the system, it was convenient to use this proportionality constant if these variables are measured or derived.
 
An alternative, used here, is to normalize thermodynamic variables such that the proportionallity constant is defined as 1. From Entropy in thermodynamics and information theory:

"The presence of Boltzmann's constant k in the thermodynamic definitions is a historical accident, reflecting the conventional units of temperature. It is there to make sure that the statistical definition of thermodynamic entropy matches the classical entropy of Clausius, thermodynamically conjugate to temperature. For a simple compressible system that can only perform volume work, the first law of thermodynamics becomes

 dE = p dV + T dS \,

But one can equally well write this equation in terms of what physicists and chemists sometimes call the 'reduced' or dimensionless entropy, σ = S/k, so that

 dE = p dV + kT d\sigma \,

Just as S is conjugate to T, so σ is conjugate to kT (the energy that is characteristic of T on a molecular scale)."

Writing the equation in this way doesn't change thermodynamics, or its expression in information theoretic terms.

   
    This statistical measure of thermodynamic entropy quantifies the uncertainty about which microstate is occupied. The higher the number of equally probable possibilities, the more uncertainty. Near equilibrium the system has a maximum entropy, because there are the most possible microstates near equilibrium. For example, there are very few possibilities for all the particles clumped in one corner of our insulated box but many possible ways they can be roughly evenly distributed across the box.

    Compare this with a set of 2-D 4 x 4 arrays of bits (images in this case, each one a kind of message), each with the same macro-state specified by the number of bits (N = 16, represented by black or white squares). Note that the number of bits, N, is the same in each instance, although all combinations of black and white are in the set.  If an acquaintance is to send you an image/message of this form (a 16-bit email, for example), and you have no prior information about which image/message is to be sent, then each of a countable number (65,536) of images/messages is equally probable.

ensemble of images

    The information theory entropy (Shannon entropy), H, for this case is defined as:

H = log(Ωp)      ,      Ωp = number of equally probable microstates

    The entropy H quantifies the uncertainty about what message is to be received. The higher the number of equally probable possibilities, the more entropy. The image/message has a maximum of entropy before it is received. But after it is received and read, there is no longer any uncertainty; there is only one possible microstate, the image/message itself; Ωp = 1 and H = 0.

    If a single one of these arrays is received as an image/message, the information, I, contained in the image/message is:

I = -log(1/Ωs) = log(Ωs)      ,      Ωs = number of equally probable microstates consistent with the message macrostate

If the microstates are not equally probable, these formulas for S, H and I need to be modified. They become weighted sums over all possible states, where the probability of each state is the weighting factor. See Entropy in thermodynamics and information theory.

    The probability of this particular image/message being sent is 1/Ωs. The larger the number of possibilities, the more uncertainty is resolved, or entropy reduced, when the particular image/message is received. Information is a measure of how much an image/message (an observed microstate) tells us, by comparison with the number of other messages it could have been (those consistent with the image/message's macrostate).

Here's the math. For this 16 bit message with 65,536 possibilities, a single message contains I = -log2(1/65,536) bits = -log2(2-16 ) bits = 16 bits. This is the amount the message was "surprising", or how much our uncertainty (entropy) was reduced -- it could have been a lot of things but it was this singular message. But this result -- 16 bits of information is contained in the message -- is not surprising for this simple example; we knew we were to be sent 16 bits and when we received the message we found out what each of the 16 bits was.

    H and I might seem redundant because the formulas are similar. But H does not equal I. Entropy refers to the uncertainty of an unknown message, and information refers to the probability of a known message occurring by chance alone.
More accurately, entropy is a measure of uncertainty due to the unknown part of a message/particle system, and information is a measure of reduction of uncertainty due to the known part of a message/particle system.

Information gained is equal to entropy lost. Information and entropy are two sides of the same probabilistic coin. While a flipped coin is spinning in the air the entropy H is one bit (an unknown heads or tails), and the information I is zero. When it lands and is observed, the entropy H is zero, and the information I is one bit (a known heads or tails).

    S and H (thermodynamic and Shannon entropy) are equivalent, in that S is directly proportional to H, and this is because the same conditions hold for both systems. S is reserved for thermodynamics, but H can can be applied to any statistical system. As Shannon and Weaver wrote:

“...the quantity which uniquely meets the natural requirements that one sets up for ‘information’ ... turns out to be exactly that which is known in thermodynamics as entropy.”


    The entropy S is a state function of a thermodynamic system, but it can't be directly measured like pressure and temperature. There is no entropy-meter; entropy must be inferred by varying the state of a system near equilibrium and observing how other thermodynamic variables (pressure, temperature, etc.) respond. This is one reason why the statistical mechanics interpretation of entropy is so important:

"[The] ability to make macroscopic predictions based on microscopic properties is the main asset of statistical mechanics over thermodynamics. Both theories are governed by the second law of thermodynamics through the medium of entropy. However, entropy in thermodynamics can only be known empirically, whereas in statistical mechanics, it is a function of the distribution of the system on its microstates." (from statistical mechanics)

    It might seem like this statistical interpretation of matter can cause matter to be "influenced" by our knowledge, or lack of knowledge, of its microstates. What does information or knowledge about microstates have to do with how a steam engine works! But this train of thought is a result of a misperception of microscopic states in nature. Which microstate a particle system is in is irreducibly (inherently) uncertain, in same sense that the position and momentum of individual particles are uncertain (Heisenberg's uncertainty principle). All we know about a steam engine is the possible microstates for any given macrostate, and we know nothing of the particular microstate.

    The fact that entropy almost always increases or stays the same (the second law of thermodynamics) is a statistical statement about the uncertainty of a particle system's microstate.

The fact that entropy sometimes can and does decrease is often glossed over in discussions of, and even the statement of, the second law of thermodynamics. The usefulness of the second law (it's explanatory power) is due to how frequently entropy doesn't measureably increase for any large number of particles. For even small macroscopic systems with a small number of possible states (e.g. > 1,000 particles each with >10 possible states and >101,000 total possible states), it is highly improbable (p <<< 1/2) that a measureable increase of entropy (e.g. a fractional increase of 1/1,000) will occur in the (current) lifetime of the universe (~1010 years). Almost is good enough for physics too.

Also see Fluctuation  theorem (accessed 10/1/08):

"The fluctuation theorem (FT) is a theorem from statistical mechanics dealing with the relative probability that the entropy of a system which is currently away from thermodynamic equilibrium (maximum entropy) will increase or decrease over a given amount of time. The second law of thermodynamics predicts that the entropy of an isolated system should tend to increase until it reaches equilibrium, but after the discovery of statistical mechanics physicists realized that the second law is only a statistical one, so that there should always be some nonzero probability that the entropy of an isolated system will spontaneously decrease; the fluctuation theorem precisely quantifies this probability."

    James Clerk Maxwell's thought experiment Maxwell's demon is an example of the importance of observability/uncertainty in discussing the second law. The experiment's resolution, that the demon can't cheat the second law because she can't observe the microstate without altering it, highlights the importance of observability/uncertainty in physics.

Other related stuff

    A good diagram illustrating the idea of "physical information" is in M. P. Franks paper "Physical limits of Computing".

    
    This site provides the freely available current version of the math book Entropy and Information Theory by R.M. Gray in the Adobe portable document format (PDF).
"This book is devoted to the theory of probabilistic information measures and their application to coding theorems for information sources and noisy channels. The eventual goal is a general development of Shannon's mathematical theory of communication, but much of the space is devoted to the tools and methods required to prove the Shannon coding theorems. These tools form an area common to ergodic theory and information theory and comprise several quantitative notions of the information in random variables, random processes, and dynamical systems."

    [To Do:  Show how these ideas can be extended to easily percieved messages, particulary images.]

Consider all 184 x 289 1-bit images, each composed of 53,754 bits. There are 253754 possible images. Each pixel is represented by one bit, black or white. These three are particular images, not arbitrary selections from all possible images :
low information image
Low
algorithmic complexity
"simple" (one of very few possible low information images)
High "image available energy" (non-random intensity gradients)
Farthest from equilibrium
medium information image
Medium
algorithmic complexity
"complex" (one of a few possible medium information images)
Medium "image available energy" (non-random intensity gradients)
Far from equilibrium
high information image
High
algorithmic complexity
"random" (one of many possible nearly random images)
Low "image available energy" (non-random intensity gradients)
Near equilibrium