Mark Dow

The connection between thermodynamic entropy and information

"Gain in entropy always means loss of information, and nothing more."  Gilbert N. Lewis
"Information is just known entropy. Entropy is just unknown information."  Michael P. Frank, in  "Physical limits of Computing"
"More bullshit has been written about entropy than about any other physical quantity." David Beeman

    The bottom line is that thermodynamic entropy is best understood not as a property or macroscopic state of matter (like mass, temperature, or pressure), but as a lack of knowledge of the detailed configuration of matter. In particular, thermodynamic entropy is a measure of our lack of information about the micro-state of a closed system of matter at equilibrium. To make this concrete, I'll compare two similar simple systems, one of particles and one of bits. Although the concept of entropy in classical thermodynamics was elucidated long before information theory was developed, thermodynamic entropy can be viewed as a straight-forward application of information theory to a physical problem.

    There are many other fine discussions of this topic, but few that strip it down to a simple example. I have strayed slightly from some common conventions (e.g.  I use "micro-state" instead of "microstate", for emphasis), and suppress a range of worthwhile elaborations. A more in-depth, but more technical, discussion of the same topic is at Entropy in thermodynamics and information theory. But this article and others have the same bottom line, with only a variation of language:

"....it should be remembered that Gibb's statistical mechanical entropy is only one application of information theory to physical systems, relevant when the particular 'message' not yet communicated is the underlying microstate of the physical system."

An good diagram illustrating this idea of "physical information" is in M. P. Franks paper "Physical limits of Computing".

Background:
What are particles and bits?
What the heck is "log", and what does it have to do with this topic?


    Consider a perfectly insulated 2-D box of simple particles. The macro-state of an ideal gas can be specified by the total energy E, number of particles N and volume V. There are a large but finite number of possible micro-states that are all consistent with this system's single, and unchanging, macro-state:
particle ensemble
Ludwig Boltzmann's leap of imagination was that the number of possible micro-states, Ω, was finite, and in some sense a particle's state is discrete. But it wasn't until quantum mechanics was developed that this was clarified and shown to be strictly true.

Henri Poincairé and others showed that such a particle system would necessarily cycle through all possible micro-states, and that each would be visited with equal probability.

    Any one of these micro-state is equally likely to be the actual micro-state (near equilibrium) and we have no way of knowing which is the actual micro-state. This lack of information is not because we haven't examined the system closely; it reflects the inaccessibility of this information near equilibrium. But we can count how many micro-states are possible.

    The thermodynamic entropy, S, for this case is:

S = log(Ωp)      ,      Ωp = number of equally probable micro-states
Boltzmann's form of this equation is S = k ln(Ω), where k is Boltzmann's constant. Same equation, different units. I'm using bits (implicity log2) as units, where Boltzmann used the SI units J K-1, or nats, where 1 nat = 1/ln(2) bits.
    This statistical measure of thermodynamic entropy quantifies the uncertainty about which micro-state is occupied. The higher the number of equally probable possibilities, the more uncertainty. Near equilibrium the system has a maximum entropy, because there are the most possible micro-states near equilibrium (for example, there are very few possibilities for all the particles clumped in one corner of our insulated box, but many possible ways they can be roughly evenly distributed across the box).

    Compare this with a set of 2-D 4 x 4 arrays of bits (images in this case, each one a kind of message), each with the same macro-state specified by the number of bits (N = 16, represented by black or white squares).  If an acquaintance is to send you an image/message of this form (a 16-bit email, for example), and you have no prior information about which image/message is to be sent, then each of a countable number (65,536) of images/messages is equally probable.
ensemble of images
    The information theory entropy (Shannon entropy), H, for this case is defined as:

H = log(Ωp)      ,      Ωp = number of equally probable micro-states

    The entropy H quantifies the uncertainty about what message is to be received. The higher the number of equally probable possibilities, the more entropy. The image/message has a maximum of entropy before it is received. But after it is received and read, there is no longer any uncertainty; there is only one possible micro-state, the image/message itself; Ωp = 1 and H = 0.

    If a single one of these arrays is received as an image/message, the information, I, contained in the image/message is:

I = -log(1/Ωs) = log(Ωs)      ,      Ωs = number of equally probable micro-states consistent with the message macro-state
If the micro-states are not equally probable, these formulas for S, H and I need to be modified. They become weighted sums over all possible states, where the probability of each state is the weighting factor. See Entropy in thermodynamics and information theory.
    The probability of this particular image/message being sent is 1/Ωs. The larger the number of possibilities, the more uncertainty is resolved, or entropy reduced, when the particular image/message is received. Information is a measure of how much an image/message (an observed micro-state) tells us, by comparison with the number of other messages it could have been (those consistent with the image/message's macro-state).
Here's the math. For this 16 bit message with 65,536 possibilities, a single message contains I = -log2(1/65,536) bits = -log2(2-16 ) bits = 16 bits. This is the amount the message was "surprising", or how much our uncertainty (entropy) was reduced -- it could have been a lot of things but it was this singular message. But this result -- 16 bits of information is contained in the message -- is not surprising for this simple example; we knew we were to be sent 16 bits and when we received the message we found out what each of the 16 bits was.

    H and I might seem redundant because the formulas are similar. But H does not equal I. Entropy refers to the uncertainty of an unknown message, and information refers to the probability of a known message occurring by chance alone.
More accurately, entropy is a measure of uncertainty due to the unknown part of a message/particle system, and information is a measure of reduction of uncertainty due to the known part of a message/particle system.
Information gained is equal to entropy lost. Information and entropy are two sides of the same probabilistic coin. While a flipped coin is spinning in the air the entropy H is one bit (an unknown heads or tails), and the information I is zero. When it lands and is observed, the entropy H is zero, and the information I is one bit (a known heads or tails).

    S and H are equivalent, in that S = H of a thermodynamic system. S is reserved for thermodynamics, but H can can be applied to any statistical system.

    The entropy S is a state function of a thermodynamic system, but it can't be directly measured like pressure and temperature (see measuring entropy). There is no entropy-meter; entropy must be infered by varying the state of a system near equlibrium and observing how it responds. This is one reason why the statistical mechanics interpretation of entropy is so important:

"[The] ability to make macroscopic predictions based on microscopic properties is the main asset of statistical mechanics over thermodynamics. Both theories are governed by the second law of thermodynamics through the medium of entropy. However, entropy in thermodynamics can only be known empirically, whereas in statistical mechanics, it is a function of the distribution of the system on its microstates." (from statistical mechanics )

    It might seem like this statistical interpretation of matter can cause matter to be "influenced" by our knowledge, or lack of knowledge, of its micro-states. What does information or knowledge about micro-states have to do with how a steam engine works! But this train of thought is a result of a misperception of microscopic states in nature. Which micro-state a particle system is in is irreducibly (inherently) uncertain, in same sense that the position and momentum of individual particles are uncertain (Heisenberg's uncertainty principle). The fact that entropy almost always increases or stays the same (the second law of thermodynamics) is a statistical statement about the uncertainty of a particle system's micro-state.
The fact that entropy sometimes can decrease is often glossed over in discussions of, and even the statement of, the second law of thermodynamics. The usefulness of the second law (it's explanatory power) is due to how frequently entropy doesn't increase for any large number of particles. For even small macroscopic systems (e.g. > 10,000 particles, >~210,000 possible states), it is highly improbable (p <<< 1/2) that a measureable increase of entropy (e.g. a fractional increase of 1/10,000) will occur in the (current) lifetime of the universe (~1010 years). Almost is good enough for physics too.

    James Clerk Maxwell's thought experiment Maxwell's demon is an example of the importance of observability/uncertainty in discussing the second law. The experiment's resolution, that the demon can't cheat the second law because she can't observe the micro-state without altering it, highlights the importance of observability/uncertainty in physics.


To Do: Comparison of non-equilibrium systems moving to equilibrium.

To Do:  Show how these ideas can be extended to easily percieved messages, particulary images.
53,754 bit (184 x 289) images ( 253754 possible images ). Each pixel is represented by one bit, black or white. These three are particular images, not arbitrary selections from all possible 184 x 289 one-bit images :
low information image
Low
algorithmic complexity
"simple" (one of very few possible low information images)
High "image available energy" (non-random intensity gradient)
Farthest from equilibrium
medium information image
Medium
algorithmic complexity
"complex" (one of a few possible medium information images)
Medium "image available energy"
Far from equilibrium
high information image
High
algorithmic complexity
"random" (one of many possible nearly random images)
Low "image available energy"
Near equilibrium