Mark
Dow
The connection
between thermodynamic entropy and
information
The bottom line is that thermodynamic entropy
is best understood not as
a property or macroscopic state of matter (like mass, temperature, or
pressure),
but
as a lack of knowledge of the detailed configuration of matter. In
particular, thermodynamic entropy is a measure of our lack of
information about the microstate of a closed system of matter near
equilibrium. To make this
concrete, I'll compare two similar simple systems, one of
particles and one of bits. Although the concept of entropy in classical
thermodynamics
was elucidated long before information
theory was developed, thermodynamic entropy can be viewed as a
straight-forward application of information theory to a physical
problem.
There are many other fine discussions of this topic,
but few that strip it down to a simple example. A more in-depth, but
more technical, discussion of the
same topic is at Entropy
in thermodynamics and information
theory. But
this discussion and others have the same bottom line, with only a
variation of language:
"....it should be remembered that
Gibb's statistical mechanical
entropy is only one application of information theory to physical
systems, relevant when the particular 'message' not yet communicated is
the underlying microstate of the physical system."
The 'message' in thermodynamics, the microstate of a physical
system, is never communicated. It is inaccessible to observation
or transmission.
Consider
a perfectly insulated 2-D box of
simple particles. The macrostate
of an ideal gas can be specified by the total energy E, number of
particles N and volume V. There are a large but finite number of
possible microstates that are all consistent with this system's
single, and unchanging, macro-state:
One of Ludwig
Boltzmann's leaps of imagination was that
the number of
possible microstates, Ω, was finite, and in some sense a
particle's state
is discrete. But it wasn't until quantum mechanics was developed that
this was clarified and shown to be strictly true.
Henri
Poincairé
and others showed that such an ideal particle system
would
necessarily
cycle through all possible microstates, and that each would be visited
with equal probability. The same holds for many real physical systems;
generally no state or group of states is favored.
|
Any one of
these microstate is equally likely to be the actual microstate (near
equilibrium) and we have no way of knowing which is the actual
microstate. And we never will. This lack of information is
not because we haven't examined the system closely; it reflects the
inaccessibility of this information near equilibrium. But
we can count how many
microstates are possible.
The thermodynamic entropy, S, for this case is:
S/k
= log(Ωp)
Ωp = number of equally probable microstates, k =
Boltzmann's constant
Boltzmann's form of this equation is S = k ln(Ωp),
where Boltzmann's constant has SI units of JK-1.
Because thermodynamic entropy is dependent on the energy and
temperature of the system, it was convenient to use this
proportionality constant if these variables are measured or derived.
An alternative, used here, is to normalize thermodynamic variables such
that the proportionallity constant is defined as 1. From Entropy
in thermodynamics and information
theory:
"The presence of Boltzmann's constant
k
in the thermodynamic definitions is a historical accident, reflecting
the conventional units of temperature. It is there to make sure that
the statistical definition of thermodynamic entropy matches the
classical entropy of Clausius, thermodynamically conjugate to temperature.
For a simple compressible system that can only perform volume work, the
first law of thermodynamics
becomes

But one can equally well write this equation in terms of what
physicists and chemists sometimes call the 'reduced' or dimensionless
entropy, σ = S/k, so that

Just as S is conjugate to T, so σ is
conjugate to kT (the energy that is characteristic of T
on a molecular scale)."
Writing the equation in this way doesn't change
thermodynamics, or its expression in information theoretic terms.
|
This statistical measure of thermodynamic entropy
quantifies the uncertainty about which microstate is occupied. The
higher the number of equally probable possibilities, the more
uncertainty. Near equilibrium the system has a maximum entropy, because
there are the most possible microstates near equilibrium. For example,
there are very few possibilities for all the particles clumped in one
corner of our insulated box but many possible ways they can be roughly
evenly distributed across the box.
Compare this with a set of 2-D 4 x 4 arrays of bits
(images in this case, each one a kind of message), each with the same
macro-state specified by the number of bits (N = 16, represented by
black or white squares). Note that the number of bits, N, is the same
in each instance, although all combinations of black and white are in
the set. If an acquaintance is to send you an
image/message of this form (a 16-bit email, for example), and you have
no prior information about which image/message is to be sent, then each
of a countable number
(65,536) of images/messages is equally probable.
The information
theory entropy (Shannon entropy), H,
for
this case is
defined as:
H
= log(Ωp) ,
Ωp = number of equally probable
microstates
The entropy
H quantifies the uncertainty
about what message is to be received. The
higher the number of equally probable possibilities, the more entropy.
The image/message has a maximum of entropy before it is
received. But after it is received and read, there is no longer any
uncertainty; there is only one possible microstate, the image/message
itself; Ωp = 1 and H
= 0.
If a single
one of these arrays is received as an image/message, the information, I, contained in the image/message is:
I
= -log(1/Ωs) = log(Ωs)
, Ωs = number of
equally probable microstates consistent with the message macrostate
If the microstates are not equally probable, these formulas
for S, H and I need to be modified. They become
weighted sums over all possible states, where the probability of each
state is the weighting factor. See Entropy
in thermodynamics and information
theory.
|
The probability of this particular image/message
being sent is 1/Ωs. The larger
the number of possibilities, the more uncertainty is resolved, or
entropy reduced, when the particular image/message is received.
Information
is a measure of how much an image/message (an
observed microstate) tells us, by comparison with the number of
other messages it could have been (those consistent with the
image/message's macrostate).
| Here's the math. For this 16 bit message
with 65,536
possibilities, a single message contains I = -log2(1/65,536) bits
= -log2(2-16 ) bits = 16
bits. This is the amount the message was "surprising", or how much our
uncertainty (entropy) was reduced -- it
could have been a lot of things but it was this singular message. But
this result -- 16 bits of information is contained in the message -- is
not
surprising for this simple example; we knew we were to be sent 16 bits
and when we received the message we found out what each of the 16 bits
was. |
H
and I might seem
redundant because the formulas are
similar. But H does not equal I. Entropy refers to the uncertainty
of an unknown
message, and information refers to the probability of a known message
occurring by chance alone.
| More accurately, entropy is a measure of
uncertainty due to the unknown part
of a
message/particle system, and information is a measure of
reduction of uncertainty due to the known part of a message/particle
system. |
Information gained is equal to entropy
lost. Information and entropy are two sides of the same
probabilistic coin. While a flipped coin is spinning in the air the
entropy H is
one bit (an unknown heads or tails), and the information I is zero.
When it lands and is observed, the entropy H is zero, and the
information I is one bit (a
known heads or tails).
S and H (thermodynamic and Shannon
entropy) are
equivalent, in that S is directly proportional to H, and this is because the same
conditions hold for both systems. S
is reserved for thermodynamics,
but H can can be applied to
any statistical system. As Shannon and Weaver wrote:
“...the quantity which uniquely
meets the natural requirements that one sets up for
‘information’ ... turns out to be exactly that which is
known in thermodynamics as entropy.”
The entropy S is
a state function
of a thermodynamic system, but it can't be directly measured like
pressure and temperature.
There is no entropy-meter; entropy must be inferred by
varying the state of a system near equilibrium and observing how other
thermodynamic variables (pressure, temperature, etc.) respond.
This is
one reason why the statistical
mechanics interpretation of entropy is so important:
"[The] ability to make macroscopic
predictions based on microscopic properties is the main asset of
statistical mechanics over
thermodynamics. Both theories are governed
by the
second
law of thermodynamics through the medium of
entropy.
However, entropy in thermodynamics
can only be known empirically, whereas in statistical mechanics, it is
a function of the distribution of the system on its microstates." (from
statistical mechanics)
It might seem like this statistical interpretation
of
matter can cause matter to be "influenced" by our knowledge, or lack of
knowledge, of its microstates. What does information or knowledge
about microstates have to do with how a steam engine works! But this
train of thought is a result of a misperception of microscopic states
in nature. Which microstate a particle system is in is irreducibly
(inherently)
uncertain, in same sense that the position and momentum of individual
particles are uncertain (Heisenberg's
uncertainty principle). All we know about a steam engine is the
possible microstates for any given macrostate, and we know nothing of
the particular microstate.
The fact that entropy almost always increases or
stays the same (the second
law of thermodynamics) is
a statistical statement about the uncertainty of a particle
system's microstate.
The fact that entropy sometimes
can and does
decrease is often glossed over in discussions of, and even
the statement of, the second law of thermodynamics. The usefulness of
the second law (it's explanatory power) is due to how frequently
entropy doesn't measureably increase for any large number of particles.
For even
small macroscopic systems with a small number of possible states (e.g.
> 1,000 particles each with >10 possible states and >101,000
total possible states), it is highly improbable (p <<< 1/2)
that a
measureable increase of entropy (e.g. a fractional increase of
1/1,000) will occur in the (current) lifetime of the universe (~1010
years). Almost is good enough for physics too.
Also see Fluctuation
theorem (accessed 10/1/08):
"The fluctuation theorem (FT) is
a theorem from statistical mechanics dealing with the relative
probability that the entropy of a system which is currently away from
thermodynamic equilibrium (maximum entropy) will increase or decrease
over a given amount of time. The second law of thermodynamics
predicts that the entropy of an isolated system should tend to increase
until it reaches equilibrium, but after the discovery of statistical
mechanics physicists realized that the second law is only a statistical
one, so that there should always be some nonzero probability that the
entropy of an isolated system will spontaneously decrease; the
fluctuation theorem precisely quantifies this probability."
|
James Clerk Maxwell's thought
experiment Maxwell's demon is an example of the
importance of observability/uncertainty in discussing the second law.
The experiment's resolution, that the demon can't cheat the second law because
she
can't observe the microstate without altering it, highlights the
importance of observability/uncertainty in physics.
Other related stuff
A good diagram illustrating the idea of "physical
information" is in M. P. Franks paper "Physical
limits of Computing".
This
site provides the freely available current version of the math book
Entropy and Information
Theory
by R.M. Gray
in the Adobe portable document format (PDF).
"This book is devoted to the theory of
probabilistic information measures and their application to coding
theorems for information sources and noisy channels. The eventual goal
is a general development of Shannon's mathematical theory of
communication, but much of the space is devoted to the tools and
methods required to prove the Shannon coding theorems. These tools form
an area common to ergodic theory and information theory and comprise
several quantitative notions of the information in random variables,
random processes, and dynamical systems."
[To Do: Show how these ideas can be extended
to easily percieved
messages, particulary images.]
Consider all 184 x 289 1-bit images,
each composed of 53,754 bits. There are 253754
possible images. Each pixel is represented by one bit, black or
white. These three are particular images, not arbitrary selections from
all possible images :

Low algorithmic complexity
"simple" (one of very few possible low information images)
High
"image available energy" (non-random intensity gradients)
Farthest
from equilibrium |

Medium algorithmic complexity
"complex" (one of a few possible medium information images)
Medium
"image available energy" (non-random intensity gradients)
Far
from equilibrium |

High algorithmic complexity
"random" (one of many possible nearly random images)
Low
"image available energy" (non-random intensity gradients)
Near
equilibrium |