Dynamical Systems (5): Ergodicity

These are  the notes that i’m trying to write up for the 5th lecture of my doctoral course “Basic principles of dynamical systems” in Toulouse. This lecture is about ergodicity.

I’m behind schedule in many things, so please be patient with me. The correct lecture notes will appear one day. I’m even planning to write a small book based on my lectures.

What is ergodicity?

There are a lot of mathematical and physical literature about ergodic theory. So what is it anyway?

For mathematicians, regodicity means the following property:

Definition (grosso modo): A dynamical system is called ergodic if the space average is equal to the time average (for any variable and almost any initial state).

In order to make the aboe definition more precise, we need a probability measure on the phase space (= space of all possible states), so let’s call our phase space $(X,\mu)$. A dinamical system on $(X,\mu)$ is given by a map $\phi: X \to X$ (if the time is continuous, and the system if given by a vector field, then $\phi$ is the time-1 flow of that vector field). Denote by $F$ an arbitrary variable, i.e. an integrable function on $(X,\mu)$

The space average of $F$ is $\int_X F d \mu$

The time average of $F$ with an initil state $x \in X$ is $\lim_{N \to \infty} \frac{1}{N} \sum_{k=0}^{N-1} F(\phi^k(x))$ for discrete-time systems

or $\lim_{N \to \infty} \frac{1}{N} \int_0^N F(\phi^t(x))dt$ for continuous-time systems

(in case this limit exists)

The ergodicity property (hypothesis) means that the above two quantities are equal: $\lim_{N \to \infty} \frac{1}{N} \sum_{k=0}^{N-1} F(\phi^k(x)) = \int_X F d \mu$

for any $F$ and almost any $x \in X.$

Remark 1. In the above defintion, one assumes that the dynamical system $\phi$ is measure-preserving. (If not, then the property can’t be satisfied, why?)

Remark 2. It’s enough to verify the above equality for the case when $F$ is the characteristic function of an arbitrary subset $A$ of $X$. Then the property means that the average time that the orbit spends in $A$ is equal to the probability measure of $A$.

Remark 3. The above ergodicity property is equivalent to the condition that , for any measurable invariant subset $A$ of $X$, the probability measure of $A$ is equal to 0 or 1, i.e. there are no non-trivial invariant subsets from the probability point of view. (Birkhoff’s theorem).

Some examples

Example 0. Fast running fan. Don’t see the wings, only their distribution in space. Another example: hummingbirds, which can beat their wings 75 times per second (so we can’t see thir wings clearly when they are flying). They can hover, fly forward, backward, and even upside down! Tiny midge insects can even beat their wings 1000 times per second! Example 1. Quasi-periodic flow on the torus ${\mathbb T}^n$given by the constant vector field $\sum a_i \frac{\partial}{\partial q_i}$ is ergodic with respect to the standard (Haar) measure if and only if the numbers $a_i$ are incommensurable, i.e. there are no invariant sub-tori. 1-dim example: Irrational rotation.

Theorem: Let $G$ be a compact group with Haar measure, and $a \in G$ be an element. Then the rotation map $x \mapsto ax$ in $G$ if and only if the set $\{a^n, n \in {\mathbb N}\}$ is dense in $G$. In particular, $G$ must be a torus.

Example 2. Bernoulli shift $\{0,1\}^{\mathbb N} \to \{0,1\}^{\mathbb N}$ is ergodic? This is the law of large numbers! (Particular example: doubling map).

Example 3. Continued fraction map $x \mapsto [1/x]$: it preserves the so-called Gauss measure $\mu(E) = (\int_E \frac{dx}{1+x})/(\log 2)$ on the interval ]0,1[, and is ergodic.

Lochs’ Theorem (1964)

Example 4. Arnold’s cat map. Pick a 2×2 integer matrix in $SL(2,R),$ say ???

Origin and different meanings of the word “ergodic”

The word ergodic(ity) is not a very intuitive one. It was composed of two words of Greek origin, `ergo-” which means “work” or “deed”, and “odo-” which  mean “way”, so it is “the way things work”?! It was coined by the physicists  Paul and Tatiana Ehrenfest (husband and wife) in 1911 in order to formulate the ergodic hypothesis used in statistical mechanics. This ergodic hypothesis was initially formulated by Boltzmann in 1971, and alwasy by Maxwell, but they didn’t use the name “ergodic hypothesis”’. In 1984 Boltzmann introduced a similar German word “ergoden”, but gave a somewhat different meaning to the word (?)

The word “ergodic” now appears not only in mathematics and physics, bt also in statistics, information theory, etc. It may have different meanings in different contexts. Here are some meaning, taken from various dictionaries:

From “the free dictionary”:

 Adj. 1 ergodic– positive recurrent aperiodic state of stochastic systems; tending in probability to a limiting form that is independent of the initial conditions statistics – a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters random – lacking any definite plan or order or purpose; governed by or depending on chance; “a random choice”; “bombs fell at random”; “random movements”

From Meriam-Webster:

Definition of ERGODIC

1: of or relating to a process in which every sequence or sizable sample is equally representative of the whole (as in regard to a statistical parameter)
2: involving or relating to the probability that any state will recur; especially : having zero probability that any state will never recur

From an economics dictionary: A stochastic process is ergodic if no sample helps meaningfully to predict values that are very far away in time from that sample. Another way to say that is that the time path of the stochastic process is not sensitive to initial conditions.

From wikipedia: In physics and thermodynamics, the ergodic hypothesissays that, over long periods of time, the time spent by a particle in some region of the phase space of microstates with the same energy is proportional to the volume of this region, i.e., that all accessible microstates are equiprobable over a long period of time. The ergodic hypothesis is often assumed in statistical analysis.

Why ergodicity?

* Idea coming from thermodynamics / statisticalmehanics: huge number of particles, micro-state phase space has too many dimensions, but macro-state phase space has FEW dimensions.

Projection (quotient) map: micro-state phase space –> macro-state space

Variables on macro-state space on invariant (adiabatic) functions.

Each preimage is an invariant space on which the system is ergodic.

* Indistinguishability: when the system is ergodic, then states are indistinguishable (can’t distinguish them using an observable) –> reduced dynamics.

Remark: The ergodic hypothesis (in statistical mechanics) is just that, a hypothesis. The hypothesis is actually wrong in general, and even when it’s true it’s very difficult to prove it. However, its consequences of important are valid, and people need the consequences, not the ergodic hypothesis itself. The ergodic hypothesis is just a  convenient way to imagine things? The maximal entropy principle is right. (The set of micro-state of almost-maximal entropy has probability almost 1 — this fact can be proved mathematically in models).

* Practical way to measure things: take the average of some samples. (Like in the law of large numbers, Monte-Carlo method)

* Recurrence: Ergodic –> recurrent & passing everywhere

Boltzmann and Maxwell thought that an ergodic system would pass through every possible state, but from the mthematical point this is impossible cause the orbit is only 1-dimensional while the state space is multi-dimensional. Weaker version is that a typical orbit passes nearby every state as closely as one wishes.

Poincaré recurrence theorem. For any set $A \in X$ of positive measure, almost every point of $x \in A$ is recurrent w.r.t. $A$, i.e. there is in finite set of positive numbers $k$ such that $\phi^k (x) \in A$. (Here $\phi: X \to X$ is assumed to be measure-preserving).

Curiosity: Furstenberg-Sarkozy theorem. The squares are a Poincaré sequence.

Birkhoff ergodic theorem

Actyally, the above definition of ergodicity is known as Birkhoff’s ergodic theorem, which says that this “time average = space average” property follows from an a-priori weaker condition:

Definition bis: A measure-preserving system is ergodic iff any measureable invariant set has measure equal to 0 or 1.

Birkhoff’s theorem (1931): Definition bis is equivalent to Definition.

Some other ergodic theorems

Maximal ergodic theorem (Yosida-Kakutani 1939): If $E$ is a set of points $x \in X$ such that $\sup_{n \geq 1} \sum_{k=0}^{n-1} F(\phi^k(x)) \geq 0$ then $\int_E F(x) d\mu \geq 0$.

(Apparently, this Yosida-Kakutani theorem is an oral excercise for admission to ENS & X ?)

It is easy to see that the maximal ergodic theorem implies Birkhoff ergodic theorem. Prove it?!

Birkhoff-Khinchin theorem (probabilistic formulation): With probability 1 we have $\lim_{N \to \infty} \frac{1}{N} \sum_{k=0}^{N-1} F(\phi^k(x)) = {\mathbb E}( F | \mathcal{C})$ where $\mathcal{C}$ is the sigma-algebra of invariant measurable sets.

It is easy to see that Birkhoff-Khinchin is essentially equivalent to Birkhoff.

von Neumann’s mean ergodic theorem (inoperator theory). Let U be a unitary operator on a Hilbert space H; more generally, an isometric linear operator (that is, a not necessarily surjective linear operator satisfying ‖Ux‖ = ‖x‖ for all x in H, or equivalently, satisfying U*U = I, but not necessarily UU* = I). Let P be the orthogonal projectiononto {ψ ∈ H| Uψ = ψ} = Ker(I – U). Then $\frac{1}{N} \sum_{k=0}^{N-1} U^k$ converges to P in the strong operator topology.

Relation between von Neumann and Birkhoff: U = operator on the  space of functions, generated by the dynamical system.

Extension to non-measure-preserving systems

System is non-measure-preserving, but non-singular (i.e. pull-back measure is Radon-equivalent to original measure).

Hurewicz ergodic theorem

Unique ergodicity

means unique invariant measure for which the system is ergodic.

Theorem  Let T : X → X be a continuous transformation on a com-
pact metric space X. Then the following are equivalent:
(i) For every f ∈ C(X), the sequence {
1
n
n−1
f (T j x)} converges uniformly
j=0
to a constant.
1
(ii) For every f ∈ C(X), the sequence {
n
to a constant.
n−1
f (T j x)} converges pointwise
j=0
(iii) There exists a μ ∈ M (X, T ) such that for every f ∈ C(X) and all
x ∈ X.
n−1
1
f (T i x) =
f (y) dμ(y).
lim
n→∞ n
X
i=0
(iv) T is uniquely ergodic.

Some applications & generalizations?

Perron-Frobenius?

Number theory?

Algebraic geometry?

Riemannian geometry:

Hopf-Green Theorem: $\int_M S_M d V \leq 0$, where $M$ is compact Rimannian without conjugate points and $S_M$ is the normalized scalar curvature of M. The equality holds if and only if $M$ is a flat torus.

Oseledets?

Some references

V.I. Arnol’d, V. Avez, “Ergodic problems of classical mechanics”, Benjamin (1968)

Karma Dajani and Sjoerd Dirksin, A Simple Introduction to Ergodic Theory (2010)

M. Denken, C. Grillenberg, K. Sigmund, “Ergodic theory on compact spaces”, Springer (1976)

Cornfeld, I.; Fomin, S.; and Sinai, Ya. G. Appendix 3 in Ergodic Theory. New York: Springer-Verlag, 1982.

H. Furstenberg, Recurrence in Ergodic Theory and Combinatorial Number Theory.

U. Krengel, “Ergodic theorems”, de Gruyter (1985) pp. 261

G.W. Mackey, “Ergodic theory and its significance for statistical mechanics and probability theory” Adv. in Math., 12 (1974) pp. 178–268

R. Mañé, “Ergodic theory and differentiable dynamics”, Springer (1987)

K. Peterson, “Ergodic theory” , Cambridge Univ. Press (1983)

O. Sarig, Lecture notes on ergodic theory, 2009.

Peter Walters. An introduction to ergodic theory, 1982 (a bit formal?)