on concentration of information and log-concave distributions

Overview

This post analyzes the landmark result by Sergey Bobkov and Mokshay Madiman (Bobkov & Madiman, 2011) regarding the concentration of information in log-concave distributions. The central finding is that for any log-concave random vector $X \in R^{n}$ , the information functional $i (X) = - lo g f (X)$ concentrates around its mean (the entropy) with sub-exponential tails. This property serves as a geometric substitute for independence, enabling the extension of the Shannon-McMillan-Breiman theorem to non-i.i.d. stochastic processes.

🏷️ Foundational Concepts

To understand the concentration results, we define the “surprise” of a distribution and the geometric constraints that stabilize it.

The Information Functional

For a random vector $X \in R^{n}$ with density $f$ , the Information Content $h (X)$ (also called “random entropy”) is:
$h (X) = - lo g f (X)$
The outcome density $f (X)$ determines the “surprise” of a realization. Its average value is the Shannon Entropy, $h (X) = E [h (X)]$ .

Log-Concave Measures

A distribution is log-concave if its density $f$ can be written as $f (x) = e^{- V (x)}$ where $V$ is a convex function. This geometric regularity prevents the mass from spreading too thin or having multiple peaks, forcing the distribution into a “well-behaved” shape.

Prefix Notation

For a stochastic process $X = (X_{1}, X_{2}, \dots)$ , we define the prefix (or $n$ -dimensional projection) as:
$X^{(n)} = (X_{1}, X_{2}, \dots, X_{n})$
This represents the first $n$ observations of the process, with $f_{n}$ denoting their joint density.

🏷️ Main Results: Concentration of Information

The primary contribution of the paper is proving that the random variable $h (X)$ stays extremely close to its mean $h (X)$ in high dimensions.

Theorem 1.1: Exponential Tail Concentration

If $X$ has a log-concave density $f$ on $R^{n}$ , then for all $t > 0$ :
$P {∣ h (X) - h (X) ∣ \geq t n} \leq 2 e^{- c t}$
where $c = 1/16$ . The fluctuations of information grow only as $O (n)$ , matching the Law of Large Numbers.

Corollary: Universal Variance Bound

For any log-concave random vector $X \in R^{n}$ , the variance of the information content satisfies:
$Var (lo g f (X)) \leq C n$
for some universal constant $C$ . This confirms that the “random entropy” per coordinate stabilizes at the rate $1/ n$ .

Proof: From Tail Concentration to Variance

The sub-exponential tail directly implies an $O (n)$ variance bound. Let $Z = ∣ h (X) - h (X) ∣$ . Using the tail integration formula:
$Var (lo g f (X)) = E [Z^{2}] = \int_{0}^{\infty} P (Z^{2} \geq u) d u$
Setting $u = t^{2} n$ and applying Theorem 1.1 ( $P \leq 2 e^{- c t}$ ):
$Var (lo g f (X)) \leq \int_{0}^{\infty} 2 e^{- c t} \cdot 2 n t d t = 4 n \int_{0}^{\infty} t e^{- c t} d t = \frac{4}{c ^{2}} n$
Thus, the “average spread” of information grows linearly with dimension.

Theorem 1.2: Gaussian Regime

For smaller deviations ( $0 \leq t \leq 2 n$ ), the decay is even sharper (Gaussian):
$P {\frac{1}{n} ∣ lo g f (X) - E lo g f (X) ∣ \geq t} \leq 3 e^{- c t^{2}}$

🏷️ Main Proof Strategy

The proof is an elegant multi-stage reduction from high dimensions down to 1D geometry, leveraging the multiscale preservation of log-concavity.

The 1D Baseline: Geometric Profile Stability

In 1D, the information is stable because of the concavity of the profile function $I (t) = f (F^{- 1} (t))$ , where $F$ is the CDF and $F^{- 1}$ its inverse. For log-concave $f$ , $I$ is concave on $(0, 1)$ . Using the identity for any function $Ψ$ :

\iint Ψ (f (x), f (y)) f (x) f (y) d x d y = \int_{0}^{1} \int_{0}^{1} Ψ (I (t), I (s)) d t d s

Choosing $Ψ (u, v) = e^{α ∣ l o g u - l o g v ∣}$ and normalizing $I (1/2) = 1/2$ , concavity forces $I (t) \geq min {t, 1 - t}$ . This allows bounding the 1D MGF:

E e^{α ∣ l o g f (X) - E l o g f (X) ∣} \leq \int_{0}^{1} \int_{0}^{1} e^{- α l o g m i n {t, s, 1 - t, 1 - s}} d t d s = \frac{2 ^{1 + α}}{( 1 - α ) ( 2 - α )}

For $α = 1/2$ , the bound is 4, ensuring $Var (lo g f (X)) \leq O (1)$ for all 1D log-concave distributions.

Reverse Lyapunov Inequalities: The Moment Engine

Standard Lyapunov inequalities state that $p \mapsto lo g E [η^{p}]$ is convex for any $η > 0$ . Bobkov and Madiman use a deep result by Borell (Borell, 1973) to show that for log-concave $η$ , the normalized moment function $\overset{ˉ}{λ}_{p} = E [η^{p}] /Γ (p + 1)$ is log-concave. This is derived by considering the volume of convex bodies in $R^{n} \times R$ and taking the limit $n \to \infty$ . This “reversed” stability ensures that moments of log-concave variables are tightly constrained, which is the key to bounding the fluctuations of $lo g η$ .

Log-Concavity of Order $p$ and the Trigamma Bound

A density $f (x) = x^{p - 1} g (x)$ for log-concave $g$ has log-concavity of order $p$ . The authors prove (Prop 4.1) that the concavity of $u (q) = lo g E [η^{q - 1}] - lo g Γ (q)$ implies:

u^{''} (p) = Var (lo g ξ) - ψ_{1} (p) \leq 0 ⟹ Var (lo g ξ) \leq ψ_{1} (p)

where $ψ_{1} (p) = \sum_{k = 0}^{\infty} (k + p)^{- 2}$ is the trigamma function. Since $ψ_{1} (p) \sim 1/ p$ for large $p$ , higher-order log-concavity forces the random information to become extremely stable.

The Localization Engine: Reduction to Weighted Needles

Using the KLS Localization Lemma (Kannan, Lovász & Simonovits, 1995), any integral inequality in $R^{n}$ is reduced to 1D “weighted needles” $Δ$ with density $f_{ℓ} (x) = \frac{1}{Z} f (x) ℓ (x)^{n - 1}$ . The global fluctuation is decomposed as:

∣ lo g f - E_{ℓ} lo g f ∣ \leq ∣ lo g f_{ℓ} - E_{ℓ} lo g f_{ℓ} ∣ + (n - 1) ∣ lo g ℓ - E_{ℓ} lo g ℓ ∣

The first term is the 1D log-concave baseline (Step 1), contributing $O (1)$ .
The second term involves the affine function $ℓ$ weighted by $ℓ^{n - 1}$ , which is log-concave of order $n$ (Step 3). Its fluctuations are $O ((n - 1) / n) = O (n)$ . Combining these via the convexity of the MGF lifts the $O (1/ n)$ needle-variance to the global $O (n)$ tail concentration.

🏷️ Application: The Strong Ergodic Theorem (SMB)

The most profound impact of this concentration is the extension of the Shannon-McMillan-Breiman (SMB) theorem.

Corollary 1.3: Extension of SMB

Let $X = (X_{1}, X_{2}, \dots)$ be a discrete-time stochastic process with log-concave joint marginals. If the entropy rate $h (X)$ exists, then:
$- \frac{1}{n} lo g f_{n} (X^{(n)}) \to h (X) almost surely.$

Why this extension is transformative

Geometry as a Substitute for Statistics: It replaces the traditional requirement of stationarity and mixing with joint convexity.

Non-Asymptotic Utility: The $O (n)$ concentration provides a universal rate of convergence that mixing-based theorems often lack.

High-Dimensional Stability: It proves that any system governed by a convex potential forms a “thin shell” of information in high dimensions, making its long-term average surprise perfectly predictable.

Thin Shell Hypothesis: Information concentration is the functional dual to mass concentration. In isotropic convex bodies, mass concentrates near a sphere of radius $n$ ; here, surprise concentrates near the entropy $h (f)$ .
Isotropic Position: When $Cov (X) = I$ , the theorem implies the density $f (x)$ is roughly constant on the $n$ -sphere.

🏷️ See Also

on sum-product in finite fields via entropy --- Entropy as a measure of structural complexity in discrete settings.
on eigenvalue estimate of kernel --- The spectral perspective of concentration of measure in log-concave spaces.

📚 References

🐻 Bobkov, S. & Madiman, M. 2011. Concentration of the information in data with log-concave distributions. The Annals of Probability 39(4).

🐻 Borell, C. 1973. Complements of lyapunov’s inequality. Mathematische Annalen 205(4), 323–331.

🐻 Kannan, R., Lovász, L. & Simonovits, M. 1995. Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry 13(3), 541–559.

Latent Seminar

Explorer

on concentration of information and log-concave distributions

🏷️ Foundational Concepts

🏷️ Main Results: Concentration of Information

🏷️ Main Proof Strategy

The 1D Baseline: Geometric Profile Stability

Reverse Lyapunov Inequalities: The Moment Engine

Log-Concavity of Order $p$ and the Trigamma Bound

The Localization Engine: Reduction to Weighted Needles

🏷️ Application: The Strong Ergodic Theorem (SMB)

🏷️ See Also

📚 References

Graph View

Table of Contents

Backlinks

Latent Seminar

Explorer

on concentration of information and log-concave distributions

🏷️ Foundational Concepts

🏷️ Main Results: Concentration of Information

🏷️ Main Proof Strategy

The 1D Baseline: Geometric Profile Stability

Reverse Lyapunov Inequalities: The Moment Engine

Log-Concavity of Order p and the Trigamma Bound

The Localization Engine: Reduction to Weighted Needles

🏷️ Application: The Strong Ergodic Theorem (SMB)

🏷️ Related Insights

🏷️ See Also

📚 References

Graph View

Table of Contents

Backlinks

Log-Concavity of Order $p$ and the Trigamma Bound