Overview
This post analyzes the landmark result by Sergey Bobkov and Mokshay Madiman (Bobkov & Madiman, 2011) regarding the concentration of information in log-concave distributions. The central finding is that for any log-concave random vector , the information functional concentrates around its mean (the entropy) with sub-exponential tails. This property serves as a geometric substitute for independence, enabling the extension of the Shannon-McMillan-Breiman theorem to non-i.i.d. stochastic processes.
🏷️ Foundational Concepts
To understand the concentration results, we define the “surprise” of a distribution and the geometric constraints that stabilize it.
The Information Functional
For a random vector with density , the Information Content (also called “random entropy”) is:
The outcome density determines the “surprise” of a realization. Its average value is the Shannon Entropy, .
Log-Concave Measures
A distribution is log-concave if its density can be written as where is a convex function. This geometric regularity prevents the mass from spreading too thin or having multiple peaks, forcing the distribution into a “well-behaved” shape.
Prefix Notation
For a stochastic process , we define the prefix (or -dimensional projection) as:
This represents the first observations of the process, with denoting their joint density.
🏷️ Main Results: Concentration of Information
The primary contribution of the paper is proving that the random variable stays extremely close to its mean in high dimensions.
Theorem 1.1: Exponential Tail Concentration
If has a log-concave density on , then for all :
where . The fluctuations of information grow only as , matching the Law of Large Numbers.
Corollary: Universal Variance Bound
For any log-concave random vector , the variance of the information content satisfies:
for some universal constant . This confirms that the “random entropy” per coordinate stabilizes at the rate .
Proof: From Tail Concentration to Variance
The sub-exponential tail directly implies an variance bound. Let . Using the tail integration formula:
Setting and applying Theorem 1.1 ():
Thus, the “average spread” of information grows linearly with dimension.
Theorem 1.2: Gaussian Regime
For smaller deviations (), the decay is even sharper (Gaussian):
🏷️ Main Proof Strategy
The proof is an elegant multi-stage reduction from high dimensions down to 1D geometry, leveraging the multiscale preservation of log-concavity.
The 1D Baseline: Geometric Profile Stability
In 1D, the information is stable because of the concavity of the profile function , where is the CDF and its inverse. For log-concave , is concave on . Using the identity for any function :
Choosing and normalizing , concavity forces . This allows bounding the 1D MGF:
For , the bound is 4, ensuring for all 1D log-concave distributions.
Reverse Lyapunov Inequalities: The Moment Engine
Standard Lyapunov inequalities state that is convex for any . Bobkov and Madiman use a deep result by Borell (Borell, 1973) to show that for log-concave , the normalized moment function is log-concave. This is derived by considering the volume of convex bodies in and taking the limit . This “reversed” stability ensures that moments of log-concave variables are tightly constrained, which is the key to bounding the fluctuations of .
Log-Concavity of Order and the Trigamma Bound
A density for log-concave has log-concavity of order . The authors prove (Prop 4.1) that the concavity of implies:
where is the trigamma function. Since for large , higher-order log-concavity forces the random information to become extremely stable.
The Localization Engine: Reduction to Weighted Needles
Using the KLS Localization Lemma (Kannan, Lovász & Simonovits, 1995), any integral inequality in is reduced to 1D “weighted needles” with density . The global fluctuation is decomposed as:
- The first term is the 1D log-concave baseline (Step 1), contributing .
- The second term involves the affine function weighted by , which is log-concave of order (Step 3). Its fluctuations are . Combining these via the convexity of the MGF lifts the needle-variance to the global tail concentration.
🏷️ Application: The Strong Ergodic Theorem (SMB)
The most profound impact of this concentration is the extension of the Shannon-McMillan-Breiman (SMB) theorem.
Corollary 1.3: Extension of SMB
Let be a discrete-time stochastic process with log-concave joint marginals. If the entropy rate exists, then:
Why this extension is transformative
- Geometry as a Substitute for Statistics: It replaces the traditional requirement of stationarity and mixing with joint convexity.
- Non-Asymptotic Utility: The concentration provides a universal rate of convergence that mixing-based theorems often lack.
- High-Dimensional Stability: It proves that any system governed by a convex potential forms a “thin shell” of information in high dimensions, making its long-term average surprise perfectly predictable.
🏷️ Related Insights
- Thin Shell Hypothesis: Information concentration is the functional dual to mass concentration. In isotropic convex bodies, mass concentrates near a sphere of radius ; here, surprise concentrates near the entropy .
- Isotropic Position: When , the theorem implies the density is roughly constant on the -sphere.
🏷️ See Also
- on sum-product in finite fields via entropy --- Entropy as a measure of structural complexity in discrete settings.
- on eigenvalue estimate of kernel --- The spectral perspective of concentration of measure in log-concave spaces.