on Dudley's Theorem

Overview

This post explores Dudley’s Theorem, a cornerstone of the theory of stochastic processes. It provides a foundational upper bound on the expected supremum of a sub-Gaussian process by quantifying the “complexity” of the index set using Metric Entropy. The theorem represents the birth of the Chaining method, which bridges geometry and probability to control the global behavior of random fluctuations.

🏷️ Complexity and Metric Entropy

To bound a stochastic process ${X_{t}}_{t \in T}$ , we must measure how “large” the index set $T$ is relative to the process’s fluctuations.

Definitions

Canonical Metric: For a process $X_{t}$ , we define a natural metric $d (t, s) = ∥ X_{t} - X_{s} ∥_{ψ_{2}}$ (the sub-Gaussian norm of the increment).

Covering Number ( $N (T, d, ϵ)$ ): The minimum number of balls of radius $ϵ$ required to cover $T$ .

Metric Entropy: The quantity $lo g N (T, d, ϵ)$ . It measures the “bits of information” needed to resolve the set $T$ at scale $ϵ$ .

🏷️ The Chaining Mechanism

Dudley’s insight (Dudley, 1967) was that a process can be decomposed into a series of approximations at different scales. Instead of looking at a single point, we “chain” together a sequence of increasingly refined points $t_{0}, t_{1}, t_{2}, \dots$ that converge to any $t \in T$ .

Multi-Scale View: At each level $k$ , we approximate $T$ with a finite net $N_{k}$ of size $N (T, d, 2^{- k})$ .
Path Decomposition: Any $X_{t}$ can be written as a telescoping sum:
$X_{t} = X_{t_{0}} + k = 1 \sum \infty (X_{t_{k}} - X_{t_{k - 1}})$
where $t_{k}$ is the closest point to $t$ in the $k$ -th net.
Control: By bounding the fluctuations of the increments $(X_{t_{k}} - X_{t_{k - 1}})$ and using a union bound over the nets, we control the total path.

🏷️ Main Theorem and Proof

Dudley’s Entropy Bound

Let ${X_{t}}_{t \in T}$ be a centered sub-Gaussian process with respect to a metric $d$ . Then:
$E t \in T sup X_{t} \leq C \int_{0}^{diam (T)} lo g N (T, d, ϵ) d ϵ$
where $C$ is a universal constant.

Proof: The Convergence of Chains

The proof formalizes the multi-scale decomposition using a Union Bound over each scale of the chain.

Incremental Error: At scale $k$ , the distance between $t_{k}$ and $t_{k - 1}$ is at most $ϵ_{k} + ϵ_{k - 1} \approx 3 \cdot 2^{- k}$ . For a sub-Gaussian process, the probability of a large increment is $P (∣ X_{t_{k}} - X_{t_{k - 1}} ∣ > u) \leq 2 exp (- u^{2} /2 ϵ_{k}^{2})$ .

Union Bound: The number of possible pairs $(t_{k}, t_{k - 1})$ is bounded by the size of the nets $∣ N_{k} ∣ \cdot ∣ N_{k - 1} ∣ \leq N_{k}^{2}$ . Taking a union bound, the maximum increment at level $k$ satisfies:
$E t \in T max ∣ X_{t_{k}} - X_{t_{k - 1}} ∣ \leq C ϵ_{k} lo g N_{k}$
Integration: Summing over all levels $k$ :
$E t \in T sup X_{t} \leq k = 1 \sum \infty E t \in T max ∣ X_{t_{k}} - X_{t_{k - 1}} ∣ \leq C k = 1 \sum \infty 2^{- k} lo g N (T, d, 2^{- k})$
This sum is a Riemann sum approximation of the integral $\int lo g N (T, d, ϵ) d ϵ$ . The convergence of the integral ensures that the process is almost surely bounded and continuous.

Lemma: Expectation of the Maximum

The transition from the Union Bound to the expectation bound $E max ∣ X_{i} ∣ \leq σ 2 lo g M$ is a fundamental result in high-dimensional probability.

Derivation:

Jensen’s Inequality: For any $λ > 0$ , by the convexity of the exponential function:

$exp (λ E 1 \leq i \leq M max X_{i}) \leq E exp (λ i max X_{i}) = E i max e^{λ X_{i}}$

Union Bound on MGF: We bound the maximum by the sum:

$E i max e^{λ X_{i}} \leq i = 1 \sum M E e^{λ X_{i}}$

Sub-Gaussian Assumption: By definition, $E e^{λ X_{i}} \leq e^{λ^{2} σ^{2} /2}$ . Thus:

$exp (λ E max X_{i}) \leq M e^{λ^{2} σ^{2} /2}$

Logarithmic Bound: Taking the logarithm:

$E max X_{i} \leq \frac{lo g M}{λ} + \frac{λ σ ^{2}}{2}$

Optimization: Minimizing with respect to $λ$ (setting $λ = \frac{2 l o g M}{σ}$ ) yields the sharp bound:

$E 1 \leq i \leq M max X_{i} \leq σ 2 lo g M$

Slepian’s Comparison: Correlation vs. Supremum

A critical refinement of Dudley’s result is Slepian’s Lemma. It addresses the intuition that correlation between variables “shrinks” the expected maximum.

The Principle: Consider two Gaussian processes ${X_{t}}$ and ${Y_{t}}$ with identical variances $E X_{t}^{2} = E Y_{t}^{2}$ . If the increments of $Y$ are “more spread out” than $X$ , i.e., $E (X_{t} - X_{s})^{2} \leq E (Y_{t} - Y_{s})^{2}$ for all $t, s$ , then:
$E t \in T sup X_{t} \leq E t \in T sup Y_{t}$
Intuition: Imagine $n$ identical variables. If they are i.i.d., they have the maximum “room” to fluctuate independently, leading to a large maximum ( $\approx 2 lo g n$ ). If they are perfectly correlated ( $X_{1} = X_{2} = \dots = X_{n}$ ), they move as a single block, and the maximum is just the expectation of a single variable (zero). Slepian’s Lemma formalizes this: independence is the “worst-case” for fluctuations.

Sharpness and Generic Chaining

While Dudley’s bound is powerful, it is not always sharp. The integral captures the “average” complexity across scales but can fail for processes with “highly correlated” increments (where the entropy integral overcounts complexity).

The Improvement: Michel Talagrand [@talagrand2014upper] refined this into Generic Chaining, replacing the entropy integral with the $γ_{2}$ functional.

The Majorizing Measure Theorem: Talagrand proved that for Gaussian processes, the supremum is perfectly characterized (both upper and lower bound) by the $γ_{2}$ functional:

$\frac{1}{L} γ_{2} (T, d) \leq E t \in T sup X_{t} \leq L γ_{2} (T, d)$
where $γ_{2} (T, d) = in f sup_{t \in T} \sum_{k = 0}^{\infty} 2^{k /2} d (t, A_{k} (t))$ and $A_{k}$ is a sequence of nested partitions of $T$ . The $in f$ is taken over all admissible sequences of partitions $(A_{k})_{k \geq 0}$ satisfying the cardinality constraint $∣ A_{k} ∣ \leq 2^{2^{k}}$ . Unlike Dudley’s fixed nets, this allowing for spatially adapted resolution. It allocates more “budget” to geometrically complex regions of $T$ , finding the optimal chaining tree that minimizes the total cost for the worst-case point $t$ .

Applications in Learning Theory

In machine learning (as seen in on improved Nyström bounds), Dudley’s theorem is used to bound the Rademacher Complexity of function classes.

If a function class has a small metric entropy (e.g., its “effective dimension” is small), then it will generalize well because its random fluctuations (noise) are constrained by the integral.
This links directly to the spectral decay of kernels; rapid decay in eigenvalues often implies a rapidly shrinking metric entropy (see also [@vershynin2018high]).

📝 Notes

The “Diam(T)” limit in the integral can often be replaced by the scale where the covering number becomes 1.
This technique shares the “log-summation” logic found in on Moser iteration, where a sequence of geometrically improving bounds is summed to reach a global limit.

🔗 See Also

on Slepian’s lemma and Gaussian comparison --- Comparison inequalities provide the fundamental lower bounds for metric entropy that establish the sharpness of the chaining upper bound.
on improved Nyström bounds --- The complexity of the kernel space, which determines the Nyström error, is quantified by the same metric entropy used in Dudley’s bound.
on eigenvalue estimate of kernel --- Covering numbers and eigenvalues are dual views of set complexity (Carl’s Inequality).
on sum-product in finite fields via entropy --- The entropy framework in combinatorics uses similar “information bits” logic to bound set expansion.

📚 References

🐻 Dudley, R.M. 1967. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis 1(3), 290–330.

Latent Seminar

Explorer

on Dudley's Theorem

🏷️ Complexity and Metric Entropy

🏷️ The Chaining Mechanism

🏷️ Main Theorem and Proof

Applications in Learning Theory

📝 Notes

🔗 See Also

📚 References

Graph View

Table of Contents

Backlinks

Latent Seminar

Explorer

on Dudley's Theorem

🏷️ Complexity and Metric Entropy

🏷️ The Chaining Mechanism

🏷️ Main Theorem and Proof

🏷️ Insights and Refinements

Applications in Learning Theory

📝 Notes

🔗 See Also

📚 References

Graph View

Table of Contents

Backlinks