Topic 05 — Information Theory in Deep Learning¶

Information-theoretic quantities—Entropy, Mutual Information, and Kullback-Leibler Divergence—provide the mathematical bedrock for understanding how neural networks represent, compress, and generate data. This module bridges classical Shannon theory with modern Variational Inference and Information Geometry.

Prerequisite Tier: Tier 2 — Intermediate (Probability, Calculus)

📚 Course Modules¶

Lecture: Unified Mathematical Foundations Historical context, ELBO derivation, Information Bottleneck, and Information Geometry.
Practice: Exercises and Solutions Theoretical proofs on KL-divergence, MI estimation, and coding tasks for language modeling.
Project: VAEs and MI Estimation Empirical construction of Rate-Distortion curves and probing latents with MINE.

📄 Key Research Literature¶

Tishby, N., & Zaslavsky, N. (2015): Deep Learning and the Information Bottleneck Principle - A seminal theory on learning and compression.
Belghazi, M. I., et al. (2018): Mutual Information Neural Estimation - The MINE algorithm.
Alemi, A. A., et al. (2017): Deep Variational Information Bottleneck - Application of IB to VAEs.
Amari, S. (2016): Information Geometry and Its Applications - Foundational text on the geometry of probability manifolds.