Topic 05 — Information Theory in Deep Learning¶
Information-theoretic quantities—Entropy, Mutual Information, and Kullback-Leibler Divergence—provide the mathematical bedrock for understanding how neural networks represent, compress, and generate data. This module bridges classical Shannon theory with modern Variational Inference and Information Geometry.
Prerequisite Tier: Tier 2 — Intermediate (Probability, Calculus)
📚 Course Modules¶
-
Lecture: Unified Mathematical Foundations Historical context, ELBO derivation, Information Bottleneck, and Information Geometry.
-
Practice: Exercises and Solutions Theoretical proofs on KL-divergence, MI estimation, and coding tasks for language modeling.
-
Project: VAEs and MI Estimation Empirical construction of Rate-Distortion curves and probing latents with MINE.
📄 Key Research Literature¶
- Tishby, N., & Zaslavsky, N. (2015): Deep Learning and the Information Bottleneck Principle - A seminal theory on learning and compression.
- Belghazi, M. I., et al. (2018): Mutual Information Neural Estimation - The MINE algorithm.
- Alemi, A. A., et al. (2017): Deep Variational Information Bottleneck - Application of IB to VAEs.
- Amari, S. (2016): Information Geometry and Its Applications - Foundational text on the geometry of probability manifolds.