Practice 07: Differential Equations in Deep Learning¶
1. Theoretical Exercises¶
1.1 Picard-Lindelöf and Neural ODEs¶
Problem: A Neural ODE is defined by \(\dot{h} = f(h, t, \theta)\). Suppose \(f(h, t, \theta) = \text{ReLU}(Wh + b)\).
- Is \(f\) globally Lipschitz?
- Does the Picard-Lindelöf theorem guarantee a unique solution for all \(t \in [0, T]\)?
- What happens if we use an activation function like \(x^2\)?
1.2 The Adjoint of a Linear ODE¶
Problem: Consider \(\dot{h} = \Theta h\). Prove that the adjoint state \(a(t)\) satisfies \(\dot{a} = -\Theta^\top a\). Hint: Use the definition \(\dot{a} = -a^\top \frac{\partial f}{\partial h}\).
1.3 Score-based Models and Langevin Dynamics¶
Problem: The Score-based generative model uses the gradient of the log-density \(\nabla_x \log p(x)\).
- Relate the Probability Flow ODE to the Langevin SDE \(dx = \nabla_x \log p(x) dt + \sqrt{2} dW\).
- Why is the score function easier to learn than the density \(p(x)\) itself?
1.4 PINNs and Spectral Bias¶
Problem: Research "Spectral Bias" in neural networks.
- Why do PINNs struggle to solve PDEs with high-frequency components?
- How do "Fourier Features" (Random Fourier Features) help mitigate this?
1.5 Conservation in Hamiltonian Neural Networks¶
Problem: Prove that for an HNN with \(H_\theta(q, p)\), the total derivative \(\frac{dH_\theta}{dt}\) is zero. Task: Show \(\frac{dH}{dt} = \frac{\partial H}{\partial q} \dot{q} + \frac{\partial H}{\partial p} \dot{p} = 0\) using the HNN equations of motion.
1.6 Divergence and Change of Variables¶
Problem: In the Probability Flow ODE, the change in log-probability is given by the Trace of the Jacobian (Divergence).
- For \(x \in \mathbb{R}^d\), what is the computational complexity of computing \(\text{Tr}(\frac{\partial f}{\partial x})\) exactly?
- Explain the Hutchinson's Trace Estimator trick: \(\text{Tr}(A) = \mathbb{E}_{\epsilon \sim \mathcal{N}(0, I)} [\epsilon^\top A \epsilon]\).
1.7 Stability of Symplectic Integrators¶
Problem: Why is the Forward Euler method (\(q_{n+1} = q_n + \Delta t p_n, p_{n+1} = p_n - \Delta t \nabla V(q_n)\)) not suitable for long-term simulation of a pendulum? Task: Show that the energy \(E = \frac{1}{2}p^2 + \frac{1}{2}q^2\) increases over time for a simple harmonic oscillator under Forward Euler.
1.8 Neural Operators vs. PINNs¶
Problem: Compare the "Loss Function" of an FNO and a PINN.
- Which one requires labeled data (ground truth solutions)?
- Which one can be used in an "unsupervised" (physics-only) manner?
2. Coding Practice¶
2.1 Solving Lotka-Volterra with Neural ODE¶
Task: Use torchdiffeq to model the predator-prey system:
- Generate synthetic data using a standard ODE solver.
- Train a Neural ODE to learn the dynamics from noisy observations.
- Visualize the learned phase portrait \((x, y)\).
2.2 HMC: Sampling a Banana Distribution¶
Task: Implement a simple Hamiltonian Monte Carlo (HMC) sampler using the Leapfrog integrator to sample from the "Banana" distribution:
- Plot the trajectory of the particle in phase space.
- Verify that the sampler explores the distribution better than standard Metropolis-Hastings.
3. Hints & Solutions¶
3.1 Hints¶
- 1.1: ReLU is Lipschitz with constant \(\|W\|\), but its derivative is discontinuous.
- 1.5: Just substitute \(\dot{q} = \partial H / \partial p\) and \(\dot{p} = -\partial H / \partial q\).
- 2.1: Define the vector field as a
nn.Module. Useodeint(func, y0, t).
3.2 Solutions (Brief)¶
- 1.2: \(\frac{\partial (\Theta h)}{\partial h} = \Theta\). Thus \(\dot{a} = -a^\top \Theta \implies \dot{a} = -\Theta^\top a\) (for column vector \(a\)).
- 1.7: For \(\dot{q}=p, \dot{p}=-q\), Forward Euler gives \(q_{n+1} = q_n + \Delta t p_n\) and \(p_{n+1} = p_n - \Delta t q_n\). Then \(q_{n+1}^2 + p_{n+1}^2 = (q_n + \Delta t p_n)^2 + (p_n - \Delta t q_n)^2 = (1 + \Delta t^2)(q_n^2 + p_n^2) > q_n^2 + p_n^2\). The energy grows exponentially.