7.3 Physics-Informed Neural Networks and Neural Operators¶
1. Physics-Informed Neural Networks (PINNs)¶
Physics-Informed Neural Networks (PINNs) integrate physical laws, typically expressed as Partial Differential Equations (PDEs), directly into the loss function of a neural network. Instead of relying solely on massive amounts of data, a PINN acts as a mesh-free PDE solver that "learns" the solution by penalizing deviations from the governing equations, boundary conditions, and initial conditions.
Consider a domain \(\Omega \subset \mathbb{R}^d\) and a general non-linear PDE parameterized by \(\lambda\):
with boundary conditions \(\mathcal{B}[u](x) = 0\) on \(\partial \Omega\). A PINN uses a neural network \(u_\theta(x)\) to approximate the solution \(u(x)\). The loss function is composed of two primary terms:
The derivatives required for \(\mathcal{F}_\lambda\) (e.g., spatial gradients, time derivatives) are computed exactly using Automatic Differentiation (AD).
2. Spectral Bias and the Neural Tangent Kernel (NTK)¶
A well-known challenge in training PINNs is their difficulty in learning high-frequency components of the PDE solution. This phenomenon is mathematically rigorously explained by the Spectral Bias theorem via the Neural Tangent Kernel (NTK).
2.1 Theorem Statement¶
Theorem (Spectral Bias of Neural Networks)
In the infinite-width limit, the training dynamics of a neural network under Gradient Descent follow a linear differential equation governed by the integral operator induced by the Neural Tangent Kernel \(\Theta(x, x')\). If the kernel is positive-definite and isotropic, the network learns the eigenfunctions of \(\Theta\) at a rate proportional to their corresponding eigenvalues. Since the eigenvalues of standard networks (like ReLU MLPs) decay polynomially with frequency, high-frequency target components converge exponentially slower than low-frequency components.
2.2 Proof¶
Step 1: NTK Formulation
Let \(u_\theta(x)\) be the network output. The continuous-time gradient descent dynamics for the parameters are:
For a simple L2 loss on a target function \(u^*(x)\): \(\mathcal{L} = \frac{1}{2} \int_\Omega (u_\theta(x) - u^*(x))^2 dx\).
Step 2: Function-Space Dynamics
We analyze how the output \(u_\theta(x)\) evolves:
Substitute the parameter dynamics:
Define the Neural Tangent Kernel: \(\Theta(x, x') = \nabla_\theta u_\theta(x)^T \nabla_\theta u_\theta(x')\). In the infinite-width limit, \(\Theta(x, x')\) remains strictly constant during training: \(\Theta(x, x') = \Theta_\infty(x, x')\).
Thus, the error residual \(e(x, t) = u_\theta(x, t) - u^*(x)\) evolves as:
Step 3: Spectral Decomposition
By Mercer's Theorem, since \(\Theta_\infty\) is a continuous symmetric positive-definite kernel, it admits a spectral decomposition:
where \(\lambda_k > 0\) are the eigenvalues in strictly decreasing order (\(\lambda_1 \ge \lambda_2 \ge \dots\)), and \(\phi_k(x)\) are the orthogonal eigenfunctions.
Step 4: Convergence Rates
Projecting the error onto the eigenfunctions \(c_k(t) = \int e(x, t) \phi_k(x) dx\), we obtain a set of decoupled ODEs:
The exact solution is:
For shift-invariant kernels on a sphere, the eigenfunctions \(\phi_k(x)\) are spherical harmonics (analogous to Fourier modes). High-frequency modes correspond to large \(k\). It has been shown that for ReLU networks, the eigenvalues decay rapidly: \(\lambda_k \sim k^{-d}\).
Therefore, the exponential decay rate \(\eta \lambda_k\) for high-frequency components (large \(k\)) approaches zero. The network rapidly fits the low-frequency components but requires exponentially more time to fit the high-frequency components. \(\blacksquare\)
3. Universal Approximation for Operators¶
While PINNs learn a single solution \(u(x)\), Neural Operators aim to learn a mapping between infinite-dimensional function spaces. For example, mapping an initial condition \(u(x, 0)\) to a future state \(u(x, T)\) across an entire family of initial conditions.
3.1 Theorem Statement¶
Theorem (Universal Operator Approximation, Chen & Chen 1995)
Let \(X\) be a compact metric space, \(V\) a compact set in \(C(X)\), and \(G\) a continuous nonlinear operator mapping \(V\) to \(C(X)\). For any \(\epsilon > 0\), there exists a single-hidden-layer neural operator defined by:
such that \(\sup_{u \in V} \sup_{x \in X} |G[u](x) - \hat{G}[u](x)| < \epsilon\).
3.2 Proof Overview¶
The proof hinges on projecting the infinite-dimensional function \(u \in V\) into a finite-dimensional representation, and then applying the standard Universal Approximation Theorem for vector functions.
- Compactness and Discretization: Since \(V\) is compact in \(C(X)\), by the ArzelĂ -Ascoli theorem, the functions in \(V\) are uniformly equicontinuous. This means we can find a finite set of sensor points \(\{x_1, \dots, x_m\} \subset X\) such that the discrete evaluation vector \(\vec{u} = (u(x_1), \dots, u(x_m))\) captures the necessary information to reconstruct \(u(x)\) up to an \(\epsilon/2\) error.
- Continuous Mapping: Let \(E: V \to \mathbb{R}^m\) be the evaluation operator \(E(u) = \vec{u}\). Because of uniform equicontinuity, there exists a continuous map \(F: \mathbb{R}^m \to C(X)\) such that \(G(u) \approx F(E(u))\).
- Standard Universal Approximation: The mapping \(F\) acts on finite-dimensional vectors \(\vec{u} \in \mathbb{R}^m\) to output a scalar value at any point \(x \in X\). By the classic Cybenko universal approximation theorem, a standard neural network can approximate \(F(\vec{u})(x)\) to arbitrary precision \(\epsilon/2\).
- Conclusion: Combining the discretization bound and the neural network approximation bound yields the full operator approximation error bound of \(\epsilon\). \(\blacksquare\)
4. Fourier Neural Operator (FNO)¶
The Fourier Neural Operator (FNO) is a highly efficient architecture for learning operators. Instead of point-wise discrete evaluations, FNO formulates the operation as an integral operator, evaluated efficiently in the frequency domain using the Fast Fourier Transform (FFT).
4.1 The Convolution Formula¶
An iterative layer in the FNO updates a hidden representation \(v_t(x)\) via a non-linear integral operator:
If we restrict the kernel to be stationary, i.e., \(\kappa_\phi(x, y) = \kappa_\phi(x - y)\), the integral becomes a continuous convolution:
By the Convolution Theorem, the Fourier transform of a convolution is the point-wise product of their Fourier transforms. Let \(\mathcal{F}\) denote the Fourier transform:
In the FNO, we parameterize the Fourier transform of the kernel directly as a learnable weight tensor \(R_\phi(\xi) \equiv \mathcal{F}(\kappa_\phi)(\xi)\). We also truncate the high-frequency modes (keeping only the lowest \(K\) modes) to enforce smoothness and resolution invariance.
Because this is formulated as a continuous integral evaluated globally via Fourier space, the model is inherently mesh-free and resolution-invariant!
5. Worked Examples¶
5.1 Example: 1D Burgers' Equation PINN Loss¶
Equation: \(\frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} = 0\). Given a point \((x, t)\) and network \(u_\theta(x, t)\).
- We compute \(u_t = \frac{\partial u_\theta}{\partial t}\) and \(u_x = \frac{\partial u_\theta}{\partial x}\), \(u_{xx} = \frac{\partial^2 u_\theta}{\partial x^2}\) using auto-diff.
- The PDE residual is \(f = u_t + u_\theta u_x - \nu u_{xx}\).
- Loss is \(L = \frac{1}{N} \sum f_i^2 + \lambda \sum (u_\theta(x_{ic}, 0) - u_{true})^2\).
5.2 Example: Advection Equation Operator Learning¶
Goal: Map \(u(x, 0)\) to \(u(x, T)\) for \(u_t + c u_x = 0\). The analytical operator is simply a shift: \(G[u](x) = u(x - cT)\). In Fourier space, a shift is a phase multiplication: \(\mathcal{F}(u(x-cT))(\xi) = e^{-i 2\pi \xi c T} \mathcal{F}(u)(\xi)\). An FNO can perfectly learn this exact operator by setting its spectral weights \(R(\xi) = e^{-i 2\pi \xi c T}\). Thus, FNO completely captures advection linearly.
5.3 Example: Eigenvalues of NTK for 1D Domain¶
Consider a 1D domain \([0, 1]\) and an NTK \(\Theta(x, x') = \cos(\pi(x - x'))\). By trigonometric identities, \(\cos(\pi(x - x')) = \cos(\pi x)\cos(\pi x') + \sin(\pi x)\sin(\pi x')\). The eigenfunctions are exactly \(\cos(\pi x)\) and \(\sin(\pi x)\) with eigenvalue \(\lambda = 1\). The network will learn these components at a uniform rate.
6. Coding Demonstrations¶
6.1 Basic PINN for a 1D ODE¶
We solve \(\frac{du}{dx} + u = 0\), with \(u(0) = 1\). The true solution is \(u(x) = e^{-x}\).
import torch
import torch.nn as nn
import torch.optim as optim
torch.manual_seed(42)
class PINN(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(1, 32), nn.Tanh(),
nn.Linear(32, 32), nn.Tanh(),
nn.Linear(32, 1)
)
def forward(self, x):
return self.net(x)
# Setup
model = PINN()
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
for epoch in range(1000):
optimizer.zero_grad()
# 1. PDE Loss
x_collocation = torch.rand(100, 1, requires_grad=True)
u_pred = model(x_collocation)
# Auto-differentiation for du/dx
du_dx = torch.autograd.grad(u_pred, x_collocation,
grad_outputs=torch.ones_like(u_pred),
create_graph=True)[0]
pde_residual = du_dx + u_pred
loss_pde = torch.mean(pde_residual**2)
# 2. Boundary Condition Loss
x_bc = torch.tensor([[0.0]])
u_bc_pred = model(x_bc)
loss_bc = (u_bc_pred - 1.0)**2
# Total loss
loss = loss_pde + loss_bc
loss.backward()
optimizer.step()
# Test
x_test = torch.tensor([[1.0]])
u_test = model(x_test).item()
print(f"PINN Prediction at x=1: {u_test:.4f} (True: {0.3679:.4f})")
6.2 Fourier Neural Operator (FNO) 1D Layer¶
Implementing the spectral convolution layer in PyTorch.
import torch
import torch.nn as nn
class SpectralConv1d(nn.Module):
def __init__(self, in_channels, out_channels, modes1):
super(SpectralConv1d, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.modes1 = modes1 # Number of Fourier modes to multiply
self.scale = 1.0 / (in_channels * modes1)
# Learnable complex weights
self.weights1 = nn.Parameter(
self.scale * torch.rand(in_channels, out_channels, self.modes1, dtype=torch.cfloat)
)
def forward(self, x):
# x shape: (batch, in_channels, grid_size)
batchsize = x.shape[0]
# 1. Compute Fourier transform (FFT)
x_ft = torch.fft.rfft(x)
# 2. Multiply relevant Fourier modes
out_ft = torch.zeros(batchsize, self.out_channels, x.size(-1)//2 + 1,
device=x.device, dtype=torch.cfloat)
# Complex multiplication: (batch, in, modes) * (in, out, modes) -> (batch, out, modes)
out_ft[:, :, :self.modes1] = torch.einsum("bix,iox->box",
x_ft[:, :, :self.modes1],
self.weights1)
# 3. Inverse Fourier transform (IFFT)
x = torch.fft.irfft(out_ft, n=x.size(-1))
return x
# Test the layer
modes = 16
layer = SpectralConv1d(in_channels=1, out_channels=1, modes1=modes)
# Batch of 4, 1 channel, spatial grid of 64
x_input = torch.randn(4, 1, 64)
output = layer(x_input)
print("FNO Input Shape:", x_input.shape)
print("FNO Output Shape:", output.shape)