Topic 09: Project - Scaling Kernels to Large Data¶
1. Objective¶
The goal of this project is to implement and benchmark large-scale kernel approximation techniques—Random Fourier Features (RFF) and the Nyström Method—against a standard Deep Learning baseline.
Dataset¶
- Forest Cover Type Prediction: Multi-class classification of forest types based on cartographic variables.
- Size: \(\approx 581,000\) samples.
- Task: Classify into one of 7 forest cover types.
2. Implementation Steps¶
Step 1: Data Preparation¶
- Download the dataset from Kaggle.
- Normalize all continuous features to zero mean and unit variance.
- Split the data into Training (\(70\%\)), Validation (\(15\%\)), and Test (\(15\%\)).
Step 2: Exact Kernel (Small Scale)¶
- Take a subset of \(N=5000\) points.
- Train a standard Kernel Ridge Classifier using the exact RBF kernel.
- Record the training time and accuracy.
Step 3: Random Fourier Features (RFF)¶
- Implement an RBF feature extractor:
- Sample \(\omega \sim \mathcal{N}(0, 2\gamma I)\).
-
\(\phi(x) = \sqrt{\frac{2}{m}} \cos(\omega^\top x + b)\).
-
Train a Linear SGD Classifier on the features \(\phi(x)\) for \(m \in \{100, 500, 1000, 5000\}\).
- Benchmark the training time and accuracy on the full dataset.
Step 4: Nyström Method¶
- Select \(m\) landmark points from the training set (use uniform sampling).
- Compute the low-rank approximation \(\hat{K}\).
- Solve the resulting system using the Nyström formula.
- Benchmark for the same values of \(m\).
3. Analysis & Expected Results¶
Performance Metrics¶
| Model | m | Accuracy | Training Time | Memory |
|---|---|---|---|---|
| Exact (Small) | N/A | |||
| RFF + Linear | 1000 | |||
| RFF + Linear | 5000 | |||
| Nyström | 1000 | |||
| Nyström | 5000 | |||
| MLP Baseline | N/A |
Expected Analysis Section¶
In your report, address the following:
- Accuracy vs. m: Plot how the accuracy improves as \(m\) increases. Is there a "diminishing returns" point?
- Approximation Quality: Compute the MSE between the exact Gram matrix and the approximations (on the \(5000\) point subset). Which method is more accurate for a fixed \(m\)?
- Efficiency: Discuss the memory trade-off. RFF requires \(O(m \times d)\) space for weights, while Nyström requires storing the landmarks \(O(m \times d)\) and the \(m \times m\) matrix.
- Conclusion: When would you recommend using a kernel approximation over a modern Deep Neural Network (MLP)?
4. Deliverables¶
- Source Code: A Python script or Jupyter Notebook (
kernel_bench.py). - Project Report: Including the plots and analysis mentioned above.
- Kaggle Submission: (Optional) Submit your RFF-based predictions to the Kaggle leaderboard and report your rank.
Tips¶
- Use
sklearn.kernel_approximationfor optimized implementations, but try to write your own RFF sampler first to understand the math. - For Nyström, the landmark points can be chosen via K-means for better coverage of the data manifold.