MIT researchers proved you can delete 90% of a neural network and keep the accuracy. But almost nobody uses it in production.
MIT researchers proved you can delete 90% of a neural network and keep the accuracy.
"The Lottery Ticket Hypothesis" is one of the most elegant ideas in deep learning. Here's what it actually says (and why it matters now):
Train a dense network → prune 80-90% of weights by magnitude → reset the remaining weights to their original random values → retrain from scratch.
The result: a sparse network that performs just as well, sometimes even better.
The catch? It only works with the exact same initialization. Randomly reinitializing the sparse structure causes performance to collapse.
That made it impractical. You had to fully train the model just to find the subnetwork. And finding “winning tickets” typically required iterative pruning - repeating training 20-30 times.
But that’s starting to change:
→ Uber AI (2021) introduced one-shot pruning
→ NVIDIA Ampere+ supports 2:4 structured sparsity in hardware
→ PyTorch 2.0 added native sparse support
Structured pruning wasn’t part of the original paper, but it’s how this theory is reaching production. Combined with quantization and distillation, it can deliver 4-8x faster inference on real hardware!
This matters even more for LLMs:
Recent work shows models like LLaMA can be pruned to over 70% sparsity without losing accuracy, backing the idea that most parameters may be redundant from the start.
Sparse models also tend to generalize better. The original paper called this Occam’s Hill: a sweet spot between overfitting and underfitting.
Scaling remains a challenge - efficiently finding winning tickets in billion-parameter models is still compute-heavy - but the foundations are solid, and the tools are catching up.
Source: Stanislav Beliaev
Labels:
News
