Overfitting: When the Model Fits Noise Instead of Signal
Overfitting occurs when a model is calibrated so closely to historical data that it captures the specific random fluctuations of that period rather than the underlying patterns that would persist in new data. The more parameters a model has relative to the amount of data, the higher the overfitting risk. If you test 1,000 variations of a strategy (different parameter combinations, different signals, different exit rules) and select the best performer, the winner is almost certainly the one that got lucky on that historical sample — not the one with the most robust underlying logic.
The multiple comparison problem is the mathematical driver of overfitting: if you test 100 strategies with no predictive power, you expect about 5 to show statistically significant positive returns at the 5% significance threshold by pure chance. If you then pick the best of those 100 strategies, you are selecting the luckiest random performer, not a genuine winner. Harvey, Liu, and Zhu estimated that most published factor research fails to account for the multiple testing problem — a substantial fraction of published 'factors' may be overfitted to the specific historical samples studied.