Why Your Backtest Is Lying to You: Data Biases That Ruin Equity Research

You built a backtest. It returns 18% annualized over the last
decade. The Sharpe is 1.4. The drawdown is acceptable. You're
already mentally apartment-shopping in the city you'll move to
after the strategy makes you wealthy.

It's probably wrong. Not because your code has bugs — backtest
code can be flawless and still produce numbers that have almost
nothing to do with what would have happened if you'd traded the
strategy live. The reasons are subtle and they have names.

Survivorship bias

The most famous one. If your historical universe is "S&P 500
today, backtested over ten years," you've quietly excluded every
company that was in the S&P 500 at some point and got booted out
— usually because it failed.

The companies that died don't show up in the dataset. Your
historical "S&P 500" is actually "the winning subset of the
S&P 500 visible from the future." Returns are biased upward,
sometimes by several percentage points a year for small-cap or
emerging-market strategies.

Fix: use a point-in-time index membership database, or build
one. CRSP and similar paid datasets handle this; free options
require more work and willingness to scrape historical index
constituent lists.

Look-ahead bias

You computed the trailing-twelve-month P/E using today's reported
financials. Trouble: those financials weren't available at the
time. Earnings get reported 30–90 days after a quarter ends, and
restatements happen after that.

If you're ranking companies on March 31, 2018, you should be
using financials that were actually filed and disclosed by
March 31, 2018 — not the latest revision visible today.

Fix: use point-in-time fundamentals. EDGAR preserves the
original filing dates, so it can be done, but it requires care.
Most price-data vendors hand you the latest-available value with
no timestamp — that's the bias source.

Universe selection bias

A close cousin of survivorship. If you backtest on "stocks that
have at least 10 years of clean fundamental data," you've
selected for stability — companies that didn't go bankrupt,
didn't merge, didn't get acquired, didn't change reporting
standards. Your universe is structurally biased toward survivors.

Fix: define your universe as a point-in-time set ("Russell
3000 as of the rebalance date") and accept that some constituents
will have noisy or incomplete data.

Slippage and transaction costs

You assumed you traded at the closing price with zero cost. In
reality, large orders move the market, bid-ask spreads exist,
and small-cap stocks can be impossible to fill at the quoted
price.

A strategy that backtests at 15% gross return after slippage and
costs is often 8–10% net. The smaller your average market cap
and the higher your turnover, the worse the gap.

Fix: model slippage explicitly. A reasonable starting point
is 20–50 basis points per round-trip for liquid stocks, 100+ bps
for small caps. Charge it to every trade.

Backtest overfitting

You ran 200 parameter combinations and the best one delivered
22%. Congratulations: you've discovered noise in your dataset,
not signal. By construction, the best of 200 trials on the same
data will look impressive even if the underlying strategy has no
edge.

Fix: walk-forward testing. Develop on one window, test on a
later out-of-sample window you never touched during parameter
selection. If the out-of-sample period underperforms
substantially, the rule was likely overfit. Bonferroni-style
corrections for multiple-testing apply here too.

Calendar and currency cleanup

Subtler than the rest, but important:

Time zones. "Closing price on date X" depends on which exchange and which time zone X refers to. Cross-listed names bite first.
Corporate actions. Stock splits and dividends have to be back-adjusted consistently. Yahoo Finance gives you adjusted prices; sometimes the adjustments are off for small caps or delisted names.
Currency. A non-USD strategy backtest needs FX adjustment at the same point in time as the prices, not at today's rate.

If your backtest dramatically outperforms the broad market over
a long window, treat that as a red flag, not a celebration.
Persistent strategy edges are uncommon and small. A 5%-per-year
alpha is exceptional. A 15%-per-year alpha is almost always a
data bias.

The honest closing

A backtest is a hypothesis, not a result. The version of you who
will trade the strategy live doesn't have access to next year's
returns — only to today's beliefs. The point of correcting for
biases isn't to make the backtest look worse; it's to make the
backtest tell you something true about what to expect.

If after correcting for all of the above, your strategy still
shows a clean edge: trade a small amount of real money for six
months and see if the live numbers match. If they don't, you've
learned the cheap version of the lesson everyone else learns the
expensive way.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

推荐订阅源

DEV Community