Does backtesting work? / Why doesn't backtesting work? - Genius Mathematics Consultants (2024)

If a trading strategy seems to backtest successfully, why doesn’t it always work in live trading?

It’s widely acknowledged that a strategy that worked in the past may not work in the future. Market conditions change, other participants change their algorithms, adapt to your attempts to pick them off, and so on. This means you need to continually monitor and adjust even profitable strategies.

But there’s something even more problematic about backtesting strategies, which fewer people understand clearly. This is that a profitable backtest does not prove that a strategy “worked”, even in the past. This is because most backtests do not achieve any kind of “statistical significance”.

As everyone knows, it’s trivial to tailor a strategy that works beautifully on any given piece of historical data. It’s easy to contrive a strategy that fits the idiosyncratic features of a particular historical dataset, and then show that it is profitable when backtested. But when no mechanism actually exists relating the signal to future movement, the strategy will fail in live testing.

So how does one tell the difference? How can one show that a backtest is not only profitable, but statistically significant?

Statistical hypothesis testing in trading

If you’ve studied some basic statistics, you’ve probably heard of hypothesis testing.

In hypothesis testing, it’s not enough for a model to fit the data. It’s got to fit the data in a way that is “statistically significant”. This means that it’s unlikely that the model would fit the data to the extent that it does, by chance or for any other reason than that the model really is valid. The only way for the model not to be valid is to invoke an “unlikely coincidence”.

One proposes some hypothesis about the data, and then considers the probability (called the p-value) that the apparent agreement between the data and the hypothesis occurred by chance. By convention, if the p-value is less than 5%, the hypothesis is considered statistically significant.

It’s worthwhile to place backtesting within this framework of hypothesis testing to help understand what, if anything, we can really conclude from a given backtest.

Coin toss trading

Let’s keep it simple to start with. Let’s suppose we have an algorithm which predicts, at time steps $t_1,…,t_n$, whether the asset will subsequently move up (change $\geq 0$) or down (change $<0$) over some time interval $\Delta T$. We then run a backtest and find that our algorithm was right some fraction $0 \leq x \leq 1$ of the time.

If our algorithm was right more than half of the time during the backtest, what’s the probability that our algorithm was right only by chance? This is calculated using the binomial distribution. To see some numbers, let’s suppose our algorithm makes 20 predictions ($n=20$) and is right for 12 of them. The probability of this happening entirely by chance is about 25%. If it’s right for 14 of them, the probability of this happening by chance is about 5.8%. This is approaching statistical significance according to convention. The idea is that it’s “unlikely” that our strategy is right by chance, therefore the mechanism proposed by the strategy is likely correct. So if our algorithm got 15 or more correct during the backtest, we’re in the money, right? Not so fast.

To take an extreme example, let’s suppose that our piece of historical data was a spectacular bitcoin bull run that went up 20 times in a row. And let’s suppose that our strategy is “Bitcoin only goes up!” Then our calculation above would prove that the strategy works with a statistical significance of 0.0001%! What’s gone wrong here?

When calculating the p-value for a linear regression, standard statistics usually assumes that the “noise” in the data is random and normally distributed. One mistake we have made in the above analysis is assuming that the actual price trajectory is like a coin toss – equally likely to go up or down. But market movements are not random. They can, for example, be highly autocorrelated. And they can go up in a highly non-random way for quite some time, before turning around and going down.

Secondly, we presumably looked at the data before deciding on the strategy. If you’re allowed to look at the data first, it’s easy to contrive a strategy that exactly matches what the data happened to do. In this case, it’s not “unlikely” that our strategy is profitable by mere coincidence, because we simply chose the strategy that we could see matched the data.

Another thing that can destroy statistical validity is testing multiple models. Suppose a given model has a p value of 0.05, that is, it has only a 5% chance of appearing correct by chance. But now suppose you test 20 different models. Suddenly it’s quite likely that one of them will backtest successfully by chance alone. This sort of scenario can easily arise when one tests their strategy for many different choices of parameter, and chooses the one that works. This is why strategy “optimization” needs to be done carefully.

So how do you backtest successfully?

In practice, we wouldn’t be checking whether the asset goes up or down. Instead, we’d likely check, across all pairs of buy and sell decisions, whether the sellprice minus the buyprice amounted to a profit greater than buy and hold. We would then ask, what is the probability that this apparent fit occurred by chance, and the strategy doesn’t really work? If it seems unlikely that the observed fit could be a coincidence, we may be onto a winner.

On the other hand, a trader may have some external or pre-existing reason for believing that a strategy could work. In this case, he/she may not require the same degree of statistical significant. This is analogous to Bayesian statistics where one incorporates a prior belief into their statistical analysis.

Now, HFT (high frequency trading) backtests can often achieve statistical significant much more easily because of the large amount of data and the large number of buy/sell decisions in a short space of time. More pedestrian strategies will have a harder time.

So does machine learning work for trading?

People often ask whether machine learning techniques are effective for developing trading strategies. The answer is: it depends on how they’re applied. When machine learning models are fit to data, they produce certain “p-value” statistics which are vulnerable to all the issues we’ve discussed. Therefore, some care is needed to ensure the models are in fact statistically significant.

FAQs

Does backtesting work? / Why doesn't backtesting work? - Genius Mathematics Consultants? ›

If a trading strategy seems to backtest successfully, why doesn't it always work in live trading? It's widely acknowledged that a strategy that worked in the past may not work in the future. Market conditions change, other participants change their algorithms, adapt to your attempts to pick them off, and so on.

Read On ›

Why does backtesting not work? ›

The Limitations of Backtesting Trading Strategies

Just because a strategy performed well in the past using historical data doesn't guarantee it will translate to success in the future. Market conditions, investor sentiment, and regulations can change drastically, rendering past performance irrelevant.

Discover More Details ›

What are the disadvantages of backtesting? ›

Disadvantages of backtesting

Because the outcome of backtesting relies on a simulation, it's subject to biases. Investors can manipulate the data to achieve a desirable result, without realizing they're doing it. It's important to create the strategy before having access to the data to avoid this bias.

How accurate is backtesting? ›

Limited data quality: Backtesting relies on historical data, and the quality and accuracy of the data used can have a significant impact on the results. Data may contain errors, gaps, or other inconsistencies, which can distort the backtest results and lead to inaccurate conclusions about the strategy's performance.

See Details ›

Can you optimize a trading strategy without backtesting? ›

Conclusions. What I've tried to demonstrate here that it is possible to perform numerical optimization of trading strategies without having to backtest the alpha as part of the process. This is critically important as alphas with realistic Information Coefficients will often show periods of underperformance.

Find Out More ›

Do professional traders backtest? ›

Unlike retail traders who dabble with different strategies they never know work or not, professional traders only employ strategies they have confirmed through backtesting to have an edge in the market and then execute them in the right way and at the right time.

Tell Me More ›

How much backtesting is enough? ›

When you are backtesting a strategy on a higher timeframe, you will have to go back 6 to 12 months. Ideally, you want to end up with 30 to 50 trades in your backtest to get a meaningful sample size. Anything below 30 trades does not have enough explanatory power.

Show Me More ›

What is the opposite of backtesting? ›

Backtesting is the process of recreating the work of your strategies on historical data, essentially all of your past strategic work. Forward testing allows for the recreation of your strategy work in real-time, all while your charts refresh their data.

Explore More ›

How do you backtest efficiently? ›

Here are some tips to ensure effective backtesting:

Consider different market scenarios. ...
Aim to keep volatility as low as possible. ...
Backtest using a relevant set of data. ...
Customise backtesting parameters to meet your specific needs to get accurate results. ...
Be careful about over-optimisation.

What is the assumption of backtesting? ›

A successful backtest will show traders a strategy that's proven to show positive results historically. While the market never moves the same, backtesting relies on the assumption that stocks move in similar patterns as they did historically.

Show Me More ›

Is backtesting valid? ›

Understanding Backtesting

A well-conducted backtest that yields positive results assures traders that the strategy is fundamentally sound and is likely to yield profits when implemented in reality. In contrast, a well-conducted backtest that yields suboptimal results will prompt traders to alter or reject the strategy.

Read The Full Story ›

Is backtesting necessary? ›

2. Why is backtesting important? It's important because it allows traders to assess the potential profitability and viability of a trading strategy before implementing it in real-time trading. It helps identify flaws, refine strategies, and make informed decisions based on historical performance.

See Details ›

What is the best platform to backtest trading? ›

5 Best Stock Backtesting Platforms of 2024

Backtesting Tool	Price	Best For
Trade Ideas	$228/month	Good for those wanting AI insights and intuitive use
FinViz	$39.50/month	Best for traders using stock screening with a focus on price action
QuantConnect	Free	Perfect for quantitative and algorithmic traders

2 more rows

Get More Info Here ›

Which trading strategy is most accurate? ›

Trend trading strategy. This strategy describes when a trader uses technical analysis to define a trend, and only enters trades in the direction of the pre-determined trend. The above is a famous trading motto and one of the most accurate in the markets.

Can you trade without backtesting? ›

It's important to note that backtesting isn't a guarantee that a strategy will be successful in the current market. Past results are never a fool-proof indicator of future performance. Rather, it's part of doing your due diligence before opening a position.

How long does it take to backtest a trading strategy? ›

To backtest, a trader will typically need several weeks of historical data for strategies where the trades are short-term in nature. Many years of historical data may be required if testing a long-term strategy.

View Details ›

How do you do a backtest effectively? ›

Here are some tips to ensure effective backtesting:

Consider different market scenarios. ...
Aim to keep volatility as low as possible. ...
Backtest using a relevant set of data. ...
Customise backtesting parameters to meet your specific needs to get accurate results. ...
Be careful about over-optimisation.

Why is my trading strategy not working? ›

Too many variables make your trading strategies stop working

The more you put into your strategy, the more likely you are to curve-fit your strategy. The simpler you make it, the better. A system might be so complex that it has no predictive value. A slight market change might turn the strategy into a loser.

Learn More ›

Is EA backtesting accurate? ›

Depending on the trading platform on which the backtest is performed, the spread is or isn't taken into account. If the spread is not taken into account, the backtest is completely distorted. Effectively, the tester only uses the Bid price, the Ask price, or the average price, to calculate performance.

Discover More Details ›

Why does algo trading fail? ›

Lack of Human Oversight

The lack of human oversight in algorithmic trading, wherein investors rely solely on automated systems, poses a significant risk of unanticipated losses, particularly in the face of extreme market conditions.

Show Me More ›