The authors believe that these results exemplify that almost all these strategies have an unavoidable tilt to value and small caps, and therefore, show superior back-tests. There might be additional biases at play as explained in an NBER working paper released this summer. The paper stated that if you take 3 or 4 random signals with no real predictive power and combine them into a strategy, you will always find a strong back-tested performance. This bias and false outperformance from picking the best 3 out of 10 random strategies are as bad as data-mining the best performing signal from approximately 1000 choices. This worsens if you then weigh the signals by how strong they are, rather than using equal weight.
These biases are evident in recent research published by Astor, wherein we compared a few smart beta indices’ after they went “live”. We discovered that, on average, the alpha decreases while its beta (the return from market risk) increases, demonstrating how out-of-sample “smart” beta may have more “dumb” beta than you think.
Does this mean all smart beta strategies are inherently flawed? Not at all! Factors have been used by quants and active money managers for decades and true factors have proven to work in different economic environments, as well as internationally. However, to avoid getting lost in the factor marketplace, here are some rules of thumb:
- Evaluate the Factors: Other than empirical testing, is there a credible reason for this factor premium to exist? Is it highly researched? Has it exhibited gain over a long history?
- Watch your Back (test): Compare the volatility, beta, and alpha of the historical-simulated index before it went live versus the ETF’s performance.
- Check for Data-Mining: Especially for composite smart-beta strategies that combine several factors. Go under the hood and evaluate the signal individually. Determine if they have predictive power on their own or if they have been data-mined from hundreds of possible options
- Have Stricter Thresholds: Some studies recommend using a high threshold Sharpe ratio of 3.0 or higher to adjust for biases.