Five Questions: Machine Learning in Investing with Kevin Zatloukal

Artificial intelligence and machine learning are going to change investing in many ways. The ability of computers to think, analyze data, and react more and more like human beings has the potential to change everything from how advisors interact with their clients to how investment strategies are created to the fees that are charged for investment advisory services.

While very few people dispute the potential for these technologies in investing, there is significant disagreement with respect to whether and to what degree it can enhance factor-based strategies and other strategies that invest using company fundamentals.

Despite spending a lot of time studying AI, I am still very new to it, so I am certainly not the person to help provide these answers. We are fortunate to be joined for our latest interview, however, by someone who can. Kevin Zatloukal teaches computer science at the University of Washington. He has a PhD in computer science from MIT and worked at both Microsoft and Google. He is also the second person selected for O’Shaughnessy Asset Management’s new research partner program. His first paper written in collaboration with OSAM, “ML & Investing Part 1: From Linear Regression to Ensembles of Decision Stumps”, offers an excellent look at how machine learning can be applied in the world of investing and I highly recommend you read it if you haven’t already.

Jack: Thank you for taking the time to talk to us.

You have an interesting perspective as someone outside the world of investing with an extensive knowledge of machine learning. One of the things I often wonder about is how machine learning will change the investing world, and which areas it will have the most impact in. Those of us inside the world of investing are prone to have our own biased thoughts on what machine learning may or not mean, but in many ways we may be too close to the issue to see it clearly. When you look at the impacts that machine learning may have in investing five or ten years out, what areas do you think it will impact the most? What problems in investing do you think it is most suited to solve?

Kevin:  Let me first say that I’m not sure that my predictions about this should be trusted any more than anyone else’s. As Yogi Berra said, predictions are hard, especially about the future.

My best argument for the use of machine learning becoming wide spread amongst asset managers is the fact that that is happening at just about every other company in America. We all want to use the data now available to make better decisions, and machine learning is a key enabler for that, especially when there is too much data for humans to examine. I can’t see any reason to expect asset management companies to be different.

My best argument in the other direction would be to point out that there seems to be a minimum level of accuracy required before human beings are willing to trust machine learning. Once the accuracy is sufficiently high, when the machine makes a prediction that doesn’t make sense to us, I think we instead assume that we are the ones who have made a mistake (that becomes the most likely explanation) rather than the machine being wrong. Users report it feeling like “magic” when that level of accuracy is reached… that the machine has suddenly become intelligent. (Here, machine learning is teaching us new things about human psychology as well!)

While we can achieve high accuracy in predicting the word being spoken or the move most likely to win in a game of Go, it could be that the stock market contains enough inherent randomness that the level of accuracy required for people to trust machine stock picks simply isn’t achievable.

I suppose, if I had to make a prediction about the future ten years from now, I would guess that both things are true: use of machine learning is wide spread by asset managers, but the humans picking asset managers still tend to pile money into whoever had the best returns last year.

As far as which area of investing machine learning will have the biggest impact in goes, It’s really hard to say. We don’t usually know how well ML is going to work on any problem until we try. To quote Elements of Statistical Learning: “it is seldom known in advance which procedure will perform best or even well for any given problem” (emphasis added).

Jack: One of the interesting ways you have utilized machine learning is in fantasy football. You showed a simple example of predicting future NFL performance for wide receivers using college performance data and physical attributes in your paper with OSAM, but I am assuming the models you use in the real world are more complex. Fantasy football is an interesting proving ground for this technology because it puts it up against the ability of human beings who have the same data. Without giving away any of your secrets, can you talk about how you have applied machine learning to fantasy football and how has it performed relative to your human competitors?

Kevin: The models I use in the “real world” are not much more complicated, if at all. In fact, for wide receivers, I’ve found that a simple model using only two features — draft pick and career market share of team yards from scrimmage (one measure of college production) — works well. I probably lean on that model more than any other.

The main reason why simple models work so well in fantasy football is that a significant fraction of the edge to be found there is behavioral (rather than informational). Sticking to the models and ignoring the noise gives you a big advantage, just as with quantitative approaches to investing. When the draft comes around, your competitors are often over-reacting to all sorts of information, while the simple model (and its strong history of accuracy) tells you that none of that really matters.

Your competitors will pass on those players and let you acquire them for less than the model says they are worth (their intrinsic value). It is basic value investing.

Overall, I would say that results have been surprisingly good. I’ve found myself relying more on models over time rather than less.

However, there is clearly a “paradox of skill” problem in fantasy football just as much as in investing. As I’ve done well, I’ve gravitated toward leagues with more skillful players, many of whom rely heavily on their own models. As the level of competition rises, the results depend more on luck since the competitors are more evenly matched.

Most of my competitors have similar models to the one I described above, so I’ve had to continue to find edges in new areas. Last year, I improved my models for how to price players in auction leagues, and that worked well. We’ll see what I can come up this year.

Jack: A recent Research Affiliates paper argued that the effectiveness of using machine learning to predict future stock returns will be limited because there isn’t enough data. The paper put it this way: “Today, we have about 55 years of high-quality equity data (or less than 700 monthly observations) for many of the metrics in each of the stocks we may wish to consider. This tiny sample is far too small for most machine learning applications, and impossibly small for advanced approaches such as deep learning.” Do you agree with this assessment?

Kevin: No, I strongly disagree with that assessment.

Let’s go back to the fantasy football example. Instead of 55 years of data, I only have 20 years of data for fantasy football (because the game has changed quite a bit from earlier years). And instead of thousands of companies and 12 months of data from each year, there are only 10–20 receivers drafted each year. As a result, my wide receiver data set has only a few hundred examples in it, whereas we have hundreds of thousands of examples in investing!

Yet, machine learning approaches work extremely well in fantasy football. In fact, the lack of data was the primary reason that I had to turn to more sophisticated machine learning algorithms in trying to analyze fantasy football in the first place. Classic techniques like linear regression couldn’t cope with the lack of data, but other machine learning approaches could.

Stepping back a bit, I think that trying to separate machine learning methods from the classical approaches used in the finance literature, like linear regression, is really the wrong way to look at this….