Why Data Scientists Can Make Great Algorithmic Traders - Interview with Jason Strimpel

Oct 1

**Disclaimer: Quant Science is not a registered investment adviser under the Investment Advisers Act or a commodity trading advisor under the Commodity Exchange Act. The information provided is for educational and informational purposes only and does not constitute investment, financial, or trading advice.

**Disclosure: This post contains affiliate links. As an affiliate of Quant Science, I may receive a commission if you sign up or make a purchase using my links, at no additional cost to you. All opinions expressed are my own and based on my personal experience.

What if the skills you’ve been honing in data science gave you a head start in taking up algorithmic trading? I was so lucky to have the opportunity to speak with Jason Strimpel, Founder of PyQuant News and Co-founder of Quant Science. He has been in trading and technology for 25 years. He was a hedge fund trader, risk quant, machine learning engineering manager, and GenAI specialist at AWS. He is now the Managing Director of AI and Advanced Analytics at a major consulting company.

Since he’s well-versed in both areas, Jason was the perfect person to ask about the intersection of data science and algorithmic trading. This article is a recap of the conversation. I asked questions, he shared insights, and I’ll sprinkle in a few of my own reflections along the way. All my comments will have my name in front of them. Otherwise, the words are either direct quotes from Jason or a paraphrase.

Jason gave his thoughts on:

Which data science skills actually transfer into trading (hint: it’s not just the technical stuff)
Why building a trading strategy feels a lot like building a machine learning model
The messy reality of financial market data (it’s definitely not your textbook dataset)
Pitfalls like overfitting and survivorship bias that trip up even smart quants
The mindset shift data scientists need to survive the markets
Jason’s go-to tools if you want to dip your toes into trading today

Transferable Skills: From Messy Data to Messy Markets. “What are the immediately transferable skills for a data scientist considering algorithmic trading?”

Jason quickly mentioned that, of course, there’s the technical side. You already know how to work with tools and data, and for the stock market, that’s a good start. Often, there are libraries now for everything, so you’ll be able to find your way to results. But then there’s “how do you interpret those results in the context or domain?” And this skill that we’ve honed as data scientists, having the discipline and rigor, is very transferable. He noted that data science is directed more towards statistical analysis, while quantitative finance is typically thought of more as stochastic calculus, so the underlying concepts are a little different, but he believes it’s something that data scientists can pick up. But having the analytics framework mentally is important. Then the more philosophical part is “how do you make that theory come to life?” It’s similar to probability, because you’re thinking in terms of bets, you need to think of it in terms of a distribution.

Trading Strategies: Similar Science, Different Playground. How is building a trading strategy like building machine learning models?

Jason shared that although we’re not always strictly following the scientific process in either domain, it’s similar; you form a hypothesis, you collect data, you test your hypothesis, and then you draw conclusions. In trading, the P&L starts to feel like just another diagnostic.

“If I’m losing money, not great, but ok, it’s just another data point telling me something.”

Kristen - My take (so far) is that I was pumped about the amount of overlap. My current strategy (leveraging code from the Quant Science course) uses xgboost, so although there is much to learn, the overlap between data science and algorithmic trading is large. It’s the full pipeline, but with huge opportunities to learn, iterate, build, and track. It’s a project that I can really sink my teeth into.

Overfitting: The Trap You’ll Fall Into (and How to Spot It). “What does data cleaning and feature engineering look like for algorithmic trading?”

Of course, Jason made the quip about how in academic programs, you receive beautifully clean data sets. And in both data science and stock market data, that is never the case, but stock data has the complexity of also being non-stationary. This breaks a lot of the assumptions required for traditional time series modeling. He discussed trade-offs, noting that people will try to normalize prices so that you’re looking at returns, but then you lose all the memory. (Using returns, you’re working with less granular information.) Another issue was data cleaning type problems, like when Facebook changed its name to Meta (changing the ticker symbol in the data). Or the problem of “survivorship bias”.

Survivorship bias = drawing conclusions from the visible winners, while overlooking the invisible losers.

For survivorship bias, the example Jason gave was Enron. If you had “looked at the market 20 years ago, you would have included Enron and been wrong”. For those unfamiliar with Enron, it was once one of the largest U.S. companies, celebrated for its innovation, before collapsing in 2001 due to massive accounting fraud. It’s also possible to conduct a backtest that shows 25% returns, but when you trade it live, your actual performance is only half that.

Kristen - Although I’m sure we’ve all put a model out in the wild that we thought would perform better than what actually happened in reality. I’m sure it’s worse in trading, but I might consider that one something that the two disciplines have in common.

Overfitting is another big problem, and it’s especially relevant here because there is so little signal given the nature of the data. Jason said you will “overfit regularly and be fooled into thinking you actually have something that is working.” And sometimes, you can even get lucky, and you think that you’re winning because you had a winner, until you stop being so lucky.

When Your Model Looks Too Good to Be True: “How do you know you’re overfitting?”

“We do things like testing for parameter robustness. So if you change your input parameter by a little bit, and your output swings a lot (meaning your outputs are very sensitive to your inputs), that’s a red flag. I typically perform one-sided t-tests to determine if there’s statistical significance after running a series of simulations. We also use what’s called the information coefficient, and I also avoid the classes of strategies that are more prone to overfitting. In factor-based strategies, there is less opportunity to overfit.”

Kristen - In the data science world, I’ve definitely had to do checks for the robustness of results and parameters, but the information coefficient was new to me, and I’ve heard Jason share before about how he uses this one to evaluate backtests. One of my next articles will be a deep dive into understanding and interpreting the information coefficient, so be on the lookout for that! (Maybe consider joining my email list).

From Loss Functions to Market Chaos: The Mindset Shift. “What type of mindset shift do data scientists need to make when they first get into algorithmic trading?”

There’s beauty in math, statistics, and analyzing data. But financial markets are messy. You will spend time on all of this and then watch it blow up or fail very quickly.
People in general have a lot of ego when it comes to money. The quicker you can separate that, the better.
You have to iterate very, very quickly. So, having templates and infrastructure in place, you’ll find that something works faster.
You have to like the journey, because you will be repeating the same thing over and over.
There’s a mental model of being a little more scrappy, maybe your code isn’t beautiful, maybe you’re less pristine with your analysis.
In ML, you’re often looking at the loss function, maybe thinking about the trade-off between type I and type II error, but with these models, you’re considering so many more metrics. You’ll need to learn a whole new suite of metrics, Sharpe, Calmar, Sortino, etc.

Sharpe Ratio - A higher Sharpe ratio means you’re earning more return for each unit of risk you take. A Sharpe > 1 is usually considered “good,” > 2 “very good.” It measures risk-adjusted return by comparing your returns (above what you’d earn in something essentially “risk-free,” like U.S. Treasury bills) to how volatile your portfolio is.
Calmar Ratio - Good for strategies with asymmetric risk, because it directly penalizes strategies with deep drawdowns (even if volatility looks low). Traders often like this better than Sharpe if they care more about capital preservation. Focuses on return relative to drawdowns instead of volatility. It measures annualized return divided by maximum drawdown (the worst peak-to-trough decline).
Sortino Ratio - A higher Sortino means better risk-adjusted returns where “risk” is defined as harmful volatility. It’s often more intuitive than Sharpe because investors generally don’t mind upside surprises. This is a refinement of Sharpe that only considers downside volatility (bad volatility), ignoring upside swings.

This part of the interview was made into a YouTube clip, which you can check out here.

“The failure I most often see people making is not having a strategy that’s linked to the economic reality. And just thinking that they can brute force optimize some input parameters as generalizable into the future.”

“What tools would you suggest if someone wanted to get started today?”

Pandas or Polars (or dplyr if you’re an R person).

YFinance for data
OpenBB is also for data, but has a broader range.

Get familiar with the underlying data, “prove to yourself that the distributions are not stationary. Prove to yourselves that normalizing destroys memory.”

Then start working with a backtesting framework like VectorBT. This is a great backtesting engine that will get you started very quickly. Then review some of the performance metrics I’ve mentioned.

Jason also shares all sorts of actionable tips on his PyQuant News, or he also has a course “Getting Started With Python for Quant Finance”.

“Will AI make the quant role more accessible to data scientists, or do you think it’s going to make it competitive in a way that is prohibitive?”

I think it’ll separate the big winners, who are already winning from everyone else. And data scientists just getting started are included in the “everybody else”. The big money managers and institutions will be able to leverage so much data at scale in a way that we can’t. At the same time, there are so many retail investors that are only leveraging price data, and it’s becoming easier for us to feature engineer interesting features on unstructured data in a way that would have been more difficult a couple of years ago.

Kristen - To paraphrase, my takeaway was that it’s now easier than ever for us to leverage market data, fundamental data, and other alternative data in our strategies. That already gives us an edge against most retail traders, who only look at price. At the same time, we shouldn’t kid ourselves; we won’t be competing head-to-head with the big institutions that use the same data at a massive scale.

Summary:

If there’s one big takeaway from my chat with Jason, it’s this: the jump from data science to quant trading isn’t as huge as it looks, but the markets aren’t going to play nice. The same habits that make you a good data scientist still apply: bring your rigor, test ideas like a scientist, keep an eye out for patterns that won’t hold up in the real world, and don’t get fooled by overfitting. Start simple, move fast, and remember that P&L is just feedback.

Honestly, I was so pumped to get the chance to pick Jason’s brain on this, he’s been at the intersection of data and markets for years, and it was a rare opportunity to learn directly from someone who’s lived both sides.

I’ll be doing a deeper dive soon on the information coefficient, so keep an eye out if that’s up your alley. And if you want to explore further, definitely check out Jason’s articles at PyQuant News and the Quant Science course we talked about.

**The content in this article is for informational and educational purposes only. It is not intended as financial, investment, or trading advice. All strategies and opinions expressed are those of the author and do not constitute recommendations to buy, sell, or hold any financial instruments. Trading and investing involve risk, and you should conduct your own research or consult a qualified financial advisor before making any investment decisions. ** Hypothetical or simulated performance results have inherent limitations and do not represent actual trading. No representation is being made that any account will or is likely to achieve profits or losses similar to those shown. Quant Science is not a registered investment adviser or commodity trading advisor, and nothing herein should be construed as personalized advice. **

Kristen Kehrer

Why Data Scientists Can Make Great Algorithmic Traders - Interview with Jason Strimpel

The Moment My Trading Went From Demo to Professional (Quant Science Level 2 Review)