Can the Yankees Make a Comeback? AI Makes the World Series Call - Decrypt
10/29/2024 18:12We built a generative AI model to read baseball stats and predict this year's Yankees-Dodgers World Series matchup.
Numbers never tell the whole story in baseball. But they tell enough of it—we think.
Armed with little more than a penchant for AI and a stack of baseball stats, we created a chatbot that would predict the World Series outcome.
On Monday, against the odds, it predicted that the Los Angeles Dodgers would win Game 3 and eventually take the World Series. As we saw last night, the chatbot got the first part right, at least.
Today, it’s saying that not only will the Dodgers win the World Series, but they’ll win tonight’s game and be the first 4-0 sweep since 2012 when the Detroit Tigers lost to the San Francisco Giants.
That outcome differed, somewhat, from what the betting markets were predicting yesterday and today.
The chatbot we built was basically a GPT loaded with performance data scraped fromBaseball Savant and Statmuse. Then we threw in a crash course on sabermetrics—baseball's mathematical backbone—to help it understand what those numbers mean.
Building the model was straightforward (we even created a guide to help you build your own bot on just about anything.) We didn’t know which statistics mattered the most, so we basically fed it with everything we found: raw data covering team performance and player stats through 2024; recent game data, including play-by-play breakdowns to keep the model current; and a lot of weirder things like exit velocity, pitchers’ arsenals, arm strength, and hits against left- and right-handed pitchers.
Finally, we gave it a framework for analysis—a "chain of thought" process that weighs historical patterns against current probabilities. We tested the model and tweaked the prompt until we were satisfied with the results. And when we were done and about to click the “save” button…
It threw an error. It was probably a temporary server bug that didn't let us save our changes.
No matter: we managed to screenshot some fascinating responses before it recused itself. The model predicted a 60% win for the Dodgers in last night’s game, with a small chance for the Yankees to win by 1-2 runs.
For the final outcome, our bot zeroed in on one stark statistic: Teams that snag the first two games in a best-of-seven series win it all 80% of the time. This looks to be pretty accurate, and besides, you probably knew that from listening to the announcers on last night’s game.
Our AI digested years of World Series data alongside current-season stats. For the Yankees-Dodgers matchup, the Dodgers showed a slight edge in overall pitching stats, while the Yankees' relief pitchers dominated throughout the playoffs.
The results split from popular prediction markets. Polymarket bettors, for example, gave the Yankees a 56% shot at taking Game 3, seeing three straight losses as unlikely. Meanwhile, our chatbot focused on broader patterns.
The game ultimately ended up with a 4-2 victory for the Dodgers. So our model was correct, as Los Angeles won with two more runs than New York. At least this time, DI (degen intelligence) didn't beat AI.
What about tonight—and the Series?
Things looked discouraging for the Yankees yesterday, but the scene is a lot uglier today. We asked our chatbot for their probability to recover and win the next four games. If that comes to happen, it would make a mark in the history of American baseball.
The chatbot gave the Dodgers a 55% chance of winning tonight and sweeping the series.
"In MLB postseason history, only one team has successfully come back from a 0-3 deficit in a best-of-seven series: the Boston Red Sox in the 2004 American League Championship Series against the New York Yankees," our chatbot said. It gave us a statistical calculation for a four-win streak in similar conditions, giving the Yankees a theoretical probability of around 6% to make history, come from behind, and win the Series.
Things go a lot lower if we throw in elements that influence the teams' performances.
“If we assume a lower win probability per game (due to the Dodgers' strength this season), the probability will decrease accordingly. For example, if we believe the Yankees have a 40% chance per game, the probability of four straight wins would be 2.56%,” our chatbot said. According to our chatbot, the Dodgers are in a much better place—so instead of a 50-50 scenario, it believes a 60-40 chance for the Dodgers to win is more realistic.
The Polymarket dudes are stubborn. For tonight's game, the odds are 58% in favor of the Yankees by the time of writing this article. This is not what our chatbot thinks. However, it reminded us that some things could play in favor of the Yankees—like the game being played in New York and the sense of urgency that could affect the players' physique. This was enough for our bot slightly increase the Yankees’ odds from 40% to 45%.
That's not enough to call it a fair 50-50, and still not in line with our favorite prediction market.
A 90% chance of losing the World Series is hard to ignore, but in baseball, one loss can rewrite the narrative and undo even the most rigorous predictions. Maybe our chatbot needs more heart, but the Polymarket degens could use a dash of data too.
Edited by Andrew Hayward
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.