MLB Runs Its Massive Statcast Tool In The AWS Cloud

Major League Baseball recognizes it’s a software-defined world and as such over the last several years, it’s been moving quickly to create applications to enhance the fan experience and let them watch and interact with MLB properties, wherever they are on whatever device they wish to use. But MLB is also a statistics-driven game so the league worked with Amazon Web Services to create an application called Statcast to monitor plays at a minute statistical level and collect an unbelievably granular set of data on every play.

Joe Inzerillo, CTO & EVP, MLB Advanced Media told the audience at AWS re:invent today in Las Vegas that Statcast was one of the most exciting projects he’s worked on his career. He explained we have lots of statistics about pitching and batting, but when it comes to defense, it’s much harder to measure why one play is better than another because the game is so complex and so fast and it’s hard to come up with an accurate way to calculate just how much better or worse, one play may be compared to another.

Statcast allows you to break down the play and see things like running speed, the speed of the ball as it comes off the bat, the speed of the defensive player as he approaches the ball and the speed of the runner as he runs down the line. This is a tremendous amount of data, especially when you multiply it by thousands of games in a season. In fact, according to Inzerillo we are talking about 17 TB of data per season. As such, it required a cloud solution like AWS.

Using an Infrastructure as a Service offering like AWS, they can scale up their compute needs during the season, and then scale back down in the offseason when they don’t need as much compute power. That said, the data they collect has to live forever because people want to be able to compare baseball statistics historically. That means that the amount of space required to store this increases every season, and Inzerillo says that they can use a cloud storage service like AWS EC2 to continually scale their storage requirements to as much as they need.

So what does this all mean in practice? Let’s have a look at sample play from this year’s MLB playoffs.

[mlbvideo id=”36854223″ width=”400″ height=”224″ /]

Watch how the Statcast can track the speed of the defender, how fast the runner goes down the line, and the fact that the runner dove into the bag instead of running through it. It’s worth noting that every ball player is taught from the earliest age never to dive into first base because it slows them down. And as it turns out on this play, it slowed the runner down enough that it very well might have been the difference between being safe and being out.

This is a staggering amount of information and it really breaks down the play in ways that just wouldn’t have ever been possible without access to cloud resources like this. And it shows that even a sports league that’s over 100 years old can find new ways to use technology to improve the fan experience.