In baseball, we must fix the order of batters. In every half-inning, whenever our team is on the offense, we continue with the next batter in our ordering. After the ninth player, we start again with player one. Now, our players have different strengths and have shown different statistics for outs, walks, singles, doubles, triples, and home runs. We presume that the ordering of our players will affect our expected number of points. But what is the best order?
Here are our players' stats [in %]:


Think about how you would determine that one ordering is better than another.

We used Seeker to compute a favorable ordering. It returned:
1. Ace (contact hitter)
2. Marcus (all-rounder)
3. Titan (power hitter)
4. Casey (balanced hitter)
5. Buster (power hitter)
6. Jack (average all-rounder)
7. Jax (low batting average)
8. Hank (good but not elite)
9. Tyrone (contact hitter with some speed)
With this lineup, we expect to score an average of 4.707 points in nine innings, which is significantly more than the 4.52 we would expect for an average batting order. Below, we plot the distribution of points we would expect to score in nine innings when choosing a random batting order.

We observe that the spread is from about 4.38 to 4.71 for the vast majority of batting orders. This may appear small, but over the course of 162 games played per season, the difference between a well-optimized ordering and an abysmal one would sum to over 50 more points scored in the season, which is a dramatic difference in performance. So batting orders definitely do matter.
The crux of this puzzle is to know how good each ordering is. We have little choice but to simulate nine innings with an ordering and to see how well it fares. But each simulation is stochastic. How many simulations are needed before we have a reasonable assessment of the quality of an ordering?
Below is the Seeker batting order evaluated against different samples of 10,000 scenarios.

As we can see, the spread runs from 4.57 to 4.81. Mind you, this spread is solely due to how we sample the 10,000 scenarios, not the spread within those scenarios, nor is it caused by different batting orders. Compared to the spread that we see when assessing different orderings, this is too coarse-grained to be able to tell whether any apparent improvement in quality is due to the fact that one ordering is better than another or whether the scenarios we used are just better for one of the orderings.
Let us see how the quality assessment changes if we use a 100,000 scenario resolution.

This is clearly better, but still not enough to be able to assert with confidence that one ordering may be better than another, especially when the progress we make by changing the ordering slightly is small. And that is why we used a 1 million scenario resolution to optimize the batting order. Seeker found the favorable ordering after half an hour on a 60-core machine. On a typical cloud platform, this would cost about $3.

Seeker is unique in its ability to optimize even at seemingly prohibitive stochastic resolutions. In environments where small changes matter and where unlikely scenarios can have a very negative effect, this ability to assess risk properly is your best insurance.
If your business is struggling with brittle plans that do not hold up to the quality that was promised during optimization, please contact us at info@insideopt.com.
