Puzzle: Gambit

Puzzle: Gambit

You have to catch a flight and you are running severely late. Thankfully, the folks at Foresights, the DS City routing company, have designed an app that will get you to the airport on time with optimized likelihood.

Your home is node 0 and the airport is node 12. The edge labels give the currently estimated time for each link. These times are subject to change, of course, as the traffic situation evolves. To catch your flight, you have to reach the airport within 95 minutes. Since you bought the supersaver fare, if you miss your flight, you will have to buy a new ticket for the next flight. This would cost you $600. If you get to the airport within 86 minutes, you can use the long-term parking. But any second later and you have to go for the short-term parking lot, which will cost $80 extra.

The simple question we have for you today: Where do you go first, to node 1 or to node 2? Have fun and post your answer with your reasoning!

For extra credit, how would you design a dynamic routing app? As a typical sequential decision problem, you may want to consider Warren Powell's universal framework.

Consider the direct look-ahead class!

Assuming real-world uncertainty about the accuracy of travel times, the right decision is to move to node 1 first. A stochastic look-ahead policy shows very quickly that the expected costs associated with routes via node one are far less than those via node 2.

How it works

For each node we can travel to next (in our current situation, nodes 1 and 2), we compute the costs when it takes exactly the expected time to move to the respective node and then use, for each stochastic scenario, the cheapest route thereafter. Then, we choose to go to the node that has the lowest expected costs.

It remains to create meaningful scenarios. We use a normal distribution on top of the currently expected arc-traversal times, whereby we increase the standard deviation for arcs we hit later. We start with a 3% standard deviation for the next arc, then increase this by 3% for every 10 minutes of additionally expected travel time.

Why it works

The assumption that we could reliably route ourselves on the scenario-specific cheapest route is obviously wrong. This estimate does not actually give us the correct expected costs. But as a discriminator between alternatives, it works anyway because the same advantage is given to all possible actions. It is therefore possible but not likely that the relative ordering of alternatives would change under a more accurate approximation of the true expected costs for each.

The creation of scenarios is obviously also not overly realistic. Real-world travel times do not deviate from forecasts following a normal distribution. They are far more skewed to the right. However, as long as the futures we plan reasonably cover what can happen, the exact distributions do not actually matter, as experience shows time and time again.

Experiments

We implemented the above strategy using Seeker's stochastic API, as well as a deterministic look-ahead policy where we would route ourselves along the currently expected shortest path. Then, we evaluated the policies in another stochastic Seeker environment where we evolved the expected travel times as follows.

For each arc, we randomly choose a time 5 - 35 minutes in the future. For this timepoint, we sample an arc traversal time from a right-shifted exponential distribution with a mean at the initial estimate and a standard deviation of 25%. We do the same for minute 95. Then, we interpolate the arc traversal times from minute 0 to the random time point and from that time point to minute 95 to obtain estimates for any time point during our simulation. The actual arc traversal times are then chosen from a normal distribution with a mean at the currently expected time and a standard deviation of 15%.

Note how much this stochastic environment differs from the simplistic normally distributed scenario generation used in our look-ahead policy. Both in terms of magnitude and shape, the environment used to evaluate the policy is very different from the assumed stochasticity within the policy.

And yet, when we compare the stochastic look-ahead with the deterministic look-ahead policy, we find:

The stochastic look-ahead policy gets us into long-term parking with 51% probability, and with an 80% chance we make our flight. The deterministic look-ahead, on the other hand, only gets us into long-term parking 42% of the time and brings us to the flight on time with a chance of only 63%.

What is interesting is to look at the probabilities with which the stochastic look-ahead policy moves through the network. As a first node, it always chooses node 1, of course. Then, node four the predominant next.

Despite the fact that, according to the initial estimates, we lose an expected 2 minutes again, we next move to node 7 most often.

And for the last leg, it is a coin toss between nodes 10 and 11.

Overall, we find that the stochastic look-ahead leads us to decisions that trade a little bit of initial performance for a much more favorable position and, implicitly, future performance. The strategy is known in chess as a gambit.

And this is what stochastic optimization is all about. For a small upfront price, we buy insurance that frequently pays off massively later. Just without any nasty insurance company.