You have probably read a dozen articles like this one, correct? We would not have written this one if we did not think they were wrong. Are you ready for a dissenting view?
The process of standing up an OR solution is frequently described as follows (note that we focus on the technical management and are leaving out the company political context in this article):
- Work with stakeholders, understand the business context, and define the problem.
- Gather the data.
- Develop and implement a MIP model.
- Iterate on its performance.
- Deploy your solution and monitor your model in operation.
Following this process, we conservatively estimate that 50% of projects die before reaching stage (4), 70% before reaching stage (5), and 85% die before they see a year in deployment. With this article, we hope to change that.
The Big Illusion
The waterfall model above is based on one crucially wrong assumption, namely that your stakeholders know what is needed. They do not, and you cannot really blame them for it either.
Try to write down a comprehensive definition of a house you would consider buying. Chances are, you are not only incapable of working out the trade-offs between your various different objectives, you will likely even forget half of the metrics that matter (or, if you actually engaged in the example, did you list the distance to the nearest supermarket just now?).
Buying a house is one of the most consequential decisions we make in our lives. And we are the experts; we know everything there is to know about our preferences, needs, and priorities. And yet, we do not approach it by writing down an all-encompassing specification first. Instead, we consider our options first and then decide what is a must-have, what is a nice-to-have, and in what regime of quality we are willing to accept what trade-offs between our objectives (100K more for a school district that is rated 7 instead of 5 is okay, but not for 10 instead of 8).
Many people will actually not buy a house right away but first settle on a location and rent there, just to see if it is actually practical and that they did not overlook anything crucial. That is to say, they first test a proxy—not quite the real thing, but an approximation—to learn what actually matters.
Start With the Basics
And that is why you should not start with an optimization problem definition but a business case. The first thing you need to do is understand what decisions need to be taken, how these decisions are executed, and how they create value or costs for your business, including the risks that they pose.
The middle part of this last sentence is frequently overlooked, so let us say that again: You need to understand how the decisions to be automated are being executed. If you only understand the high-level business value but not the underlying business process, your project will likely suffer the fate of 50% of projects that make it to stage (5) but do not survive the first year in production.
Next, you need to understand the basis for decision-making. What is the baseline, i.e., how are the decisions being taken so far? Based on what information are the decisions being taken? We write 'information' and not 'data' because not all that goes into the decision may actually be available in the form of data. It is extremely important to understand the difference if you want the solutions your OR model provides not to be dismissed as naive and void of an understanding of the industrial context.
Connect the Data With the Execution
After the initial information-gathering stage, it is already time to build a low-cost end-to-end pipeline that connects all inputs all the way to the execution stage. Right away. This initial solution serves two purposes: 1. To see if we can actually fill in all the blanks to connect data with decision execution. And 2. to serve as a basis for experimentation and further information gathering. Consider this your cheap rental unit.
All system components should be staged in this phase, but do not worry about them being pretty (linting, long-term maintenance, etc. are not concerns at this stage). If you are planning to use a cloud service or a deployment service like NextMV to run your optimization, set this up in experimental mode and stand up your service end-to-end, including any outside communication.
In its first instantiation, there does not even have to be any optimization model. You can use a plan created by the baseline to see if you can digitally connect to the execution stage, or, if you add a dummy plan maker to your pipeline, you can have the execution stage ignore the operational plan created and just confirm that it could execute it in principle (if it were a feasible plan). Importantly, use this first system to jot down your first set of data requirements and constraints on the operational plans as imposed by the execution stage.
Agile Optimization
Now it is time to build and iterate on the optimization module based on an increasingly realistic problem specification. The objective now is to get an understanding of the value entitlement of the optimization as quickly as possible. The process here is as follows: You optimistically assess the value of making better decisions based on optimization while retiring technical risk as quickly as possible. That means, with every iteration on the optimization model, the value assessment should get more and more realistic. If, at any time in the process, the optimistic entitlement assessment drops below a critical threshold, you can stop the project with as little investment lost as possible.
Of course, with traditional mixed-integer problem (MIP) solvers, this is easier said than done. In rare cases, the business problem is quickly formulated as a MIP and then we are done, modulo some late constraints or slight tweaks in the objective as more business requirements reveal themselves. The vast majority of real-world problems will throw curve balls, though. Some important terms in the constraints or the objectives will have non-linear relationships with decision variables. Some data will be estimated and come with uncertainty. There is usually more than one objective function to optimize with non-stationary (i.e., non-linear) trade-offs across their regimes. The list goes on.
Given that we have to expect significant changes to the optimization model definition as we iterate with the business, we cannot spend three months on one iteration and implement an elaborate branch-and-price model or implement our own meta-heuristic solution. If you do that (see the canonical model above), your project will likely end unsuccessfully at stage (3) or (4) after it is discovered that months of development were just invested in a solution to a problem the business does not have.
Modern solvers like InsideOpt Seeker help iterate quickly and learn fast. One advantage of Seeker is its uncanny ability to support experimentation with the problem specification. There are highly expressive modeling concepts available that help accommodate most requirements from the business, including stochastic, non-linear, non-convex, non-differentiable, and multi-objective optimization.
Instead of fearing the next meeting with the business stakeholders and wondering what changes they will want this time that will render much of our work useless, we frequently find that Seeker makes it so easy to change problem formulations that we can even pro-actively experiment with different specifications to help the stakeholders understand some trade-offs better, for example, between risk and expected costs, profit and service quality, inventory and stockouts, etc. In this way, we can engage in a much deeper discussion about what specification will actually create the most business value, unincumbered by the restrictive modeling capabilities of traditional MIP solvers. It is frequently this exchange that creates the buy-in needed to make a project succeed.
Importantly, do not focus on the speed with which your optimization model works, even if time-to-decision is an important factor in the end. The objective at this stage is to find out as precisely as possible what value an automated decisioning process can potentially deliver for the business while gaining an increasingly better understanding of the requirements and targets. At the same time, any co-development that needs to take place on the system, data, or execution side should take place simultaneously in an agile fashion as well.
While iterating, as soon as the problem requirements appear to be understood well enough, we go into shadow mode. That is to say, test as early as possible if the plans created actually work, either in a sandbox or in practice. Depending on the business use case, this can mean simulating a generated plan and comparing it with the baseline and/or actually executing a few optimized plans in real execution. In this way, with as little risk as possible, we assess if our solutions work or if our problem specification needs further refinement.
Create the Alpha Solution
After maturing our solution and gaining a sound understanding of the real business value that can be materialized, it is time to build a solution that could, in principle, go into production. At this stage, we still iterate on our model, but no longer to incorporate changing business requirements we were not aware of before but to improve optimization efficiency.
We may experiment with different formulations that implement the same problem specification, which will aid the solver in creating high-value plans faster. For a solver like Seeker, this stage typically includes the first automatic tuning of the solver for the specific model that was created. Note that this feature tailors the optimization search for your specific business problem, which includes the time you can allot for the optimization process. Seeker employs different search patterns depending on whether there are 60 minutes or 60 seconds left for optimization. In this way, the machine determines automatically how best to invest whatever computing time you can afford to spend on the optimization.
To improve optimization efficiency, you may also consider running your solver in parallel. Seeker offers true distributed optimization that is not limited by the number of cores on your CPU as are thread-parallel solvers. All that is required to harness parallel compute power for Seeker is to add two integers to your model, one that uniquely identifies your problem and the other that identifies the process that collaborates with others on this same problem. In doing so, you not only gain an easy way to harness as much compute power as is needed, but you also make the solution process resistant to hardware crashes. Seeker will continue its search exactly at the point it was stopped, which makes it easy to recover even after severe machine interruptions.
Note how late in the game the expert optimization work actually takes place. In this way, we avoid building elaborate solution approaches for incomplete or flawed problem specifications. Moreover, right after this step is done, we can be very confident that the resulting solution meets expectations and does not dangle in mid-air, unsupported by data that cannot be provided, compute that cannot be connected, and incapable of being caught by an execution whose requirements we do not meet.
Which means that as soon as we have our optimization model, we can go into production. Directly.
Deploy the Service
Finally, we deploy our solution, first in Beta, always ready to redeploy the original decisioning process if needed, and then for real. If you are using a deployment service like NextMV, you automatically get options to run multiple optimization models to see if an updated model gives better solutions in shadow mode or to deploy a new model sporadically and send its solutions to the execution step to assess its validity in practice. If you stand up the compute yourself, you will want to add monitoring features as well as model update features yourself to facilitate the ongoing support of the optimization service.
Importantly, make your service as robust as possible. Think about computing sub-optimal but feasible fall-back solutions that can be sent to the execution stage should the full optimization run fail for some reason. The person or department in charge of monitoring the service should always have options to keep the operations running.
Project Plan
In summary, we suggest that you follow this framework:
- Determine the business case. What decisions must be taken? How are these decisions being executed? How are these decisions made today, based on what information, and how do they create business value or costs?
- Build an initial, optimize-nothing, end-to-end service, from data to plan execution.
- Build and iteratively refine an optimization model in an agile fashion, learning business requirements as you go. Engage with the business deeply to fully understand what is needed and what creates the most value. Begin to test your solutions in execution or simulated execution. In parallel, refine the system, data, and execution components as needed.
- Optimize your optimization. Refine your model to improve time-restricted solution quality. Tune your solver. Parallelize execution.
- Deploy the service, make it robust by adding model monitoring, test deployment, and disruption recovery capabilities.
We wish you good luck with your projects! And of course, if you want help at any stage, we are here for you.