What Hardware Trends Mean for Optimization, Or Why Your Attention Is Being Misguided

What Hardware Trends Mean for Optimization, Or Why Your Attention Is Being Misguided


Moore's famous “Law” states that the number of transistors per area doubles every 2 years. While this pace has slowed to about 3 years and modern chips approach physical limits, the trend definitely still goes towards more and cheaper compute cores.

We recently saw some efforts to accelerate optimization using GPUs. While this works in principle, the benefits only apply to a very small niche of extremely large continuous, non-combinatorial problems. For optimization algorithms for which an in-sync, marching band-style search does not apply, it is at best unclear and more realistically improbable that GPUs will have a significant impact on the future of optimization.

Even further remote is the dream of quantum computers making an impact on optimization in the foreseeable future, so we will not go there.

What is left are CPUs. To maintain or even increase the yield of production of these chips, modern CPUs are less and less integrated and more and more follow a chiplet design. The easiest way to increase the number of cores by far remains a distributed network of machines.

 

A Rack Full of Cheap Machines


Take my favorite desktop right now, the Mac Mini. In its base version, it costs $450 and features 4 performance cores plus 6 efficiency cores. That means, for under $10,000 we can buy a cluster of 22 Mac Minis with 88 performance cores plus 132 efficiency cores that can connect via a 10GBit/s ethernet and that work on 352GB of RAM plus 5.632TB storage.

This cluster would offer an estimated sustained 20 TeraFLOPS (FP32) of compute power (see https://lnkd.in/e5pMSyur.). Until about 2010 this cluster would have appeared in the TOP500 list of the world's fastest computers. Now we can buy it for $10,000.

A question for optimization users: How much of that performance improvement has reached you?

 

Legacy Optimizers Do Not Scale


The reason why you did not feel this massive increase in computer power per dollar is that the legacy optimization solvers do not parallelize very well. But this is needed, as machines do not get much faster anymore; we can just afford many more of them. While you can run a MIP-solver on 100 cores in theory, you will likely not see speed-ups of more than 6. Ever.

This is another reason why we build a primal optimization solver at InsideOpt. Seeker is fully distributed and scales well to even 1,000s of cores. So while the world is watching in awe as trillions of dollars are invested in genAI data centers, the real puzzle is why users of optimization software cling to legacy solvers that do not parallelize.

Maybe it is time to use tech from this decade?

* Picture Credit