Tuesday, October 23, 2012

The Gaussian Hare, the Laplacian Tortoise, and Big Data

Alternative Optimal Solutions
A recurrent theme of this tab is to highlight an important contribution of O.R decision modeling: alerting us to the presence of alternative optimal solutions (AOS). Prior posts relating to AOS can be found here and here.

Customers unfamiliar or uncomfortable with the jagged-edged linear programming (LP) models often seek refuge within the smooth world of classical calculus (a world now known to have been initially crafted by Kerala mathematicians, but i digress). Here, alpine slopes of gracefully changing functions invariably allow you to rapidly ski down to a uniquely best solution. Like the truism "Football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans win", an applied math legend is that the Gaussian hare invariably trumps the Laplacian tortoise. Unless of course, modern day optimization solvers and decision science come into play and allow Laplace to overcome the various complications posed by non-smoothness. (Image below linked from http://www.econ.uiuc.edu)

Well specified decision models in a variety of industrial situations admit plenty of good quality answers, so the problem is usually one of having 'too many' rather than too few options (my favorite academic researchers tackle increasingly difficult unsolved problems, whereas my favorite OR practitioners compete on identifying the easiest unsolved problems). A fundamental thumb-rule of practical decision optimization modeling is to advantageously exploit and defuse this often-deadly problem of 'degeneracy' that characterizes practical LP formulations, and a reasonably skilled analyst can turn a potential numerical liability of the LP model into a business analytical asset as follows.

AOS often hide in plain sight
The presence of alternative answers forces us to revisit our translation of business rules into an LP and devote some time toward gainfully analyzing the differences in the potential business impact of these seemingly equal solutions. The customer can use this feedback to re-examine and further refine his/her business goals and priorities. This process of iteratively improving the design specification provides valuable insight to customers, and helps setup a richer, and a more practical and robust optimization model.  Not that a similar exercise is impossible to accomplish using smooth approximations - just that the flexibility afforded by an LP model is often superior, and the tool kits for analyzing LP solutions have gotten amazingly better over the years. 

Means and Ends
It just doesn't make sense to crunch through 21st century "Big Data" analytics using bleeding edge data mining, econometric, and machine learning methods on the one hand, and on the other hand, downgrade to 19th century techniques, or random black-box methods to manage the underlying decision optimization problems and produce severely suboptimal and non-robust solutions. Using such shortcuts because "a quick one-iteration improvement is all that is needed" brings along with some risky side effects and potentially leaves a big chunk of 'Big-Data' value on the table. Do everybody a favor and upgrade to a Laplacian tortoise (e.g. CPLEX) and you will be surprised to see how fast it runs, especially on Big Data.

1 comment:

  1. Thanks for the information and links you shared this is so should be a useful and quite informative!
    Data Analytics