Dualnoise: Data agnosticism

Showing posts with label Data agnosticism. Show all posts

Monday, May 6, 2013

Guesstimation in Optimization and Analytics

Guesstimation
Browsing through the various examples examined in the 2008 book 'Guesstimation', it is apparent that the basic ideas behind this approach of "solving the world's problems on the back of a paper napkin" are useful for analyzing real-life optimization and analytics instances. Whether we are looking at (i) Smart-grid optimization ("how many hours does it take to charge an electric car"), or (ii) Hazmat routing ("what is the risk of dying per unit mile of road") or (iii) environmental problems ("how much farmland will it take to convert all gasoline cars to ethanol"), guesstimating values can be critical to effective model formulation in the absence of precise information. Guesstimation is a 'small data' method that can improve the effectiveness and performance of our 'big data' analytics.

Optimization
Let's look at an 'elastic' linear programming formulation:
LP: Minimize c.x + P.s
A.x - s <= b
l <= x <= u, s >= 0
where we've added artificial variables 's' to ensure feasibility of LP for any bound feasible 'x'. Often times, exact values for b, c, l, u, or portions of A are not available. Furthermore, we allow artificial variables to be active at some high but as yet unknown penalty rate of 'P'. We often end up guesstimating these values in industrial applications since many of these values are not known accurately.

Example (i) in the previous paragraph gives us a charge rate per car (a component of 'A' matrix). if 'x' is the number of cars to be charged, and 'b' is an upper bound on the time available and 'c' the cost of electricity for that duration, we can predict the 'ballpark' cost incurred and the charge stored. Similarly, example (ii) gives us the average risk per unit mile as a cost coefficient 'c'. By extending this calculation to denote high and low risk arcs in a network, we have an initial minimum risk network path optimization formulation at hand. Finally, example (iii) gives us an unconstrained resource requirement. This value allows us to quickly assess the impact of adding a resource constraint.

Thou shalt always Guesstimate!
Direct uses of guesstimation in practical optimization include the calculation of tighter bounds (l, u) for decision variables, and the smallest valid values for the big-M numbers that riddle integer programming formulations. If we think of (u - l) as a measure of our ignorance about the as yet unknown optimal 'x', guesstimating tight bounds is akin to maximally reducing its "prior variance". While guesstimation in the book is done using feasibility calculations, we can and should incorporate optimality considerations to obtain sharper bounds for such unknown variables. Using solver default (0, infinity) values is lazy and bad practice. Also, spending some time to improve bounds often gives us insight into the nature of the optimal solution, and which constraints are likely to be more restrictive, and helps catch modeling bugs earlier.

Example 1: what is the cost of not scheduling a crew for an airline flight?
In airline crew scheduling, the flights in a schedule are partitioned among crew pairings (see this post for a problem description). Every flight must be covered by exactly one pairing. This is a hard constraint, but in practice, we have to temporarily employ its elastic form of the type described above, and penalize violations via 'P'. How do we determine an appropriate penalty value? This would be the cost of leaving a flight uncovered. Such flights have to be covered by a reserve crew, which comes at a certain cost that can be calculated and is typically smaller than some default value we may have otherwise chosen. The dual variable value associated with an uncovered flight in an optimal LP solution attains a value close to 'P'. Any new column (x(j)) generated that covers this flight will subsequently will have a reduced cost of order (c(j)-P), and so on. Good guesstimation of P helps in producing manageable reduced costs, and accelerates algorithm convergence.

Example 2: what is the price elasticity of point-and-shoot digital cameras?
Price elasticity is a dimensionless (negative) quantity that gives us the expected percentage change in sales for a percentage change in price. We can expect that this value will be more than that for electricity, grocery, and dairy items, which are essential items that people are less willing to give up when price is hiked, and are likely to be inelastic (less than a percent drop in sales for a percent increase in price, say -0.5). The value will be lower than that of highly discretionary purchases like fashion apparel or leisure airline tickets (which have an elasticity of -4 to -2). The geometric and arithmetic mean of 3.0 and 0.5 gives us a range (-1.8, -1.2). A data-driven estimation of this value should be compared against this guesstimate as a sanity check. For example, if regression models on a data sample estimates an 'optimized' elasticity parameter value of -5 or -0.1, we may want to double check our modeling, or run the risk of producing wild predictions.

Another use of guesstimation is in generating plausible random optimization instances to demonstrate the effectiveness of new NP-Hard algorithms for a certain problem class if only limited real-life data is available. It can help weed out weird and contrived instances that skew the degree of difficulty of the problem and make it seem much harder than it actually is in reality.

Tuesday, February 23, 2010

O.R. Ball - Defense and Offense

Often times, the difference between the theory and practice of O.R. is the same as that between a carefully choreographed brain surgery at Boston General versus brain repair performed at the MASH 4077. In the former case, the idea is come back with a generic breakthrough that can be safely re-used for all surgeries. In the latter case, the aim is to get through the Korean war (on TV albeit) with the least number of casualties among the huge number of wounded that randomly show up.

The textbook approach prides itself on being data-agnostic and coming up with new "small polynomial" techniques that exploit special structure in a problem class. Data agnosticism is something we practitioners can ill-afford since the proof is in the pudding. If we build the best recipe that is optimized for no more than 6-7 guests, and we never expect to see more than ten guests, ever, then it is pointless to fuss over a complex recipe-generator that optimally serves a thousand guests, but can't serve five as quickly and as well. And if there are thousands of such five-guest parties to be served, the former approach wins hands down. Of course, now if we were optimally designing a dam or two, things would be a little different since strict service levels come into play. So it is case dependent, and bringing your skill and art into play to deal with such differences and putting your money where your model is, happens to be one of the reasons why O.R. practice is never dull.

Textbooks recommend that we exploit problem structure. Industrial optimization exploits problem structure to an extent, but can and should exploit the structure in data. The Simplex method that is making such a robust comeback via GUROBI is worst-case exponential and rarely does a bad job because good implementations thrive on real-world LP data and the numerical properties of computers. In the real world, even strongly NP-hard optimization problems are quite manageable. Do not let textbooks scare you! After all, there were no computers around when architects in South India optimally designed the Brihadeeswara temple a thousand years ago - the design required that the tall temple's shadow be constrained to within its (convex) perimeter any time of day. They achieved an elegant and 'feasible' design that stands 216 feet high while using incredibly heavy granite stones to do this. And yes, the location of the symbolic 'idol' of the deity coincides with the centroid of the overall structure. Now this has got to be one great place for spiritual reflection or thinking O.R!

It's a strange dichotomy in the global O.R. community. The universities are focused on 'defense' and abhor seeing any "2^n" kind of numbers anywhere - a systematic O.R. approach that generates lower bounds to protect your answers, but is also capable of forcing turnovers, i.e., can be quickly converted into good quality solutions with a little bit of imagination. The book on the Traveling Salesman Problem by the stalwarts of our discipline is quite fascinating in this context. The O.R. business community goes with whatever brings home the Dosa, exponential or otherwise, and there is healthy scorn for the 'defense' part. Approximation / local optimum / randomized methods can be thought of as being part of the 'offense'. It gets the glory and can be used to quickly initiate a product, and that perhaps is a reason why many a young practitioner is hooked to it. However, it is generally a good idea to consider using both approaches. Over the product life cycle, it is usually the best defense-offense combo that wins the game, all other things being equal.