Dualnoise: benchmark

Showing posts with label benchmark. Show all posts

Sunday, January 19, 2014

The King and the Vampire - 3: The Flaw of Optimizing-On-Average

This is the third episode of the 'King Vikram and the Vetaal' series.

(link: juneesh.files.wordpress.com)

Dark was the night and weird the atmosphere. It rained from time to time. Eerie laughter of ghosts rose above the moaning of jackals. Flashes of lightning revealed fearful faces. But King Vikram did not swerve. He climbed the ancient tree once again and brought the corpse down. With the corpse lying astride on his shoulder, he began crossing the desolate cremation ground. "O wise King, it seems to me that your ministers are sometimes too quick in taking policy decisions based on an average scenario, ignoring the distribution. But it is better for you to know that such decisions invariably results in complaints. Let me cite an instance. Pay your attention to my narration. That might bring you some relief as you trudge along," said the Betaal, which possessed the corpse.

(pic source link: pryas.wordpress.com)

The retailing vampire went on:
Once there lived a retailer who always optimized his scarce resource constrained planning problems based on an average planning week. It was a quick and easy heuristic, and he claimed that it worked just fine. One day, an OR practitioner challenged this assumption, quoting points from the well-known 'flaw of averages' book, and said that with a bit more effort, and without increasing the problem size much, one can generate true optimal merchandising decisions by including all scenarios, using CPLEX. This would also be relatively more robust in reality compared to optimizing to the average. The retailer replied that while this issue was of theoretical importance, it did not matter much in practice, unless he saw some hard evidence. The practitioner decided to make an empirical point and proceeded to evaluate a number of historical instances by evaluating the recommendations generated by these two approaches over all the scenarios. In 75% of the instances, the practitioner's method yielded relatively better metrics, while in 25% of the cases, the retailer's average method did better. The retailer took one look at the results and said that the experimental setup was erroneous, inconclusive, and remained unconvinced.

So tell me King Vikram, Why did the retailer conclude the test setup was faulty? Was he correct in this assessment? Which method is better in the retailer's context? Answer me if you can. Should you keep mum though you may know the answers, your head would roll off your shoulders!"

King Vikram was silent, then closed his eyes, as if going into a Yogic trance in a moment of intense meditation. He reopened his eyes soon enough along with a smile, and then spoke: "Vetaal, unlike the last time, this one is pretty easy, so let's take the first question first.

1. The retailer felt the experimental setup was flawed because, if the practitioner's method truly yielded optimal solutions as claimed, then it should have done just as well or better in 100% of the instances.

2. However, the retailer was wrong because his old method optimized against constraints based on an average scenario. There is no guarantee that his decisions will be feasible to the original problem, over all scenarios. Therefore, in the instances where the old method did better, the recommendations had to be infeasible, since we cannot, of course, find a feasible solution better than an optimal one.

3. So we now know that the old method generated infeasible solutions at least 25% of the time. If we ignore these cases where we know it was surely infeasible, and look at the remaining 75% of the cases where it may have been feasible, it did worse 100% of the time due to a combination of suboptimality and overly-constrained instances.

In every case, the old method was either infeasible or suboptimal. The average-based heuristic must be discarded.

No sooner had King Vikram concluded his answer than the vampire, along with the corpse, gave him the slip.

(pic source link: omshivam.files.wordpress.com)

Wednesday, July 31, 2013

86400X speedup?

I once read a research paper that stated that their customized nonlinear solver reduced computational time for a particular problem class from days to seconds, i.e., something like an 86400X speedup.

(pic source link: espn.go.com)

Digging a little deeper, it seems the authors did not notice prior work that solved similar sized instances of a more difficult discrete nonlinear case, using an analogous CPLEX-based approach, in a few seconds to a few hours in the worst case. Even a conservative 'from several minutes to a few seconds' mean-improvement is impressive (~100X faster). After deleting complicating side-constraints and relaxing integrality restrictions, the resulting continuous relaxation can indeed be solved really quickly compared to the original problem.

Amartya Sen recently confessed to pulling numbers out of thin air to grab people's attention, lending credence to the claims of his detractors. I hope O(claims) does not turn into a total marketing game in the future.

Sunday, July 22, 2012

Ekthetikophobia: The Fear of the Exponential

More specifically, the fear of having to solve an NP-Hard optimization problem to save your job.

Do you or anyone else in your organization exhibit symptoms of Ekthetikophobia?

Sample Responses by Ekthetikophobes
1. Resign: Capitulation is by far the most common response, especially if the person does not have an ORMS background and is oblivious to the 'science of better'. Throw hands in the air, lose faith in humanity, and slurp spoonfuls of the nearest meta-heuristic algorithmic prescription provided by bees, ants, bacteria, squeaky wheels, mutant turtles, ... any nature cure that uses pseudo random numbers. This response is best captured by the Hindi proverb "Naach Na Jaane, Aaangan Teda",i.e., a bad dancer blames the uneven floor.

2. Controlled Panic. This is pretty typical of a generation of OR practitioners weaned on CPLEX. Like MDs trying to decode a deadly new strain of flu, the overriding urge is to throw money at the problem by ordering the latest versions of the baddest bunch of industrial strength optimization solvers in the market with matching supercomputing accessories to run massive benchmarks, and generally scare the pants off their IT managers who work with depression-era budgets.

3. Code. This is relatively more common to programming gurus and follows exactly one rule: If you throw sufficiently well-written code at any problem, it must work. This is so crazy, these guys are on to something here.

4. Publish. The median response from E.phobic research folks is to put this exhibit on a pedestal for everybody to gaze at. It's akin to a biologist discovering a new specie or an astronomer sighting a new planet that shouldn't have been there. Half of these people (surely OR types) will go on to demonstrate the monstrosity of this new problem based on diabolical worst case instances that even a God who plays dice would not inflict on mankind. The other, more elegant half (theoretical CS E.phobes) propose 'factor of 2' approximations that cover all remaining difficult instances missed by the Muggles, yet just as useful practically, before declaring victory. Then over the next two decades: keep shaving of that factor; rinse & repeat; it's a veritable cottage industry.

Exaggerations aside, ranked at the top of the most systematic, resourceful, innovative, and practically useful responses toward 'new' NP-Hard optimization problems must be the approach adopted by Dantzig, Fulkerson, and Johnson (1954) to tackle a 49-city Traveling Salesman Problem using 1950s computing technology and fearless minds that dared lasso the exponential.

July 22: minor fix: added missing text.

Monday, March 26, 2012

Gender-Shaping III: Is Amartya Sen's Missing Women Count Exaggerated?

This is the third post in this series on gender-shaping. The previous installment can be found here. Thanks to a twitter link, I came across a 2010 journal paper: "Missing Women: Age and Disease," Siwan Anderson (University of British Columbia) and Debraj Ray (New York University) published in Review of Economic Studies Vol.77.

This paper has among other things, investigated Amartya Sen's '100 Million Missing Women of India' claim that is attributed to systemic discrimination. Anderson and Ray have estimated the number of 'missing women' in India, China and Sub-Sahara Africa by age and cause-of-death (not done before) while also moving away from the simplistic aggregate sex ratios that were used as baselines in prior works. The authors make the following useful observations: Defining missing women by differences in aggregate sex ratios can be misleading, or uninformative (or both). It is misleading because different countries have different fertility and death rates, and (in particular) different age distributions. They will have different disease compositions.
They may also have different sex ratios at birth for genetic or environmental reasons that have nothing to do with missing females.

The procedure is also uninformative: we cannot tell at what ages the missing women are clustered, or what diseases are responsible. Thus, we cannot begin to ask about the various
channels: discrimination, biology, social norms, and so on. Answering these questions is of profound importance. By unpacking missing women by age and disease, our paper takes a limited and preliminary step in this direction."

From an OR perspective, we extensively rely on similar customer segmentation models (in revenue management for e.g), and this additional age- and causal-factor based segmentation appears to be quite important and yields two main results as well as a comparative result that may be interesting to an U.S audience:

1. A large fraction of the missing women in India are not infants (less than 20%) but adults, and is attributable to other factors like disease and injury, apart from any systemic discrimination. Consequently, any claim of exclusively female infanticide driven 'missing women' in India is rejected. On the other hand, this paper finds that 44% of China's missing women are in the prenatal age-group. Here is a snapshot of sex-ratio by age, taken from the Anderson & Ray paper:

2.The authors make an interesting comparative comparison with the U.S: "we observe some similarities between age-specific percentages of missing women in the historical United States (ca. 1900) and India or sub-Saharan Africa today".

3. The Sen count (100 million missing women) appears to have been calculated with respect to a specific counterfactual: The overall sex ratio for N. America, U.S and Japan. An alternative calculation by Coale (1991) comes up with a more conservative estimate of 60 million. Anderson and Ray perform similar calculations but at the segment level (i.e. by age-disease) and generate missing number estimates using more carefully chosen counterfactuals as the baseline and find approximately 20 million missing women in India (aggregated across all age groups), while the corresponding figure for China is 58 million. Furthermore, 'injury' is not an insignificant culprit in India across all age groups, a potentially worrying trend that its government must look into. (The paper alludes to the old bogey of 'dowry deaths' as a probable cause which may not turn out to be the case. A similar detailed analysis is required).

The findings of this paper also weakens a statement in a previous post on this topic that a skewed overall male:female ratio in a region is a 'scary indicator' of female infanticide being practiced there. My statement ignored the age distribution as well as the 'cause of death' dimension. Bad O.R, but I have Amartya Sen for company.

Tuesday, November 29, 2011

American Airlines Chapter 11: An OR Opportunity within a crisis

The news of American Airlines' Chapter11 may sadden some members of the OR community given their pioneering work in revenue management apart from their amazing impact on the overall ORMS practice in the airline industry. However, this crisis is also an opportunity for O.R. The company will be taking a close look at their business model as well as their more tactical operations and ring in a chain of meaningful improvements. Speaking from a similar 'chapter 11' experience during the last decade, this filing represents an opportunity for OR practitioners that must be grabbed. This situation highlights what this tab has talked about before: having a well trained ORMS team with domain expertise is invaluable and i'm sure AA has retained a lot of their experts.

Specific example
The renegotiation of the existing collective bargaining agreements (CBAs) with various unions is a great application example that can generate significant 'true dollar' returns. OR can be gainfully used to analyze rule changes that significantly improve efficiency while also improving the quality of work life of stakeholders to produce win-win deals. During 2003-2006, OR folks at a similar legacy US carrier were able to quantify and successfully redefine a set of highly complex work rules that ensured better utilization and compensation without sacrificing quality of work life. What's more, such feedback can help prevent CBA negotiations from stalling and win some trust from both sides of the table. In such situations, the management is likely to be more flexible in 'opening up' old rules, allowing the full use of powerful solvers and large-scale optimization models to determine the maximum level of improvement possible by replacing such legacy constraints with more sensible ones. Every (deterministic) dollar saved during such stressful times is precious and is highly appreciated. In short, Chapter-11 is a time when OR folks on the ground can make their presence felt and directly contribute to the bottom-line. A friend in need is OR indeed.

update: some typos fixed

Sunday, November 13, 2011

The best solver award in a decision support role goes to ..

Major ORMS conferences typically coincide with new releases of existing commercial optimization software products like CPLEX and Gurobi, as well as the introduction of new ones. In the last few days, we have seen the entry of an exciting new solver product, Sulum Optimization. Dr.Bixby's eagerly awaited updates on the state of the solver universe speak of truly breathtaking improvements. So how important are such off-the-shelf solvers in practice?

A complementary 'state of the union' question that is worth asking is 'How much progress has been achieved by humans as far as analyzing decision problem formulations in practice ?' While glancing through the journal papers in the last three decades and comparing the work with the approaches taken by researchers in the 1940s-1970s, I couldn't help but feel that there is a non-improvement, perhaps even a retrogression along this dimension. It's tough to quantify the answer, but we can try to describe reasonable boundary conditions.

1. A terrible formulation is one which cannot be readily analyzed to determine an acceptable solution to the original business problem. This is not the same as 'finding a feasible solution to the mathematical optimization model', although these questions are somewhat related. Yet, a whole lot more time and publicity is directed toward the latter question. Apparently quite a few MI-Hard (mission impossibly hard) problems have been discovered out there, especially in the last three decades. These instances are so mysterious that even an acceptable solution is elusive despite the obviously massive progress in solvers and hardware. What does this tell us? An acceptable solution is either already available or can be found using OR / business insight. I certainly haven't yet come across such a MI-Hard instance across multiple industries.

2. A viable formulation is one which you can analyze to find an excellent answer to the original problem by partial or complete enumeration, within a reasonable amount of time. In other words, you are doing your job as a practitioner well if your design simply avoids crappy solutions to the original problem, leaving you with the task of scanning just the few remaining good ones. This process is merely a refinement of the 'acceptability analysis' in (1) (Watson would ask what is 'constraint programming' ?). Not only is the process of being a 'human solver' quite enjoyable, it also goes a long way in convincing your employer and a customer that a competitor cannot simply buy their own solver and achieve parity. Plus it builds street-cred for you and OR. On the other hand, a few journals love to have MI-Hard problems subjected to 'shock and awe' within their pages.

Summary
There may be rare exploratory formulations that require the full power of such solvers to generate insight when very little is practically known about the original problem. In general, solvers are useful in practice like health insurance is useful in health care. Some feel that buying the best or most expensive health insurance will somehow translate into an equivalent improvement and stability in our health, forgetting that fundamental choices like exercise, lifestyle, and diet are far more important factors. In fact, if we find that we are using our health insurance a lot, it is all the more important to question those fundamental choices.

'Which is the best solver today' is a interesting question, but the best decision model that you build will be one that most certainly does not depend on the answer.