Dualnoise: April 2013

Sunday, April 21, 2013

Fun with Coins: Optimizing The OR Cafe Price Menu

We've looked at a few coin-optimization models here before. ... We'll look at another one today. We'll try to price the menu at OR cafe. You can look at the detailed problem statement and the associated sample data here. As always, there's no claim that the method described here is a 'best' or even a 'correct' way to manage such a problem. This post serves as a useful starting point for analyzing this fun problem with coins and change.

Basically, the problem we are analyzing in this post here is to change the existing prices of items in the menu by tiny amounts (+/- 2 cents) to maximize the number of (future) customer bills that end with zero or five cents (i.e., $x.05, $x.10, $x.15, ... $x.95) after (13%) tax. Assuming that past transactions (200 of them) are a good indicator of future purchase patterns, we can select a portion of this data (say 60%) to 'train' our pricing model (MIP) on, and then evaluate its performance on the remaining 40% of the transactions.

Model
1. Decision variables: There are five pricing possibilities for the recommended price of an item p(i) = {p0(i)-2, p0(i)-1, p0(i), p0(i)+1, p0(i)+2}, where p0(i) = original price of the item in the menu.

2. Bill(transaction j) b(j) = sum(i) Aij.p(i)*(1+T/100)
where:
a. Aij = number of units of item (i) purchased in historical transaction (j)
b. T = percentage sales tax charged

3.The minimum (l(j)) and maximum (u(j)) values for b can be determined by setting all prices to their lower bounds, and upper bounds respectively. We assume that fractional cents are rounded to the nearest cent. This gives a pre-computable range of feasible b values V(j) = {l(j), l(j)+1, ...., u(j)}.

4. Objective: Maximize the number of times b ends in 0 or 5 (using boolean indicator variable 'z' and 0-1 cost 'c') in the training set by selecting one price from the 5 possibilities for each item

CAFEMIP: Max sum(k,j) c(k,j)z(k,j)
subject to:
ii) sum(i) Aij.p(i)*(1+T/100) = sum(k)V(k, j).z(k, j) for all j
iii) SOS-1 constraints for: prices p(i) that ensure that one of five prices is picked for each item in the menu, and for z(j) to ensure that exactly one bill realization is active for each transaction.

Round-off Issue
The LHS of constraint 4(ii) can contain fractional cents, whereas the RHS only admits discrete values. This situation can be overcome as follows:
b(j) = sum(i) Aij.p(i) + sum(i) Aij.p(i)(T/100) = b'(j) + tax,
where b'(j) is integer. Every realization of b'(j) (=V'(j)) can be mapped to our 0-1 objective function contribution after accounting for the additional tax it generates. Thus, it is sufficient to retain the integer components of 4(ii) and rewrite it free of round-off error as:

sum(i) Aij.p(i) = sum(k)V'(k, j).z(k, j) for all j

CAFEMIP was optimized using CPLEX 12.x and implemented in Java. As a secondary objective, a tiny penalty was added to minimize the deviation of prices from the original value to avoid unnecessary price changes or revenue shifts.

Results
1. optimized over the entire 200 transactions (no hold-out sample).We count the number of times, the cafe needed to round to the nearest 0 or 5 using the prices before and after optimization.

Roundings needed: original price: 135/200
Roundings needed: optimized price: 35/200
While the results on training data are really good, we do not know how well CAFEMIP would do on hidden data, so we re-run CPLEX on a subset of the data.

2. Optimized over the first 120 transactions only, and tested on the last (hidden) 80 transactions.
training data: original cost = 77/120
training data: optimized cost = 20/120

--- CPLEX log ----

Tried aggregator 1 time.
MIP Presolve eliminated 34 rows and 17 columns.
Reduced MIP has 230 rows, 847 columns, and 1824 nonzeros.
Reduced MIP has 835 binaries, 0 generals, 0 SOSs, and 0 indicators.
Probing fixed 68 vars, tightened 0 bounds.
Probing time =    0.01 sec.
Tried aggregator 1 time.
MIP Presolve eliminated 4 rows and 207 columns.
MIP Presolve modified 14 coefficients.
Reduced MIP has 226 rows, 640 columns, and 1666 nonzeros.
Reduced MIP has 629 binaries, 11 generals, 0 SOSs, and 0 indicators.
Presolve time =    0.06 sec.
Probing time =    0.02 sec.
Clique table members: 3071.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 4 threads.
Root relaxation solution time =    0.00 sec.

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer     Best Node    ItCnt     Gap

      0     0       13.3095    48                     13.3095       92         
*     0+    0                           73.0000       13.3095       92   81.77%
      0     0       18.8333    13       73.0000     Cuts: 130      189   74.20%
      0     0       20.0000     2       73.0000      Cuts: 22      220   72.60%
*     0+    0                           20.0000       20.0000      220    0.00%
      0     0        cutoff             20.0000       20.0000      220    0.00%
Elapsed real time =   0.16 sec. (tree size =  0.00 MB, solutions = 2)

Clique cuts applied:  34
Cover cuts applied:  35
Implied bound cuts applied:  4
Mixed integer rounding cuts applied:  7
Zero-half cuts applied:  7
Gomory fractional cuts applied:  28

Root node processing (before b&c):
  Real time             =    0.09
Parallel b&c, 4 threads:
  Real time             =    0.00
  Sync time (average)   =    0.00
  Wait time (average)   =    0.00
                          -------
Total (root+branch&cut) =    0.09 sec.
CPLEX objval = 20.000000000000025

original, new price of XS Coffee is $1.24, $1.24
original, new price of S Coffee is $1.33, $1.33
original, new price of M Coffee is $1.57, $1.55
original, new price of L Coffee is $1.71, $1.73
original, new price of XL Coffee is $1.91, $1.90
original, new price of S Tea is $1.33, $1.32
original, new price of M Tea is $1.52, $1.51
original, new price of L Tea is $1.71, $1.73
original, new price of Donut is $0.95, $0.97
original, new price of Bagel is $1.24, $1.24
original, new price of Muffin is $1.30, $1.28
original, new price of Cookie is $0.75, $0.75

(prices in red indicate a changed recommendation when optimizing over the partial sample instead of the entire history)

These prices were used to recompute the bills in the hidden sample:
hidden data: original cost = 58/80 (~72%)
hidden data: optimized cost = 17/80 (~21%)
Thus, we would have been able to significantly reduce the number of roundings on the hidden sample if we used the CAFEMIP prices instead of the original values. If the customer buying pattern remains the same, we can expect these prices to continue to do a good job.

Sunday, April 14, 2013

Optimally Managing Ladybugs

My daughter loves ladybugs. However, when it comes to cleverness, many have noted how bees and ants (which seem to have a special place in Indian hearts :) "solve" traveling salesman and shortest path instances that matter to them (see this interesting link as well). In contrast, a ladybug's decision analytics appears to be somewhat shaky. They don't seem to have the kind of auto-GPS of bees and ants (at least in the context of this post), and seem to get confused and stray from their optimal path.

(pic source)

Recently, a number of LBs invaded my residence, and I spent a lot of time trying to shoo them away and determine how they got inside in the first place so that I could block their entry. Then, I came across this website. Apparently, ladybugs prefer humidity and warmth and try to enter homes as winter begins. OK. However, a good number also enter the house come spring time, which was puzzling. It seems some of the LBs goof up. Ladybugs roosting under windows outside the house, with a certain probability, enter the house instead of enjoying spring time outside. Once inside, they suffer from dehydration and die (unless they figure out the route to the bathroom or kitchen). If we try to force them out, they get stressed and shed 'yellow blood' that can stain walls. The optimal ladybug solution for this time of the year is to simply open the windows, so the ones who came inside by mistake can leave. It's a win-win for both parties, and it actually worked. On the other hand, during winter, we have to try and provide a cozy, alternative home in backyard and insulate the house well to keep them out.

The "Chaos" of India

Thanks to the person who pointed me to an interesting article in some online magazine called "livemint" : "Why Rahul Gandhi has understood the chaos of India"

As the 2014 General Election in India approaches, and pro-establishment journos crank out essays defending their favorite candidates, the author of the above article attempts something truly brave: resurrect a universally panned speech by a leading politician of the incumbent ruling political party. In particular, this paragraph caught my attention:
"... There are two things that stand out in the speech. The first is Rahul’s recognition of the chaos of India, for which he used the image of the beehive. I think this is wonderful realism, and excellent imagery."

As a keen India observer and commentator pointed out on twitter: "....beehive is nt chaotic"

Is India really chaotic?
To many a visitor from the west, it has appeared so. A vast section of India's English media who have worshiped at the altar of Western Universalism and return to lecture India's unwashed masses, may readily concur.

India, for a long, long time, has functioned as a highly decentralized system - in virtually every aspect of life. It's workforce contains among the highest percentage of entrepreneurs in the world, and traditionally possesses a remarkably decentralized economic model. It's indigenous religions are decentralized and base on empirical self-realization. There is no single holy book or authority (there is an entire reference library, with scope for new books to be written). However, this lack of a central authority does not imply chaos or that India was never a nation. It is more likely that what is perceived as chaos is in fact a cognitive overload on the part of the casual observer - as noted in Rajiv Malhotra's book "Being Different". The best example for this apparent chaos that actually produces incredibly effective results is the Kumbh Mela (the largest human gathering on earth, which has been recurring for thousands of years now) that recently concluded in India. Several observers from Harvard and other big-brand western institutions showed up this time to analyze this amazing event. Here's what Rajiv Malhotra says about the Mela:

"....India's Kumbha Mela amply demonstrates that diversity can be self-organized and not anarchic, even on a very large scale. Held every twelve years, this is the world's largest gathering of people, attracting tens of millions of individuals from all corners of India, from all strata of society, and from all kinds of traditions, ethnicities and languages. Yet there is no central organizing body, no 'event manager' to send out invitations or draw up a schedule, nobody in charge to promote it, no centralized registration system to get admitted. Nobody has official authority or ownership of the event, which is spontaneous and 'belongs' to the public domain. Since time immemorial, numerous groups have put up their own mini-townships and millions go as individuals just to participate in the festivities."

How and Why? Rajiv argues in his introduction to the chapter "Order and Chaos":
"Indians tend to be more relaxed in unpredictable situations than westerners. Indians indeed find it natural to engage in non-linear thinking, juxtaposing opposites and tackling complexities that cannot be reduced to simple concepts or terms. They may be said even to thrive on ambiguity, doubt, uncertainty, multitasking, and in the absence of centralized authority and normative codes. .... In the vast canon of classical writings in Sanskrit, we see many context-sensitive and flexible ways of dealing with chaos and difference. The search here is always for balance and equilibrium with the 'rights' of chaos acknowledged. On the other hand, in the creation stories in Genesis and in the Greek classics, there is a constant zero-sum battle between the two poles in which order must triumph."

Interestingly, the socio-biological systems of beehives and ant colonies, which may appear chaotic to the casual observer has an underlying order. In his brilliant book 'Traffic: why we drive the way we do', Tom Vanderbilt quotes studies that note how vast numbers of ants move without colliding (without a centralized traffic cop ant), which is certainly worthy of emulation by human car drivers. The author notes "looking at the swarm as a whole, one might not see what is driving the movement". The insects have their own Kumbha Mela going, perhaps.

Rahul Gandhi's 'beehive' metaphor may not be unreasonable, but it appears to have been chosen based on an entirely shallow reasoning due to his inability to decipher India's apparent "complexity", which he contrasts with China's 'simplicity'. India can be plausibly compared in certain contexts to a beehive or an ant colony not because of 'chaotic frenzied activity', but because of an certain well-functioning harmonic order underlying the apparent chaos.

The lesson here for those of us wishing to properly study India is to look beyond the obvious, look deeper, pay attention to context, and there are amazing discoveries waiting to be made. When Rahul Gandhi, the 4th edition of the Nehru dynasty and potential contender for the post of prime-ministership in 2014 compares India to a beehive for all the wrong reasons; when sections of the Indian English media jump in to spin this into some 'deep thought', it only betrays a fundamental lack of understanding of the essence of India: pluralism, diversity, and cooperative decentralization, blissfully unaware that at its deepest philosophical levels, dharmic India is bound by what Rajiv Malhotra terms an 'Integral Unity'.

Wednesday, April 3, 2013

Synchronized Scheduling Case Study - Part 3

The case study is introduced here.
The ICAR goal is derived here.

Model
The ICAR problem can be specified as follows:
Find a set of feasible crew pairings and aircraft strings such that:
a. every flight in the LEG schedule is covered exactly once by an aircraft string

b. every flight in the LEG schedule is covered exactly once by a crew pairing

c. any string and pairing is used no more than once (let's denote the non-zero pairings and strings as 'active')

d. the total cost is minimized

e. if an active crew pairing contains a connection t(i→j) in T, and there is also some active aircraft string that also contains t, then we say that the crew turns with the aircraft and is thus synchronized. Downstream systems can ensure that the actual crew assigned to flights i and j remain with the actual aircraft tail number that flies i and then j.

f. Maximally satisfy a reliability measure that is proportional to the tightness and the number of the synchronized connections in T. For the purpose of illustration in this post, we specify that a certain minimum count (m) of the connections in T be synchronized. The aim here is to be smart about picking the right airports and connections to synchronize to boost reliability as well as utilization.

g. The active strings belonging to a sub-fleet should form a Hamiltonian circuit. This is a maintenance requirement that tries to ensure equal wear-and-tear across the fleet. There are other constraints that are also interesting topics in their own right, but we'll ignore them here for brevity.

Resource Synchronization Modeling
Our main focus here is on requirement (f). We provide some sample approaches to handling synchronization in large-scale scheduling:
a) Within the subproblem:
Really tight day-connections t in T can be directly enforced by restricting t to be the only valid connection out of i and into j (i→j) in the flight network. Thus any pairing or string that comes of the subproblem that covers flight i will subsequently cover j. However, we cannot apply this idea to all connections in T since that is likely to be overly restrictive.

b) Within the Master program:
Let p(t) = sum of all pairing variables that contain connection t
q(t) = sum of all string variables that contain connection t
Note that p and q will be no more than 1.0. We can impose the following synchronization constraint:
∑_T p(t)q(t) ≥ m

The bilinear terms in the reliability constraint can be linearized in a straightforward manner. A personal favorite is the reformulation-linearization technique (RLT). z(t) ≡ p(t)q(t) ⇒

∑_T z(t) ≥ m, z(t) ≤ p(t), z(t) ≤ q(t), z(t) ≥ p(t) + q(t) - 1

Thus, by adding O(T) very sparse constraints and continuous variables z, we can model the synchronization component of ICAR without requiring any approximation, and the resulting formulation remains tractable. The RLT can be applied in a similar manner to synchronize three or more resources as well. The keys for this 'frontal assault' to work are:
1. The cardinality of set T be manageable in the context of the application.
2. Synchronizing two resources is easier than juggling three or more. We can see from the synchronization constraint that if the p-variables are fixed, the z-variables can be eliminated, leaving us with just the q-variables. This 'two variable per constraint' structure makes life much easier during column generation. The synchronization duals associated with the connections in T tend to work much more effectively in generating string-pairing pairs that cover a beneficial subset of common turns in T.

Epilogue

The ICAR solutions that were actually implemented (different from the illustrative version presented here) significantly improved upon the existing approach. It reduced business cycle time, greatly increased reliability, and reduced costs. The success of the ideas employed to tackle ICAR spurred the development of new, algorithmic techniques to solve the next-generation integrated crew scheduling problems that went on to save the LEG carrier many million bottom-line dollars annually during some difficult economic times. As far as the competition with LCA was concerned (for which this solution was designed): LEG had incurred enough losses and pulled the plug. Operations Research successful, patient dead.