Sunday, February 27, 2011

Analytics and Cricket - V: Death by Forecast

Is cricket the only sport in the world where a game-changing decision, i.e., an "out" is decided by a forecast? Welcome to the "leg before wicket" rule. Here, the umpire (aka the referee in the USA) gives the batsman out if he deems that the ball would have gone on to hit the stumps if the pad had not come in the way. A good percentage of outs in cricket occur via the LBW. There is some additional 'fine print' that must be satisfied in addition to the above requirement for a batsman to be given out and too technical for this tab. Here's a clip of Pakistan's great pacer Waqar Younis getting some LBWs with his fast (90-95mph+) in swinging (curve ball) yorkers in the early 1990s.

The LBW rule is unique in that a batsman is given out based on something that did not actually occur, but would have probably occurred. Not surprisingly, LBWs are the most contentious decisions in cricket. Yesterday's sensational world-cup match between India and England in my home town that ended in a dramatic last-ball tie included an LBW incident. The 2011 cricket world cup allows the use of the "Hawk-eye" trajectory predictor to help arrive at the best decision. Similar technology is used in many other sports now such as tennis. In yesterday's instance, a batsman was deemed "not out", overruling Hawk-eye's prediction, since he was struck on his pads, which was more than 2.5 metres in front of the stumps at the time of impact, a 'magic number' threshold beyond which, the Hawk-eye forecast is deemed practically unreliable. I believe that this is just the start of the problem.

Spin bowling. How the heck is Hawk-eye going to predict the amount of spin (turn) off the pitch ?? In more complex cases, there is drift, dip, as well as turn. Watch the Aussie legend Shane Warne bowl this incredible "ball of the century" some 17 years ago. Would Hawk-eye have been able to accurately predict the path of the ball after making contact with the surface of the cricket pitch ?? I some how doubt it.

Strictly speaking, forecast-based decisions do exist in other sports as well. In basketball, we have goal tends that are probabilistic calls in the sense that the scoreboard is updated based on a forecast rather than an actual basket. Are there other popular sports where non-trivial game-time decisions involve a forecast of some kind ?

In cricket, the forecasting story just doesn't stop there. We have forecast-based rules for weather-affected matches that were devised by OR professors in England - something which we already talked about before. These are causal predictive models employed regularly in professional cricket. Yet another reason why cricket is called the 'game of glorious uncertainties", and why the game of cricket is always an applied-mathematician's delight.

Given that this is world-cup cricket time, the next post will (probably) center around cricket, and will focus on a very deterministic analytical modeling element. Go India ...

Sunday, February 20, 2011

Dating and wedding logistics: an OR opportunity area? is yet another online dating portal start-up by a bunch of young New Yorkers. However, this one is a bit more interesting from the OR perspective. It specifically targets 'group dating' noting that "meeting someone one-on-one is more awkward than a junior high dance" and also acts as a 'dating logistics' enabler. They also do 'group profile' matching. One gets the feeling that there's bound to be some OR methods applicable here to improve upon this original idea.

However, the most interesting aspect of this startup was that it was not very successful in igniting the NYC scene, but within a few months, their biggest customer segment came from India, much to their surprise. Why? pulling off one-one meeting coups in India generally tend to be even more daunting. Plus the fact that Indians love novelty while also holding on to the time-tested. As we noted a couple of weeks ago, traditional weddings in India makes up a huge fraction of the weddings in world at any point in time (H-hour, the most auspicious time for the final ritual, occurs at night for North Indian weddings, and during the day in the South :). This makes an upwardly mobile and economically liberalized India an irresistible growth market for practically anything new, from stealth jets to instant Mehendi/Henna, and of course, novel online dating services. To back up this claim, we go back to our tried and tested indicator - Indian movies. Yes, we have a new B'wood movie around the dating/movie logistics theme. In fact, this one is a pretty decent and successful yarn about an pair of entrepreneurs who make it big scoring contracts for the scheduling, synchronizing, and sequencing of the various events in an Indian wedding, which are among the most elaborate, yet intricate, in the world. The complexity and the number of hard and soft constraints that have to be satisfied here is likely to be challenging. Seems like a great niche O.R. opportunity area.

Saturday, February 12, 2011

Driving on 'E': Mad Max and OR

While doing my commute to work yesterday on the parkways of NY:

9 AM: fuel indicator hits 'E' and still some distance from the destination. After an initial surge of panic, like any good OR person i decided to build a quick inventory model of the quantity of gas (that would 'petrol' if you are Indian) held in my car. With all those NY drivers racing at breakneck speed (like those Mohawked goons from the Aussie outback in 'Mad Max'), doing real-time inventory optimization and driving safely is not easy.

9:10AM: Consciously slow down inventory depletion rate. Stay within your auto's fuel efficient range around 45-55 mph. Resist the urge to go too slow or too fast. Turn off heating - maybe that would help too.

9:12 AM: Toyota surely must have sound safety stock calculations thrown in to calibrate the fuel meter, so 'E' is more likely to be the 'replenishment point' rather than a truly empty "back order" point. Reasoning helps reduce panic. It's a new job in a new geographical area, but the car is the same old and trusted sedan.

9:15AM: Use GPS to locate nearest replenishment point. The nearest one was just 2 miles away. Good. When I get to that spot, there's just a pile of snow. Data issues with the GPS.

9:20 AM: Do I trust the GPS and search for another gas station or do I head to the office and postpone my decision to refill on return? feeling confident that there is enough in reserve, I head straight for the office staying on the highway, where cars are more fuel efficient, and make it, and park the car in the shade.

12PM: Offline analytics to plot return trip in the evening. I find this really great site. For an input car make and model, it displays the sample mean and deviation. I was not even close to riding on the edge. The distribution of gas miles after hitting 'E' shows a mean value of about 45 miles and a standard deviation of 25. Some road warriors appear to have done a hundred or more. There was one who apparently refueled his tank with 18.064 gallons and must have been running on fumes. On the other hand, the minimum value is 2.0 miles, indicating that there was a reasonable expected cost of being stranded in sub-zero conditions on a highway looking foolish and may yet be stranded in the office.

5PM: Decision optimization time. Do I bet on the analytical model and drive home to refuel at the gas station that (certainly) existed today morning next to my residence and also sells cheaper gas? or do I head for the nearest gas station from my current location? If I head home, I have roughly a 2/3 data-driven chance of making it based on the normal distribution fit to the curve on Despite this comfortable probability of success, I realize that it's relatively easier selling OR models to others. Furthermore, what happens if I'm stuck in a return-commute traffic jam? I decide to leave a little later to avoid peak traffic. However, if I can locate a gas station closest to a point on my shortest path to home, then that is an optimal route. The treacherous GPS will (hopefully) redeem itself and find a good solution to this tiny traveling salesman problem.

View Larger Map

5:45PM: The GPS is wrong. Twice in a day! I expended about 6 miles on this wild goose chase and burnt valuable daylight as well. I am no closer to home, my tank is still on 'E', my night vision is poor, and i'm freezing. I figure my odds have dropped close to coin-toss range. The GPS has been a let down as far as finding non-fictional gas stations for this highly wooded area that is still new to me. Given the darkness, I think I'm doing the sensible thing by assuming that the conditional probability of hitting a gas station given that we choose local roads (closer to residential areas), is higher. I ask Cassius to re-route me off the parkways, keeping in mind the drop in fuel efficiency on local roads, which is about 30% for my car.

6:00 PM: Despite driving at low speeds, I find a gas station within 4 miles and shell out 61$ for the 'juice'. I find that I had almost a gallon in reserve, safe and warm enough to just about take me home if I drove along the highway at optimal speeds in the first place. If only my night vision was as good as my hindsight. In the end, I was just another data blip that was pretty close to the median on the 'E' curve. Mad Max I was not. That and the fact that I don't have a spunky dog riding with me, nor a sawn-off shotgun.

Saturday, February 5, 2011

Analytics, Astrology, and V-day Objectives

What predictive analytical tools do insurance companies use to manage long-term risk? The usual ones and then this. Using customer data such as month of birth, The Allstate Insurance company grouped observations into 12 buckets. To make it more fun, they labeled these buckets under their star sign ("Raashee" in Sanskrit) as a V-day joke. Then they tracked the accident levels for each of the groups. The topper in this list of trouble makers are Virgos, characterized by AllState as "worried and shy". Of course, while this was a bit of harmless fun for many people, and AllState said this was joke that fell apart, Virgos and Leos have a good reason to worry in the current economic climate, since their rates may be relatively higher because of a "pre-existing" condition. In the end, AllState went into damage-control mode and assured customers that their star-sign was never and will never be held against them :)

Assuming these results are real, it raises some interesting and entertaining questions. Is there at least some apparent correlation between your date of birth and your future on and off the road? Does the popular observation that a significant proportion of babies are conceived during the downtime in winter and thus born around August (Leo-Virgo time), have something to do these results?

When it comes to long-term decisions such as match-making, Rashee and celestial planetary alignments matter to many Indians, regardless of economic and educational levels. The recent Bollywood flick 'What's your Rashee'? comes to mind.

Indians love weddings, and a significant fraction of the marriages 'arranged' in India (which would amount to a healthy fraction of the total weddings in the world at any moment!) are based on the compatibility of horoscopes that must be determined by an expert astrologer. A 'matching algorithm' is run to determine the compatibility of horoscopes on various attributes. The outcome is an integer and a certain lower threshold must be met for the alliance to be considered worthwhile. The bigger the score, the brighter the predicted future of the proposed marriage. An unattainable upper bound for this score among mortals is a perfect 36/36 (?) which was achieved for the divine pair of Sri Rama (an Avatar of Lord Vishnu, the preserver) and his consort, Mother Sita (the daughter of Goddess Earth), whose perfect union is the basis of one of the two great Indian epics, the Ramayana.

In today's world, horoscope-matching is a fun and educative exercise for Indian couples ready to take the plunge, while also connecting with many thousand years of uninterrupted native culture. However, the problem arises when matchmakers begin to take astrological (or analytical) predictions way too seriously, and at the expense of every other reasonable consideration such as the 'content of a person's character', as the noble Dr M. L. King Jr. said. Thus it is not surprising that even the most 'secular' of Indian politicians is a fanatical follower of astrology. As members of generation-A (the analytical generation), we would love to think that we are different but things haven't changed all that much. One only needs to look at the "analytics" employed by online dating sites. The 'Analytic Age' blog had an interesting post relating to this a while ago. When it comes to making strategically useful match-making predictions, today's analytics is not much of an improvement.

Tactical Level
On the other hand, when it comes to tactics, banking on stars to bail us out on V-days and anniversaries is a recipe for 'crash and burn'. In this rare instance, a large variance can actually be good since it is the opposite of 'routine and boring', in keeping with the 'variety seeking' behavior of shoppers observed in descriptive retail analytics. But this risk is at odds with the eventual reward so we must choose our objective with care. V-days follows a geometric probability distribution, where you have to win every year just to stay in the game. One big meltdown and you are out, regardless of the big wins you had in the past when your mojo peaked. By all accounts, the expectation of tolerance on these days is ruthlessly Markovian, so remember the gambler's ruin and plan accordingly.

Operational Level
Under the assumptions of our tactical model, the default aim on the eve of any given V-day is to minimize maximum regret so we can live to see another V-day. On the other hand, if we want to go for it on 4th down, then maximizing expected value it is. However, the operational plan must have an ability to fall-back to the default objective. This way we can contain second-order effects (collateral damage) while also constraining our primary losses.

This will be a submission toward the February Informs blog challenge on 'OR and Love'.