Thursday, December 30, 2010

Analytics and Cricket - IV: The Great Indian Coin Toss

We'll end this year with the humblest of analytical models - the coin toss. It is an important benchmark. After all, if your predictive business model can consistently outperform a coin-toss approach, then that could be a big deal in many practical situations. So what do we make of the Indian cricket captain Mahendra Singh Dhoni's (MS for short) performance with the coin? He's lost 13 of the last 14 trials!

A coin toss can be a big deal in cricket, since a 'win' allows you to decide whether to bat or bowl first. A 'flat' wicket means it's a great one to bat on and make best use of it, and the opposition gets to play on the same pitch after potential wear and tear. A 'sticky wicket' or overcast conditions on a 'green' pitch means bowling first could be a great option since batting will be difficult for the first few hours due to the 'swing' and 'seam' movement potentially available to the bowlers.

Die-hard cricket fans like me and players are among the most superstitious in the world due to the long and complex nature of the game. MS gets blamed for "losing" the toss and he's even asked for tips on improving his record :) Useful analytical models are nice to have, but they could go horribly wrong, especially when applied to cricket ... Before the sports fan begins to question his faith in science and even doubt the fundamental idea of Bernoulli trials and the law of large numbers, we note that if MS had lost 14 tosses in a row, that would have been an extreme "achievement" since the probability of that happening would have been roughly 60 in a million, and that did not happen. Phew! that counts as favorable evidence.

With the India-South Africa cricket series tied at 1-1, and with one test match to go, we have no choice but to seek solace in the scientific estimate that our fearless captain still has a 50% chance of winning the toss in Cape Town. I know that doesn't sound encouraging. But there's got to be a point in time when nature is going to bring that win-loss average back close to 50%. Will that happen in 2011? who knows ...

In 2010, MS had several ways of losing 13 of the 14 tosses. More simply, he had 14 ways of winning exactly one toss. We know that the probability of winning m of n tosses follows the Binomial distribution, and we can find out online here that the chance of losing 13 out of 14 is still tiny, at 0.00085. In other words, the chance of him winning 2 or more tosses in 2010 was greater than 999/1000, and yet that did not happen!

Like most great teams, this current Indian cricket team does not depend much on the outcome of the toss. Put into bat on a green, bouncy wicket under overcast conditions, they still managed to defeat RSA in 4 days and displayed amazing skill and resilience in the process. Still, it wouldn't hurt to begin the final match between No.1 in the world (India) and No.2 in the world (RSA), starting on Jan 2, 2011, by winning the coin-toss. If MS loses that toss, then the probability of this extended streak over 15 trials would be around 0.0004, i.e., 50% less than the already dismal number he is at today. Surely, that's unlikely, right? Let's see. What is the probability of the sequence that ends with him winning the toss on Jan 2, i.e., the chance that he wins exactly 1 of the first 14, and then win the 15th? Sadly, that's not very different. Delving into the past does not help the Indian sports fan, and talking to statisticians would not help since none of them wants to see such a rare streak end :)

It is better to look forward to the new year, where 2010 is done and dusted. We can say it again: MS has a 50% of winning the next toss, and relatively speaking, that looks so much more promising and simpler to comprehend.

Happy New Year and Go India!

Wednesday, December 8, 2010

The shortest path between OR jobs

Driving from my old job in the Burlington, MA area to Elmsford, NY (near my new job location at Yorktown Heights) took less than 3 hours. It seemed like a race-course full of caffeine-high jihadi drivers after all those leisurely strolls through the somnolent country roads of Maine. I got a newer GPS product (yet another Garmin) from an e-tailer. Having worked on retail pricing during the past four years, and this being a pre-Black Friday deal, I almost reflexively asked for a price match and sure enough - there was 60$ in savings to be had after pushing against some soft constraints. Since this was Garmin's latest version in the series it wasn't discounted on BF, so it turned out to be a pretty decent deal in the end.

The GPS product, on its short maiden voyage from MA to NY decided to take me through no fewer than four interstate highways: I-95, I-90, I-91, and I-84. The route seemed simpler on paper. I quickly realized that newer does not necessarily mean better. The re-routing algorithm is still ancient even though the newer one allegedly takes traffic congestion into account. To avoid extensive re-calculation of the shortest path in real time, the product continues to merely finds the quickest way to get back to plan. This is an approach typically used in airline online crew recovery ops (even though fancier global optimization algorithms have been available on paper). In general, this is not a bad idea as long as you don't wander off deep into the reservation. Forcing a recalculation enables you to recover the faster (optimal?) route, and my ETA dropped by about 10 minutes. The newer version has an "EcoRoute" option that allows you to find minimal cost paths, in addition to the standard metrics based on distance and time. Looks like you can also plan a trip having multiple intermediate nodes. That looks like a nice TSP structure. An analysis of these new features makes for an interesting post on another day.