Monday, May 3, 2010

Analytics and Cricket - III : Forecast Models gone wild

The cricket 20-20 world cup is going on in the West Indies, and a potentially great match between the West Indies and England was cut-short today by rain. Upon resumption after the rain stopped, the chasing team's target was reduced using the D/L rules from 192 @ 9.6 runs per over to just 60 in the 6 overs that was possible, with all ten wickets available. What a farce! As mentioned in an earlier post (as well as in ORMS today by the creators - Operations Research (O.R.) guys whose names appear in the title of this post), The D/L model is used to forecast the runs target for the second team. This is a fantastic analytical model that works splendidly for the 50-over format. Why? Because this was adopted about 20 years after 50-over cricket was popularized, and D/L had plenty of varied data to calibrate their model and estimate goodness of fit. On the other hand, everybody assumed that the same model would work like a charm for the 20-over format, since D/L works with the % of overs remaining, etc, so its just a case of using a different multiplier, right? Wrong.

Nobody in the International Cricket Committee (ICC) bothered to even do a cursory analysis of how this model would perform in T20 games - a typical O.R. case study of blind trust in a black-box solution that works fine in normal conditions but fails when the problem slightly changes. And thus we recognize a weak spot in this model. It is going to take time to gather the data needed for better calibration, but what do we do until then?

Like any parameter estimation problems in statistics, O.R and econometrics, this one also requires a significant number of strongly good quality, non-collinear historical observations to work really well. The three years so far has been insufficient. Is 7 more years of international T20 cricket sufficient? 17 more years? While T20 is also cricket (at least when Sachin or Mahela bat), the dynamics is quite different from the 50-over format. Teams are 'all-out' or close to all-out far less frequently compared to the 50-over game, and every cricket fan knows that a wicket in a 50-over game is disproportionately more valuable compared to a wicket in a 20-over game. Does 2 wickets in a T20 game equal in value, the loss of one wicket in a 50-over game? As we being to think about this, we realize that the risk-reward-resource model used by D/L could be quite different, what with just 120 balls per innings. Or it could be the same in principle and its just a simple recalibration. On the other hand, An T-20 over is 5% of an inning, compared to 2% for the longer format. Does this huge reduction cause some boundary condition effects that need to accounted for? Is there a possibility that that we never find the amount of good quality data in my lifetime to make this same model work reliably for T20 games? I think it is time to look inside the model and confirm first if the fundamental assumptions and modeling constructs continue to hold in a T20 situation. Clearly with the 3 years of data we have had so far, it appears to be off the mark. In fact, even in the 50-over game, the model is known to have some bias the favors the team batting second, which however, is not severe enough to warrant replacement. However, we should be looking at fundamental modeling extensions if we find intrinsic problems with the D/L model applied to T20.

Cricket is perhaps the most unpredictable of all sports and is called the game of 'glorious uncertainties'. I just saw an international team lose 5 wickets in a single over and yet end up winning the match comfortably. It's also embraced a modern and sophisticated O.R. solution to weather-interrupted matches, but please, lets get our modeling straight. This kind of uncertainty is great for the O.R person in me, but not at all enjoyable as a cricket fan, and unlike business, cricket is far too serious to left totally to O.R types like me.