Dualnoise: February 2012

Saturday, February 25, 2012

Analytics and Cricket - VII : Does DRS have a False Positive issue?

The last post related to cricket was quite a while ago (that the Indian cricket team has been repeatedly thrashed since then is a mere coincidence). This post focuses again on the Decision Review System (DRS), a technology-aided analytical decision-support system to aid cricket umpires. The toolkit includes a set of multiple video cameras, heat-sensing 'hot spot' technology, and ball-tracking devices that record the point of impact, as well as an additional set of predictive algorithms to forecast the counterfactual trajectory of the cricket ball (you can be forecasted 'out' in cricket). Despite the best efforts of cricket's custodians, considerable user unease with the DRS persists. In fact, it has been recently acknowledged that the use of the decision support system has had a significant impact on the game (user response: altering playing styles and inducing more 'OUT' decisions from umpires), something which this tab predicted a year ago. Reasons for discomfort also include the lack of uniformity in its deployment, the incremental dollar cost of the DRS versus incremental returns, and equally importantly from a fan and player perspective, DRS reliability (both real and perceived). This post will focus on the last two issues.

The International Cricket Conference (ICC) has focused almost exclusively on improving the technology (e.g. increased number of video frames per second, etc). The main argument here is that while an improvement in the unconditional success rate for the DRS may seem impressive, it would be more helpful if statistics are calculated and presented conditional on the corresponding human decisions made. Toward this, let's look this MBA-ish 2x2 decision matrix (sorry). Strictly speaking, the terms 'correct' and 'incorrect' in the matrix mean 'almost surely correct' and 'almost surely incorrect', respectively .

1. The ICC has a wonderful set of umpires in their 'elite panel' that referee the most important inter-nation test matches (these elite umpires are a scarce resource, and their globe-trotting schedule optimization is yet another operations research problem - perhaps a good topic for part-8 of this series). Prior to the DRS, the umpires achieved a respectable success rate of more than 90%. Consequently in such situations, the DRS getting it right is a relatively uninteresting event. This situation is denoted as the neutral zone (top-left box). Therefore the focus is on the remaining 7-10% of the time when the decisions are contentious.

2. Clearly the case where the umpire is wrong and the DRS is right (as judged by video and predicted-trajectory evidence) is a win-win for the DRS and players. This is the green-zone (bottom left) and appears to be the exclusive area of ICC's focus as far as technological improvements. However, it is not necessarily desirable to accord top priority to the goal of achieving further improvements in this statistic.

3. The problems arise when the DRS occasionally produces visibly and audibly confounding results. This is represented by the top-right box, the 'high conflict zone'. In some instances, it could be because of technological gaps or operator error (there was a recent example where an umpire whose sole job consisted of watching the TV replay and hitting one of two buttons managed to hit the wrong one). However, in other instances, the predictive component of the DRS that is used to probabilistically judge LBW (leg-before-wicket) 'OUT' decisions appeared to be flawed or incompatible because:

a. Greater the required length (or duration) of the predicted values, the more noisier the forecasted trajectory.
b. Lesser the observed portion of the ball trajectory available for 'training' (especially after spinning and bouncing off the cricket pitch), the less reliable the prediction.

The years of prior refereeing experience of the umpire, and other human cognitive powers that help him arrive at the decision is pitted against hardware and algorithmic prediction prowess. The challenge is to be able to be aware of the many degrees of freedom involving a rotating cricket ball in motion while also taking into account the effect of the cricket pitch and local conditions.

4. There may be rare irritable cases where despite best efforts, uncertainty prevails and both the umpire and DRS manage to get it wrong (bottom right box).

If the ICC can provide data on the frequency of observations that fall in each of these 4 boxes, we can of course calculate the conditional probability of a correct decision given the DRS response using well-known conditional probability models and compare with the corresponding results for the manual system. For example, how likely is it that the batsman is actually OUT given that the DRS overruled an umpire's original 'NOT OUT' decision? Such analyses helps figure out the impact of false positives and false negatives that comprise the conflict zone observations. In particular, the false-positive rate, i.e. the case where a batsman tests positive ('OUT') using a DRS when he is actually NOT OUT, should be minimized given the nature of this sport.

Recommendations
The biggest stumbling block appears to the the top-right box (high-conflict zone) that erodes user trust every time the DRS wrongly overrules what appears to be a sound cricketing decision by the umpire. As a priority, the ICC should isolate and eliminate those components that increases the occurrence of such situations. The likely candidates for culling will be the trajectory-predictor and existing flawed versions of 'hot spot'. These innovations should be reintroduced at a later stage only after sufficient improvements have been made (and while also keeping the resultant cost down) to ensure that the expected failure rates are well under control. Viewed from this perspective, a recent decision by the Indian cricket board to do away with the predictive component of the ball-tracking technology is actually the right one.

@dualnoise on twitter

Sunday, February 12, 2012

Gender Shaping - II

This tab examined the issue of 'gender shaping' last year and we continue the discussion here. This time we analyze simple probability models related to this issue. Imagine a population in a geographical area where parents adopt a policy of 'stop having children after the first boy'. Surprisingly (or maybe not), this practice in itself cannot really 'shape' or affect the stability of the population, as neatly explained by Prof. Thomas C. Schelling in his book 'Micromotives and Macrobehavior': no “stopping rules,” like stopping after the first boy, can affect the ultimate proportions. At the first round, half the babies will be boys. At the second round, only half the families have children, but they will be half boys. The half with only girls will proceed to the third round and again, by the 50–50 hypothesis, half will have boys and half girls. If at each round half are boys and half girls the total—no matter where it stops—will be half boys and half girls. (A corollary is that we know, without adding, how many children will be born. In the end, every family will have one boy; girls will equal boys; and, the average will be two children per family.)

Dr. Schelling also mentions: "It has occasionally been proposed that this motivation might explain a slight excess of boys over girls in some populations. Where female infanticide is practiced it is bound to have that result."

Thus when one sees F-M ratios like 89:100 in some pockets of Northern India, it's a scary indicator that a sizable percentage of baby girls have been murdered (the Gov of India has had in place a strict ban on sex-determination tests for many years now). Female infanticide is a relatively recent phenomenon in certain sections of society within India's 7000+ year culture where women were typically accorded an equal (perhaps higher) status compared to men. Russel Ackoff has discussed a related issue in his classic book many decades ago.

Although the boy-driven stopping rule does not affect the stability of the population and the resultant average family looks pretty normal, the internal distribution is asymmetric (another example of the flaw of averages?). For example, a boy will either be the only kid or the youngest kid in the family. In the latter case, the parents are 'focused' on producing a boy and then tending to his needs and thus more likely to ignore the needs of their girl babies, and as the family gets bigger, this situation, on a per-capita basis is likely to get worse. These conclusions are largely confirmed in a recent NBER econometric/statistical study that uses data-driven analytical models to answer the question "Are boys and girls treated differently". Girls brought up to adulthood in such a biased environment may well help perpetuate this vicious cycle in certain parts of India. The U.S. does not appear to suffer from the problem of gender-shaping, although the pro-abortion groups have required some deft arguments to enunciate their stance on the selective gender-based abortion question posed by anti-abortionists. On the other hand, there may be some issues to be overcome with respect to investments in girl children as far as their career choices, as very briefly touched upon in a prior post.