Showing posts with label rahul dravid. Show all posts
Showing posts with label rahul dravid. Show all posts
Sunday, October 28, 2012
Analytics and Cricket - IX : Book cricket v/s T20 cricket
Introduction
This previous post on cricket in this tab can be found here. We discovered how long a game of snakes and ladders is expected to last a while ago. Calculating the duration of a book-cricket game appears to be relatively simpler. It's a two-player game that used to be popular among kids in India and requires a book (preferably a thick one with strong binding), a pencil, and a sheet of paper for scoring. A player opens a random page, and notes down the last digit on the left (even numbered) page.
(image linked from krishcricket.com)
A page value of 'zero' indicates that the player is out, and an '8' indicates a single run (or a no-ball). The remaining possibilities in the sample space {2, 4, 6} are counted as runs scored off that 'ball'. A player keeps opening pages at random until they are out. Here's a sample inning from a simple simulation model of traditional book-cricket:
6, 2, 2, 1, 4, 4, 1, 2, 1, 2, 4, 2, 1, 4, 6, 6, 6, 4, 0
score is 58 off 19 balls
The counting process terminates when the first zero is encountered. Given this game structure, we try to answer two questions: What is the expected duration of a player's inning, and what the expected team total is (i.e., across 10 individual innings).
Conditional Probability Model
Assume a page is opened at random and the resultant page values are IID (uniform) random variables.
Let p(i) = probability of opening page with value i, where
p(i) = 0 if i is odd, and equals 0.2 otherwise.
D = E[Duration of a single inning]
S = E[Score accumulated over a single inning]
Conditioning on the value of the first page opened, and noting that the counting process resets for non-zero page values:
D = 1*0.2 + (1+D)*4 *0.2
⇒ D = 5.
Next, let us compute F, the E[score in a single delivery]:
F = 0.2*(0+2+4+6+1) = 2.6 runs per ball, which yields a healthy strike rate of 260 per 100 balls
S = FD = 13 runs per batsman, so we can expect a score of 130 runs in a traditional book-cricket team inning that lasts 50 balls on average.
Introduction of the Free-Hit
The International Cricket Conference (ICC) added a free-hit rule to limited overs cricket in 2007. To approximate this rule, we assume that a page ending in '8' results in a no-ball (one run bonus, like before) that also results in a 'free hit' the next delivery, so the player is not out even if the number of the next page opened ends in a zero. This change will make an innings last slightly longer, and the score, a little higher. Here's a sample inning (a long one):
1, 6, 1, 0, 1, 4, 2, 2, 4, 6, 1, 6, 2, 4, 2, 4, 6, 6, 6, 4, 1, 1, 6, 0,
score is 76 off 24 balls
Note that the batsman was "dismissed" of the 4th ball but cannot be ruled 'out' because it is a free-hit as a consequence of the previous delivery being a no-ball. All such free-hit balls are marked in bold above.
D = 1*0.2 + (1+D)*0.2 + (1+D)*0.2 + (1+D)*0.2 + (1+d)*0.2
= 1.0 + 0.6D + 0.2d
where d = E[duration|previous ball was a no-ball]. By conditioning on the current ball:
d = (1 + d)*prob{current ball and previous ball are no-balls} + (1+D)*prob{current ball is not a no ball but previous ball was a no ball)
= (1+d)*0.2 + (1+D) * 0.8
⇒ d = 1.25+D
⇒ D = 1 + 0.6D + 0.2(1.25+D)
⇒ D = 6.25
Under the free-hit rule, a team innings in book-cricket lasts 62.5 balls on average, which is 12.5 page turns more than the traditional format. A neat way to calculate S is based on the fact that the free-hit rule only increases the duration of an inning on average, but cannot alter the strike rate that is based on the IID page values, so S = 6.25 * 2.6 = 16.25. To confirm this, let us derive a value for S the hard way by conditioning on the various outcomes of the first page turn:
S = 0*0.2 + (S+2)*0.2 + (S+4)*0.2 + (S+6)*0.2 + (s+1)*0.2
= 2.6 +0.6S + 0.2s.
where s = E[score|current ball is a no-ball] and can be expressed by the following recurrence equation:
s = (1 + s)*prob{next ball is a no-ball} + (r+S)*prob{next ball is not a no-ball), where
r = E[score in next ball | next ball is not a no-ball]
= 0.25*(0 + 2 + 4 + 6) = 3
Substituting for r, we can now express s in terms of S:
s = (1+s)*0.2 + (3+S) * 0.8
⇒ S = 2.6 + 0.6S + 0.2(3.25+S) = 16.25, as before.
Under the free-hit rule, the average team total in book cricket is 162.5 runs (32.5 runs more than the total achieved in the traditional format). The average strike rate based on legal deliveries, i.e. excluding no-balls, is 162.5 * 100/(0.8*62.5) = 325 per 100 balls. A Java simulation program yielded the following results:
num trials is 10000000
average score per team innings is 162.422832
average balls per team innings is 62.477603
average legal balls per team innings is 49.981687
scoring rate per 100 legal balls is 324.9646855657353
Result: In comparison to real-life T20 cricket (~ 120 balls max per team inning), book-cricket is roughly 50% shorter in duration, but the higher batting strike rate usually yields bigger team totals in book cricket. The fact that we can even rationally compare statistics between these two formats says something about the nature of T20 cricket!
The cost of front-foot no-balls and big wides in T20
We can use the simple conditional probability ideas used to analyze book-cricket to estimate the expected cost of bowling a front-foot no-ball and wide balls in real-life T20 matches by replacing the book-cricket probability model with a more realistic one:
Assume p[0] = 0.25, p[1] = 0.45, p[2] = 0.15, p[3] = 0.05, p[4] = 0.05, p[5] ~ 0, p[6] = 0.05, p[7, 8, ...] ~ 0.
E[score in a ball] = 0 + 0.45 + 0.3 + 0.15 + 0.2 + 0.3 =1.4
This probability model yields a reasonable strike rate of 140 per 100 balls)
E[cost | no ball] = 1 + 1.4 + 1.4 = 3.8
Bowling a front-foot no-ball in T20 matches is almost as bad as giving away a boundary (apart from paying the opportunity cost of having almost no chance of getting a wicket due to the no-ball and the subsequent free-hit). Similarly,
E[cost | wide-ball down the leg-side] = (5|wide and four byes)*prob{4 byes} + (1| wide but no byes)*prob{no byes} + 1.4.
Assuming a 50% chance of conceding 4 byes, the expected cost is 4.4. On average, a bowler may be marginally better off bowling a potential boundary ball (e.g., bad length) than risk an overly leg-side line that can result in 5 wides and a re-bowl.
More sophisticated simulation models based on actual historical data can help analyze more realistic cricketing scenarios and support tactical decision making.
Saturday, February 25, 2012
Analytics and Cricket - VII : Does DRS have a False Positive issue?
The last post related to cricket was quite a while ago (that the Indian cricket team has been repeatedly thrashed since then is a mere coincidence). This post focuses again on the Decision Review System (DRS), a technology-aided analytical decision-support system to aid cricket umpires. The toolkit includes a set of multiple video cameras, heat-sensing 'hot spot' technology, and ball-tracking devices that record the point of impact, as well as an additional set of predictive algorithms to forecast the counterfactual trajectory of the cricket ball (you can be forecasted 'out' in cricket). Despite the best efforts of cricket's custodians, considerable user unease with the DRS persists. In fact, it has been recently acknowledged that the use of the decision support system has had a significant impact on the game (user response: altering playing styles and inducing more 'OUT' decisions from umpires), something which this tab predicted a year ago. Reasons for discomfort also include the lack of uniformity in its deployment, the incremental dollar cost of the DRS versus incremental returns, and equally importantly from a fan and player perspective, DRS reliability (both real and perceived). This post will focus on the last two issues.
The International Cricket Conference (ICC) has focused almost exclusively on improving the technology (e.g. increased number of video frames per second, etc). The main argument here is that while an improvement in the unconditional success rate for the DRS may seem impressive, it would be more helpful if statistics are calculated and presented conditional on the corresponding human decisions made. Toward this, let's look this MBA-ish 2x2 decision matrix (sorry). Strictly speaking, the terms 'correct' and 'incorrect' in the matrix mean 'almost surely correct' and 'almost surely incorrect', respectively .
1. The ICC has a wonderful set of umpires in their 'elite panel' that referee the most important inter-nation test matches (these elite umpires are a scarce resource, and their globe-trotting schedule optimization is yet another operations research problem - perhaps a good topic for part-8 of this series). Prior to the DRS, the umpires achieved a respectable success rate of more than 90%. Consequently in such situations, the DRS getting it right is a relatively uninteresting event. This situation is denoted as the neutral zone (top-left box). Therefore the focus is on the remaining 7-10% of the time when the decisions are contentious.
2. Clearly the case where the umpire is wrong and the DRS is right (as judged by video and predicted-trajectory evidence) is a win-win for the DRS and players. This is the green-zone (bottom left) and appears to be the exclusive area of ICC's focus as far as technological improvements. However, it is not necessarily desirable to accord top priority to the goal of achieving further improvements in this statistic.
3. The problems arise when the DRS occasionally produces visibly and audibly confounding results. This is represented by the top-right box, the 'high conflict zone'. In some instances, it could be because of technological gaps or operator error (there was a recent example where an umpire whose sole job consisted of watching the TV replay and hitting one of two buttons managed to hit the wrong one). However, in other instances, the predictive component of the DRS that is used to probabilistically judge LBW (leg-before-wicket) 'OUT' decisions appeared to be flawed or incompatible because:
a. Greater the required length (or duration) of the predicted values, the more noisier the forecasted trajectory.
b. Lesser the observed portion of the ball trajectory available for 'training' (especially after spinning and bouncing off the cricket pitch), the less reliable the prediction.
The years of prior refereeing experience of the umpire, and other human cognitive powers that help him arrive at the decision is pitted against hardware and algorithmic prediction prowess. The challenge is to be able to be aware of the many degrees of freedom involving a rotating cricket ball in motion while also taking into account the effect of the cricket pitch and local conditions.
4. There may be rare irritable cases where despite best efforts, uncertainty prevails and both the umpire and DRS manage to get it wrong (bottom right box).
If the ICC can provide data on the frequency of observations that fall in each of these 4 boxes, we can of course calculate the conditional probability of a correct decision given the DRS response using well-known conditional probability models and compare with the corresponding results for the manual system. For example, how likely is it that the batsman is actually OUT given that the DRS overruled an umpire's original 'NOT OUT' decision? Such analyses helps figure out the impact of false positives and false negatives that comprise the conflict zone observations. In particular, the false-positive rate, i.e. the case where a batsman tests positive ('OUT') using a DRS when he is actually NOT OUT, should be minimized given the nature of this sport.
Recommendations
The biggest stumbling block appears to the the top-right box (high-conflict zone) that erodes user trust every time the DRS wrongly overrules what appears to be a sound cricketing decision by the umpire. As a priority, the ICC should isolate and eliminate those components that increases the occurrence of such situations. The likely candidates for culling will be the trajectory-predictor and existing flawed versions of 'hot spot'. These innovations should be reintroduced at a later stage only after sufficient improvements have been made (and while also keeping the resultant cost down) to ensure that the expected failure rates are well under control. Viewed from this perspective, a recent decision by the Indian cricket board to do away with the predictive component of the ball-tracking technology is actually the right one.
@dualnoise on twitter
The International Cricket Conference (ICC) has focused almost exclusively on improving the technology (e.g. increased number of video frames per second, etc). The main argument here is that while an improvement in the unconditional success rate for the DRS may seem impressive, it would be more helpful if statistics are calculated and presented conditional on the corresponding human decisions made. Toward this, let's look this MBA-ish 2x2 decision matrix (sorry). Strictly speaking, the terms 'correct' and 'incorrect' in the matrix mean 'almost surely correct' and 'almost surely incorrect', respectively .
1. The ICC has a wonderful set of umpires in their 'elite panel' that referee the most important inter-nation test matches (these elite umpires are a scarce resource, and their globe-trotting schedule optimization is yet another operations research problem - perhaps a good topic for part-8 of this series). Prior to the DRS, the umpires achieved a respectable success rate of more than 90%. Consequently in such situations, the DRS getting it right is a relatively uninteresting event. This situation is denoted as the neutral zone (top-left box). Therefore the focus is on the remaining 7-10% of the time when the decisions are contentious.
2. Clearly the case where the umpire is wrong and the DRS is right (as judged by video and predicted-trajectory evidence) is a win-win for the DRS and players. This is the green-zone (bottom left) and appears to be the exclusive area of ICC's focus as far as technological improvements. However, it is not necessarily desirable to accord top priority to the goal of achieving further improvements in this statistic.
3. The problems arise when the DRS occasionally produces visibly and audibly confounding results. This is represented by the top-right box, the 'high conflict zone'. In some instances, it could be because of technological gaps or operator error (there was a recent example where an umpire whose sole job consisted of watching the TV replay and hitting one of two buttons managed to hit the wrong one). However, in other instances, the predictive component of the DRS that is used to probabilistically judge LBW (leg-before-wicket) 'OUT' decisions appeared to be flawed or incompatible because:
a. Greater the required length (or duration) of the predicted values, the more noisier the forecasted trajectory.
b. Lesser the observed portion of the ball trajectory available for 'training' (especially after spinning and bouncing off the cricket pitch), the less reliable the prediction.
The years of prior refereeing experience of the umpire, and other human cognitive powers that help him arrive at the decision is pitted against hardware and algorithmic prediction prowess. The challenge is to be able to be aware of the many degrees of freedom involving a rotating cricket ball in motion while also taking into account the effect of the cricket pitch and local conditions.
4. There may be rare irritable cases where despite best efforts, uncertainty prevails and both the umpire and DRS manage to get it wrong (bottom right box).
If the ICC can provide data on the frequency of observations that fall in each of these 4 boxes, we can of course calculate the conditional probability of a correct decision given the DRS response using well-known conditional probability models and compare with the corresponding results for the manual system. For example, how likely is it that the batsman is actually OUT given that the DRS overruled an umpire's original 'NOT OUT' decision? Such analyses helps figure out the impact of false positives and false negatives that comprise the conflict zone observations. In particular, the false-positive rate, i.e. the case where a batsman tests positive ('OUT') using a DRS when he is actually NOT OUT, should be minimized given the nature of this sport.
Recommendations
The biggest stumbling block appears to the the top-right box (high-conflict zone) that erodes user trust every time the DRS wrongly overrules what appears to be a sound cricketing decision by the umpire. As a priority, the ICC should isolate and eliminate those components that increases the occurrence of such situations. The likely candidates for culling will be the trajectory-predictor and existing flawed versions of 'hot spot'. These innovations should be reintroduced at a later stage only after sufficient improvements have been made (and while also keeping the resultant cost down) to ensure that the expected failure rates are well under control. Viewed from this perspective, a recent decision by the Indian cricket board to do away with the predictive component of the ball-tracking technology is actually the right one.
@dualnoise on twitter
Saturday, April 18, 2009
IPL cricket second edition - day one
Two of India's greatest batsman showed their class while the other Indians generally struggled to cope with alien conditions in South Africa, where IPL-II is being staged. The contest is more even due to the nature of the wickets, so the cricket content is going to be more enjoyable than the first edition. Sachin and Dravid anchored successful batting efforts for Mumbai, and my home town, Bangalore, respectively. The interesting moment today was when Dhoni blundered - leaving Murali out of the 11 on a day when Harbhajan bowled really well to Hayden, Kumble took 5-5, and Warne showed his genius. Dravid's classy fifty was marked with his clearly pointing his bat at somebody in the crowd (hopefully asking Hoochman Mallya, the distasteful, pompous owner of his team to just shut up).
Other IPL news involves yet another self-serving vestigial coach, this time in form of John Buchanan spouting some b.s about not one or two, but 5 captains. On assumes the remaining 6 are vice-captains (Kolkata socialism makes its impact :-). Gavaskar, never one to mince words, pointed it out, and J.B tried to rephrase. It's not that big a deal really. India has played with 5-odd (ex-) captains, while Pak in the 90s played with 6-8 of them. Dhoni routinely lets bowlers set fields.
Other IPL news involves yet another self-serving vestigial coach, this time in form of John Buchanan spouting some b.s about not one or two, but 5 captains. On assumes the remaining 6 are vice-captains (Kolkata socialism makes its impact :-). Gavaskar, never one to mince words, pointed it out, and J.B tried to rephrase. It's not that big a deal really. India has played with 5-odd (ex-) captains, while Pak in the 90s played with 6-8 of them. Dhoni routinely lets bowlers set fields.
Monday, April 6, 2009
number 182
No fielder (besides the keeper, of course) had taken more than 181 catches in the history of test cricket until yesterday. Rahul Dravid took a couple yesterday in Wellington, New Zealand to touch 183. All the more impressive considering that Mark Waugh held that record, and he was one outstanding fielder. The unbelievable catch that he held to dismiss VVS Laxman (youtube) during India's run-chase in the Chennai Test of 2001 made me quite sick as an indian cricket fan.
Hopefully he can shed his hangdog/stonewalling batting method that's crept in since 2006, quite unlike his 'gritty but positive' batting between 2001-06, both in ODIs and tests. Would love to see the real Dravid at least once before he retires...
Courtesy of 'The Hindu', here's a beautiful photo of the impending 182nd brick in the wall !
Hopefully he can shed his hangdog/stonewalling batting method that's crept in since 2006, quite unlike his 'gritty but positive' batting between 2001-06, both in ODIs and tests. Would love to see the real Dravid at least once before he retires...
Courtesy of 'The Hindu', here's a beautiful photo of the impending 182nd brick in the wall !
Subscribe to:
Posts (Atom)