Sunday, October 28, 2012

Analytics and Cricket - IX : Book cricket v/s T20 cricket


Introduction
This previous post on cricket in this tab can be found here. We discovered how long a game of snakes and ladders is expected to last a while ago. Calculating the duration of a book-cricket game appears to be relatively simpler. It's a two-player game that used to be popular among kids in India and requires a book (preferably a thick one with strong binding), a pencil, and a sheet of paper for scoring. A player opens a random page, and notes down the last digit on the left (even numbered) page.

 (image linked from krishcricket.com)

A page value of 'zero' indicates that the player is out, and an '8' indicates a single run (or a no-ball). The remaining possibilities in the sample space {2, 4, 6} are counted as runs scored off that 'ball'. A player keeps opening pages at random until they are out.  Here's a sample inning from a simple simulation model of traditional book-cricket:
 6, 2, 2, 1, 4, 4, 1, 2, 1, 2, 4, 2, 1, 4, 6, 6, 6, 4, 0
score is 58 off 19 balls
The counting process terminates when the first zero is encountered. Given this game structure, we try to answer two questions: What is the expected duration of a player's inning, and what the expected team total is (i.e., across 10 individual innings).

Conditional Probability Model
Assume a page is opened at random and the resultant page values are IID (uniform) random variables.

Let p(i) = probability of opening page with value i, where
p(i) = 0 if i is odd, and equals 0.2 otherwise.

D = E[Duration of a single inning]

S = E[Score accumulated over a single inning]

Conditioning on the value of the first page opened, and noting that the counting process resets for non-zero page values:
D = 1*0.2 + (1+D)*4 *0.2
D = 5.
Next, let us compute F, the E[score in a single delivery]:
F = 0.2*(0+2+4+6+1) =  2.6 runs per ball, which yields a healthy strike rate of 260 per 100 balls

SFD = 13 runs per batsman, so we can expect a score of 130 runs in a traditional book-cricket team inning that lasts 50 balls on average.

Introduction of the Free-Hit
The International Cricket Conference (ICC) added a free-hit rule to limited overs cricket in 2007. To approximate this rule, we assume that a page ending in '8' results in a no-ball (one run bonus, like before) that also results in a 'free hit' the next delivery, so the player is not out even if the number of the next page opened ends in a zero. This change will make an innings last slightly longer, and the score, a little higher. Here's a sample inning (a long one):
1, 6, 1, 0, 1, 4, 2, 2, 4, 6, 1, 6, 2, 4, 2, 4, 6, 6, 6, 4, 1, 1, 6, 0,
score is 76 off 24 balls
Note that the batsman was "dismissed" of the 4th ball but cannot be ruled 'out' because it is a free-hit as a consequence of the previous delivery being a no-ball. All such free-hit balls are marked in bold above.

D = 1*0.2 + (1+D)*0.2 + (1+D)*0.2 + (1+D)*0.2 + (1+d)*0.2
= 1.0 + 0.6D + 0.2d
where d = E[duration|previous ball was a no-ball]. By conditioning on the current ball:
d = (1 + d)*prob{current ball and previous ball are no-balls} + (1+D)*prob{current ball is not a no ball but previous ball was a no ball)
= (1+d)*0.2 + (1+D) * 0.8
d = 1.25+D

D = 1 + 0.6D + 0.2(1.25+D)
D = 6.25


Under the free-hit rule, a team innings in book-cricket lasts 62.5 balls on average, which is 12.5 page turns more than the traditional format. A neat way to calculate S is based on the fact that the free-hit rule only increases the duration of an inning on average, but cannot alter the strike rate that is based on the IID page values, so S = 6.25 * 2.6 = 16.25. To confirm this, let us derive a value for S the hard way by conditioning on the various outcomes of the first page turn:
S = 0*0.2 + (S+2)*0.2 + (S+4)*0.2 + (S+6)*0.2 + (s+1)*0.2
= 2.6 +0.6S + 0.2s.
where s = E[score|current ball is a no-ball] and can be expressed by the following recurrence equation:
s = (1 + s)*prob{next ball is a no-ball} + (r+S)*prob{next ball is not a no-ball), where
r = E[score in next ball | next ball is not a no-ball]
= 0.25*(0 + 2 + 4 + 6) = 3

Substituting for r, we can now express s in terms of S:

s = (1+s)*0.2 + (3+S) * 0.8
S = 2.6 + 0.6S + 0.2(3.25+S) = 16.25, as before.

Under the free-hit rule, the average team total in book cricket is 162.5 runs (32.5 runs more than the total achieved in the traditional format). The average strike rate based on legal deliveries, i.e. excluding no-balls, is 162.5 * 100/(0.8*62.5) = 325 per 100 balls. A Java simulation program yielded the following results:
num trials is 10000000 
average score per team innings is 162.422832
average balls per team innings is 62.477603
average legal balls per team innings is 49.981687
scoring rate per 100 legal balls is 324.9646855657353

Result: In comparison to real-life T20 cricket (~ 120 balls max per team inning), book-cricket is roughly 50% shorter in duration, but the higher batting strike rate usually yields bigger team totals in book cricket. The fact that we can even rationally compare statistics between these two formats says something about the nature of T20 cricket!

The cost of front-foot no-balls and big wides in T20
We can use the simple conditional probability ideas used to analyze book-cricket to estimate the expected cost of bowling a front-foot no-ball and wide balls in real-life T20 matches by replacing the book-cricket probability model with a more realistic one:

Assume p[0] = 0.25, p[1] = 0.45, p[2] = 0.15, p[3] = 0.05, p[4] = 0.05,  p[5] ~ 0, p[6] = 0.05, p[7, 8, ...] ~ 0.

E[score in a ball] = 0 + 0.45 + 0.3 + 0.15 + 0.2 + 0.3 =1.4

This probability model yields a reasonable strike rate of 140 per 100 balls)

E[cost | no ball] = 1 + 1.4 + 1.4 = 3.8

Bowling a front-foot no-ball in T20 matches is almost as bad as giving away a boundary (apart from paying the opportunity cost of having almost no chance of getting a wicket due to the no-ball and the subsequent free-hit).  Similarly,
E[cost | wide-ball down the leg-side] = (5|wide and four byes)*prob{4 byes} + (1| wide but no byes)*prob{no byes} + 1.4.

Assuming a 50% chance of conceding 4 byes, the expected cost is 4.4. On average, a bowler may be marginally better off bowling a potential boundary ball (e.g., bad length) than risk an overly leg-side line that can result in 5 wides and a re-bowl.

More sophisticated simulation models based on actual historical data can help analyze more realistic cricketing scenarios and support tactical decision making.

5 comments:

  1. Very neat!
    Good refresher on probability and statistics. The recursive relation D = 0.2 + 0.8*(1+D) did not strike me at first, but obvious after I saw it. I still need to understand the free-hit and no ball modifications.

    I found another thing striking. In T20, around 40-60% of balls are dot balls. If we add these to the duration of the innings of book cricket, we pretty much have a typical T20 innings of today! Book cricket can indeed lay claim to be the forerunner to T20 (and even going a step further by dispensing with balls where no action takes place)

    ReplyDelete
  2. On a second reading, you have compared book cricket with T20. With dot balls thrown in, it does get a lot closer.
    Your finding on the costliness of a no-ball and wide down the leg side are also interesting. While I thought the wide was quite expensive, I didn't realize that a no-ball was that expensive. But shouldn't it be even more expensive? The probability of scoring big goes up for a free hit.

    ReplyDelete
    Replies
    1. yes, the strike rate for free-hit deliveries may be higher, despite the over-eagerness of batsmen to cash in that causes them to under-perform (hence left the scoring model as is). perhaps 3.8 is conservative, and the expected cost may cross 4.0, making an ff-nb effectively a boundary ball on average.

      Delete
  3. what about a batsmen scoring 3 runs or 5 runs (if it hits the keepers helmet) ? ?

    ReplyDelete

Note: Only a member of this blog may post a comment.