Thursday, June 27, 2013

Bengal Holocaust: The Analytics of Mahalanobis - 3

Part-1: Storm Chaser. Part-2: Traveling Surveyor.

The Forgotten Holocaust
When the Nazi death camps in Europe were discovered by the allied forces toward the end of WW2, General Eisenhower ensured that many people, including allied troops as well as local civilians got a good look at those concentration camps, and had the evidence documented for posterity. Today, after more than six decades, the belief that the Jewish holocaust actually happened (with a probability of 1.0) is accepted by most of the world. A key step in such situations is to record as much evidence, and obtain multiple and independent verification of events in a scientific and unbiased manner, when the relevant facts still have a fresh time-stamp and eye witnesses are alive and willing to go on the record.

While the Jewish population, and others in Europe were being cleansed from the world of the fictional master-race of Aryans (another byproduct of the discredited Aryan Invasion Theory (AIT), which was initially concocted by a few 19th century German researchers analyzing Sanskrit texts, and later 'weaponized' by the British Raj for empire building), occupied Bengal in India was dying of starvation and disease. Some historians have attributed the deaths of these 3-6 million innocent Bengalis in the 1940s to the British occupation policy - one that was inspired by the very same AIT that the Nazis whom they were fighting, also subscribed to. However, there was neither an Eisenhower nor his resources to allow an exhaustive audio-video-text recording of the facts surrounding this tragedy in Bengal. Millions of innocent Indians were claimed by that holocaust 70 years ago, and much of the world knows very little about it.

Let's look at the data.

The Bengal of Mahalanobis, 1943
The British empire's occupation model ensured the conversion of India from a thriving manufacturing and knowledge hub into an agrarian, subsistence economy. India turned into a supplier of slave-wage labor as well as a market for Britain's finished goods. If this wasn't enough, the empire monopolized the mother-of-all drug running operations whose supply chain originated in Bengal. Bengal's native economy was badly dented when the second world war arrived.
 
An old map of India (possibly) in the 1940s. Bengal is situated in the eastern part of India.
(pic source link: gypsyscholar.com)

Mahalanobis had just published reports of his pioneering work on optimized random sampling to estimate the Jute crop in Bengal (an effort that ranks among the earliest victories for the field of Operations Research). A reading of his research papers point to a person who was a dispassionate collector of data and details, a stubbornly methodical person (his detailed breakdown of costs incurred during the jute crop sampling optimization project is mind-boggling), and a person who was enthusiastic about, and skilled in the application of analytics to solving real problems.

Mahalanobis was requested by a representative of the occupied people to do something to bring to light the facts surrounding the Bengal holocaust. Thus it transpired that a Operations Researcher / Applied Statistician, and not some military general, was entrusted with the job of recording the facts of the Bengal holocaust that occurred between 1942-1944.

An increasing number of historians today are coming around to the view (including many blogs) that was held by many in Bengal - that this crisis was an inevitable outcome of British Raj policy. Going one step further, Madhushree Mukerjee has painstakingly compiled evidence and data that implicates the British Raj and the one person she felt was most responsible: Winston Churchill, in her recent book "Churchill's Secret War". The evidence is disturbing, and readers can make up their own minds.

(pic source link: http://www.marymartin.com)

The focus of this post is not on Churchill but to try and grasp the estimated scale and dimensions of the tragedy brought to light by the systematic findings of Mahalanobis' randomized survey.

The work of Mahalanobis
Mahalanobis applied for and obtained a grant from the Government after a considerable delay to conduct his analysis. Time was of the essence, and he again employed his cost-effective yet accurate methods based on random sampling (an overview is provided in part-2 of this series) by interviewing families located in different locations within Bengal. Approximately 16, 000 randomly selected families from 386 villages were surveyed between July 1944 and February 1945. Detailed statistics including: loss of life in the family by age and gender, the mortgaging and/or sales, either in part or full, of farm land, as well as the sale of cattle used to plough the land, their profession, economic status, etc. were collected.  Bengal was divided into regions, classified based on the degree to which they affected by the famine. The survey design took this into account and weighted averages were calculated to avoid over- or under-reporting of mortality rates.

Official forecasts of food supply and demand were of pretty poor quality and 'bad guesswork' as mentioned in the prior post, and unreliable. The survey report noted that Bengal was suffering from food inventory deficit well before the crisis of 1943: The net annual import was 100K tons on average, and up to a million and half tons during individual years, during the seven year period between 1933-39, i.e., even before WW2 started, from which point it could have only gotten worse. Data also showed that the pre-1943 rate of land sales was rising in a land where the primary occupation was agriculture and 76% of the family-owned farm land was already at or below the subsistence level. A good proportion of the cattle that was sold was not repurchased by native farmers but by outsiders (possibly for slaughter to supply meat to military personnel). All indicators pointed to an already desperate situation that also left Bengal totally vulnerable to any supply-demand shock, which inevitably arrives sooner or later. In this case, the supply shocks arrived in the form of imperial Japanese troops storming Burma (today's Myanmar), and the apparent failure of rice crop. Demand (and hence price) spikes cannot be ruled out either.

1943 was apocalyptic for Bengal. The Mahalanobis report measured the change in the already terrible economic indicators, as well as the increase in the number of destitutes. These numbers are shocking and point to the total absence of effective government intervention: A 300% increase in economic deterioration and 1200% increase in the rate of destitution (with young women affected the most) during the famine even as Britain appeared to stockpile food for itself. Bengal was a victim of depraved indifference of the worst kind.

(picture source link: boydom.com)

Fatalities
The sampling survey attempted to obtain the number of family members who had lost their lives during the food shortage. Mahalanobis estimated a mortality rate of 5.0% for men and 5.6% for women in a estimated 1943 population of around 61 Million (derived from a 1941 census). By design, the survey excluded: infant and toddler deaths, individuals without families, and entire families that either perished or relocated out of Bengal. Mahalanobis was not provided funds to repeat this sample survey ever again, so the fatalities in 1944 could not estimated.

The next step was to establish a counter-factual: what would have been the 'normal' mortality rate had the disaster not taken place? He chose 1931 as the baseline year, since data was available. That rate was 4.0%.  Madhusree's book notes that the normal mortality rate for India then was 2.1%, so Bengal's 1931 rate was already nearly double that number - a stunning statistic. Note that a different counterfactual will give us a different 'missing number' value. 

Note: This post does not provide numbers from Amartya Sen's analysis because his numerical results have been shown to be shaky in multiple instances upon further examination (see this old dualnoise post for example, as well as external references 1, and 2).

Some of the reasonable numbers quoted by historians today are derived from Mahalanobis' carefully-designed and controlled sampling study. For example, Madhusree, after correcting for infant deaths, and noting the symmetrical distribution of mortality rates around December 1943, arrived at an estimate of 3 million incremental deaths during 1943-44. The total number of incremental fatalities across the war years is likely to be higher and comes eerily close to the number of Jews murdered in Europe by the Nazis. Also, if the reference counterfactual is taken as India's normal mortality rate (2.1%) to include the abnormal situation in Bengal during those years, this number further shoots up. Birth rates in 1943-44 dropped significantly as well. Based on these calculations, it is plausible that Bengal lost around 10% of its population during the war years.

The random sampling methodology for estimating the supply of jute that earned much revenue for Bengal, and for determining food crop produce that fed it was eventually re-used to estimate the number of Bengalis who died without having anything more left to sell or eat. Mahalanobis' carefully chosen words within his final summation reads:
"... The famine of 1943 was thus not an accident like an earthquake or a flood, but the culmination of economic changes that were going on even in normal times."

Postscript
A remarkable finding about the millions of Indians who were left to starve to death in Bengal: there was no cannibalism anywhere.

Selected References
1. Several Sankhya journal articles of the 1930s-40s that cover the relevant works done in the ISI including:

'Mortality in Bengal in 1943'

'The Bengal Famine' - reprinted from 'The Asiatic Review', 1946.

'Report on the Bengal Crop Survey, 1944-45'

'The Sample Census of the Area Under Jute in Bengal in 1940'

'An Estimate of the Rural Indebtedness of Bengal', 1934

'Elasticity of Wheat in 1935 India'

'Indian Statistical Institute: Numbers and Beyond, 1931–47'

2. Madhusree Mukerjee, "Churchill's secret war', pages: 266-273

3. http://www.bowbrick.org.uk/Famine%20pages/famine.htm. Also see the 'key documents on the famine' page.

4. The New York Review of Books: "Did Churchill let them starve?", and the resultant exchange with Amartya Sen.

5. Madhusree's interview at harpers.org.

Updated June 29: mildly edited for brevity.

Sunday, June 16, 2013

Traveling Surveyor: Contributions of Mahalanobis to Analytics - 2

Bengal
This post is the second in a series of blogs based on the work done by P. C. Mahalanobis in the area of statistics, analytics, and operations research between 1930-1960. We move north-east from the location of our prior post on storm-flood forecasting (Orissa) to Bengal: a land of science, wisdom, and dharma that gave to the world a Vivekananda whose thoughts deeply affected Gandhi's contribution to India's freedom struggle, who in turn shaped the work of Martin Luther King, Jr., and Nelson Mandela, and thus the civil liberties of a significant population of the world.

Much of discussion here is gleaned from ISI archives, websites, and various papers from Sankhya, ISI's flagship journal. Those interested in a more detailed and accurate analysis of this work are referred directly to ISI's journal material.

Jute
Bengal (including Bangladesh, formerly East Bengal) produces much of the world's jute. India is the largest producer and consumer of Jute today, followed by Bangladesh. Jute is an incredibly useful crop and has been a significant contributor to Bengal's revenue for a long time.  Here are some contemporary pictures of standing Jute crop in Bengal.



(pics source: informedfarmers.com)

Why this work is important
Prior to 1947, when India was still occupied by the British Raj (there's a very relevant reason for bringing this up, but we'll get to that later), forecasting the supply of this valuable and lucrative crop in Bengal was largely a product of bad guesswork. Like most other sectors in India, the agricultural sector too is highly decentralized, which means a myriad of tiny farms growing jute and other crops, all of which had to be surveyed if one wanted to get an exact, enumerated production number. Mahalanobis came up with an alternative in the 1930-40s using methods derived from statistics and a field that is now termed 'Operations Research': a scarce-resource optimized method for accurate crop forecasting. Today, the Government of India employs sophisticated remote sensing including a Satellite Survey System to improve crop forecasts, but the methods developed then are still relevant and valuable. The seminal work of Mahalanobis in developing an optimal sample-based survey is also interesting to read from a practitioner's perspective. The combination of ideas employed in the work done in the 1930s include data analysis, statistical modeling, pilot study, scarce resource allocation, and mathematical optimization, and ranks among the great achievements in the practice of Operations Research and Analytics.

A map of undivided Bengal, circa 1850 C. E. (source: http://jrahman.files.wordpress.com)

Motivation
Jute and cotton were two of the most important exports out of India after the manufacturing sectors was crippled by the British Raj - 24% of the total revenue between 1927-37 was from Jute. Estimating the total Jute produce in Bengal up until the 1940s was largely a product of guesswork and ad-hoc estimates provided by the administrative chain of the British Raj produced wildly varying numbers. Like other parts of India, cultivation in Bengal was decentralized and spread over nearly 100 Million small farms, which were on average less than half-an-acre in area, spread over more than 60, 000 sq. miles. Jute was grown in a subset of these farms. Furthermore, the cultivation lifecycle of Jute is very short - about two months from planting to harvesting. Consequently, even if the administration was willing to cough up the expenses for an enumeration survey, covering all these farms within 8-9 weeks would be extremely expensive, if not impossible. Add to the fact, that many plots (30%) that cultivated Jute also cultivated other crops in parallel. Thus, while in theory, we can expect a total enumeration to give us near-zero error, in practice, allotting multi-crop areas to Jute and other Human-induced errors would introduce noise. In fact, the report states that the biggest negative associated with an enumerative survey was not the prohibitive cost but its unreliability, and this motivated Mahalanobis to develop and implement an alternative approach that accomplished the task at a fraction of the cost and time, and at a higher level of accuracy using random sampling.

Random Sampling
The nearly 100M jute farms were spread over Bengal in a non-homogeneous manner. Some areas were densely cultivated, some sparsely. The approach was to partition the total area into zones, i = 1, ..., n (area A_i) whose area was internally homogeneous (kinda like the way finite element analysis is used in structural engineering). Within each zone, a number of areas or grids were selected and sampled at random. If a sufficient number of such grids were sampled, the average proportion of area under Jute within a zone (J_i) can be obtained, which allows us to predict the total area under Jute  = sum(i) A_i * J_i.

Decision variables
1. The partition of the total area into approximately homogeneous zones
2. The number of random samples within a zone
3. The area of a sample

For simplicity, we assume that the first decision of partitioning the area within Bengal is an external input and thus our focus is on optimizing the remaining two decisions.

Constraints
1. The cost of the whole operation depends on the second and third set of decision variables.  For a given budget, if the area of an individual sample is large, then the number of samples has to be reduced, and thus the samples would be more spread out and further away from each other.

2. The achievable precision (variance) varies similarly. If the sample area is large, the per-sample variance is smaller, but cost considerations limited the number of such large-samples, and this can hurt the overall variance accumulated across the zone. On the other hand, a smaller area in tandem with a larger number of such small-samples affects precision in an opposite manner.

Nonlinear Optimization Problem
Given either cost or precision as a hard constraint, select the sampling area and the number of random samples to maximize precision, or minimize cost.

Mahalanobis' approach attempts to model the change in variance and cost as continuous functions of the two decision variable sets. Once these functions are at hand, a local optimum is obtained using a derivative based Lagrange-multiplier method. Mahalanobis used this approach to tabulate the achievable precision for a range of cost levels.

An exploratory, small-scale (pilot) survey was initially conducted at a small expense as a proof-of-concept and proof-of-technology validation of the methodology prior to embarking on a full-scale project. This type of an approach is now widely adopted in many business analytics projects.

The effect of the decision variables on variance can be calculated relying on theoretical methods. However, human-induced errors were also common, and Mahalanobis used the idea of interpenetrating half-sample pairs, where two groups independently arrived at Jute area estimates for a given location.  There are many important details here that are left out for brevity. The cost calculation is detailed and empirical and depends on the nature of the survey, and among things, include:
a. cost of staying and surveying at a given site - this depends on the size of the sampled area and time spent
b. cost of traveling from sample to sample - this depends on the distances between the chosen samples and the sequence of visiting.

Again, we have left out a humongous amount of cost calculations that were done. Reading the reports that came out of this work, one is amazed by the time and effort devoted to meticulously tabulating the various costs that go beyond 'ball-park' estimates, to produce an accurate cost function. For example cost calculation (b) depends on the solution to the corresponding traveling salesman problem.

The TSP
One of the many reports that came out of this this project notes:
(source: Sankhya journal, 1940)

This cost calculation is reviewed by Applegate, Bixby, et al. in their book on TSP and in Bill Cook's 'In Pursuit of the Traveling Salesman'.  A literature review of this TSP in these books mention that researchers later showed that the expected length of the optimal tour was approximately between (0.707, 1.27) times the square root of the number of samples visited in a unit square, so Mahalanobis' 1930s estimate was a remarkably good choice.

Results and Business Impact
The cost- and precision-controlled random sampling approach proved to be revolutionary. It achieved greater precision at a fraction of the cost.  Specifically, the margin of error was +/- 2%, and the cost was 1/15 of an enumeration census that was performed the same year and found to be less accurate compared to the random sampling approach. Thus the benefit and return-on-investment of this analytical approach was successfully demonstrated in practice, which received widespread recognition and was later embraced by the Government of independent India for many nationwide surveys.

Prelude to Part-3: The Bengal Holocaust
Within a couple years of the successful demonstration and publication of this work, Mahalanobis' Bengal lost between 2-6 million people due to starvation and disease between 1942-1945, triggered in part possibly by a failure of rice crop. The British Raj, locked in an grim Atlantic battle during WW2, may have suppressed reports and figures. It appears that most of the world, and even a vast majority of Indians, to this day, remain unaware of the reality behind this event.  How to obtain a reasonable estimate of casualties due to this disaster? Who was responsible and how? A recent book has brought this controversy into the open, and it appears that Mahalanobis (and his statistical sampling method) may have played a critical part in solving this puzzle.

To be continued.

Monday, June 10, 2013

Storm Chaser: Contributions of Mahalanobis to Analytics - 1

Introduction
The recent history of the practice of Analytics and Operations Research in India appears to begin with P. C. Mahalanobis; or at the very least, he is central to this history during the 1930s-1960s time frame.
(source: www.isical.ac.in)

Aside from the well-known Indian Statistical Institute in Kolkata and the distance measure named after him, his legacy includes a rich body of practical analytics work. Examples includes the design of a cost-effective and accurate random sampling method to determine the jute crop output in Bengal in the 1930s,  predictive analysis of the effects of South-west monsoons in the Indian state of Odisha (Orissa), a post-mortem of the Bengal famine in the 1940s, and his application of Linear Programming models for national planning in the 1950s. As an ORMS practitioner as well as a student of Indian history, these works are quite useful and instructive and will be covered here over the next few weeks, starting with his analysis of monsoon storms in Odisha. This is the second post here associated with this beautiful state of India. A previous post on Odisha analyzed the optimal location of elephants ('jumbo decision variables'), no kidding.

Storm Chaser
Figure 1 below depicts an annotated Google map of the area of the catchment basin for the Mahanadi ('great river') near the east coast of India and the river delta where the Mahanadi and other rivers (including the Brahmini and Baitarini) deposit their alluvial silt and empty into the Bay of the Bengal.

(Figures 1 and 2: google maps)

Mahalanobis' description of this problem in the 1930s issue of Sankya, ISI's journal, begins with a general description of the geography and the climate of this area that provides us a big picture and context for his research, before utilizing weather-related data for a deep-dive analysis. Data indicates that the south west Monsoon (June-September) accounts for around 80% of the total rainfall in the year in the bay area, and can result in severe flooding in certain areas resulting in loss of life and property. In particular, the research focuses on the head of the delta ("A": Naraj, near the city of Cuttack), depicted using a zoom-in on the area.

To the of south of this area lies the magnificent Chilka lake, the second largest lagoon in the world.
(source: flikr)

Mahalanobis' description really brings to life a bunch of dry and dull row-and-column data by mapping it to visceral reality. You can almost feel the intensity of the monsoons, and see the storm waters rushing by. This makes the subsequent description of the analytical approach that much more easier to follow and enjoyable to read - something sorely missing in almost all technical journals today.

Weather data recorded during previous monsoons (between 1874-1926) indicate that such storms originate from the Bay of Bengal and move westward over a period of a few days. A table of calculated effective distances between the various locations of interest in the Mahanadi system is given below. Each row is associated with a location that is further east of the coastline.


(source: Sankhya journal)
The accumulated run-off water in the catchment basin (51, 000 Sq. miles) enter the river system and much of it flows through Naraj before exiting into the Bay. In the absence of any weather satellite data, the objective of the exercise is to analytically determine the time period where key flood-prone locations of the Bay area will be threatened by a big storm that makes landfall, and if timely warnings are feasible.

Step 1: Storm Velocity (east-to-west)
Existing historical data tracks the location of the center of storms in the past. Using this data, Mahalanobis estimated an average speed of a typical storm at 8.5 MPH. Next, he made a neat assumption: The velocity of the center of the storm must be roughly the same as the velocity of the locus of heavy rainfall that first falls in the delta area and takes about 40 hours to reach the eastern most section (V). He then used rain gauge data recorded at various points in the catchment area to note the period of peak rainfall to obtain the temporal lags between the rainfall peaks at various locations to independently confirm this estimate. Nice! Mahalanobis was now able to predict the approximate times of peak rainfall at various locations. Figure 3 below shows a snapshot of these results for the Mahanadi catchment area. Note the proximity of the delta to the bay (~50 miles).


(figure source: Sankhya journal)
The next step was to correlate this information with the resultant flow characteristics of storm water run-off back into the bay of Bengal.

Step 2: Flood Velocity (west-to-east)
Mahalanobis performed a series of calculations to estimate the typical historical velocity of the flood waters (in the absence of any gradient information) by correlating the times and locations of peak rainfall with the peak water level data recorded using a flood gauge of Naraj. Again, a cool use of lags. As before, he employed two independent methods to compute a reliable value, which turned out to be fairly steady at around 4 MPH most of the way, and slowing down at the head of the delta (section-I) where the land considerably flattens out.

(figure source: Sankhya journal)

The picture is now complete. Mahalanobis summarized his findings as follows:

Unfortunately, casualties due to flooding remains a serious problem here (2011 report) to this day. A youtube video of flood waters at Naraj, 2011, and another one in 2008.




Updated June 10: (pic source: http://censusindia.gov.in)



Wednesday, June 5, 2013

Left-handed Coconut Trees and Other Statistical Tales

Lots of more interesting stuff in C. R. Rao's 1989 lecture. I blogged about a historically interesting demand estimation problem from this talk earlier today. I'm sharing other comments below. Italicized words are mine.

On the philosopher's view of knowledge.
...Vivekananda and Einstein maintained that new knowledge can be created only by instinct, reason and inspiration, a process known as abduction and not by deductive reasoning assuming a given set of premises to be true or by inductive inference from observed data. ( “a theory can be proved by an experiment, but no path leads from experiment to theory”-Einstein).

The ancient Hindu scriptures mention, perception (pratyksha), inference (anumana), comparison (upamana) and verbal testimony (sabda) as possible instruments for creation of new knowledge.

From a subsection on the scientist's view of knowledge
"In May 1983, exactly 350 years after Galileo‟s confession, Pope John Paul II graciously conceded to a delegation of 200 scientists that the Pope Urban VIII who convicted Galileo might have erred. In July 1984, after examining all the relevant documents, Pope Paul II exonerated Galileo saying that the judges who condemned Galileo were wrong."

On Serendipity in scientific discoveries
"Pluto's moon Charon was discovered by US astronomer James Christy in 1978. He was going to discard what he thought was a defective photographic plate of Pluto, when his Star Scan machine broke down. While it was being repaired he had time to study the plate again and discovered others in the archives with the same "defect" (a bulge in the planet's image which was actually a large moon)."
One person's discarded 'outlier' is another person's treasure.

On statistics and scientific research.
... As R.A.Fisher said in a speech delivered at the Indian Statistical Institute in 1952: “Statistical science is the peculiar aspect of human progress which gave 20th
century its special character. It is to the statistician the present age turns for
what is most essential in all its more important activities”."
perhaps he meant to say 'Operations Research'. maybe not.

R.A.Fisher. Emphasizing the need for consulting a statistician before the experiment is conducted, Fisher said:
“You get 10 times more information from a carefully designed experiment. To consult a statistician after the experiment is finished is often to merely ask him to conduct a postmortem examination. He can only say what the experiment died of”.

“Statistics is the technology of finding the invisible and measuring the immeasurable”.

On statistical applications.
"....Galton was able to collect birth order data from 99 of his [gifted, accomplished] subjects, revealing that 48% of them were first born sons or only sons. The percentages of the second and third born were very low..."
Rao mentions he was his parent's eighth child.

...T.A. Davis, a professor at the Indian Statistical Institute made several studies on coconut trees which can be classified as left-handed or right- handed depending on the direction of its foliar spiral. By doing experiments he found that spirality is not genetically inherited and left handed trees yield 10% more coconuts than the right handed trees, a conclusion of economic importance. A recommendation was made to the Government in the state of Kerala to grow only the” leftists to increase the production of nuts”.
Possible context: Kerala was crazy enough to willfully usher in the world's first democratically elected communist government.

On facts before theory:
“It is a capital mistake to theorize before one has data. Insensibly, one begins to twist facts to suit theories instead of theories to suit facts.
- Sherlock Holmes

Without good information, you won‟t see things as they really are-you will see them as you think they are.
“Aristotle maintained that women have fewer teeth than men; although he was married twice, it never occurred to him to verify his statement by examining his wives ‟mouth”.
- Bertrand Russel

On decision making under uncertainty, an important area of ORMS.
“The need for knowing the three R’s, reading, writing and arithmetic, is well understood. These do not take us far unless we acquire the fourth R, reasoning under uncertainty, for taking decisions in real life”.
-C.R.Rao (Statistics and Truth)

Rao finishes with this.
The philosopher Sullivan, when asked whether he believes in astrology, replied.
“I am a Gemini and Gemini do not believe in astrology”.
Thank you.

updated: June 6, typos.

Estimating the Number of Refugees inside the Red Fort, 1947

An interesting historical use of statistical thinking during a humanitarian crisis. The situation is as follows:

Britain decided to up and leave India one fine day, but not before partitioning the land into the Islamic state of Pakistan, and Gandhi's pluralistic India. Communal riots broke out in 1946-47 during the biggest cross-border exodus of populations in mankind at that point in time. In Delhi, the capital city, an unknown number of (the majority of the) Muslims who chose to remain in India arrived to seek shelter in the iconic red fort and also within the smaller area of Humayun's tomb. The rest of the situation is described by C. R. Rao, professor emeritus at Penn State, and ranked among the greatest living statisticians, in his 1989 lecture (emphasis and comments in parantheses added by me):"

"...The government employed contractors to feed them. The contractors used to submit bills to the government of the purchases they made of different commodities [that were in short supply] like rice, pulses, salt etc. , to feed the refugees. A Secretary to the government of India suspected that the contractors were over quoting the commodities they purchased and he thought of asking a statistician to go inside the Redfort and count the number of refugees. Perhaps there were no statisticians in Delhi at that time, he sent a telegram to the Indian Statistical Institute in Calcutta requesting for a statistician to be sent to Delhi immediately. Two statisticians who had experience in conducting sample surveys went to Delhi by air and on arrival they were taken to the Redfort. 

When they wanted to go inside the Redfort to see how they can count, the guards did not allow them to go inside as they were not members of the same community as the refugees. The problem then was to estimate the number of refugees without going inside the fort or having any  knowledge of the concentration of refugees inside the Fort. The experts had conducted household surveys in Calcutta and had some idea of per capita consumption of different commodities. Here statistical thinking came to their rescue. If R, P, and S are the amounts quoted by the contractors of rice, pulses and salt, and r, p, and s are the per capita consumptions of these commodities as estimated from household surveys, they argued that R/r, P/p and S/s are estimates of the same number of persons. Based on the figures quoted by the contractors, the three estimates based on rice, pulses and salt were, 30,253, 21,122, and 10,891 respectively. They chose the smallest number based on salt. Rice was the most expensive commodity at that time and apparently the contractors were over quoting the amount of rice purchased to make money. There was another protected area, the Humayun Tomb where [a smaller number] some have taken refuge. A volunteer belonging to the same community as the refugees entered the tomb and counted the number of refugees; The statisticians used the same formula there and found that the salt estimate agreed with the number counted. The government secretary was pleased and accepted the salt estimate..."
(pic source: wikimedia)

C. R. Rao was one of the early members of the renowned Indian Statistical Institute (ISI) that was founded by the legendary P. C. Mahalanobis, an institute the produces fine Operations Research grads to this day. He's briefly mentioned in an earlier post, and an upcoming post in this space will again be related to his work.

Historical Postscript
1. A story that involves covert sampling to predict the number of pro-India soldiers to provision during the 1857 war of independence in India is briefly covered in 'Operation Red Lotus'.


(pic source: http://whc.unesco.org)
2. The tomb, which was visited by President Barack Obama on his India visit, houses the mutilated remains of a so much more relevant person - Dara Sikoh, perhaps the Mughal who most deserved the suffix "great". The Red Fort (and the Taj Mahal) was built by Indian workers on the orders of Dara's father.

(pic source: wikipedia)

Dara was the most learned and compassionate in a dynasty characterized by excesses that causes resentment to this day. He respected, even celebrated the pluralism of the dharmic thought system that has defined India since times immemorial. In unbelievably stark contrast stands his brother, the fanatical Aurangazeb, who unfortunately for India, prevailed in a tense struggle for the throne and had Dara killed most gruesomely on charges of blasphemy, before proceeding to slaughter about 4.6 Million Hindus (NY Times, 2011), statistically ranking him at #23 in the all-time list of mass killers (WW2 is at #1).

June 6: edited typos.