Dualnoise: Storm Chaser: Contributions of Mahalanobis to Analytics

Monday, June 10, 2013

Storm Chaser: Contributions of Mahalanobis to Analytics - 1

Introduction
The recent history of the practice of Analytics and Operations Research in India appears to begin with P. C. Mahalanobis; or at the very least, he is central to this history during the 1930s-1960s time frame.

(source: www.isical.ac.in)

Aside from the well-known Indian Statistical Institute in Kolkata and the distance measure named after him, his legacy includes a rich body of practical analytics work. Examples includes the design of a cost-effective and accurate random sampling method to determine the jute crop output in Bengal in the 1930s, predictive analysis of the effects of South-west monsoons in the Indian state of Odisha (Orissa), a post-mortem of the Bengal famine in the 1940s, and his application of Linear Programming models for national planning in the 1950s. As an ORMS practitioner as well as a student of Indian history, these works are quite useful and instructive and will be covered here over the next few weeks, starting with his analysis of monsoon storms in Odisha. This is the second post here associated with this beautiful state of India. A previous post on Odisha analyzed the optimal location of elephants ('jumbo decision variables'), no kidding.

Storm Chaser
Figure 1 below depicts an annotated Google map of the area of the catchment basin for the Mahanadi ('great river') near the east coast of India and the river delta where the Mahanadi and other rivers (including the Brahmini and Baitarini) deposit their alluvial silt and empty into the Bay of the Bengal.

(Figures 1 and 2: google maps)

Mahalanobis' description of this problem in the 1930s issue of Sankya, ISI's journal, begins with a general description of the geography and the climate of this area that provides us a big picture and context for his research, before utilizing weather-related data for a deep-dive analysis. Data indicates that the south west Monsoon (June-September) accounts for around 80% of the total rainfall in the year in the bay area, and can result in severe flooding in certain areas resulting in loss of life and property. In particular, the research focuses on the head of the delta ("A": Naraj, near the city of Cuttack), depicted using a zoom-in on the area.

To the of south of this area lies the magnificent Chilka lake, the second largest lagoon in the world.

(source: flikr)

Mahalanobis' description really brings to life a bunch of dry and dull row-and-column data by mapping it to visceral reality. You can almost feel the intensity of the monsoons, and see the storm waters rushing by. This makes the subsequent description of the analytical approach that much more easier to follow and enjoyable to read - something sorely missing in almost all technical journals today.

Weather data recorded during previous monsoons (between 1874-1926) indicate that such storms originate from the Bay of Bengal and move westward over a period of a few days. A table of calculated effective distances between the various locations of interest in the Mahanadi system is given below. Each row is associated with a location that is further east of the coastline.

(source: Sankhya journal)
The accumulated run-off water in the catchment basin (51, 000 Sq. miles) enter the river system and much of it flows through Naraj before exiting into the Bay. In the absence of any weather satellite data, the objective of the exercise is to analytically determine the time period where key flood-prone locations of the Bay area will be threatened by a big storm that makes landfall, and if timely warnings are feasible.

Step 1: Storm Velocity (east-to-west)
Existing historical data tracks the location of the center of storms in the past. Using this data, Mahalanobis estimated an average speed of a typical storm at 8.5 MPH. Next, he made a neat assumption: The velocity of the center of the storm must be roughly the same as the velocity of the locus of heavy rainfall that first falls in the delta area and takes about 40 hours to reach the eastern most section (V). He then used rain gauge data recorded at various points in the catchment area to note the period of peak rainfall to obtain the temporal lags between the rainfall peaks at various locations to independently confirm this estimate. Nice! Mahalanobis was now able to predict the approximate times of peak rainfall at various locations. Figure 3 below shows a snapshot of these results for the Mahanadi catchment area. Note the proximity of the delta to the bay (~50 miles).

(figure source: Sankhya journal)
The next step was to correlate this information with the resultant flow characteristics of storm water run-off back into the bay of Bengal.

Step 2: Flood Velocity (west-to-east)
Mahalanobis performed a series of calculations to estimate the typical historical velocity of the flood waters (in the absence of any gradient information) by correlating the times and locations of peak rainfall with the peak water level data recorded using a flood gauge of Naraj. Again, a cool use of lags. As before, he employed two independent methods to compute a reliable value, which turned out to be fairly steady at around 4 MPH most of the way, and slowing down at the head of the delta (section-I) where the land considerably flattens out.

(figure source: Sankhya journal)

The picture is now complete. Mahalanobis summarized his findings as follows:

Unfortunately, casualties due to flooding remains a serious problem here (2011 report) to this day. A youtube video of flood waters at Naraj, 2011, and another one in 2008.