Wednesday, May 21, 2014

Predicting the Indian Elections - A Win for Data Science

The exit polls for the recently concluded Indian elections threw up a spectrum of results. Several Cable-TV networks ran their own polls, most of their numbers falling within a seemingly reasonable range, barring a public research group called 'Todays Chanakya', whose numbers were literally off the charts, predicting a massive win for Narendra Modi. People began to take averages of these polls to come up with an 'expected result', and many of these 'poll of polls' excluded TC's result as an outlier, discarding it as unbelievable.

I spent quite a bit of time looking at the meager information provided in the  (TC) website before the results were announced. Buzz-words aside, what caught my attention was the meticulous attention they paid toward obtaining a representative data sample in every single constituency. Their prior track record in predicting elections in India was simply stunning. In a recent state election too, their prediction was an outlier, and turned out to be accurate. This data sampling step is important, especially given the incredibly diverse nature of India's population. Translating projected vote-shares into actual seats won in India's 'first past the post' system is an incredibly daunting problem. If your sample is even slightly messed up, then your seat predictions can be way off, regardless of the sophistication of the predictive analytics you employ. Human judgment and domain expertise is critical.

As this useful blog points out, it's not about 'sampling error', but sampling bias. And once we see this, it is not difficult to see why the English TV networks of India, virtually every single one a willing and well-compensated participant in the witch hunt of Narendra Modi since 2002, miserably fail in their predictions, time and again. Their reporting has rarely been fact-driven, and is usually ratings-driven. Few, if any on their payroll, are trained in the rigorous scientific method. Reporters appear to be hired based on ideology, west-accented English-speaking ability, and political connections rather than merit or technical proficiency. So, when by force of habit, you look for a sample that you like, then you will only get the predictions you want viewers to see in your TV shows, which has little to do with reality. The media witch hunt against Modi, like their exit polls, as is now known, was never fact-driven from day one.  It was doomed from the start. After this election, few will take their "predictions" seriously again unless they reform.

TC's predictions were quite accurate. Modi indeed won in a landslide as they predicted, with the incumbent Nehru dynasty (aka "UPA" coalition) whose corruption almost surely qualifies as a crime against humanity, getting deservedly annihilated. On election day, at around 1-2:00 AM EST, while following the election trends, UPA was leading in about a hundred of the 543 seats up for grabs, way higher the predicted range of 61-79 seats that TC predicted they would get. However, as the day progressed, it was amazing to see UPA's leads petering out one by one, as if an invisible rope was magically pulling it back into the predicted range. Statistical destiny. Only two people appeared to be convinced about the result before May 16. TC, who adopted a scientific approach to gathering and analyzing data, and Narendra Modi, who created the history in the first place.  Both of them dared to be different and put their reputations on the line, and were worthy winners.

This election result and Modi becoming the Prime Minister of India has taught many of us a scientific lesson. Data science is about being guided by facts, not emotion, or prejudiced opinion, or preferred outcome. Carefully constructed fact-driven methods are less likely to fail. Gujarat's development, both rural and urban, spearheaded by Modi for 12 years, is real, and cannot be falsified. It happened, and it is there to be seen regardless of what the New York Times tells you. I blogged in 2012 that the heavy-lifting done in Gujarat may pay rich dividends in the future. The people there lived that development and they knew, and the thousands of migrants returned from Gujarat to other states to speak about their experience there.  TC's data sample accurately reflected this reality. The media-heads sitting in Delhi, London, and New York were high on ideology-meth, low on fact. Few visited the state of Gujarat to make a factual assessment. Some of the open-minded critics who did, ended up becoming Modi's strongest supporters. Not surprisingly, his fact-driven campaign won him every single parliamentary seat there. The amazing number of Indians cutting across religious, class, language, age, gender, and geographical 'barriers', who voted for Modi, too cannot be brushed aside. Facts cannot be ignored until time-travel becomes practical.

And here's another prediction, an easy one. Modi will probably become India's best, and most unifying leader since Mahatma Gandhi, if he isn't already that. If, as the Nehru dynasty says, "power is poison", India has surely found their Shiva.

Wednesday, May 14, 2014

Indian elections 2014: Long words, short story

Can long, archaic words be used to maximize overall brevity (and levity)? Take the 2014 Indian general elections that recently concluded. Although the final results will come out on May 16 (the amazing Narendra Modi as Prime Minister), exit polls already give us this clear-enough picture:

India has understood that the phoney "Idea of India" brand of secularism is nothing but antidisestablishmentarianism in disguise, and we are witness to the historical floccinaucinihilipilification of the Nehru-dynasty by the Indian voter.