This post is mildly motivated by the INFORMS blog challenge this month dealing with 'OR and politics'. This tab dabbles with OR-eyed viewpoints of Indian political events from time to time. Past posts on OR and politics can be found here, here and here. The idea for this post arose from real issues in OR practice and business analytics. Yet, there is an interesting element of politics as well.

Consider this hypothetical experiment. We select a list of many thousand past political leaders from around the world and generate ratings on multiple attributes that provide valuable insight into their their 'level of honesty' derived from fact-driven records during their leadership tenure. On the right hand side, we have a yes/no binary indicator on whether that politician was generally considered "honest" or "dishonest". Our objective is simple: generate the probability that an input politician is honest, given a set of scores for each of his/her performance attributes.

We use a binary logit model (i.e. logistic regression) to do this and use historical data to calibrate the parameters using the maximum likelihood estimate approach. Since we have a fairly large sample size, we get a good model fit and hit all the right notes as far as confidence intervals, etc. The statistical model shows a good fit. But how well will it predict in real life? These are two different stories.

Politicians strongly rated as honest and statesmanlike are a rare species. Indian legend regards King Harishchandra as an exemplar for honesty in public life, which is not surprising given that he never uttered a single lie in his life, and greatly influenced the the first person in the next list. More recently, 'Mahatma' Gandhi, Abe Lincoln, and Nelson Mandela. In current times and keeping with contemporary mores, a Barack Obama (perhaps), Dr. Abdul Kalaam of India, or a Helen Clark of New Zealand, ..., the list of people keeping it on the level is quite short. It is likely that we will find our predictive analytical model is (far too) good as far as picking crooked politicians. If 99% of politicians are dishonest, then it is very easy to get a good fit. In fact, a 1-line model that simply returns "crooked politician" is a good one - it is 99% accurate. However, this model is not very interesting. Our focus and curiosity is driven by finding those that fall in that elusive 1%. A "NO" model fails 100% in this regard. How well did our statistically calibrated predictive model fit the "YES" instances? Most likely it did a pretty poor job and far below the expected rate of good guys. In fact, if you were very careless, your computer program may even treat some of these 'YES' data points as nuisance value/outliers! This situation is kinda like the inverse of the analytical problem of fraud detection (pun unintended). Consequently, if we fed the model, say, 'Honest' Abe Lincoln's attributes, we would be disappointed with the output. Our model moves into the domain of truthiness. On the other hand, a 'monkey model' that randomly generates answers with a mean "YES" rate of 1% may be more useful. Our challenge is to be able to do better than the monkey.

To do that, we turn to analytical work done in political science. Folks here (and in areas like new drug discovery) often work with predictive math models for rare events and some literature search in these areas indicate that there are quick (but not obvious) fixes to such plain-vanilla predictive models that we tend to use mechanically in OR projects. In particular, these corrections ensure that the natural imbalance inherent in the training data is accounted for in the right way and by the right amount.

The lesson, if any, from this experiment is that the basic act of testing predictive models on hold-out or hidden samples must never be bypassed. Fitting well to historical data is necessary for our validation, but certainly not sufficient for a customer's satisfaction. It does NOT imply "useful predictor". Not even if we have a lot of data. Furthermore, when we build a prescriptive analytical layer by embedding our predictive model within an optimization framework to determine the optimal attributes that maximize some objective, the external effects of a bad predictive model become pronounced. Optimization magnifies the silliness of a bad prediction. It literally takes it to an extreme point. In fact, an advantage of having a prescriptive layer is that it can often tell if the underlying predictive layer is playing politics with you.

Consider this hypothetical experiment. We select a list of many thousand past political leaders from around the world and generate ratings on multiple attributes that provide valuable insight into their their 'level of honesty' derived from fact-driven records during their leadership tenure. On the right hand side, we have a yes/no binary indicator on whether that politician was generally considered "honest" or "dishonest". Our objective is simple: generate the probability that an input politician is honest, given a set of scores for each of his/her performance attributes.

We use a binary logit model (i.e. logistic regression) to do this and use historical data to calibrate the parameters using the maximum likelihood estimate approach. Since we have a fairly large sample size, we get a good model fit and hit all the right notes as far as confidence intervals, etc. The statistical model shows a good fit. But how well will it predict in real life? These are two different stories.

Politicians strongly rated as honest and statesmanlike are a rare species. Indian legend regards King Harishchandra as an exemplar for honesty in public life, which is not surprising given that he never uttered a single lie in his life, and greatly influenced the the first person in the next list. More recently, 'Mahatma' Gandhi, Abe Lincoln, and Nelson Mandela. In current times and keeping with contemporary mores, a Barack Obama (perhaps), Dr. Abdul Kalaam of India, or a Helen Clark of New Zealand, ..., the list of people keeping it on the level is quite short. It is likely that we will find our predictive analytical model is (far too) good as far as picking crooked politicians. If 99% of politicians are dishonest, then it is very easy to get a good fit. In fact, a 1-line model that simply returns "crooked politician" is a good one - it is 99% accurate. However, this model is not very interesting. Our focus and curiosity is driven by finding those that fall in that elusive 1%. A "NO" model fails 100% in this regard. How well did our statistically calibrated predictive model fit the "YES" instances? Most likely it did a pretty poor job and far below the expected rate of good guys. In fact, if you were very careless, your computer program may even treat some of these 'YES' data points as nuisance value/outliers! This situation is kinda like the inverse of the analytical problem of fraud detection (pun unintended). Consequently, if we fed the model, say, 'Honest' Abe Lincoln's attributes, we would be disappointed with the output. Our model moves into the domain of truthiness. On the other hand, a 'monkey model' that randomly generates answers with a mean "YES" rate of 1% may be more useful. Our challenge is to be able to do better than the monkey.

To do that, we turn to analytical work done in political science. Folks here (and in areas like new drug discovery) often work with predictive math models for rare events and some literature search in these areas indicate that there are quick (but not obvious) fixes to such plain-vanilla predictive models that we tend to use mechanically in OR projects. In particular, these corrections ensure that the natural imbalance inherent in the training data is accounted for in the right way and by the right amount.

The lesson, if any, from this experiment is that the basic act of testing predictive models on hold-out or hidden samples must never be bypassed. Fitting well to historical data is necessary for our validation, but certainly not sufficient for a customer's satisfaction. It does NOT imply "useful predictor". Not even if we have a lot of data. Furthermore, when we build a prescriptive analytical layer by embedding our predictive model within an optimization framework to determine the optimal attributes that maximize some objective, the external effects of a bad predictive model become pronounced. Optimization magnifies the silliness of a bad prediction. It literally takes it to an extreme point. In fact, an advantage of having a prescriptive layer is that it can often tell if the underlying predictive layer is playing politics with you.

"Generally considered honest" is not the same as, and may not be all that close to, "is/was honest", particularly if we interpret honesty as speaking the truth (or at least what one sincerely believes to be the truth), rather than taking honesty to mean "did not steal (much) while in office". Just yesterday I read an article in Newsweek about Ronald Reagan. I think both friends and political foes considered Reagan to be fundamentally honest, and yet the article points out that he maintained a consistency of message that is difficult to square with the facts, and might in some cases not have been what he actually believed to be true.

ReplyDelete