Saturday, November 14, 2020

Reflections on polling aggregation

After the 2019 Australian election, the Poll Bludger posted that "aggregators gonna aggregate." At the time I thought it was a fair call.

More recently, I have been pondering a paper by Xiao-Li Meng: Statistical Paradises and Paradoxes in Big Data: Law of Large Populations , Big Data Paradox, and the 2016 US Presidential Election. In a roundabout way, this paper is answering the usual question: why did the polls call the 2016 US election wrong? 

A key issue explored in this paper is the extent to which the growing non-response rates in public opinion polling are degrading probabilistic sampling. Theoretically, very small probability samples, randomly selected, can be used to make reliable statements about a trait in the total population. 

Non-response rates can adversely impact on the the degree to which a sample is a probability sample. This is of particular concern when there is a correlation between non-response and a variable under analysis (such as voting intention). Even a very small, non-zero correlation (say 0.005) can dramatically reduce the effective sample size. The problem is exacerbated by population size (something we can ignore when the sample is a true probability sample). 

Making matters worse, this problem is not fixed by weighting sub-populations within the sample. Weighting only works when the non-response is not correlated with a variable of analytical interest.

Meng argues that the combined polls prior to 2016 election account for one per cent of the US's eligible voting population (ie. the combined sample is around 2.3 million people). However, given the non-response rate, the bias in that non-response rate,  and the size of the US population, this becomes an effective sample size in the order of 400 eligible voters. An effective sample of 400 people has a 95% confidence interval of plus or minus 4.9 percentage points (not the 0.06 percentage points you would expect from a 2.3 million combined voter sample).

Isakov and Kuriwaki used Meng's theory and a Bayesian model written in Stan to aggregate battleground state polls prior to the 2020 US presidential election to account for pollster errors in 2016. Of necessity, this work made a number of assumptions (at least one of which turned out to be mistaken). Nonetheless, it suggested that the polls in key battleground states were one percentage point too favourable to Biden, and more importantly, the margin of error was about twice as wide as reported. Isakov and Kuriwaki's model was closer to the final outcome than a naive aggregation of the publicly available polling in battleground states.

According to Isakov and Kuriwaki, Meng's work suggests a real challenge for poll aggregators. Systemic polling errors are often correlated across pollsters. If this is not accounted for by the aggregator, then over-confident aggregations can mislead community expectations in anticipation of an election result, more so than the individual polls themselves.

References

Isakov, M., & Kuriwaki, S. 2020. Towards Principled Unskewing: Viewing 2020 Election Polls Through a Corrective Lens from 2016. Harvard Data Science Review. 

Meng, Xiao-Li. 2018. Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Annals of Applied Statistics 12 (2): 685–726.

Thursday, November 12, 2020

Report into 2019 polling failure

Yesterday, the Association of Market and Social Research Organisations, the national peak industry body for research, data and insights organisations in Australia, released its report into the polling failure for the 2019 Australian Federal Election

Unfortunately, the Australian pollsters did not share their raw data with the AMSRO inquiry. So the inquiry was constrained. Nonetheless, it found:

The performance of the national polls in 2019 met the independent criteria of a ‘polling failure’ not just a ‘polling miss’. The polls: (1) significantly — statistically — erred in their estimate of the vote; (2) erred in the same direction and at a similar level; and (3) the source of error was in the polls themselves rather than a result of a last-minute shift among voters.
The Inquiry Panel could not rule out the possibility that the uncommon convergence of the polls in 2019 was due to herding. 
Our conclusion is that the most likely reason why the polls underestimated the first preference vote for the LNP and overestimated it for Labor was because the samples were unrepresentative and inadequately adjusted. The polls were likely to have been skewed towards the more politically engaged and better educated voters with this bias not corrected. As a result, the polls over-represented Labor voters.

While the report was hampered by the limited cooperation from polling companies, it is well worth reading.