After the 2019 Australian election, the Poll Bludger posted that "aggregators gonna aggregate." At the time I thought it was a fair call.

More recently, I have been pondering a paper by Xiao-Li Meng: Statistical Paradises and Paradoxes in Big Data: Law of Large Populations , Big Data Paradox, and the 2016 US Presidential Election. In a roundabout way, this paper is answering the usual question: why did the polls call the 2016 US election wrong?

A key issue explored in this paper is the extent to which the growing non-response rates in public opinion polling are degrading probabilistic sampling. Theoretically, very small probability samples, randomly selected, can be used to make reliable statements about a trait in the total population.

Non-response rates can adversely impact on the the degree to which a sample is a probability sample. This is of particular concern when there is a correlation between non-response and a variable under analysis (such as voting intention). Even a very small, non-zero correlation (say 0.005) can dramatically reduce the effective sample size. The problem is exacerbated by population size (something we can ignore when the sample is a true probability sample).

Making matters worse, this problem is not fixed by weighting sub-populations within the sample. Weighting only works when the non-response is not correlated with a variable of analytical interest.

Meng argues that the combined polls prior to 2016 election account for one per cent of the US's eligible voting population (ie. the combined sample is around 2.3 million people). However, given the non-response rate, the bias in that non-response rate, and the size of the US population, this becomes an effective sample size in the order of 400 eligible voters. An effective sample of 400 people has a 95% confidence interval of plus or minus 4.9 percentage points (not the 0.06 percentage points you would expect from a 2.3 million combined voter sample).

Isakov and Kuriwaki used Meng's theory and a Bayesian model written in Stan to aggregate battleground state polls prior to the 2020 US presidential election to account for pollster errors in 2016. Of necessity, this work made a number of assumptions (at least one of which turned out to be mistaken). Nonetheless, it suggested that the polls in key battleground states were one percentage point too favourable to Biden, and more importantly, the margin of error was about twice as wide as reported. Isakov and Kuriwaki's model was closer to the final outcome than a naive aggregation of the publicly available polling in battleground states.

According to Isakov and Kuriwaki, Meng's work suggests a real challenge for poll aggregators. Systemic polling errors are often correlated across pollsters. If this is not accounted for by the aggregator, then over-confident aggregations can mislead community expectations in anticipation of an election result, more so than the individual polls themselves.

#### References

Isakov, M., & Kuriwaki, S. 2020. *Towards Principled Unskewing: Viewing 2020 Election Polls Through a Corrective Lens from 2016*. Harvard Data Science Review.

Meng, Xiao-Li. 2018. *Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election*. Annals of Applied Statistics 12 (2): 685–726.