Wednesday, May 8, 2019

Why I am troubled by the polls

A couple of days ago I made the following statement:
I must admit that the under-dispersion of the recent polls troubles me a little. If the polls were normally distributed, I would expect to see poll results outside of this one-point spread for each side. Because there is under-dispersion, I have wondered about the likelihood of a polling failure (in either direction). Has the under-dispersion come about randomly (unlikely but not impossible). Or is it an artefact of some process, such as online polling? Herding? Pollster self-censorship? Or some other process I have not identified?
Since then, we have had two more polls in the same one point range: 51/52 to 49/48 in Labor's favour. As I count it on the Wikipedia polling site, with a little bit of licence for the 6-7 April poll from Roy Morgan, there are thirteen polls in a row in the same range.

One of the probability exercises you encounter when learning statistics is the question: How likely is one to flip a coin thirteen times and throw thirteen heads in a row. The maths is not too hard. If we start with the probability of one head.

$$P(H1) = \frac{1}{2} $$

The probability of two heads in a row is

$$P(H2) = \frac{1}{2} * \frac{1}{2} = \frac{1}{4}$$

The probability of thirteen heads is

$$P(H13) = \biggl(\frac{1}{2}\biggr)^{13} = 0.0001220703125$$

So the probability of me throwing 13 heads in a row is a little higher than a one in ten thousand chance. Let's call that an improbable, but not an impossible event.

We can do something similar to see how likely it is for us to have 13 opinion polls in a row within a one-percentage-point range. (Well, actually two percentage points when you account for rounding). Let's, for the sake of the argument, assume that for the entire period the population-wide voting intention was 48.5 per cent for the Coalition. This is a generous assumption. Let's also assume that the polls each had a sample of 1000 voters, which implies a standard deviation of 1.58 percentage points.

$$SD = \frac{\sqrt{48.5*51.5}}{1000} = 1.580427157448264$$

From this, we can use python and cumulative probability distribution functions - the .cdf() method in the following code snippet - to calculate the probability of one poll (and thirteen polls in a row) being 48 or 49 per cent when the intention to vote Coalition across the whole population is 48.5 per cent.

import scipy.stats as ss
import numpy as np

sd = np.sqrt((48.5 * 51.5) / 1000)

pop_vote_intent = 48.5
p_1 = ss.norm(pop_vote_intent, sd).cdf(49.5) - ss.norm(pop_vote_intent, sd).cdf(47.5)
print('probability for one poll: {}'.format(p_1))

p_13 = pow(p_1, 13)
print('probability for thirteen polls in a row: {}'.format(p_13))

Which yields the following results

probability for one poll: 0.47309677092421326
probability for thirteen polls in a row: 5.947710065619661e-05

Counter-intuitively, if the population-wide voting intention is 48.5 per cent; and a pollster randomly samples 1000 voters, then the chance of the pollster publishing a result of 48 or 49 per cent is slightly less than half.

The probability of 13 polls in a row at 48 or 49 per cent is 0.000059. This is actually slightly less likely than throwing 14 heads in a row.

I get the same result if I run a simulation 100,000,000 times, where each time I draw a 1000 person sample from a population where 48.5 per cent of that population has a particular voting intention. In this simulation, I have rounded the results to the nearest whole percentage point (because that is what pollsters do).

Again we can see only 47.31 per cent of the samples would yield a population estimate of 48 or 49 per cent. More than a quarter of the poll estimates would be at 50 per cent or higher. More than a quarter of the poll estimates would be at 47 per cent or lower. The code snippet for this simulation follows.

import pandas as pd 
import numpy as np 

p = 48.5
q = 100 - p
sample_size = 1000
sd = np.sqrt((p * q) / sample_size)
n = 100_000_000
dist = (pd.Series(np.random.standard_normal(n)) * sd + p).round().value_counts().sort_index()
dist = dist / dist.sum()
print('Prob at 50% or greater', dist[dist.index >= 50.0].sum())

# - and plot results ...
ax =
ax.set_title('Probability distribution of Samples of '+str(sample_size)+
    '\n from a population where p='+str(p)+'%')
ax.set_xlabel('Rounded Percent')
fig = ax.figure
fig.set_size_inches(8, 4)
fig.text(0.99, 0.01, '',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Probabilities.png', dpi=125) 

As I see it, the latest set of opinion polls are fairly improbable. They look under-dispersed compared with what I would expect from the central limit theorem. My grandmother would have bought a lottery ticket if she encountered something this unlikely.

In my mind, this under-dispersion raises a question around the reliability of the current set of opinion polls. The critical question is whether this improbable streak of polls points to something systemic. If this streak is a random improbable event, then there are no problems. However, if this streak of polls is driven by something systemic, there may be a problem.

It also raises the question of transparency. If pollsters are using a panel for their surveys, they should tell us. If pollsters are smoothing their polls, or publishing a rolling average, they should tell us. Whatever they are doing to reduce noise in their published estimates, they should tell us.

I am not sure what is behind the narrow similarity of the most recent polls. I think pure chance is unlikely. I would like to think it is some sound mathematical practice (for example, using a panel, or some data analytics applied to a statistical estimate). But I cannot help wondering whether it reflects herding or pollster self-censorship. Or whether there is some other factor at work. I just don't know. And I remain troubled.

A systemic problem with the polls, depending on what it is, may point to a heightened possibility of an unexpected election result (in either direction).


  1. I haven't studied statistics - ever - so forgive my ignorance, but isn't there a big difference between an outcome that depends entirely on chance (eg, coin flip) and a matter that depends on subjective human intentions (eg, voting)? Accordingly, wouldn't we expect the latter to be a lot less predictable, and therefore not subject to the same assumptions about probability?

  2. Selecting a random sample from the entire population - is a chance thing - provided the sample is random (everyone in the population has an equal chance of being picked). From this we can use the central limit theorem to talk about the probability distribution of many randomly selected samples from the same population.

    1. But it isn't a truly random pick. They reject some respondents and/or weight results to better represent society's demographics.

  3. Great analysis and insight, Mark.

    Based on a longer term trend of polls going back to 2016, isn't it more likely that the opposition is going to benefit?

    Early voter turn out in this election is also at an all time high. I wonder how that will effect the outcome. Speculation, but there seems to be a mood for change and I don't think that bodes well for the current Government.

    1. I have not made an analysis of this - but from what i have read, polling errors typically go both ways, they don't systemically favour one side of politics or whether a party is in or out of government.

      The increased trend to pre-polling is interesting. I will have a look and post.

    2. Thanks Mark, sounds good to me.

      Will be interesting to see that post.

  4. Thanks, that makes perfect sense. I understand why you’re troubled. All of a sudden the polls don’t appear to be so “random”, or as you suggest perhaps something else is happening that can explain the unlikely scenario.

  5. Mark,

    My memory has this occurred last election when the polls barely moved at all.

    The election outcome was consistent with the polls last time.

    As my aging memory wrong or if it is correct would this obviate your worries?

    1. 2016 was not as tightly aligned as the most recent polls in 2019 - more in a three point range (49 to 51) - but yes the 2016 campaign polls look under-dispersed. I don't think I did a more detailed analysis at that time.

      Under-dispersion is not proof of a problem. And even if polls are herded, it does not mean they will get it wrong.

      2016 here:

  6. This is a known problem in polls leading up to elections eg see

    It can't be anything to do with polls being online. Most likely pollsters are doing some kind of regularising to the mean oin their data processing, deliberately or not.

  7. My suggestion: given the relatively low primary vote for each of Labor and Lib/Nat, at least some of the convergence is due to the assumptions about the flow of preferences. Somewhere between 1/3 and 1/4 of each major party's vote in a published 2PP poll is made up of preferences assumed to have flowed to that party based on previous elections. These assumptions stay steady from poll to poll, and as I understand it all of the non-Greens minor parties are treated as a group when assigning these prefs.

    That would, at least, increase the probability of such tightly grouped polls, as the distribution of Greens and non-Greens preferences is held constant.

    1. This preference issue shouldn't result in convergence, because it's just a fixed proportion of a random variable, which leaves the standard deviation basically unchanged. ie the proportion of Green and ONP etc votes that they assume to move to ALP or Lib/Nat is either the same each poll (typically from the last election) or they ask voters what they are going to do. The end is still a random proportion of 2pp for ALP or Lib/Nat that should have the characteristics the original post says.

  8. This is really great stuff. As someone who would love to get rid of the current government, it makes me nervous as sh*t, but I guess I can't blame you for that, can I?

  9. What do you make of long unchanged runs for a party within a single pollster? E.g. the fact the Greens have held at 9% for 13 Newspolls in a row? (I'm not sure if this is directly related, but it seems at least somewhat in the same vein)

    1. Fair point: what is different now is that every pollster is singing from the same hymnal, and they are too much in tune with each other. It's a miracle.

  10. Bookmakers are obviously doing their own polling as one of them (Sportsbet) on Thursday announced it was already paying out on a Labor win.

    1. Not obviously. It could just be good marketing.

  11. I've been surveyed several times in this election by pollsters. Each time it was on my landline - never on my mobile. I am an oldie. Now in a situation where most younger voters do not have a landline and most apparently are concerned about climate change and intend to vote that way, is this a structural flaw in the polling? The proportion of young people who have registered to vote is the highest on record because of the same sex marriage plebiscite and the climate issues.

    Any insights?