Saturday, April 30, 2022

Are the polls biased?

When you look at the two-party preferred (2pp) election outcome compared with the cloud of 2pp polls immediately prior to an election, it looks like the election result, more often than not, is more favorable to the Coalition than the preceding polls. To put it another way, it looks like the polls on average favour Labor. In the following chart, the election result (in the red/orange box), is typically above most of the polls (blue dots) in the five weeks immediately prior to an election.

If we look specifically at the average of all the 2pp polls in the 14 Federal elections from 1983 (the modern polling era), the Coalition outperformed the final two-week poll average 12 times. Labor outperformed this poll average twice. The following table has the difference between the average 2pp poll for the two weeks prior to the election, and the final 2pp Election result. A negative number indicates the final fortnight polls were on average more favourable to Labor than the election result. A positive number indicates that the polls were more favourable to the Coalition than the election result.

Election Year
Ave Poll Error (for polls concluded in the final 2 weeks before the election)

We can visualise this tendency to favour Labor in the polls as a probability density function, where the area under the curve sums to one. The statistical technique to construct this curve is known as a Kernel Density Estimate (KDE). It is clear, that on average, these polls collectively were a little over one percentage point favourable to Labor when compared with the final Election result.

The next long series of charts shows each of the elections and the way in which the rows in the above table were constructed. Feel free to skip past these charts if this is not your thing.

I have been thinking about whether I can model this bias, and whether I should model it. Using Bayesian techniques, I have found a Student's t-distribution that provides a good algebraic approximation of the probability density function above. So it can be modeled easily. For the nerdy, this distribution has a location of -1.255 percentage points (this is the historic pro-Labor bias), a scale factor of 1.44 percentage points, and 11.22 degrees of freedom.

I am more conflicted on whether I should model the pro-Labor bias. I have not found a compelling theory of action for how the historic bias arose. I assume that the pollsters would rather get the final election result correct (as this reflects well on their business), than to favour one side of politics or the other. It has also been reported to me that this bias does not exist in state government polling. I think this historical bias is unlikely to have occurred by chance alone. Nonetheless, if I don't know why it has occurred in the past, I cannot be confident that the driving factors will persist into the future.

I will think about this some more. If you have any compelling explanations for the historical bias, let me know in the comments below.

Finally, I want to thank Ethan and Rebecca at who compiled this data. Ethan tells me that he in turn was assisted by William Bowe and Kevin Bonham.

1 comment:

  1. Possible explanation based on gut feeling:

    * Polls lag 1-2 weeks behind actual reality of electorate.

    * In nearly all Federal elections in this era, the (dominant) tabloid media favour the LNP, particularly in the last 2 weeks of the campaign's.

    * Therefore, due to the time lag, most of this ground made up by the LNP is not picked up by the polls released before the day of the election.