Mark the Ballot: Last poll update

We now have all of the opinion poll results prior to today's election. And what a remarkable set of numbers they are: The 16 polls taken since April 11, when the election was called, have all been in the range 51-52 for Labor and 48-49 for the Coalition.

Date	Firm	Primary					TPP
Date	Firm	L/NP	ALP	GRN	ONP	OTH	L/NP	ALP
15-16 May 2019	Newspoll	38	37	9	3	13	48.5	51.5
13-15 May 2019	Galaxy	39	37	9	3	12	49	51
12-15 May 2019	Ipsos	39	33	13	4	11	49	51
10-14 May 2019	Essential	38.5	36.2	9.1	6.6	9.6	48.5	51.5
10-12 May 2019	Roy Morgan	38.5	35.5	10	4	12	48	52
9-11 May 2019	Newspoll	39	37	9	4	11	49	51
2-6 May 2019	Essential	38	34	12	7	9	48	52
4-5 May 2019	Roy Morgan	38.5	34	11	4	12.5	49	51
2-5 May 2019	Newspoll	38	36	9	5	12	49	51
1-4 May 2019	Ipsos	36	33	14	5	12	48	52
25-29 Apr 2019	Essential	39	37	9	6	9	49	51
27-28 Apr 2019	Roy Morgan	39.5	36	9.5	2.5	12.5	49	51
26-28 Apr 2019	Newspoll	38	37	9	4	12	49	51
23-25 Apr 2019	Galaxy	37	37	9	4	13	48	52
20-21 Apr 2019	Roy Morgan	39	35.5	9.5	4.5	11.5	49	51
11-14 Apr 2019	Newspoll	39	39	9	4	9	48	52

If we assume the sample size for every one of these polls was 2000 electors, and if we assume that the population voting intention was 48.5/51.5 for the entire period, then the chance of every poll being in this range is about 1 in 1661.

In statistics, we talk about rejecting the null hypothesis when p < 0.05. In this case p < 0.001. So let's reject the null hypothesis. These numbers are not the raw output from 16 independent, randomly-sampled surveys.

While it could be that the samples are not independent (for example, if the pollsters used panels), or that the samples are not sufficiently random and representative, I suspect the numbers have been manipulated in some way. I would like to think this manipulation is some valid and robust numerical process. But without transparency from the pollsters, how can I be sure?

For those interested, the python code snippet for the above calculation follows.

import scipy.stats as ss
import numpy as np

p = 48.5
q = 100 - p
sample_size = 2000
sd = np.sqrt((p * q) / sample_size)
print(sd)

p_1 = ss.norm(p, sd).cdf(49.5) - ss.norm(p, sd).cdf(47.5)
print('probability for one poll: {}'.format(p_1))

p_16 = pow(p_1, 16)
print('probability for sixteen polls in a row: {}'.format(p_16))

My next problem is aggregation. My Bayesian aggregation methodology depends on there being polls that are normally distributed around the population mean. In practice, it is the outliers (and the inliers towards the edges of the normal range) that move the aggregation. There are no such data points in this series.

Setting this aside, when I run the aggregation on all of the polling data since the last election I get a final aggregated poll estimate of 48.4 for the Coalition to 51.6 for Labor. On those results, I would expect Labor to win around 80 seats and form government.

The ensemble of moving averages is broadly consistent.

Turning to the primary votes, we can see ...

Mark the Ballot

Pages

Saturday, May 18, 2019

Last poll update

No comments:

Post a Comment