Date | Firm | Primary | TPP | |||||
L/NP | ALP | GRN | ONP | OTH | L/NP | ALP | ||
15-16 May 2019 | Newspoll | 38 | 37 | 9 | 3 | 13 | 48.5 | 51.5 |
13-15 May 2019 | Galaxy | 39 | 37 | 9 | 3 | 12 | 49 | 51 |
12-15 May 2019 | Ipsos | 39 | 33 | 13 | 4 | 11 | 49 | 51 |
10-14 May 2019 | Essential | 38.5 | 36.2 | 9.1 | 6.6 | 9.6 | 48.5 | 51.5 |
10-12 May 2019 | Roy Morgan | 38.5 | 35.5 | 10 | 4 | 12 | 48 | 52 |
9-11 May 2019 | Newspoll | 39 | 37 | 9 | 4 | 11 | 49 | 51 |
2-6 May 2019 | Essential | 38 | 34 | 12 | 7 | 9 | 48 | 52 |
4-5 May 2019 | Roy Morgan | 38.5 | 34 | 11 | 4 | 12.5 | 49 | 51 |
2-5 May 2019 | Newspoll | 38 | 36 | 9 | 5 | 12 | 49 | 51 |
1-4 May 2019 | Ipsos | 36 | 33 | 14 | 5 | 12 | 48 | 52 |
25-29 Apr 2019 | Essential | 39 | 37 | 9 | 6 | 9 | 49 | 51 |
27-28 Apr 2019 | Roy Morgan | 39.5 | 36 | 9.5 | 2.5 | 12.5 | 49 | 51 |
26-28 Apr 2019 | Newspoll | 38 | 37 | 9 | 4 | 12 | 49 | 51 |
23-25 Apr 2019 | Galaxy | 37 | 37 | 9 | 4 | 13 | 48 | 52 |
20-21 Apr 2019 | Roy Morgan | 39 | 35.5 | 9.5 | 4.5 | 11.5 | 49 | 51 |
11-14 Apr 2019 | Newspoll | 39 | 39 | 9 | 4 | 9 | 48 | 52 |
If we assume the sample size for every one of these polls was 2000 electors, and if we assume that the population voting intention was 48.5/51.5 for the entire period, then the chance of every poll being in this range is about 1 in 1661.
In statistics, we talk about rejecting the null hypothesis when p < 0.05. In this case p < 0.001. So let's reject the null hypothesis. These numbers are not the raw output from 16 independent, randomly-sampled surveys.
While it could be that the samples are not independent (for example, if the pollsters used panels), or that the samples are not sufficiently random and representative, I suspect the numbers have been manipulated in some way. I would like to think this manipulation is some valid and robust numerical process. But without transparency from the pollsters, how can I be sure?
For those interested, the python code snippet for the above calculation follows.
import scipy.stats as ss import numpy as np p = 48.5 q = 100 - p sample_size = 2000 sd = np.sqrt((p * q) / sample_size) print(sd) p_1 = ss.norm(p, sd).cdf(49.5) - ss.norm(p, sd).cdf(47.5) print('probability for one poll: {}'.format(p_1)) p_16 = pow(p_1, 16) print('probability for sixteen polls in a row: {}'.format(p_16))
My next problem is aggregation. My Bayesian aggregation methodology depends on there being polls that are normally distributed around the population mean. In practice, it is the outliers (and the inliers towards the edges of the normal range) that move the aggregation. There are no such data points in this series.
Setting this aside, when I run the aggregation on all of the polling data since the last election I get a final aggregated poll estimate of 48.4 for the Coalition to 51.6 for Labor. On those results, I would expect Labor to win around 80 seats and form government.
The ensemble of moving averages is broadly consistent.
Turning to the primary votes, we can see ...
No comments:
Post a Comment