One of the foundations of statistics is the notion that if I draw many independent and random samples from a population, the means of those many random samples will be normally distributed around the population mean (represented by the Greek letter mu \(\mu\)). This is known as the Central Limit Theorem or the Sampling Distribution of the Sample Mean. In practice, the Central Limit Theorem holds for samples of size 30 or higher.

The span or spread of the distribution of the many sample means around the population mean will depend on the size of those samples, which is usually denoted with a lower-case \(n\). Statisticians measure this spread through the standard deviation (which is usually denoted by the Greek letter sigma \(\sigma\)). With the two-party preferred voting data, the standard deviation is given by the following formula:

$$\sigma = \sqrt{\frac{proportion_{Coalition} * proportion_{Labor}}{n}}$$

While I have the sample sizes for most of the sixteen polls prior to the 2019 Election, I do not have the sample size for the final YouGov/Galaxy poll. Nor do I have the sample size for the Essential poll on 25–29 Apr 2019. For analytical purposes, I have assumed both surveys were of 1000 people. The sample sizes for the sixteen polls ranged from 707 to 3008. The mean sample size was 1403.

If we take the smallest poll, with a sample of 707 voters, we can use the standard deviation to see how likely it was to have a poll result in the range 48 to 49 for the Coalition. We will need to make an adjustment, as most pollsters round their results to the nearest whole percentage point before publication.

So the question we will ask is if we assume the population voting intention for the Coalition was 48.625 per cent (the arithmetic mean of the sixteen polls), what is the probability of a sample of 707 voters being in the range 47.5 to 49.5, which would round to 48 or 49 per cent?

So the question we will ask is if we assume the population voting intention for the Coalition was 48.625 per cent (the arithmetic mean of the sixteen polls), what is the probability of a sample of 707 voters being in the range 47.5 to 49.5, which would round to 48 or 49 per cent?

For samples of 707 voters, and assuming the population mean was 48.625, we would only expect to see a poll result of 48 or 49 around 40 per cent of the time. This is the area under the curve from 47.5 to 49.5 on the x-axis, when the entire area under the curve sums to 1 (or 100 per cent).

We can compare this with the expected distribution for the largest sample of 3008 voters. Our adjustment here is slightly different, as the pollster, in this case, rounded to the nearest half a percentage point. So we are interested in the area under the curve from 47.75 to 49.25 per cent.

Because the sample size (\(n\)) is larger, the spread of this distribution is narrower. We would expect almost 60 per cent of the samples to produce a result in the range 48 to 49 if the population mean (\(\mu\)) was 48.625 per cent.

We can extend this technique to all sixteen polls. We can find the proportion of all possible samples we would expect to generate a published poll result of 48 or 49. We can then multiply these probabilities together to get the probability that all sixteen polls would be in the range. Using this method, I estimate that there is a one in 49,706 chance that this pattern of poll results would occur randomly (if the polls were independent random samples of the population).

### Chi-squared goodness of fit

Another approach is to apply a Chi-squared (\(\chi^2\)) test for goodness of fit to the sixteen polls. We can use this approach because the Central Limit Theorem tells us that the poll results should be normally distributed around the population mean. In this case, the formula for the Chi-squared test is:$$ \chi^2 = \sum_i{\biggl( \frac{\bar{x_i} - \mu}{\sigma_i} \biggr)}^2 $$

Let's step through this equation. It is nowhere as scary as it looks. To calculate the Chi-squared statistic, we do the following calculation for each poll:

- we take the published poll result (\(\bar{x_i}\) ) and subtract the population mean \(\mu\) which we estimated using the arithmetic mean for all of the polls.
- We then divide that difference by the standard deviation for the poll (\(\sigma_i\)), and then we
- square the result (multiply it by itself).

If the polls are normally distributed, the difference between the poll result and the population mean should be around one standard deviation on average. For sixteen polls that were normally distributed around the population mean, we would expect a Chi-squared statistic around the number sixteen.

If the Chi-squared statistic is much less than 16, the poll results could be under-dispersed. If the Chi-squared statistic is much more than 16, then the poll results could be over-dispersed. For sixteen polls (which have 15 degrees of freedom, because the population mean (\(\mu\)) is constrained by the 16 poll results), we would expect 99 per cent of the Chi-squared statistics to be between 4.6 and 32.8.

The Chi-squared statistic I calculate for the sixteen polls is 1.68. In other words, using this approach, if the polls were truly independent and random samples, there would be a one in 108,282 chance of generating the narrow distribution of poll results we saw prior to the 2019 Federal Election. We can confidently say the polls are under-dispersed.

### Why the difference?

It is interesting to speculate on why there is a difference between these two approaches. While both approaches suggest the poll results were statistically unlikely, the Chi-squared test says they are twice as unlikely as the first approach. I suspect the answer comes from the rounding the pollsters apply to their raw results. This impacts on the normality of the distribution of poll results.### So what went wrong?

To be honest, it is too early to tell with any certainty. But we are starting to see statements from the pollsters that suggest where some of the problems lie.A first issue seems to be the increased use of online polls. There are a few issues here:

- Finding a random sample where all Australians have an equal chance of being polled - there have been suggestions of too many educated and politically active people are in the online samples.
- Resampling the same individuals from time to time - meaning the samples are not independent. (This may explain the lack of noise we see in polls in recent years). If your sample is not representative, and used often, then all of your poll results would be skewed.
- An over-reliance on clever analytics and weights to try and make a the pool of online respondents look like the broader population. These weights are challenging to keep accurate and reliable over time.

- the use of weighting, where some groups are under-represented in the raw sample frame can mean that sample errors get magnified.
- not having quotas and weights for all the factors that align somewhat with cohort political differences can mean polls accidentally do not sample important constituencies.

Like Kevin Bonham, I am not a fan of the following theories

- Shy Tory voters - too embarrassed to tell pollsters of their secret intention to vote for the Coalition.
- A late swing after the last poll.

### Code snippet

To be transparent about how I approached this task, the python code snippet follows.import pandas as pd import numpy as np import scipy.stats as ss import matplotlib.pyplot as plt import sys sys.path.append( '../bin' ) plt.style.use('../bin/markgraph.mplstyle') # --- Raw data sample_sizes = ( pd.Series([3008, 1000, 1842, 1201, 1265, 1644, 1079, 826, 2003, 1207, 1000, 826, 2136, 1012, 707, 1697])) measurements = ( # for Labor: pd.Series([51.5, 51, 51, 51.5, 52, 51, 52, 51, 51, 52, 51, 51, 51, 52, 51, 52])) roundings = ( pd.Series([0.25, 0.5, 0.5, 0.25, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])) Model_Coalition = 48.5 # -- population wide voting intention for the Coalition in percentage points Model_Labor = 100 - Model_Coalition # -- population wide voting intention for Labor in percentage points Mean_Labor = measurements.mean() Mean_Coalition = 100 - Mean_Labor # some pre-processing variances = (measurements * (100-measurements)) / sample_sizes # sigma^2 standard_deviations = pd.Series(np.sqrt(variances)) # sigma print('Mean measurement: ', Mean_Labor) print('Measurement counts:\n', measurements.value_counts()) print('Sample size range from/to: ', sample_sizes.min(), sample_sizes.max()) print('Mean sample size: ', sample_sizes.mean()) # --- Using normal distributions print('-----------------------------------------------------------') individual_probs = [] for sd, r in zip(standard_deviations.tolist(), roundings): individual_probs.append(ss.norm(Mean_Labor, sd).cdf(Model_Labor + 0.5 + r) - ss.norm(Mean_Labor, sd).cdf(Model_Labor - 0.5 - r)) # print individual probabilities for each poll print('Individual probabilities: ', individual_probs) # product of all probabilities to calculate combined probability probability = pd.Series(individual_probs).product() print('Overall probability: ', probability) print('1/Probability: ', 1/probability) # --- Chi Squared measuremnent error - two tailed test print('-----------------------------------------------------------') dof = len(measurements) - 1 ### degrees of freedom print('Degrees of freedom: ', dof) X = pow((measurements - Mean_Labor)/standard_deviations, 2).sum() print('Expected X^2 between: ', ss.distributions.chi2.ppf(0.005, df=dof), ' and ', ss.distributions.chi2.ppf(0.995, df=dof)) print('X^2 statistic: ', X) print('Probability: ', ss.chi2.pdf(X , dof)) print('1/Probability: ', 1 / ss.chi2.cdf(X , dof)) # --- some normal plots print('-----------------------------------------------------------') low = 47.5 high = 49.5 mu = Mean_Coalition n = 707 sigma = np.sqrt((Mean_Labor * Mean_Coalition) / n) x = np.linspace(mu - 4*sigma, mu + 4*sigma, 200) y = pd.Series(ss.norm.pdf(x, mu, sigma), index=x) ax = y.plot() ax.set_title('Distribution of samples: n='+str(n)+', μ='+str(mu)+', σ='+str(round(sigma,2))) ax.axvline(low, color='royalblue') ax.axvline(high, color='royalblue') ax.text(x=low-0.5, y=0.05, s=str(round(ss.norm.cdf(low, loc=mu, scale=sigma)*100.0,1))+'%', ha='right', va='center') ax.text(x=high+0.5, y=0.05, s=str(round((1-ss.norm.cdf(high, loc=mu, scale=sigma))*100.0,1))+'%', ha='left', va='center') mid = str( round(( ss.norm.cdf(high, loc=mu, scale=sigma) - ss.norm.cdf(low, loc=mu, scale=sigma) )*100.0, 1) )+'%' ax.text(x=48.5, y=0.05, s=mid, ha='center', va='center') ax.set_xlabel('Per cent') ax.set_ylabel('Probability') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.0025, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/'+str(n)+'.png', dpi=125) plt.close() n = 3008 low = 47.75 high = 49.25 sigma = np.sqrt((Mean_Labor * Mean_Coalition) / n) x = np.linspace(mu - 4*sigma, mu + 4*sigma, 200) y = pd.Series(ss.norm.pdf(x, mu, sigma), index=x) ax = y.plot() ax.set_title('Distribution of samples: n='+str(n)+', μ='+str(mu)+', σ='+str(round(sigma,2))) ax.axvline(low, color='royalblue') ax.axvline(high, color='royalblue') ax.text(x=low-0.25, y=0.3, s=str(round(ss.norm.cdf(low, loc=mu, scale=sigma)*100.0,1))+'%', ha='right', va='center') ax.text(x=high+0.25, y=0.3, s=str(round((1-ss.norm.cdf(high, loc=mu, scale=sigma))*100.0,1))+'%', ha='left', va='center') mid = str( round(( ss.norm.cdf(high, loc=mu, scale=sigma) - ss.norm.cdf(low, loc=mu, scale=sigma) )*100.0, 1) )+'%' ax.text(x=48.5, y=0.3, s=mid, ha='center', va='center') ax.set_xlabel('Per cent') ax.set_ylabel('Probability') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.0025, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/'+str(n)+'.png', dpi=125) plt.close()