Tuesday, June 2, 2015

Updated Bayesian analysis of opinion polls

There have been two polls in the past fortnight.

  • Morgan Poll: which had a two-party preferred (TPP) estimate of 48 for the Coalition, with preferences distributed by how electors voted at the 2013 Federal Election (-0.5 on the previous Morgan Poll); and
  • Newspoll: which had a TPP estimate of 48 for the Coalition (+1 on the previous Newspoll).

Before going to the Bayesian model, I noticed in this tweet from @PhantomTrend, that I was an outlier among those who aggregate polls (my estimate being the most favourable to Labor). I looked at my methodology and concluded there were two possible sources of difference that might contribute to this outcome.

First, I was using the Morgan series that is based on preferences distributed by how poll respondents say they will vote. In the following charts, we explore this difference a little further. These charts show that the self-reported preference allocations have typically been more favourable to Labor (and correspondingly less favourable to the Coalition). I have changed my processing arrangement to use the Morgan TPP series, with preferences distributed by how electors voted at the 2013 Federal Election.

I commented on the Morgan poll following the Budget, that its house effect appears to be changing (with a reduced lean away from the Coalition for five polls now). In the moving average in the first chart above (which looks past individual poll noise), we can see the Morgan series aligning with the Bayesian estimate over recent months. I have been concerned for sometime that Morgan is over-distributed (too bouncy) for the reported sample sizes. For modelling purposes, I already reduce the sample size to 1000. More recently, with five polls now closer to the aggregation, I am coming to the view there may be a methodology change in the way Morgan processes its raw data collection. This would represent a significant dis-continuity, which perhaps should be factored into the model. I am reviewing whether I continue to use Morgan in the poll aggregation.

The second reason my aggregation will differ from others is that I do not use Essential polls in the model. My concern with the Essential poll series is that it appears under-dispersed for the reported sample size (it is not bouncy enough). Looking at the reported weekly sample sizes (typically over 1000) and the combined fortnightly sample sizes (typically around 1800), I have wondered whether the weekly samples include some people from the previous week (or earlier). Anyway, (as noted here) because I cannot explain the under-dispersion, I do not use the Essential poll in my aggregation. Please note: I am not suggesting there is anything wrong with the Essential poll series. All I am saying is that I do not understand it fully.

In the next chart, we can see that the Essential poll (the red line) has typically been more favourable to the Coalition in comparison my aggregation (the fat brown line) which ignores it. If I included the Essential poll, the sum-to-zero constraint would move my aggregation towards the Coalition.

In terms of the aggregation, and not withstanding the changes noted above, the Coalition continues to improve following its February troubles, but it has some way to go before it is in winnable territory.

If we focus on the most recent six months, and run the model in respect of that data alone, we can see the following.

1 comment:

  1. I'll be interested to see if Morgan continues this apparently less house-effected behaviour. Its old pure face-to-face series was generally heavily ALP-skewed but did have moments now and then when it would briefly come back to the pack for just a few polls before resuming its old ways. With this one though it does look like its house effect relative to others has been declining over a longer period of time.