Mark the Ballot: Morgan poll and Bayesian aggregation

Tuesday, May 19, 2015

Morgan poll and Bayesian aggregation

Morgan was the final poll out of the blocks in the post Budget tsunami of opinion polls. But before we look at the Morgan poll, I want to reflect a little on house effects (the systemic polling bias for each pollster).

The Bayesian model I use includes the assumption that the individual polling houses do not change their methodology (and consequently their systemic house bias) throughout the period under analysis. The model also assumes this bias is constant. Each house, on average, leans to Labor or the Coalition by fixed number of percentage points.

I have two problems with these assumptions:

The first assumption is probably not true. It would be more reasonable to assume that polling houses are continually reviewing and from time-to-time improving their statistical practice. Unfortunately, it is rare for polling houses to expose their methodology changes to the public. So I cannot readily introduce discontinuities into the Bayesian model when polling practices change.

Second, even if the polling houses did not change their statistical practice, there is no guarantee that the systemic house bias is constant. It might vary, for example, depending on the vote share of the parties.

I am reflecting on these assumptions because the past four Morgan polls have been a little more favourable to the Coalition than the earlier polls where on average. This may reflect a change of polling practice at Morgan. And it may be nothing more than the random noise associated with opinion polling. I simply do not know. However, it is a trend worthy of monitoring further.

One way to reduce the potential erroneous impact of the above assumptions in the Bayesian model is to reduce the time period under analysis. A shorter window of analysis is less likely to include methodology changes from polling houses. When changes do happen, they will pass through the window of analysis quickly. The voting intention for the period is also more likely to be in a narrower range. With the voting intention in a narrower range, the non-constant rate biases will be better modeled by a constant. However, you do not get something for nothing. With fewer data points under analysis, the precision of the model is reduced.

To help you judge how things stand at the end of the Budget period, I have run the model using the polling data for the past three months, as well as the polling data since the last election. You should note the differences in the median and precisions of the relative house effects (also noting that the longer period includes ACNielsen, which ceased polling in the middle of 2014).

Of some comfort, both models yield a very similar end-point in terms of the Coalition's two-party preferred vote share (47.8 or 47.9 per cent).

Bayesian model over three months

Just a reminder in respect of the above charts. The shading indicates the proportion of samples in the model. The Markov Chain Monte Carlo model is run 100,000 times. In each iteration, for each node in the model, a sample is drawn. There are nodes in the model for each day under analysis and the house effect for each polling organisation. In the charts: