Friday, May 29, 2015

Refactored Bayesian model

Today, I updated the technicals page for the 2016 election. This is the page where I explain the Bayesian model I use. The page includes additional commentary on the JAGS model, which explains how the various elements of the model interact.

In the process, I re-examined the model I had been using. I became concerned that the prior I had been using for the house effect was not sufficiently uninformative.  I had been using a normal prior, centred on zero, with a standard deviation of five percentage points. I have changed this to a uniform prior between -15 and +15 percentage points. I would expect house effects to be in the range of -2 to +2 percentage points, so the 30 point uniform range should be uninformative.

The change has had little impact on the analysis, so my fears were probably unfounded. Nonetheless, I have retained the uniform prior, as it is clearly less informative than the normal prior.

The other change I have made to the charts is that I now include an extra band in the Bayesian output to indicate the middle 99 per cent of samples. Previously, I had only indicated the median sample with a line, and the ranges for the middle 95, 80 and 50 percent of the samples with increasingly darker shading.

Let's look at the outcome. For the three month analysis, the median estimate of voting intention is unchanged. The first chart is the revised chart, the second chart is the earlier analysis (from here).

Turning now to house effects. The first chart is the updated analysis. The second chart is the earlier analysis. The only significant difference is the extra band in the first chart, that indicates the range for the middle 99 per cent of samples.

These changes in the second decimal place for the medians are very small, and should be ignored.

Saturday, May 23, 2015

Coalition TPP by Polling House

The Henderson Moving Average (HMA) has some significant limitations. Technically, it should only be used when the data points come in equally spaced periods of time, and it has no mechanism for dealing with missing data.

In what is not quite kosher analysis, I have applied a HMA to the houses that poll fairly regularly. Essential usually produces a weekly estimate. Morgan and Newspoll typically produce a fortnightly result. And ReachTEL yields a monthly estimate.

To obtain a rough six month moving average across these houses, I have applied a 25-term HMA to Essential, a 13-term HMA to Morgan and Newspoll, and a 7-term HMA to ReachTEL. The results follow. Only Morgan saw the post 2014 Budget as the more significant slump for the Coalition. The other three pollsters had the late January come early February 2015 slump as the more significant. All agree the Coalition has been improving since the early 2015 slump, but an election winning position in the polls will require further improvement.

Thursday, May 21, 2015

Data and code for election 2016

I have made most of my data and code base available on Google Drive. Please note, this is my live code base, which I play with quite a bit. So, there will be times when it is broken or in some stage of being edited.

What I have not made available is the Excel spreadsheets into which I initially place my data. These live in the (hidden) raw-data directory. However, the collated data for the Bayesian model lives in the intermediate directory, visible from the above link.

The program that collates and organises the spreadsheets into a single CSV input file for the Bayesian analysis is There are two intermediate input files (at the moment): TPP-3-stage1.csv and TPP-all-stage1.csv. The first of these input files covers the most recent three months. The second of these input files is for all polls since the 2013 election.

The Bayesian model itself lives in the file TPP-step2.R.

The code for producing the plots lives in

The files that begin with the letter 'z' are bash shell scripts.

The most recent set of charts live in the graphs directory. I don't keep historical charts.

There are a handful of helper programs that live in the bin directory.

There are a couple files that I am working on in respect of a primary votes model. This is still a long way from finished.

If you see an error in my code or data, please drop me a line (comments below or email address in right hand column), and let me know. I can only improve with your help.

Tuesday, May 19, 2015

Morgan poll and Bayesian aggregation

Morgan was the final poll out of the blocks in the post Budget tsunami of opinion polls. But before we look at the Morgan poll, I want to reflect a little on house effects (the systemic polling bias for each pollster). 

The Bayesian model I use includes the assumption that the individual polling houses do not change their methodology (and consequently their systemic house bias) throughout the period under analysis. The model also assumes this bias is constant. Each house, on average, leans to Labor or the Coalition by fixed number of percentage points.

I have two problems with these assumptions:

  • The first assumption is probably not true. It would be more reasonable to assume that polling houses are continually reviewing and from time-to-time improving their statistical practice. Unfortunately, it is rare for polling houses to expose their methodology changes to the public. So I cannot readily introduce discontinuities into the Bayesian model when polling practices change.
  • Second, even if the polling houses did not change their statistical practice, there is no guarantee that the systemic house bias is constant. It might vary, for example, depending on the vote share of the parties.

I am reflecting on these assumptions because the past four Morgan polls have been a little more favourable to the Coalition than the earlier polls where on average. This may reflect a change of polling practice at Morgan. And it may be nothing more than the random noise associated with opinion polling. I simply do not know. However, it is a trend worthy of monitoring further.

One way to reduce the potential erroneous impact of the above assumptions in the Bayesian model is to reduce the time period under analysis. A shorter window of analysis is less likely to include methodology changes from polling houses. When changes do happen, they will pass through the window of analysis quickly. The voting intention for the period is also more likely to be in a narrower range. With the voting intention in a narrower range, the non-constant rate biases will be better modeled by a constant. However, you do not get something for nothing. With fewer data points under analysis, the precision of the model is reduced.

To help you judge how things stand at the end of the Budget period, I have run the model using the polling data for the past three months, as well as the polling data since the last election. You should note the differences in the median and precisions of the relative house effects (also noting that the longer period includes ACNielsen, which ceased polling in the middle of 2014).

Of some comfort, both models yield a very similar end-point in terms of the Coalition's two-party preferred vote share (47.8 or 47.9 per cent).

Bayesian model over three months

Just a reminder in respect of the above charts. The shading indicates the proportion of samples in the model. The Markov Chain Monte Carlo model is run 100,000 times. In each iteration, for each node in the model, a sample is drawn. There are nodes in the model for each day under analysis and the house effect for each polling organisation. In the charts:
  • the palest shaded area represents the middle 95 per cent of the samples;
  • the next shaded area represents the middle 80 per cent of the samples;
  • the darkest shaded area represents the middle 50 per cent of sample;
  • the dark line and white triangle is the median (middle) sample; and
  • the white line on the above charts is a 61-term Henderson moving average.

Bayesian model since the last election

Monday, May 18, 2015

Second wave of post-Budget polling

Overnight two more polls were published:

  • In the Fairfax media, the Ipsos poll had the two party preferred result at 50/50. This is the best poll result for the Coalition since Newspoll of 4-6 April 2014 had the Coaltion on 51 per cent. The Ipsos poll was up four points for the Coalition on the April 2015 Ipsos poll.
  • In the Australian, Newspoll had the Coalition on 47 per cent, two-party preferred, which was down one percentage point on the previous Newspoll at the start of May.

Plugging these numbers into the Bayesian model yields:

The latest Ipsos poll is substantially adjusted in the Bayesian model.

Where there is remarkable agreement is in respect of the Prime Minister's recovery in the attitudinal polling.

In contrast, happiness with the Leader of the Opposition is largely unchanged.

Saturday, May 16, 2015

First batch of post budget polling

It is the weekend after the Federal Budget and the blizzard of post Budget polling has started. There is a venerable tradition in Australian politics of wall-to-wall polling in the week that follows the Budget.

Mark Graph's first law of polling analysis is that most Budgets are meaningless events in the lives of most punters. Budgets have minimal (if any) impact on week-to-week movements in the national voting intention (particularly at the time of the Budget). Where they do have impact, measures that hurt are more likely to have an impact than measures that help. Furthermore, if a Budget measure does have an impact on voting intention, that impact is typically manifest closer to the time of implementation (when it actually hurts) than at the time of announcement.

Mark's second law: breathless reporting, where every micro-movement in the polls is over analysed, follows the Budget polling. In the melee, noise and signal become confused: one media outlet will proclaim the resurrection of the government and another will declare its Armageddon - all in the same week. The reporting of opinion polls in the week following the Budget is often little more than an exercise in intellectual masturbation for the titillation of those living inside the beltway.

Now that I have got that off my chest, let's look at the polls:
  • ReachTEL has the national two-party preferred voting intention at 47-53 in Labor's favour. This is a movement of one percentage point in the government's favour since 23 April.
  • Galaxy has the national two-party preferred voting intention at 48-52 in Labor's favour. This is a five percentage movement in the government's favour since February 2015.

Before moving to the charts, I need to acknowledge a few things. I have added some early 2014 data that I had missed, including the ACNielsen polling data before it ceased political polling. Because I use a sum-to-zero constraint in my model, these charts are not directly comparable with the earlier charts as a result.

In short, the government's fortunes continue to improve, but at a very slow pace. The government is still some distance from looking competitive were an election called at this time. All-in-all, there is not much to see in the post-Budget opinion polling.


I shifted the graph production process from ggplot in R to matplotlib in python. Of necessity, this saw some analytical tasks move from R to python. In the process, it became clear that the LOWESS (localised regression) package in python statsmodels was not as robust as the conceptually similar LOESS package under R. For comparison, the first two of the remaining charts were produced in R. The subsequent charts were produced in python. I still run the Bayesian model in R/JAGS. However, I now export the model output data to python for chart production.


On the annotated charts, the last result was rounded to zero decimal places. This has been corrected to round to one decimal place.

Sunday, May 10, 2015

Comparison of Attitudinal Polling

Four of the polling houses have asked questions on the performance of the Prime Minister and the Leader of the Opposition. ACNielsen and Ipsos asked if you approve/disapprove of the their performance. Newspoll asked whether you are satisfied or dissatisfied with the way they are doing their job. And ReachTEL gave you a five-point Likert scale on which to grade their performance ranging across very good, good, satisfactory, poor and very poor.

While the questions and methods differ a little, each is essentially asking respondents if they are happy or unhappy with the performance of the Prime Minister and Leader of the Opposition. While the levels of happiness may differ between polling houses, the timing of the ups-and-downs are remarkably consistent between the polling houses. The levels of unhappiness are more closely aligned.

On the happiness and unhappiness maps, the Prime Minister has been on a roller coaster. Newspoll particularly has happiness on the improve at the moment for the Prime Minister. All of of the polls report that unhappiness has declined since the Prince Phillip knighthood and leadership spill motion (late January/early February 2015).

It will be interesting to see whether this latest movement in happiness will translate into a more substantial improvement in two-party preferred voting intention for the government.

For the Leader of the Opposition, these attitudinal maps have been relatively flat for many months. Nonetheless, there is a hint that things might be declining a just little in the last three months.


A couple of extra charts:

Tuesday, May 5, 2015

Not a lot to report

Two new polls this week: Morgan at 46.5 per cent for the Coalition (down 0.5 percentage points) and Newspoll (perhaps the last Newspoll as we know it) at 48 per cent for the Coalition (down 1 percentage point).

When I feed the latest data to the Bayesian aggregator, we can see a very slow improvement over the past nine weeks (less than one tenth of one percentage point per week - an improvement of 0.72 percentage points over the nine weeks).