Saturday, December 1, 2012

House effects: a first look at the 2010 election

Following the 2004 and 2007 Federal Elections, Professor Simon Jackman published on the "house effects" of Australia's polling houses. Unfortunately, I could not find a similar analysis for the 2010 election, so for this blog I have developed a preliminary exploration of the issue.

For Jackman, house effects are the systemic biases that affect each pollster's published estimate of the population voting intention. He assumed that each published estimate from a polling house typically diverged from the from real population voting intention by a constant number of percentage points (on average and over time).

To estimate these house effects, Jackman developed a two-part model where each part of the model informed the other. Rather than outline a formal Bayesian description for Jackman's approach (with a pile of Greek letters and twiddles), I will talk through the approach.

The first part of Jackman's approach was a temporal model. It assumed that on any particular day the real population voting intention was much the same as it was on the previous day. To enable the individual house effects to be revealed, the model is anchored to the actual outcome on the election day (for the 2010 Election, the anchor would be 50.12 per cent in Labor's favour).

In the second part - the observational model - the voting intention published by a polling house for a particular date is assumed to encompass the actual population voting intention, a house effect and the margin of error for the poll.

Using a Markov chain Monte Carlo technique, Jackman identified the most likely day-to-day pathway for the temporal model for each day under analysis and the most likely house effects given the data from the observational model.

I have replicated Professor Jackman's approach in respect of the 2010 election, with a small modification to account for the different approaches to rounding taken by each polling house. I have used the TPP estimates published by the houses. Like Jackman, I used the down-rounded, mid-point date for those polls that spanned a period. As the weekly Essential reports typically aggregate two polls over a fortnight (resulting in individual weekly polls appearing twice in the Essential report data stream), I ensured that the weekly polls only appeared once in the input data to the model. Typically, this meant excluding every second Essential report.

Unfortunately, I do not have polling data for Galaxy in the lead up to the 2010 Election (if someone wants to send it to me I would be greatly appreciative). Also, I don't have the sample sizes for for all of the polls, and I have used estimates based on reasonable guesses. Consequently, this analysis must be considered incomplete. Nonetheless, my initial results follow.

In this first chart, we have the hidden temporal model estimate for each day between 5 July 2010, and the election on 21 August 2010. The red line is the median estimate from a 100,000 simulation run. The darkest zone represents the 50% credibility zone. The outer edges of the second darkest zone expands on the 50% zone to highlight the 80% credibility zone. The outer edges of the lightest zone similarly shows the 95% credibility zone.


The second chart is the estimated house effects for five of Australia's polling houses going into the 2010 Federal election. The shading in this chart has the same meaning as the previous chart. A negative value indicates that the house effect favours the Coalition. A positive value favours Labor.


The JAGS code for this analysis follows.
    model {
        ## -- observational model
        for(i in 1:length(y)) { # for each poll result ...
            roundingEffect[i] ~ dunif(-houseRounding[i], houseRounding[i])
            mu[i] <- houseEffect[house[i]] + walk[day[i]] + roundingEffect[i] # system
            y[i] ~ dnorm(mu[i], samplePrecision[i]) # distribution
        }
        
        ## -- temporal model
        for(i in 2:period) { # for each day under analysis ...
            walk[i] ~ dnorm(walk[i-1], walkPrecision) # AR(1)
        }

        ## -- priors
        sigmaWalk ~ dunif(0, 0.01)          ## uniform prior on std. dev.  
        walkPrecision <- pow(sigmaWalk, -2) ##   for the day-to-day random walk
        walk[1] ~ dunif(0.4, 0.6)           ## initialisation of the daily walk

        for(i in 1:5) {                     ## vague normal priors for house effects
            houseEffect[i] ~ dnorm(0, pow(0.1, -2))
        }
    }

No comments:

Post a Comment