Saturday, July 27, 2013

How much was Kevin Rudd worth?

I was a little surprised when I saw Simon Jackman suggest that Kevin Rudd had moved the two-party preferred voting intention by seven percentage points in Labor's favour. It was not consistent with my own analysis and only one pollster (Morgan) has data that supports a seven point movement. Data from all the remaining pollsters suggest the "Rudd Effect" was less than seven percentage points.

Now don't get me wrong, I have enormous respect for Professor Jackman. I purchased and read his 600 page text, Bayesian Analysis for Social Sciences. It is a tour de force on Bayesian statistics. I cannot recommend this book enough. His understanding and knowledge in this area far surpasses my own. Unashamedly, I have used Jackman's approach as the basis for my own aggregation efforts.

However, I suspect he has not noticed that the data since the second ascension of Keven Rudd violates a number of the linear assumptions implicit in his model. In particular, some of the house effects before and after Kevin are radically different. I blogged on this under the rubric: When models fail us. As I noted previously, the violation of the underpinning assumptions results in the model producing incorrect results.

Revisiting the discontinuity model I initially used following Rudd's restoration, I have treated the Morgan, Galaxy and Essential data before and after the restoration as different series. I have also centred the aggregation on the assumption that the house effects for Newspoll and Nielsen sum to zero (this may turn out to be problematic, but it is sufficient for the moment). Notwithstanding, some remaining doubts, I think this approach overcomes many of the problems my earlier discontinuity model had. I will cut to the results before reviewing the R and JAGS code.

The key finding is that Kevin was worth 5.6 percentage points in Labor's two party preferred vote share.

Turning to the house effects, we can see some of the variability in the pre-Rudd (PR) and after-Rudd (AR) values.

The revised model follows. In the first code block is the R code for managing the Morgan sample size and for separating the relevant polls into pre-Rudd (PR) and after-Rudd (AR) series. The second code block has the JAGS code. (As an aside, I have been playing with Stan lately, and might make a switch down the track).

# fudge sample size for Morgan multi - adjustment for observed over-dispersion[[, 'House'] == 'Morgan multi', 'Sample'] <- 1000

# treat before and after for Morgan, Galaxy and Essential as different series$House <- paste(as.character($House), 
    ifelse(as.character($House) %in% c('Essential', 'Morgan multi', 'Galaxy'),
        ifelse([, 'Date'] >= as.Date(discontinuity), ' AR', ' PR'), ''), 
l <- levels(factor($House))
n <- which(l == 'Newspoll')
l[n] <- l[1]
l[1] <- 'Newspoll' # Newspoll is House number one in the factor ...$House <- factor($House, levels=l)

model {
    ## Based on Simon Jackman's original model 
    ## -- observational model
    for(poll in 1:NUMPOLLS) { 
        y[poll] ~ dnorm(walk[day[poll]] + houseEffect[house[poll]], samplePrecision[poll]) 
    ## -- temporal model
    for(i in 2:PERIOD) { # for each day under analysis ...
        day2DayAdj[i] <- ifelse(i==DISCOUNTINUITYDAY, walk[i-1]+discontinuityValue, walk[i-1])
        walk[i] ~ dnorm(day2DayAdj[i], walkPrecision)
    sigmaWalk ~ dunif(0, 0.01)            ## uniform prior on std. dev.  
    walkPrecision <- pow(sigmaWalk, -2)   ##   for the day-to-day random walk
    walk[1] ~ dunif(0.01, 0.99)           ## uninformative prior
    discontinuityValue ~ dunif(-0.2, 0.2) ## uninformative prior

    ## -- sum-to-zero constraint on house effects 
    for(i in 2:HOUSECOUNT) { ## vague normal priors for house effects
        houseEffect[i] ~ dnorm(0, pow(0.1, -2))
    #houseEffect[NEWSPOLL] <- -sum(houseEffect[2:HOUSECOUNT])  ## all sum to zero
    houseEffect[NEWSPOLL] <- -houseEffect[NIELSEN]   ## Newspoll and Nielsen sum to zero
    #houseEffect[NEWSPOLL] <- 0 ## centred on Newspoll as zero


  1. I spoke briefly to Simon about the issue of different biases in the Gillard and Rudd eras and he was already aware of it.

    I agree with your estimate of the Rudd restoration effect - my model output is 5.6% with a standard deviation of 0.8%.

  2. I have played quite a bit with this. When the data does not conform to the model it is being fit to, all sorts of things can happen.

    From the same JAGS model. I can generate a wide array of estimates; just by including or excluding pollsters and by increasing/decreasing the prior Rudd model time span.