Now don't get me wrong, I have enormous respect for Professor Jackman. I purchased and read his 600 page text, Bayesian Analysis for Social Sciences. It is a tour de force on Bayesian statistics. I cannot recommend this book enough. His understanding and knowledge in this area far surpasses my own. Unashamedly, I have used Jackman's approach as the basis for my own aggregation efforts.
However, I suspect he has not noticed that the data since the second ascension of Keven Rudd violates a number of the linear assumptions implicit in his model. In particular, some of the house effects before and after Kevin are radically different. I blogged on this under the rubric: When models fail us. As I noted previously, the violation of the underpinning assumptions results in the model producing incorrect results.
Revisiting the discontinuity model I initially used following Rudd's restoration, I have treated the Morgan, Galaxy and Essential data before and after the restoration as different series. I have also centred the aggregation on the assumption that the house effects for Newspoll and Nielsen sum to zero (this may turn out to be problematic, but it is sufficient for the moment). Notwithstanding, some remaining doubts, I think this approach overcomes many of the problems my earlier discontinuity model had. I will cut to the results before reviewing the R and JAGS code.
The key finding is that Kevin was worth 5.6 percentage points in Labor's two party preferred vote share.
Turning to the house effects, we can see some of the variability in the pre-Rudd (PR) and after-Rudd (AR) values.
The revised model follows. In the first code block is the R code for managing the Morgan sample size and for separating the relevant polls into pre-Rudd (PR) and after-Rudd (AR) series. The second code block has the JAGS code. (As an aside, I have been playing with Stan lately, and might make a switch down the track).
# fudge sample size for Morgan multi - adjustment for observed over-dispersion
output.data[output.data[, 'House'] == 'Morgan multi', 'Sample'] <- 1000
# treat before and after for Morgan, Galaxy and Essential as different series
output.data$House <- paste(as.character(output.data$House),
ifelse(as.character(output.data$House) %in% c('Essential', 'Morgan multi', 'Galaxy'),
ifelse(output.data[, 'Date'] >= as.Date(discontinuity), ' AR', ' PR'), ''),
sep='')
l <- levels(factor(output.data$House))
n <- which(l == 'Newspoll')
l[n] <- l[1]
l[1] <- 'Newspoll' # Newspoll is House number one in the factor ...
output.data$House <- factor(output.data$House, levels=l)
model {
## Based on Simon Jackman's original model
## -- observational model
for(poll in 1:NUMPOLLS) {
y[poll] ~ dnorm(walk[day[poll]] + houseEffect[house[poll]], samplePrecision[poll])
}
## -- temporal model
for(i in 2:PERIOD) { # for each day under analysis ...
day2DayAdj[i] <- ifelse(i==DISCOUNTINUITYDAY, walk[i-1]+discontinuityValue, walk[i-1])
walk[i] ~ dnorm(day2DayAdj[i], walkPrecision)
}
sigmaWalk ~ dunif(0, 0.01) ## uniform prior on std. dev.
walkPrecision <- pow(sigmaWalk, -2) ## for the day-to-day random walk
walk[1] ~ dunif(0.01, 0.99) ## uninformative prior
discontinuityValue ~ dunif(-0.2, 0.2) ## uninformative prior
## -- sum-to-zero constraint on house effects
for(i in 2:HOUSECOUNT) { ## vague normal priors for house effects
houseEffect[i] ~ dnorm(0, pow(0.1, -2))
}
#houseEffect[NEWSPOLL] <- -sum(houseEffect[2:HOUSECOUNT]) ## all sum to zero
houseEffect[NEWSPOLL] <- -houseEffect[NIELSEN] ## Newspoll and Nielsen sum to zero
#houseEffect[NEWSPOLL] <- 0 ## centred on Newspoll as zero
}
I spoke briefly to Simon about the issue of different biases in the Gillard and Rudd eras and he was already aware of it.
ReplyDeleteI agree with your estimate of the Rudd restoration effect - my model output is 5.6% with a standard deviation of 0.8%.
I have played quite a bit with this. When the data does not conform to the model it is being fit to, all sorts of things can happen.
ReplyDeleteFrom the same JAGS model. I can generate a wide array of estimates; just by including or excluding pollsters and by increasing/decreasing the prior Rudd model time span.