This is a short methodology discussion on how I generate a possible House of Representatives outcome from a national two-party preferred (TPP) poll estimate. This is not the most sophisticated approach possible (and it will not be the ultimate way I generate these predictions as I refine my models). But, it is what I do at the moment.
The first thing I did was to secure an estimate of the TPP vote for each seat in the 2010 federal election adjusted for boundary changes and redistributions since the 2010 election. I have shamelessly used
Antony Green's pendulum for this purpose.
Secondly, building on some of the assumptions Antony made, I thought about how I should handle the treatment of independents (which sit a little outside of the mechanics of a TPP estimate). My current approach is based on the following assumptions (which are not dissimilar to
Poliquant's approach):
- Bob Katter will win Kennedy
- Andrew Wilkie will win Denison (Labor polling in the Oz 26/06/12, ReachTEL 29/6/12)
- Adam Bandt will lose Melbourne. [Note: this assumption is a little speculative. It rests on the Liberals changing their preference strategy from their 2010 approach (which they have said they will do). This would see Liberal preferences flow to Labor:Greens at 67:33 in 2013 rather than the 20:80 flow in 2010. At the 2010 election Labor won 38.1 and the Greens 36.2 per cent of the primary vote. I am not aware of any subsequent polling in the Federal seat of Melbourne; but the Greens lost the 2012 by-election in the related State seat of Melbourne (where Liberals preferenced Labor ahead of Greens)].
- Rob Oakshott will lose Lyne (Newspoll 24/10/11; ReachTEL 25/8/2011, 20/6/2012)
- Tony Windsor will lose New England (Newspoll 24/10/11; ReachTEL 19/6/2012)
- Peter Slipper's seat of Fisher will be a normal Coalition/Labor contest next election
- Craig Thomson's seat of Dobell will be a normal Coalition/Labor contest next election
- Tony Crook will re-contest O'Connor for the Coalition in a normal Coalition/Labor contest
I made some tweak's to the pendulum to give effect to these assumptions.
Next I made a quick estimate of the number of seats by calculating the swing from the previous election and summing the probabilities of a win for each of the 150 seats if that swing was applied. The R-code for this function follows. As you can see, it is a short piece of code. The heavy lifting is done by the sum(pnorm(...)) functions in the middle of this code.
seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv',
pendulum, LaborTPP) {
ALP.Outcome.2010 <- 50.12
swing <- LaborTPP - ALP.Outcome.2010
if(missing(pendulum)) {
pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
}
# Note: sd in next line comes from analysis of federal elections since 1996 ...
ALP = round( sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) )
pc <- pendulum[pendulum$OTHER == 'OTHER', ]
OTHER = round( sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) ) )
COALITION = 150 - ALP - OTHER # Just to ensure it all adds to 150.
# return a data frame - makes it easier to ggplot later
results <- data.frame(Party='Other', Seats=OTHER)
results <- rbind(results, data.frame(Party='Coalition', Seats=COALITION))
results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP))
return(results)
}
Update: I have updated the model to better manage how I treat Denison.
seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv',
pendulum, LaborTPP) {
ALP.Outcome.2010 <- 50.12
swing <- LaborTPP - ALP.Outcome.2010
if(missing(pendulum)) {
pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
}
# Note: sd in next few lines comes from analysis of federal elections since 1996 ...
pc <- pendulum[pendulum$OTHER == 'OTHER', ]
other.raw <- sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) )
OTHER <- round( other.raw )
carry <- other.raw - OTHER
# this approach typically favours Labor (probably the right way to go)
ALP <- round( carry + sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) )
COALITION <- 150 - ALP - OTHER # Just to ensure it all adds to 150.
# return a data frame - makes it easier to ggplot later
results <- data.frame(Party='Other', Seats=OTHER)
results <- rbind(results, data.frame(Party='Coalition', Seats=COALITION))
results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP))
return(results)
}
From this function we can plot a likely election outcome for a given a swing.
To get a more nuanced understanding of a potential election outcome, I undertake a simple Monte Carlo simulation (typically with 100,000 iterations). This is not a Bayesian MCMC approach. It's just a plain old fashioned MC simulation. The R-code for this procedure is more substantial.
storeResult <- function(N, pendulum, individualSeats=FALSE) {
# Use of R's lexical scoping
# entry sanity checks ...
stopifnot(is.numeric(N))
stopifnot(is.data.frame(pendulum))
stopifnot(N > 0)
seatCount <- nrow(pendulum)
stopifnot(seatCount > 0)
# sanity checking variables
count <- 0
finalised <- FALSE
# where I store the house wins ...
ALP <- rep(0, length=seatCount)
COALITION <- rep(0, length=seatCount)
OTHER <- rep(0, length=seatCount)
CUM_ALP <- rep(0, length=seatCount)
CUM_COALITION <- rep(0, length=seatCount)
# where I keep the seat-by-seat wins
seats <- data.frame(seat=pendulum$SEAT, state=pendulum$STATE, Labor=ALP,
Coalition=COALITION, Other=OTHER)
rememberSim <- function(simResult) {
# - sanity checker
stopifnot(!finalised)
stopifnot(count < N)
count <<- count + 1
stopifnot(length(simResult) == seatCount)
# - overall result
a <- table(simResult)
ALP[ a[names(a)=='ALP'] ] <<- ALP[ a[names(a)=='ALP'] ] + 1
COALITION[ a[names(a)=='COALITION'] ] <<-
COALITION[ a[names(a)=='COALITION'] ] + 1
OTHER[ a[names(a)=='OTHER'] ] <<- OTHER[ a[names(a)=='OTHER'] ] + 1
# - seat by seat result
if(individualSeats) {
seats$Labor <<- ifelse(simResult == 'ALP', seats$Labor + 1,
seats$Labor)
seats$Coalition <<- ifelse(simResult == 'COALITION',
seats$Coalition + 1, seats$Coalition)
seats$Other <<- ifelse(simResult == 'OTHER', seats$Other + 1,
seats$Other)
}
}
finalise <- function() {
# sanity checker
stopifnot(!finalised)
stopifnot(count == N)
ALP <<- ALP / N
COALITION <<- COALITION / N
OTHER <<- OTHER / N
if(individualSeats) {
seats$Labor <<- seats$Labor / N
seats$Coalition <<- seats$Coalition / N
seats$Other <<- seats$Other / N
}
for(i in 1:seatCount) {
CUM_ALP[i] <<- 1 - sum(ALP[1:i])
CUM_COALITION[i] <<- 1 - sum(COALITION[1:i])
}
finalised <<- TRUE
}
results <- function() {
stopifnot(finalised)
data.frame(seatsWon=1:nrow(pendulum), Labor=ALP, Coalition=COALITION,
Other=OTHER)
}
cumResults <- function() {
stopifnot(finalised)
data.frame(seatsWon=1:nrow(pendulum), Labor=CUM_ALP, Coalition=CUM_COALITION)
}
winProbabilities <- function() {
stopifnot(finalised)
win <- (floor(seatCount/2) + 1):seatCount
list(Labor = sum(ALP[win]), Coalition = sum(COALITION[win]))
}
seatResults <- function() {
stopifnot(finalised)
stopifnot(individualSeats)
seats
}
list(rememberSim=rememberSim, finalise=finalise, results=results, cumResults=cumResults,
seatResults=seatResults, winProbabilities=winProbabilities)
}
## -- similate one Federal election
simulateNationaLResult <- function(pendulum, swing) {
rawPrediction <- pendulum$ALP_TPP + swing
probabilisticPrediction <- rawPrediction + rnorm(nrow(pendulum), mean=0, sd=3.27459)
ifelse(probabilisticPrediction >= 50, 'ALP', pendulum$OTHER)
}
## -- run N simulations of one Federal election outcome
simulateOneOutcome <- function(N=100000, pendulumFile='./files/AntonyGreenTPP.csv',
pendulum, LaborTPP, individualSeats=FALSE) {
ALP.Outcome.2010 <- 50.12
swing <- LaborTPP - ALP.Outcome.2010
if(missing(pendulum)) {
pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
}
r <- storeResult(N, pendulum, individualSeats)
for(i in 1:N) r$rememberSim ( simulateNationaLResult(pendulum, swing) )
r$finalise()
invisible(r)
}
From this simulation, there are a few plots I can make:
I am currently working on a state-level frame for converting a series of state TPP estimates to a national outcome for the House of Representatives.