I was wondering whether the difference between my TPP aggregations from the TPP polling and the primary vote polling was an artifact of the preference flows that the pollsters were applying to the primary vote estimates they had derived.

This is a question for a simple multiple regression against the formula:

TPP_estimate = coalition_pv +
α green_pv +
β other_pv

In English, the Coalition's two-party preferred vote-share estimate comprises the Coalition primary vote, plus a proportion of the Greens' primary vote (

α
), and a proportion of the other parties' primary vote (

β
). In this equation,

α and
β are both values between 0 and 1 (on the continuum of no flow of preferences through to 100% flow). I decided to solve this regression using a simple Bayesian model, as follows.

model {
## -- preference flows
for(poll in 1:NUMPOLLS) { # for each poll result - rows
yhat[poll] <- pv_coalition[poll] +
(alpha * pv_greens[poll]) +
(beta * pv_other[poll])
y[poll] ~ dnorm(yhat[poll], tau)
}
## priors
alpha ~ dunif(0.0, 1.0)
beta ~ dunif(0.0, 1.0)
sigma ~ dunif(0.001, 0.1)
tau <- pow(sigma, -2)
}

I undertook the analysis for each polling house, using their polling data since the last Federal election, with the following results.

On both charts, I have marked with a vertical gray line the preference
flow I use in my models (0.1697 for the Greens and 0.533 for other
parties). I used

Antony Green's earlier work to set my preference flows within my models.

The gray line falls within the 95% credibility interval for each of the polling houses. Therefore, I cannot argue that any of the polling houses are using different preference flows from the one I am using. If the pollsters are using different preference flows, this test did not demonstrate that.

###
Update

I have re-ran this analysis with uninformative, rather than weakly-informative priors. The charts and model have been updated. The width of the credibility intervals for Nielsen and Ipsos are unsurprising, as they have 7 and 6 observations respectively.

Very interesting exercise. I understand that some pollsters apply preference distributions from the last election to their results on a state by state basis rather than nationally. If, for instance, a higher share of the Greens' national vote comes from Victoria than in 2013, then the Green preference flow to the Coalition will be slightly lower by such methods. However the differences created by this are so small I wouldn't expect a regression to catch them on a pollster by pollster basis.

ReplyDeleteMy equation uses a small fourth term which is a constant term for each parliamentary term (currently 0.14%) to account for the influence of three-cornered contests in reducing the Coalition 2PP. It would be possible to get fancy with this concept using the state-by-state distributions of the Coalition votes and the size of the Nationals vote, but for very little gain in accuracy.

From last August on I've been using the 2PPs implied for the published primaries to modify the pollster's published 2PP (mainly to avoid throwing away information useful for estimating what the 2PP was before the pollster rounded it.) In this time I've ended up adjusting Morgan's published 2PP in the Coalition's favour 15 times, in Labor's favour 2 times and not adjusting it at all 3 times.