- The first model estimates the compositional voting proportions for all but one party as centered logits (where logits are logarithms of the odds ratio, which are assumed to have a mean centred close to 0 on the logit scale). The temporal model is a Gaussian autoregressive process for each of these centred logits which is similar to the autoregressive process used in the two party preferred (TPP) vote share model. To get meaningful primary poll estimates, the n_parties-1 centred logits are converted to voting share proportions for all the parties. This model takes around 500 seconds (9 minutes) to produce an aggregate poll.
- The second model is a Stan version of the Dirichlet process model I developed for the last Federal election. This model derives the party vote share on a particular day through a Dirichlet distribution over the previous day's vote share multiplied by a transmission-strength value. The larger the transmission strength value, the the less change from one day to the next. This model takes about 1800 seconds (30 minutes) to produce an aggregate poll, which is a long time.
Both models use the multinomial distribution to fit the estimated hidden voting intention to the polling data. This hidden vote share in both models is expressed as an n_parties simplex; where all values on the simplex are between 0 and 1, and collectively they sum to 1. The data from each opinion poll is four integers in a multinomial, where all values in this multinomial sum to the pseudo-sample-size of 2000. The four party groups are: the Coalition, Labor, Greens and others.
Both models treat house effects in the same way. The house effects for each polling house sum to zero across the four parties. The house effects for each party also sum to zero across the (currently) five houses.
Both models derive the best estimate for the extent to which the voting patterns on any day are similar to the voting patterns on the previous day.
Let's look at the output from each model for each party grouping. The high, low and end point median samples are annotated.You can see the models produce similar output. The median estimate from the Dirichlet model is smoother, and the sample distribution is wider.
We can also compare the house effects from each model. Again these are similar.
We can also compare the TPP estimates from the models using previous election preference flows.
We can compare these TPP estimates with the estimate from the TPP Aggregation model.
What cannot be compared - because the models are so different - is the degree to which both the models ensure the voting intention on one day is much like the next. In the centred logit model we have a model estimated standard deviation (walkSigma) from one day to the next. In the Dirichlet model, we have a model estimated transmissionFactor, the inverse of which provides the transmission strength.
My code for these models is still very much in development. It has not been vectorised. And in places it is just plain ugly. I will spend some time tidying the code, before I add it to my page on the models.
// STAN: Primary Vote Intention Model using Centred Logits data { // data size int<lower=1> n_polls; int<lower=1> n_days; int<lower=1> n_houses; int<lower=1> n_parties; int<lower=1> pseudoSampleSize; // Centreing factors real centreing_factors[n_parties-1]; // poll data int<lower=1,upper=pseudoSampleSize> y[n_polls, n_parties]; // poll data multinomials int<lower=1,upper=n_houses> house[n_polls]; // polling house int<lower=1,upper=n_days> poll_day[n_polls]; // day on which polling occurred // TPP preference flows row_vector<lower=0,upper=1>[n_parties] preference_flows_2010; row_vector<lower=0,upper=1>[n_parties] preference_flows_2013; row_vector<lower=0,upper=1>[n_parties] preference_flows_2016; } parameters { real<lower=0> walkSigma; row_vector[n_days] centredLogits[n_parties-1]; matrix[n_houses-1, n_parties-1] houseAdjustment; } transformed parameters { matrix[n_parties, n_days] hidden_voting_intention; vector<lower=-0.2,upper=0.2>[n_parties] tHouseAdjustment[n_houses]; row_vector[n_days] tmp; // house effects - two-direction sum to zero constraints for (h in 1:(n_houses-1)) for(p in 1:(n_parties-1)) tHouseAdjustment[h][p] = houseAdjustment[h][p]; for(p in 1:(n_parties-1)) tHouseAdjustment[n_houses][p] = -sum(col(houseAdjustment, p)); for(h in 1:n_houses) { tHouseAdjustment[h][n_parties] = 0; // get rid of the NAN tHouseAdjustment[h][n_parties] = -sum(tHouseAdjustment[h]); } // convert centred logits to a simplex of hidden voting intentions tmp = rep_row_vector(0, n_days); for (p in 1:(n_parties-1)) { hidden_voting_intention[p] = inv_logit(centredLogits[p]) + centreing_factors[p]; tmp = tmp + hidden_voting_intention[p]; } hidden_voting_intention[n_parties] = 1.0 - tmp; } model{ matrix[n_parties, n_polls] hvi_on_poll_day; // -- house effects model for( p in 1:(n_houses-1) ) houseAdjustment[p] ~ normal(0, 0.015); // -- temporal model - all done on the centred logit scale // Note: 0.02 near the centre --> roughly std dev of half a per cent walkSigma ~ normal(0, 0.02); // half normal prior - note: on logit scale; for(p in 1:(n_parties-1)) { centredLogits[p][1] ~ normal(0, 0.15); // centred starting point 50% +/- 5% centredLogits[p][2:n_days] ~ normal(centredLogits[p][1:(n_days-1)], walkSigma); } // -- observed data model for(p in 1:n_parties) hvi_on_poll_day[p] = hidden_voting_intention[p][poll_day]; for(poll in 1:n_polls) // note matrix transpose in the next statement ... y[poll] ~ multinomial(to_vector(hvi_on_poll_day'[poll]) + tHouseAdjustment[house[poll]]); } generated quantities { // aggregated TPP estimates based on past preference flows vector [n_days] tpp2010; vector [n_days] tpp2013; vector [n_days] tpp2016; for (d in 1:n_days){ // note matrix transpose in next three lines tpp2010[d] = sum(hidden_voting_intention'[d] .* preference_flows_2010); tpp2013[d] = sum(hidden_voting_intention'[d] .* preference_flows_2013); tpp2016[d] = sum(hidden_voting_intention'[d] .* preference_flows_2016); } }
// STAN: Primary Vote Intention Model using a Dirichlet process data { // data size int<lower=1> n_polls; int<lower=1> n_days; int<lower=1> n_houses; int<lower=1> n_parties; // key variables int<lower=1> sampleSize; // maximum sample size for y // give a rough idea of a staring point ... simplex[n_parties] startingPoint; // rough guess at series starting point int<lower=1> startingPointCertainty; // strength of guess - small number is vague // poll data int<lower=0,upper=sampleSize> y[n_polls, n_parties]; // a multinomial int<lower=1,upper=n_houses> house[n_polls]; // polling house int<lower=1,upper=n_days> poll_day[n_polls]; // day polling occured // TPP preference flows vector<lower=0,upper=1>[n_parties] preference_flows_2010; vector<lower=0,upper=1>[n_parties] preference_flows_2013; vector<lower=0,upper=1>[n_parties] preference_flows_2016; } parameters { simplex[n_parties] hidden_voting_intention[n_days]; matrix<lower=-0.06,upper=0.06>[n_houses-1, n_parties-1] houseAdjustment; real<lower=0> transmissionFactor; } transformed parameters { vector<lower=-0.2,upper=0.2>[n_parties] tHouseAdjustment[n_houses]; real<lower=1> transmissionStrength; // AR(1) strength: higher is stronger // calculate transmissionStrength transmissionStrength = 1/transmissionFactor; // make the house effects sum to zero in two directions for (h in 1:(n_houses-1)) for(p in 1:(n_parties-1)) tHouseAdjustment[h][p] = houseAdjustment[h][p]; for(p in 1:(n_parties-1)) tHouseAdjustment[n_houses][p] = -sum(col(houseAdjustment, p)); for(h in 1:n_houses) { tHouseAdjustment[h][n_parties] = 0; // get rid of the NAN tHouseAdjustment[h][n_parties] = -sum(tHouseAdjustment[h]); } } model{ // -- house effects model for( p in 1:(n_houses-1) ) houseAdjustment[p] ~ normal(0, 0.01); // -- temporal model transmissionFactor ~ normal(0, 0.005); // a half normal prior hidden_voting_intention[1] ~ dirichlet(startingPoint * startingPointCertainty); for (d in 2:n_days) hidden_voting_intention[d] ~ dirichlet(hidden_voting_intention[d-1] * transmissionStrength); // -- observed data model for(poll in 1:n_polls) y[poll] ~ multinomial(hidden_voting_intention[poll_day[poll]] + tHouseAdjustment[house[poll]]); } generated quantities { // aggregated TPP estimates based on past preference flows vector [n_days] tpp2010; vector [n_days] tpp2013; vector [n_days] tpp2016; for (d in 1:n_days){ tpp2010[d] = sum(hidden_voting_intention[d] .* preference_flows_2010); tpp2013[d] = sum(hidden_voting_intention[d] .* preference_flows_2013); tpp2016[d] = sum(hidden_voting_intention[d] .* preference_flows_2016); } }
No comments:
Post a Comment