## Saturday, March 10, 2018

### Two aggregation models for primary vote share

I have developed two different Stan models for aggregating the primary vote opinion polls.
• The first model estimates the compositional voting proportions for all but one party as centered logits (where logits are logarithms of the odds ratio, which are assumed to have a mean centred close to 0 on the logit scale). The temporal model is a Gaussian autoregressive process for each of these centred logits which is similar to the autoregressive process used in the two party preferred (TPP) vote share model. To get meaningful primary poll estimates, the n_parties-1 centred logits are converted to voting share proportions for all the parties. This model takes around 500 seconds (9 minutes) to produce an aggregate poll.
• The second model is a Stan version of the Dirichlet process model I developed for the last Federal election. This model derives the party vote share on a particular day through a Dirichlet distribution over the previous day's vote share multiplied by a transmission-strength value. The larger the transmission strength value, the the less change from one day to the next. This model takes about 1800 seconds (30 minutes) to produce an aggregate poll, which is a long time.

Both models use the multinomial distribution to fit the estimated hidden voting intention to the polling data. This hidden vote share in both models is expressed as an n_parties simplex; where all values on the simplex are between 0 and 1, and collectively they sum to 1. The data from each opinion poll is four integers in a multinomial, where all values in this multinomial sum to the pseudo-sample-size of 2000. The four party groups are: the Coalition, Labor, Greens and others.

Both models treat house effects in the same way. The house effects for each polling house sum to zero across the four parties. The house effects for each party also sum to zero across the (currently) five houses.

Both models derive the best estimate for the extent to which the voting patterns on any day are similar to the voting patterns on the previous day.

Let's look at the output from each model for each party grouping. The high, low and end point median samples are annotated.You can see the models produce similar output. The median estimate from the Dirichlet model is smoother, and the sample distribution is wider.

We can also compare the house effects from each model. Again these are similar.

We can also compare the TPP estimates from the models using previous election preference flows.

We can compare these TPP estimates with the estimate from the TPP Aggregation model.

What cannot be compared - because the models are so different - is the degree to which both the models ensure the voting intention on one day is much like the next. In the centred logit model we have a model estimated standard deviation (walkSigma) from one day to the next. In the Dirichlet model, we have a model estimated transmissionFactor, the inverse of which provides the transmission strength.

My code for these models is still very much in development. It has not been vectorised. And in places it is just plain ugly. I will spend some time tidying the code, before I add it to my page on the models.

// STAN: Primary Vote Intention Model using Centred Logits

data {
// data size
int<lower=1> n_polls;
int<lower=1> n_days;
int<lower=1> n_houses;
int<lower=1> n_parties;
int<lower=1> pseudoSampleSize;

// Centreing factors
real centreing_factors[n_parties-1];

// poll data
int<lower=1,upper=pseudoSampleSize> y[n_polls, n_parties]; // poll data multinomials
int<lower=1,upper=n_houses> house[n_polls]; // polling house
int<lower=1,upper=n_days> poll_day[n_polls]; // day on which polling occurred

// TPP preference flows
row_vector<lower=0,upper=1>[n_parties] preference_flows_2010;
row_vector<lower=0,upper=1>[n_parties] preference_flows_2013;
row_vector<lower=0,upper=1>[n_parties] preference_flows_2016;
}

parameters {
real<lower=0> walkSigma;
row_vector[n_days] centredLogits[n_parties-1];
}

transformed parameters {
matrix[n_parties, n_days] hidden_voting_intention;
row_vector[n_days] tmp;

// house effects - two-direction sum to zero constraints
for (h in 1:(n_houses-1))
for(p in 1:(n_parties-1))
for(p in 1:(n_parties-1))
for(h in 1:n_houses) {
tHouseAdjustment[h][n_parties] = 0; // get rid of the NAN
}

// convert centred logits to a simplex of hidden voting intentions
tmp = rep_row_vector(0, n_days);
for (p in 1:(n_parties-1)) {
hidden_voting_intention[p] = inv_logit(centredLogits[p]) +
centreing_factors[p];
tmp = tmp + hidden_voting_intention[p];
}
hidden_voting_intention[n_parties] = 1.0 - tmp;
}

model{
matrix[n_parties, n_polls] hvi_on_poll_day;

// -- house effects model
for( p in 1:(n_houses-1) )

// -- temporal model - all done on the centred logit scale
// Note: 0.02 near the centre --> roughly std dev of half a per cent
walkSigma ~ normal(0, 0.02); // half normal prior - note: on logit scale;
for(p in 1:(n_parties-1)) {
centredLogits[p][1] ~ normal(0, 0.15); // centred starting point 50% +/- 5%
centredLogits[p][2:n_days] ~ normal(centredLogits[p][1:(n_days-1)], walkSigma);
}

// -- observed data model
for(p in 1:n_parties)
hvi_on_poll_day[p] = hidden_voting_intention[p][poll_day];
for(poll in 1:n_polls)
// note matrix transpose in the next statement ...
y[poll] ~ multinomial(to_vector(hvi_on_poll_day'[poll]) +
}

generated quantities {
// aggregated TPP estimates based on past preference flows
vector [n_days] tpp2010;
vector [n_days] tpp2013;
vector [n_days] tpp2016;

for (d in 1:n_days){
// note matrix transpose in next three lines
tpp2010[d] = sum(hidden_voting_intention'[d] .* preference_flows_2010);
tpp2013[d] = sum(hidden_voting_intention'[d] .* preference_flows_2013);
tpp2016[d] = sum(hidden_voting_intention'[d] .* preference_flows_2016);
}
}

// STAN: Primary Vote Intention Model using a Dirichlet process

data {
// data size
int<lower=1> n_polls;
int<lower=1> n_days;
int<lower=1> n_houses;
int<lower=1> n_parties;

// key variables
int<lower=1> sampleSize; // maximum sample size for y

// give a rough idea of a staring point ...
simplex[n_parties] startingPoint; // rough guess at series starting point
int<lower=1> startingPointCertainty; // strength of guess - small number is vague

// poll data
int<lower=0,upper=sampleSize> y[n_polls, n_parties]; // a multinomial
int<lower=1,upper=n_houses> house[n_polls]; // polling house
int<lower=1,upper=n_days> poll_day[n_polls]; // day polling occured

// TPP preference flows
vector<lower=0,upper=1>[n_parties] preference_flows_2010;
vector<lower=0,upper=1>[n_parties] preference_flows_2013;
vector<lower=0,upper=1>[n_parties] preference_flows_2016;
}

parameters {
simplex[n_parties] hidden_voting_intention[n_days];
real<lower=0> transmissionFactor;
}

transformed parameters {
real<lower=1> transmissionStrength; // AR(1) strength: higher is stronger

// calculate transmissionStrength
transmissionStrength = 1/transmissionFactor;

// make the house effects sum to zero in two directions
for (h in 1:(n_houses-1))
for(p in 1:(n_parties-1))
for(p in 1:(n_parties-1))
for(h in 1:n_houses) {
tHouseAdjustment[h][n_parties] = 0; // get rid of the NAN
}
}

model{
// -- house effects model
for( p in 1:(n_houses-1) )

// -- temporal model
transmissionFactor ~ normal(0, 0.005); // a half normal prior
hidden_voting_intention[1] ~ dirichlet(startingPoint * startingPointCertainty);
for (d in 2:n_days)
hidden_voting_intention[d] ~ dirichlet(hidden_voting_intention[d-1] *
transmissionStrength);

// -- observed data model
for(poll in 1:n_polls)
y[poll] ~ multinomial(hidden_voting_intention[poll_day[poll]] +
}

generated quantities {
// aggregated TPP estimates based on past preference flows
vector [n_days] tpp2010;
vector [n_days] tpp2013;
vector [n_days] tpp2016;

for (d in 1:n_days){
tpp2010[d] = sum(hidden_voting_intention[d] .* preference_flows_2010);
tpp2013[d] = sum(hidden_voting_intention[d] .* preference_flows_2013);
tpp2016[d] = sum(hidden_voting_intention[d] .* preference_flows_2016);
}
}