Sunday, August 26, 2018

Model updates are coming

The change of party leader necessitates the introduction of a discontinuity in the model associated with the transition from Prime Minister Turnbull to Prime Minister Morrison. Because the current model is written in Stan, the process is similar but subtly different from when I did similar things for Rudd to Gillard, Gillard to Rudd, and Abbot to Turnbull; all back in the day when I was modelling in JAGS.

I have also taken the opportunity to repair an element of the model that has nagged at me. Because the polling data is relatively sparse compared with the unit of temporal analysis (days), the derived standard deviation on the day-to-day changes in the model can be under-estimated. Rather than have the model estimate this factor (typically with a median estimate at 0.0007 of the two-party preferred (TPP) vote-share), I have specified a standard deviation on day-to-day changes in the vote share of 0.002. For reference, 100 per cent of the vote-share is represented in the model with the number one.

At the conclusion of the Turnbull government, the TPP model follows. This model has a specified standard deviation on the day-to-day change in vote share set to 0.002.



We can compare this with the moving averages below.



It is arguable that the last poll in the series was an outlier: Ipsos at 45 per cent for the Coalition over the period 15-18 August. Most recent polls had the Coalition on 49 per cent. As the final poll in the series, it has affected the Bayesian analysis more than it has the moving averages. If we re-run the analysis without that final poll, the Turnbull trajectory for the past 9 months has been strong, and a win at the next election for the Coalition did not look inconceivable, even should the trend have continued at a slower rate from now. Obviously, the current polls still had Labor ahead, but with the Coalition recovering lost ground over much of the year to date.




Turning to the primary vote model, I have made a similar adjustment to the standard deviation on day-to-day changes in voting intention. This time from a model derived 0.003 to an imposed 0.009. The unit of analysis for this is less intuitive, being on the centered logit scale, which is highly non-linear. The results follow.





The implied TPP results follow ... they don't have the same drop associated with the most recent Ipsos poll.



I am working on the revised models for when new poll results under the Morrison leadership are released. My updated code for the two models follows. This code provides for a discontinuity- but it is still in a development and test cycle - so not yet finalised.

// STAN: Two-Party Preferred (TPP) Vote Intention Model 
//     - Updated to allow for a discontinuity event. 

data {
    // data size
    int n_polls;
    int n_days;
    int n_houses;
    
    // assumed standard deviation for all polls
    real pseudoSampleSigma;
    
    // we are only going to normalise house effects from first n houses
    int n_core_set;
    
    // poll data
    vector[n_polls] y; // TPP vote share
    int house[n_polls];
    int day[n_polls];
    
    // day of discontinuity event
    int discontinuity;
}

transformed data {
    // Specify sigma in this block if you do not 
    // want the model to derive a value for 
    // the standard deviation on day-to-day
    // changes in the value of hidden_vote_share
    // NOTE: Also requres changes in parameters and 
    // model blocks below.
    
    real sigma = 0.0015;
    
    // Technical note: smaller values of sigma produce 
    // smoother (but less responsive) lines of analysis. 
    // Technical note: because the poll data is relatively 
    // sparse compared with the temporal unit of analysis 
    // (days), the model derived sigma will be understated. 
    // Technical note: from July 16 to July 18 the median 
    // model derived sigma was 0.0007
}

parameters {
    vector[n_days] hidden_vote_share; 
    vector[n_houses] pHouseEffects;
    
    // speicfy sigma here for a model derived value
    //real sigma; // SD of day-to-day change 
}

transformed parameters {
    vector[n_houses] houseEffect;
    
    // house effects sum to zero over the first n_core_set houses
    // this allows you to specify a "trusted" set of pollsters
    
    houseEffect[1:n_core_set] = pHouseEffects[1:n_core_set] - 
        mean(pHouseEffects[1:n_core_set]);
    if(n_core_set < n_houses)
        houseEffect[(n_core_set+1):n_houses] = 
            pHouseEffects[(n_core_set+1):n_houses];
}

model {
    // -- temporal model [this is the hidden state-space model]
    
    // - comment out the next line if sigma is not model derived
    // sigma ~ cauchy(0, 0.0025); // half cauchy prior
    
    // - hidden daily voting intention 
    // NOTE: the priors for the temporal model of hidden voting 
    // intention are weakly informative and therefore should be 
    // selected with some reference to the subsequent data
    hidden_vote_share[1] ~ normal(0.49, 0.0333); // PRIOR
    // update for discontinuity follows
    //hidden_vote_share[2:n_days] ~ 
    //    normal(hidden_vote_share[1:(n_days-1)], sigma);
    hidden_vote_share[discontinuity] ~ normal(0.44, 0.0333); // PRIOR
    hidden_vote_share[2:(discontinuity-1)] ~ 
        normal(hidden_vote_share[1:(discontinuity-2)], sigma);
    hidden_vote_share[(discontinuity+1):n_days] ~ 
        normal(hidden_vote_share[discontinuity:(n_days-1)], sigma);
    
    // -- house effects model
    
    pHouseEffects ~ normal(0, 0.025); // up to +/- 5 percentage points 

    // -- observed data / measurement model
    
    y ~ normal(houseEffect[house] + 
        hidden_vote_share[day], pseudoSampleSigma);
}

// STAN: Primary Vote Intention Model using Centred Logits
//     - Updated to handle the Turnbull to Morrison discontinuity
//     - Updated to use a specified standard deviation on 
//       day-to-day voting intentions.

data {
    // data size
    int<lower=1> n_polls;
    int<lower=1> n_days;
    int<lower=1> n_houses;
    int<lower=1> n_parties;
    int<lower=1> pseudoSampleSize;
    
    // Centreing factors 
    real centreing_factors[n_parties]; // Updated 
    
    // poll data - provided in three variables
    // y in the next line is effectively an array of multinomials
    int<lower=1,upper=pseudoSampleSize> y[n_polls, n_parties]; 
    int<lower=1,upper=n_houses> house[n_polls]; // polling house
    int<lower=1,upper=n_days> poll_day[n_polls]; // day poll taken
    
    // TPP preference flows from previous elections
    row_vector<lower=0,upper=1>[n_parties] preference_flows_2010;
    row_vector<lower=0,upper=1>[n_parties] preference_flows_2013;
    row_vector<lower=0,upper=1>[n_parties] preference_flows_2016;
    
    // day of discontinuity event (from Turnbull to Morrison)
    int<lower=1,upper=n_days> discontinuity;
}

transformed data {
    // NOTE: Specify sigma in this block if you 
    // do not want the model to derive a value 
    // for the standard deviation on day-to-day
    // changes in the value of hidden_vote_share
    // NOTE: Also requires changes in parameters and 
    // model blocks below.
    
    real sigma = 0.009; // Note: logit scale near the origin
    
    // Note: median model derived from Jul-16 to Ju-18 was 
    // around 0.003. 
}
 
parameters {
    // NOTE: because the vote-share from four parties
    // sums to one, we only need to model three as
    // centred logits. We can derive the four party
    // simplex from the three modeled logits
    row_vector[n_days] centredLogits[n_parties-1];
    matrix[n_houses-1, n_parties-1] houseAdjustment;

    // comment next line if model is not finding sigma
    //real<lower=0> sigma; 
}

transformed parameters {
    matrix[n_parties, n_days] hidden_voting_intention; // simplex
    vector<lower=-0.2,upper=0.2>[n_parties] tHouseAdjustment[n_houses];
    row_vector[n_days] tmp; // tmp var used to find the redundant party
    
    // -- house effects - two-direction sum to zero constraints
    for (h in 1:(n_houses-1)) {
        for(p in 1:(n_parties-1)) {
            tHouseAdjustment[h][p] = houseAdjustment[h][p];
        }
    }
    for(p in 1:(n_parties-1)) {
        tHouseAdjustment[n_houses][p] = -sum(col(houseAdjustment, p));
    }
    for(h in 1:n_houses) {
        tHouseAdjustment[h][n_parties] = 0; // get rid of the NAN
        tHouseAdjustment[h][n_parties] = -sum(tHouseAdjustment[h]);
    }
    
    // -- convert centred logits to a simplex of hidden voting intentions
    tmp = rep_row_vector(0, n_days);
    for (p in 1:(n_parties-1)) {
        hidden_voting_intention[p] = inv_logit(centredLogits[p]) + 
            centreing_factors[p];
        tmp = tmp + hidden_voting_intention[p];
    }
    hidden_voting_intention[n_parties] = 1.0 - tmp; 
}

model{
    matrix[n_parties, n_polls] hvi_on_poll_day;

    // -- house effects model
    
    for( h in 1:(n_houses-1) ) {
        houseAdjustment[h] ~ normal(0, 0.015); 
    }
    
    // -- temporal model - [AKA the hidden state-space model]
 
    // - day-to-day standard deviation
    // Note: 0.02 near the centre --> roughly std dev of half a per cent 
    // Comment out the next line if you do not want the model to find sigma
    //sigma ~ normal(0, 0.02); // half normal prior - note: on logit scale;

    // - AR(1) temporal model with a discontinuity
    for(p in 1:(n_parties-1)) { 
        // centred starting point 50% +/- 5% = zero on the logit scale
        centredLogits[p][1] ~ normal(0, 0.15); 
        //centredLogits[p][2:n_days] ~ 
        //  normal(centredLogits[p][1:(n_days-1)], sigma);
        
        // - update for discontinuity
        centredLogits[p][discontinuity] ~ normal(0, 0.15);
        centredLogits[p][2:(discontinuity-1)] ~ 
            normal(centredLogits[p][1:(discontinuity-2)], sigma);
        centredLogits[p][(discontinuity+1):n_days] ~ 
            normal(centredLogits[p][discontinuity:(n_days-1)], sigma);
    }
    
    // -- observed data model [AKA the measurement model]
    
    for(p in 1:n_parties) {
        hvi_on_poll_day[p] = hidden_voting_intention[p][poll_day];
    }
    for(poll in 1:n_polls) {
        // note matrix transpose in the next statement ...
        y[poll] ~ multinomial(to_vector(hvi_on_poll_day'[poll]) + 
            tHouseAdjustment[house[poll]]);
    }
}

generated quantities {
    // aggregated TPP estimates based on past preference flows
    vector [n_days] tpp2010;
    vector [n_days] tpp2013;
    vector [n_days] tpp2016;

    for (d in 1:n_days){
        // note matrix transpose in next three lines
        tpp2010[d] = sum(hidden_voting_intention'[d] .* 
            preference_flows_2010);
        tpp2013[d] = sum(hidden_voting_intention'[d] .* 
            preference_flows_2013);
        tpp2016[d] = sum(hidden_voting_intention'[d] .* 
            preference_flows_2016);
    }
}

No comments:

Post a Comment