Friday, December 6, 2019

Aggregated attitudinal polling

At this point in the election cycle, only Newspoll is publishing primary vote share and two-party preferred population estimates. So there is nothing to aggregate across polling houses when it comes to voting intention. However, both Essential and Newspoll are publishing attitudinal polling. So I decided to build a Dirichlet-multinomial process model to see what trends there are in the attitudinal polling since the 2019 election.

First, however, we will look at the output from the model, before looking at the model itself.

Let's begin with the preferred prime minister polling. We see a small dip in the proportion of the population preferring the Prime Minister over the period (from 45.4 to 44.9 per cent). The Opposition Leader has improved a little over the period (from 26 to 29 per cent), but he is much less preferred than the Prime Minister. The "undecideds" have declined a little (from 29 to 26 per cent).

The median lines from the above charts can be combined on a chart as follows.

The model allows us to compare house effects in preferred Prime Minister polling. Those polled by Essential are more likely to express a preference on their preferred prime minister compared with the other two houses.

The next set of charts are about satisfaction with the Prime Minister's performance. Satisfaction with the Prime Minister has declined from 48 to 45 per cent. Dissatisfaction has increased from 37 to 44 per cent.

Satisfaction with the Opposition Leader has improved from 37 to 38 per cent. Dissatisfaction has increased from 30 to 36 per cent. Undecideds have decreased from 32 to 25 per cent.

In summary, both leaders have seen a decline in net satisfaction. On this metric, the Prime Minister has fallen further than the Opposition Leader. The Opposition Leader ends the year with a higher net satisfaction rating compared with the Prime Minister.

The model that produced the above charts is as follows.

// STAN: Simplex Time Series Model 
//  using a Dirichlet-multinomial process

data {
    // data size
    int<lower=1> n_polls;
    int<lower=1> n_days;
    int<lower=1> n_houses;
    int<lower=1> n_categories;
    // key variables
    int<lower=1> pseudoSampleSize; // maximum sample size for y
    real<lower=1> transmissionStrength;
    // give a rough idea of a staring point ...
    simplex[n_categories] startingPoint; // rough guess at series starting point
    int<lower=1> startingPointCertainty; // strength of guess - small number is vague
    // poll data
    int<lower=0,upper=pseudoSampleSize> y[n_polls, n_categories]; // a multinomial
    int<lower=1,upper=n_houses> house[n_polls]; // polling house
    int<lower=1,upper=n_days> poll_day[n_polls]; // day polling occured

parameters {
    simplex[n_categories] hidden_voting_intention[n_days];
    matrix[n_houses, n_categories] houseAdjustment;

transformed parameters {
    matrix[n_houses, n_categories] aHouseAdjustment;
    matrix[n_houses, n_categories] tHouseAdjustment;
     for(p in 1:n_categories) // included parties sum to zero 
        aHouseAdjustment[,p] = houseAdjustment[,p] - mean(houseAdjustment[,p]);
     for(h in 1:n_houses) // included parties sum to zero 
        tHouseAdjustment[h,] = aHouseAdjustment[h,] - mean(aHouseAdjustment[h,]);

    // -- house effects model
    for(h in 1:n_houses)
        houseAdjustment[h] ~ normal(0, 0.05); 
    // -- temporal model
    hidden_voting_intention[1] ~ dirichlet(startingPoint * startingPointCertainty);
    for (day in 2:n_days)
        hidden_voting_intention[day] ~ 
            dirichlet(hidden_voting_intention[day-1] * transmissionStrength);
    // -- observed data model
    for(poll in 1:n_polls)
        y[poll] ~ multinomial(hidden_voting_intention[poll_day[poll]] + 

The model assumes that house effects sum to zero (both across polling houses and across the simplex categories). I set the startingPointCertainty variable to 10. The prior on the startingPoints is 0.333 for each series. The day-to-day transmissionStrength is set to 50,000 (attitudes yesterday are much the same as today). The pseudoSampleSize is set to 1000.

As usual, the data for this analysis has been sourced from Wikipedia.