Mark the Ballot: July 2015

Thursday, July 30, 2015

Essential

Following the 2013 election, polls from Essential jumped to the right politically (jumped upwards in the chart below). I wondered whether this was a correction for their position prior to the 2013 election. Most polling houses had over-estimated the Greens and under-estimated the Coalition in the Primaries. Anyway, the last six months has seen Essential move back in line with my TPP trend estimate.

Wednesday, July 29, 2015

Is flatline the new normal?

This fortnight's Morgan poll reversed the jump to the right we saw in the previous Morgan poll. The result was 46.5 to 53.5 in Labor's favour, with preferences distributed by how electors voted at the 2013 election. This was a 2.5 percentage point movement in Labor's favour over the previous poll.

Without smoothing, it is pretty much a flat line. This week's aggregate is 47.6 per cent. Last week it was 47.7 per cent.

I am not seeing much movement since May this year. If we smooth the aggregation with a Henderson moving average, a flat-line since May is clearly evident.

The best hope for the Coalition comes from my anchored primary vote model. This model has the Coalition on 49.2 per cent (which might translate into a small Coalition win). But even this model has flat-lined.

This model assumes that the polling houses have done nothing to correct the Green over-estimate and the Coalition under-estimate from the last election. I would be surprised if the polling houses had not considered and corrected their collective under-performance on the 2013 primary vote estimates.

Tuesday, July 21, 2015

Aggregated poll update

There were two published polls since the last aggregation:

Today's Australian has the second new Newspoll at 47-53 in Labor's favour. This is a one point move in Labor's direction over the previous fortnight.
Last week's Morgan poll at 49-51 in Labor's favour. This was a 2.5 points move in the Coalition's direction over the previous fortnight.

In the following aggregations, because Morgan exhibits a higher variance than its reported sample sizes would suggest, I have arbitrarily down-weighted the sample size in the aggregation to 1000 respondents.

The unanchored aggregation is 47.7 points in Labor's favour. In trajectory terms: the government recovered ground quickly after its late-January/early-February troubles, and enjoyed a rare small post-Budget bump in May. Since then things have not changed much in terms of its share of the two-party preferred voting share. If an election were held now, the government would lose.

At this point, it is customary to remind my readers the unanchored model assumes that collectively the polling houses are unbiased (as a consequence, their collective house effects sum to zero), and the average poll result is a good indicator of the population voting intention. In practice, this is an oversimplification.

You will note that the error bars are much wider at the end of this fortnight's chart (above). The model has not yet got enough data from the new Newspoll (Npoll2) to locate its systemic house effect. Until we have a good number of new Newspoll results, the new Newspoll will be less influential in locating the aggregated poll result. We can see the wide range of house effects being applied to the new Newspoll in the next chart (below).

Looking at the primary votes, we see some unusual results. Compared with recent trends, the latest Newspoll was kind to Labor and unkind to the Greens. The rise of the Green vote over recent months appears to have come to an end.

Note: in all of the primary vote aggregations, I model the data on a week to week basis, rather than a day-to-day basis as I do with the two-party preferred vote shares. I plot the aggregation for the first day of the modeled week. As a result, there are data points after what looks like the end of the aggregation plot. Be assured, these data points were included in the analysis.

Finally, let's look at the aggregated attitudinal polling where, in spite of today's headlines, not much has changed. Please note, this is also modeled on a week-to-week basis.

Saturday, July 18, 2015

Sealing wax, sample sizes and things

The latest Morgan poll for the period July 4/5 and 11/12, 2015 had the Coalition on 49 per cent and Labor on 51 per cent. It was based on an Australia-wide cross-section of 3110 Australian electors. As past data suggested a systemic house bias from Morgan towards Labor of around one percentage point, most aggregators would have treated this data point as something like a 50-50 result.

This Morgan poll has thrown some poll aggregators. Some have moved a percentage point or more in the Coalition's favour in a week (in what I thought was a fairly average week in politics, in as far as it was likely to influence voting behaviour). The only other poll out this week was Essential, which was unchanged on last week's rolling fortnight result (48-52 in the Coalition's favour).

On 6 July, the Phantom Trend had the Coalition on 47.71. Come 13 July, that was 49.36 (an improvement for the Coalition of 1.65 percentage points).

On 9 July, the Poll Bludger had the Coalition on 47.8. Come 16 July, the Poll Bludger had the Coalition on 49.0 per cent (an improvement of 1.2 percentage points in a week).

More subdued is Kevin Bonham, who has moved from 47.5 percent to 48.1 (a movement of 0.6 percentage points).

While I typically only update my aggregation on a fortnightly basis this far out from an election, my own movement over the week was from 47.4 to 48.0 (also a movement of 0.6 percentage points). But note, I apply an adjustment to the Morgan poll sample size in my aggregation. I usually treat Morgan polls as if they came from a sample of 1000 voters. If I turn this feature off, my aggregate poll result would have been 48.7 (a movement of 1.3 percentage points).

Anyway, all of this got me wondering about how I treat poll variance in my models.Variance is a measure of how much a statistic is likely to bounce around.

I decided to check whether the poll bounciness from each polling house was consistent with the typically reported sample sizes for the polling house. If I treat the smoothed aggregation as the population voting intention at any point in time, I can compare the performance of systemic-bias-adjusted poll results against this voting intention. I can then derive a notional sample size, based on the bounciness of the poll results, for each of the polling houses.

Algebraically, there are two relevant formulas for statistical variance (you will need javascript activated to see these formulas in your web browser using MathJax):
\[Variance=\frac{1}{n}\sum(x-\mu)^2\]
\[Variance=\frac{pq}{N}\]

Where \(n\) is the number of polls for the polling house, \(x\) is the systemic-bias-adjusted poll results, \(\mu\) is the population voting intention at the time (proxied by a 91-term Henderson-smoothed poll aggregation), \(p\) is the two-party preferred Coalition proportion (averaged for the period from the pollster's systemic-bias-adjusted results), \(q\) is the two-party preferred Labor proportion (where \(q=1-p\)), and \(N\) is the sample size. We can solve for \(N\) by solving the first equation and then applying that result to the second equation after rearranging terms.
\[N=\frac{pq}{Variance}\]

The results, using the polling data since the last election, follow. Arguably, the number of polls for all but Morgan and Newspoll, are too few in number to sensibly analyze. The analysis suggests that the Morgan poll is a touch more bouncy (it has a higher variance), than a typical Morgan sample of around 2900 respondents would suggest. It looks like Nielsen had an unlucky run of outlier polls (not surprising given we are only talking about a small number of polls). This run affected Nielsen's result, but with only 7 polls, the analysis should be treated with some caution.

House	# polls (n)	Implied Sample Size (N)	Typically Reported Sample Size
Galaxy	9	1275.94	1400
Ipsos	8	1283.48	1400
Morgan	47	2416.09	2900
Newspoll	34	1235.65	1150
Newspoll2	1	-	1600
Nielsen	7	668.01	1400
ReachTEL	21	3642.97	3400

By-the-way, so that Morgan was not disadvantaged in this exercise, I turned off the 1000 sample size adjustment I usually apply to Morgan. The smoothed poll aggregate was derived using the actual Morgan sample sizes as reported.

A chart of the 91-day smoothed aggregate follows. I used a smoothed aggregate so that individually anomalous poll results would not overly sway the calculation of population voting intention on a day.

The other question that interested me was whether these poll results were normally distributed, or whether they had "fat" tails. To assess this, I performed a kernel density estimation. For this I used the Python statsmodels package.

In the following charts, using the derived variance above for the normal curve, and limiting our analysis to the two polling houses where we have enough data, it would appear that our poll results are close to normally distributed. But I cannot rule out the possibility of fatter tails. In plain English, it would appear that pollsters are perhaps a touch more-likely to throw an outlier-poll compared with what you would expect from statistical theory.

Kernel density estimates are designed so that the area under the curve sums to one. This allows for the probability of a particular outcome to be seen. In the above charts, the x-axis is in percentage points; for example, 2 percentage points is 0.02 (just in case you were wondering about the relative scales of the x and y axes).

I can combine the poll results from all houses as standardised z-scores, using the variance calculated above. In this combined chart of all poll results, the "fatter tails" are more evident. The z-scores are the differences divided by the standard deviation. The standard deviation (\(\sigma\)) is the square root of the variance (\(\sigma^2\)).
\[Standard Deviaition = \sqrt{Variance}; \quad \sigma = \sqrt{\sigma^2}\]

Because we are combining a number of small sample sizes, there is a reasonable likelihood this analysis is problematic.

The fatter the tail, the more likely that a polling aggregate will be sent off course from time to time. Perhaps this week was one of those times. As is so often the case, we will need to wait and see.

Update

I have been asked for quantile-quantile plots (affectionately known as QQ plots) to further explore the extent to which we have a normal distribution in the distribution of bias-adjusted poll results.

Sunday, July 12, 2015

Should I junk JAGS? Is Stan the man?

There are a number of analytical tools that enable statisticians to solve Bayesian hierarchical models.

For some time I have been using JAGS, which uses Gibbs sampling in its MCMC algorithm.

Because it is so bloody cold outside, I thought I would give Stan a try. Stan uses Hamiltonian Monte Carlo sampling in its MCMC algorithm. I am using Stan version 2.6.3.0 with the interface to Stan from Python's pystan.

Proponents of Stan claim that it has replaced JAGS as the state-of-the-art, black-box MCMC method. Their criticism of JAGS is that Gibbs sampling fails to converge with high posterior correlation. They argue that CPU time and and memory usage under Stan scales much better with model complexity. As models become larger, JAGS chokes.

The first challenge I faced was getting pystan to work. In the end I deleted my entire python set-up, and reinstalled Anaconda 2.3.0 for OSX from Continuum Analytics. From there it was quick step at the shell prompt:

conda install pystan

Using Stan required me to refactor the model a little. There was the obvious: Stan uses semi-colons to end statements. Some of the functions have been renamed (for example, JAGS' dnorm() becomes Stan's normal()). Some of the functions take different arguments (dnorm() is passed a precession value, while normal() is passed the standard deviation. Comments in Stan begin with a double-slash, whereas in JAGS they begin with a hash (although Stan will accept hashed comments as well). Stan required me to divide the model into a number of different code blocks.

A bigger challenge was how to code the sum-to-zero constraint I place on house effects. In JAGS, I encode it as follows:

    for (i in 2:n_houses) {
        houseEffect[i] ~ dnorm(0, pow(0.1, -2))
    }
    houseEffect[1] <- -sum( houseEffect[2:n_houses] )

In Stan, I needed a different approach using transformed parameters. But I did not need the for-loop, as Stan allows vector-arithmetic in its statements.

    pHouseEffects ~ normal(0, 0.1); // weakly informative parameter
    houseEffect <- pHouseEffects - mean(pHouseEffects); // sum to zero transformed

I also ran into all sorts of troubles with uniform distributions generating run-time errors. I am not sure whether this was a result of my poor model specification, something else, or something I should just ignore. In the end, I replaced the problematic uniform distributions with weakly informative (fairly dispersed) normal distributions. Anyway, after a few hours I had a Stan model for aggregating two-party-preferred poll results that was working.

The results for the hidden voting intention generated with Stan are quite similar to those with JAGS. In the next two charts, we have the Stan model first and then the JAGS model. The JAGS model was changed to mirror Stan as much as possible (i.e. I replaced the same uniform distributions in JAGS that I had replaced in Stan).

We also have similar results with the relative house effects. Again, the Stan result precedes JAGS in the following charts. But those of you with a close eye will note that these results differ slightly from the earlier results that came from the JAGS model which used a uniform distribution as a prior for house effects. There is a useful lesson here on the importance of model specification. Also it is far too easy to read too much into results that differ by a few tenths of a percentage point. The reversal of Newspoll and ReachTEL in these charts comes down to hundredths of a percentage point (which is bugger all).

So what does the Stan model look like?

data {
    // data size
    int<lower=1> n_polls;
    int<lower=1> n_span;
    int<lower=1> n_houses;

    // poll data
    real<lower=0,upper=1> y[n_polls];
    real<lower=0> sampleSigma[n_polls];
    int<lower=1> house[n_polls];
    int<lower=1> day[n_polls];
}
parameters {
    real<lower=0,upper=1> hidden_voting_intention[n_span];
    vector[n_houses] pHouseEffects;
    real<lower=0,upper=0.01> sigma;
}
transformed parameters {
    vector[n_houses] houseEffect;
    houseEffect <- pHouseEffects - mean(pHouseEffects); // sum to zero
}
model{
    // -- house effects model
    pHouseEffects ~ normal(0, 0.1); // weakly informative

    // -- temporal model
    sigma ~ uniform(0, 0.01);
    hidden_voting_intention[1] ~ normal(0.5, 0.1);
    for(i in 2:n_span) {
        hidden_voting_intention[i] ~ normal(hidden_voting_intention[i-1], sigma);
    }

    // -- observational model
    for(poll in 1:n_polls) {
        y[poll] ~ normal(houseEffect[house[poll]] + hidden_voting_intention[day[poll]], sampleSigma[poll]);
    }
}

The revised JAGS code is as follows. The code I have changed for this post is commented out.

model {
    ## developed from Simon Jackman's hidden Markov model
    ## - note: poll results are analysed as a value between 0.0 and 1.0

    ## -- observational model
    for(poll in 1:n_polls) { # for each observed poll result ...
        yhat[poll] <- houseEffect[house[poll]] + hidden_voting_intention[day[poll]]
        y[poll] ~ dnorm(yhat[poll], samplePrecision[poll]) # distribution
    }

    ## -- temporal model
    for(i in 2:n_span) { # for each day under analysis, except the first ...
        # today's national TPP voting intention looks much like yesterday's
        hidden_voting_intention[i] ~ dnorm(hidden_voting_intention[i-1], walkPrecision)
    }
    # day 1 estimate of TPP between 20% and 80% - a weakly informative prior
    #hidden_voting_intention[1] ~ dunif(0.2, 0.8)
    hidden_voting_intention[1] ~ dnorm(0.5, pow(0.1, -2))
    # day-to-day change in TPP has a standard deviation between 0 and 1
    # percentage points
    sigmaWalk ~ dunif(0, 0.01)
    walkPrecision <- pow(sigmaWalk, -2)

    ## -- house effects model
    #for(i in 2:n_houses) { # for each polling house, except the first ...
    #    # assume house effect is somewhere in the range -15 to +15 percentage points.
    #    houseEffect[i] ~ dunif(-0.15, 0.15)
    #}
    for (i in 2:n_houses) {
        houseEffect[i] ~ dnorm(0, pow(0.1, -2))
    }
    # sum to zero constraint applied to the first polling house ...
    houseEffect[1] <- -sum( houseEffect[2:n_houses] )
}

As for the question: will I now junk JAGS and move to Stan? I will need to think about that a whole lot more before I make any changes. I love the progress feedback that Stan gives as it processes the samples. I think Stan might be marginally faster. But model specification in Stan is far more fiddly.

Update

I have now timed them. Stan is slower.

Monday, July 6, 2015

Poll update

Today we are blessed with two new polls:

Newspoll in the Australian (in its new incarnation combining automated phone and internet polling) had the Coalition on 48 per cent. [I have marked this series as Newspoll2 in the charts].
Ipsos in the Fairfax media had the Coalition on 47 per cent; which is unchanged from the previous Ipsos poll.

The poll aggregate continues its flatline to slight decline for the Coalition.

At this point, it is worth noting that I assume that collectively the polling houses are not biased. I treat the systemic bias across all polling houses as summing to zero.

Newspoll 2.0 is a completely different beast to its predecessor, with a different methodology and likely a different systemic bias. Consequently, I have added a new house to the polling houses. It will take a few polling cycles for this to settle down. Only after a few cycles will we have a good reading on the relative bias of the new Newspoll. In the interim, the aggregate will move a little as a consequence of adding a new polling house.

Looking at the primary votes: The Greens continue to grow. Labor and the Coalition are in decline. [Please note, with the following charts, the aggregation is done on a weekly basis, and graphed for the first day in the week. This will be a little out of step with the poll results, which are plotted according to the middle day of their collection period.]