Monday, November 12, 2012

How are election swings distributed?

Taking the seat-by-seat swings for each Federal election since 1993, I was interested to see how they were distributed around the national swing in two party preferred voting outcomes compared with the previous election. I was conscious that a lot of financial data is not normally distributed (fat tails are common in financial distributions). I was wanting to check my use of a normal probability function in the simulation of election results.

First, let's look at the distribution of swings for each of the federal elections. The y-axis in the first graph (and the x-axis in the subsequent graphs) have the metric of percentage points.

Of note in the first plot, it is not unusual in an election to have one or two seats that buck the national swing by around plus or minus ten percentage points. In 1996, the outliers were swings of +17 and -14 percentage points compared to the national average.

These distributions can be compared with the normal curve (dashed in the next plot).

Combining the seven elections we can see in the next plot that the overall distribution of seat swings is normally distributed around the national swing. The normal curve in the next plot is super imposed with a red dashed line over the probability density function for seat-by-seat swings compared with the national swing for seven elections. Both curves have a standard deviation of 3.27459.

This means I can use a normal distribution in Monte Carlo simulations of the 2013 election result.

The data for this analysis came from the Australian Electoral Commission website.

1. Doesn't the central limit theorem apply to your averaging?

Also, you plot PDF curves for the swings, but they must be discrete data.

To me it looks more like a parameter estimation problem. The PDFs you have plotted seem to be bimodal and hence you might want to see if they can be decomposed well that way with some high overlap of the probability function. It's then a matter of finding out how those parameters are determined, most probably by inputs using other data (polling maybe, not sure).

2. Austin

The CLT applies to the means of repeated samples - these means are normally distributed around the population mean. I was interested in the distribution of the actual population of seat swings after subtracting the national swing. (This differenced data was going to have a mean that was close to zero by definition). There was no guarantee this distribution would be normally distributed.

While the use of language can be inconsistent between statisticians, typically the use of the term "probability mass function" is used with discrete data. The term, "probability density functions" is used with continuous data. The difference between the swing in a seat and the swing in the nation is continuous.

None of the PDFs were bi modal, however, I appreciate the overlap graph might not make that clear. Some of the individual election seat swing PDFs were a little left skewed and others a little right skewed. Some were leptokurtic others were platykurtic. Combined, they look normally distributed.

3. Are you saying that the deviation from a normal distribution for the actual distribution of swings for each election can be explained by sampling errors? Given that the "sample" for each election is the entire population, something seems strange here to me.

Perhaps there is some implicit averaging so that the central limit theorem does imply that the distribution of swings will approach normal. Could one make the argument that the percentage is an average?

It also seems strange to me that the width of the distribution of swings would be the same from election to election. I think your plots show that there is some variation in the width. Though it does seem that the variation in the width is rather small, which I also find odd.

4. Austin, classical statistics does not provide a compelling narrative when looking at populations rather than samples. In this case I looked at the population of all seat swings minus national swings in seven elections. I was not sampling anything. As I understand it, the CLT is a theory about samples not a theory about populations. There is no theory that says that every attribute in every population must be normally distributed. For me it was useful to confirm that seat swings are normally distributed around the national swing.

Measuring percentage points can be problematic if the range is large. In my case the vast majority were in single digits, so it should not be too difficult.

5. One could argue that each seat is a sample of sorts. Of the national vote that is. I am guessing that the swings are correlated and hence wouldn't be indicative of a random sample of the nation. I guess I'm interested in why the swings have the shape and width that it does.

6. 