With Essential coming in at 47-53 in Labor's favour, it is time to update the aggregation. Both Essential and Newspoll are pegging a two percentage point recovery for the Coalition over the past four weeks. I have tweaked the TPP model to allow for a more rapid changes in public opinion over the period immediately following the recent disruption event.
Turning to the primary vote model ...
Pages
▼
Wednesday, September 26, 2018
Monday, September 24, 2018
Aggregated poll update
With the publication of the latest Newspoll (46 to 54 in Labor's favour), it is time to update my poll aggregation. I currently have the Coalition with a 45.8 per cent share of the two-party preferred (TPP) vote, if an election was held at the moment.
The moving averages ... which take time to settle after a disruption event.
The primary votes ...
The data for this analysis was sourced from Wikipedia.
The moving averages ... which take time to settle after a disruption event.
The primary votes ...
The data for this analysis was sourced from Wikipedia.
Saturday, September 22, 2018
Redistributions
One of the myths of data science is the myth of scientific accuracy. The myth is that when a data scientist applies the correct mathematical procedures, they get a mathematical correct answer to seven decimal places. In reality, data scientists make many choices in the analysis they bring, and the data they work with. Gelman and Loken (2013) enlivened this concept to the statistics community with their paper on the garden of forking paths.
They way in which we curate data, munge it and prepare it for analysis, and the analytical frames we we use are all choices. Unconsciously those choices can frame our results. And they can blind us to the results we might have produced if we had taken a different approach. In short, they can mislead us, even when we set out to undertake ethical and unbiased analysis.
I found myself reflecting on the garden of forking paths as I considered how to best apply the 2016 polling outcome data to the electoral boundary changes that occurred in 2017 and 2018. I wanted to find the best way to allocate to the new seat boundaries the ordinary votes that were geographically linked (via longitude and latitude) to polling locations in 2016 . I also had to allocate the non-tagged votes (some ordinary votes, but also absentee votes, provisional votes, postal votes and declaration pre-poll votes).
My initial plan was to allocate the 2016 two-party preferred (TPP) votes to new electorates based on the geo-coded place where the vote was cast. I would then allocate the remaining votes for each electorate in proportion with the geo-coded votes by party for that electorate. Once all the votes from 2016 had been allocated to the new electorate boundaries, I could calculate a TPP margin for each seat.
However, this turned out to be problematic. For example, a number of votes in the Victorian seat of Goldstein (which was unchanged by the Victorian redistribution) were coded as being cast outside of the electorate. But as the seat was unaffected by the redistribution, those votes should remain attributed to Goldstein, and should not be counted in another electorate.
The interesting analytical question became which data was most reflective of where people live (at a reasonably fine level of granularity), and which data is only reflective of the voter's electorate without any granularity about where in the electorate they might live. For example, if someone from the outer-suburbs of Melbourne votes in the city on polling day, can I draw any useful information about where they live in their electorate. My conclusion was that I could not.
As a consequence, to the extent I could apply this rule, where people voted at a polling booth outside of their electorate, and we have Geo-coding for that polling place, I chose to ignore that Geo-coding. I also decided to ignore the geo-coding for pre-poll votes. Because there are so few pre-poll voting centres, I assumed the votes cast at these centres were not necessarily informative about where in an electorate the pre-poll voters come from. Conversely, I assumed the location data from the many polling booths to be informative, if the person was casting an ordinary vote on the polling day in their electorate.
I was picky, because I wanted to be comfortable that as boundaries are redrawn, I was meaningfully attributing votes from the parts of an electorate that might be redistributed differently.
For the votes that I could only attribute to an electorate, I allocated them in the same proportion as the ordinary TPP votes by party were allocated. If 70 per cent of the Labor vote stayed in an electorate and 30 per cent went to another electorate, that is how I treated the Labor vote from the electorate where I did not have geographic information or where I deemed the geographic information I had to be insufficiently informative. I treated the flow of Coalition and Labor votes separately.
My initial results are close to, but also different from the work of others. My walk in the garden of forking paths was different to theirs. The usual caveats for preliminary analysis apply: Be warned, this work could include errors that I have not identified. While I have calculated results to two decimal places, this suggests a level of accuracy that belies the many choices I have made in munging and analysing the data. Also note: I could not find map data for the 2016 electorate boundaries in the shapefile format for NT and Tasmania, so my treatment of these states is less certain than the others.
And here are my Coalition 2016 TPP estimates for each seat compared with those from Antony Green and William Bowe.
They way in which we curate data, munge it and prepare it for analysis, and the analytical frames we we use are all choices. Unconsciously those choices can frame our results. And they can blind us to the results we might have produced if we had taken a different approach. In short, they can mislead us, even when we set out to undertake ethical and unbiased analysis.
I found myself reflecting on the garden of forking paths as I considered how to best apply the 2016 polling outcome data to the electoral boundary changes that occurred in 2017 and 2018. I wanted to find the best way to allocate to the new seat boundaries the ordinary votes that were geographically linked (via longitude and latitude) to polling locations in 2016 . I also had to allocate the non-tagged votes (some ordinary votes, but also absentee votes, provisional votes, postal votes and declaration pre-poll votes).
My initial plan was to allocate the 2016 two-party preferred (TPP) votes to new electorates based on the geo-coded place where the vote was cast. I would then allocate the remaining votes for each electorate in proportion with the geo-coded votes by party for that electorate. Once all the votes from 2016 had been allocated to the new electorate boundaries, I could calculate a TPP margin for each seat.
However, this turned out to be problematic. For example, a number of votes in the Victorian seat of Goldstein (which was unchanged by the Victorian redistribution) were coded as being cast outside of the electorate. But as the seat was unaffected by the redistribution, those votes should remain attributed to Goldstein, and should not be counted in another electorate.
The interesting analytical question became which data was most reflective of where people live (at a reasonably fine level of granularity), and which data is only reflective of the voter's electorate without any granularity about where in the electorate they might live. For example, if someone from the outer-suburbs of Melbourne votes in the city on polling day, can I draw any useful information about where they live in their electorate. My conclusion was that I could not.
As a consequence, to the extent I could apply this rule, where people voted at a polling booth outside of their electorate, and we have Geo-coding for that polling place, I chose to ignore that Geo-coding. I also decided to ignore the geo-coding for pre-poll votes. Because there are so few pre-poll voting centres, I assumed the votes cast at these centres were not necessarily informative about where in an electorate the pre-poll voters come from. Conversely, I assumed the location data from the many polling booths to be informative, if the person was casting an ordinary vote on the polling day in their electorate.
I was picky, because I wanted to be comfortable that as boundaries are redrawn, I was meaningfully attributing votes from the parts of an electorate that might be redistributed differently.
For the votes that I could only attribute to an electorate, I allocated them in the same proportion as the ordinary TPP votes by party were allocated. If 70 per cent of the Labor vote stayed in an electorate and 30 per cent went to another electorate, that is how I treated the Labor vote from the electorate where I did not have geographic information or where I deemed the geographic information I had to be insufficiently informative. I treated the flow of Coalition and Labor votes separately.
My initial results are close to, but also different from the work of others. My walk in the garden of forking paths was different to theirs. The usual caveats for preliminary analysis apply: Be warned, this work could include errors that I have not identified. While I have calculated results to two decimal places, this suggests a level of accuracy that belies the many choices I have made in munging and analysing the data. Also note: I could not find map data for the 2016 electorate boundaries in the shapefile format for NT and Tasmania, so my treatment of these states is less certain than the others.
And here are my Coalition 2016 TPP estimates for each seat compared with those from Antony Green and William Bowe.
Seat | Mark Graph | Antony Green | William Bowe |
Bean (ACT) | 40.9 | 41.1 | 41.1 |
Canberra (ACT) | 36.54 | 36.8 | 36.8 |
Fenner (ACT) | 38.97 | 38.4 | 38.2 |
Banks (NSW) | 51.44 | 51.4 | 51.4 |
Barton (NSW) | 41.7 | 41.7 | 41.7 |
Bennelong (NSW) | 59.72 | 59.7 | 59.7 |
Berowra (NSW) | 66.45 | 66.4 | 66.5 |
Blaxland (NSW) | 30.52 | 30.5 | 30.5 |
Bradfield (NSW) | 71.04 | 71 | 71 |
Calare (NSW) | 61.81 | 61.8 | 61.8 |
Chifley (NSW) | 30.81 | 30.8 | 30.8 |
Cook (NSW) | 65.39 | 65.4 | 65.4 |
Cowper (NSW) | 62.58 | 62.6 | 62.6 |
Cunningham (NSW) | 36.68 | 36.7 | 36.7 |
Dobell (NSW) | 45.19 | 45.2 | 45.2 |
Eden-Monaro (NSW) | 47.07 | 47.1 | 47.1 |
Farrer (NSW) | 70.53 | 70.5 | 70.5 |
Fowler (NSW) | 32.51 | 32.5 | 32.5 |
Gilmore (NSW) | 50.73 | 50.7 | 50.7 |
Grayndler (NSW) | 27.64 | 27.6 | 27.6 |
Greenway (NSW) | 43.69 | 43.7 | 43.7 |
Hughes (NSW) | 59.33 | 59.3 | 59.3 |
Hume (NSW) | 60.18 | 60.2 | 60.2 |
Hunter (NSW) | 37.54 | 37.5 | 37.5 |
Kingsford Smith (NSW) | 41.43 | 41.4 | 41.4 |
Lindsay (NSW) | 48.89 | 48.9 | 48.9 |
Lyne (NSW) | 61.63 | 61.6 | 61.6 |
Macarthur (NSW) | 41.67 | 41.7 | 41.7 |
Mackellar (NSW) | 65.74 | 65.7 | 65.7 |
Macquarie (NSW) | 47.81 | 47.8 | 47.8 |
McMahon (NSW) | 37.89 | 37.9 | 37.9 |
Mitchell (NSW) | 67.82 | 67.8 | 67.8 |
New England (NSW) | 66.42 | 66.4 | 66.4 |
Newcastle (NSW) | 36.16 | 36.2 | 36.2 |
North Sydney (NSW) | 63.61 | 63.6 | 63.6 |
Page (NSW) | 52.3 | 52.3 | 52.3 |
Parkes (NSW) | 65.1 | 65.1 | 65.1 |
Parramatta (NSW) | 42.33 | 42.3 | 42.3 |
Paterson (NSW) | 39.26 | 39.3 | 39.3 |
Reid (NSW) | 54.69 | 54.7 | 54.7 |
Richmond (NSW) | 46.04 | 46 | 46 |
Riverina (NSW) | 66.44 | 66.4 | 66.4 |
Robertson (NSW) | 51.14 | 51.1 | 51.1 |
Shortland (NSW) | 40.06 | 40.1 | 40.1 |
Sydney (NSW) | 34.69 | 34.7 | 34.7 |
Warringah (NSW) | 61.09 | 61.1 | 61.1 |
Watson (NSW) | 32.42 | 32.4 | 32.4 |
Wentworth (NSW) | 67.75 | 67.7 | 67.8 |
Werriwa (NSW) | 41.8 | 41.8 | 41.8 |
Whitlam (NSW) | 36.28 | 36.3 | 36.3 |
Lingiari (NT) | 41.58 | 41.9 | 41.8 |
Solomon (NT) | 44 | 43.9 | 43.9 |
Blair (Qld) | 41.81 | 42 | 41.8 |
Bonner (Qld) | 53.39 | 53.4 | 53.4 |
Bowman (Qld) | 57.07 | 57.1 | 57.1 |
Brisbane (Qld) | 55.85 | 56 | 56.1 |
Capricornia (Qld) | 50.63 | 50.6 | 50.6 |
Dawson (Qld) | 53.37 | 53.3 | 53.4 |
Dickson (Qld) | 51.6 | 52 | 52 |
Fadden (Qld) | 61.05 | 61.2 | 61.3 |
Fairfax (Qld) | 60.78 | 61 | 60.8 |
Fisher (Qld) | 59.24 | 59.2 | 59.3 |
Flynn (Qld) | 51.04 | 51 | 51 |
Forde (Qld) | 50.63 | 50.6 | 50.6 |
Griffith (Qld) | 48.93 | 48.6 | 48.7 |
Groom (Qld) | 65.31 | 65.3 | 65.3 |
Herbert (Qld) | 49.98 | 49.98 | 50 |
Hinkler (Qld) | 58.42 | 58.4 | 58.4 |
Kennedy (Qld) | |||
Leichhardt (Qld) | 53.89 | 54 | 53.9 |
Lilley (Qld) | 44.68 | 44.2 | 44.2 |
Longman (Qld) | 49.21 | 49.2 | 49.2 |
Maranoa (Qld) | 67.54 | 67.5 | 67.5 |
McPherson (Qld) | 61.64 | 61.6 | 61.6 |
Moncrieff (Qld) | 64.94 | 64.5 | 64.7 |
Moreton (Qld) | 45.58 | 46 | 45.9 |
Oxley (Qld) | 40.92 | 40.9 | 40.9 |
Petrie (Qld) | 51.65 | 51.6 | 51.7 |
Rankin (Qld) | 38.7 | 38.7 | 38.7 |
Ryan (Qld) | 59.21 | 58.8 | 59.1 |
Wide Bay (Qld) | 58.14 | 58.3 | 58.2 |
Wright (Qld) | 59.62 | 59.6 | 59.6 |
Adelaide (SA) | 40.5 | 41 | 41.1 |
Barker (SA) | 64.13 | 64.3 | 64.1 |
Boothby (SA) | 52.74 | 52.8 | 52.9 |
Grey (SA) | 58.37 | 58.5 | 58.1 |
Hindmarsh (SA) | 42.64 | 41.8 | 41.8 |
Kingston (SA) | 36.21 | 36.5 | 36.4 |
Makin (SA) | 39.04 | 39.1 | 39.2 |
Mayo (SA) | |||
Spence (SA) | 31.79 | 32.1 | 32.2 |
Sturt (SA) | 55.78 | 55.8 | 55.7 |
Bass (Tas) | 44.57 | 44.7 | 44.5 |
Braddon (Tas) | 48.43 | 48.5 | 48.4 |
Clark (Tas) | |||
Franklin (Tas) | 39.3 | 39.3 | 39.3 |
Lyons (Tas) | 46.12 | 46 | 46.9 |
Aston (Vic) | 57.73 | 57.6 | 57.4 |
Ballarat (Vic) | 42.55 | 42.6 | 42.6 |
Bendigo (Vic) | 45.98 | 46.1 | 46.1 |
Bruce (Vic) | 33.5 | 34.3 | 34.5 |
Calwell (Vic) | 27.88 | 29.9 | 29.7 |
Casey (Vic) | 54.03 | 54.5 | 54.3 |
Chisholm (Vic) | 53.54 | 53.4 | 53.4 |
Cooper (Vic) | 28.25 | 28 | 27.9 |
Corangamite (Vic) | 50.61 | 50.03 | 50 |
Corio (Vic) | 40.84 | 41.7 | 41.5 |
Deakin (Vic) | 56.11 | 56.3 | 56.6 |
Dunkley (Vic) | 48.79 | 48.7 | 48.7 |
Flinders (Vic) | 56.64 | 57.2 | 57.1 |
Fraser (Vic) | 29.85 | 29.4 | 29.5 |
Gellibrand (Vic) | 35.68 | 35.3 | 35.3 |
Gippsland (Vic) | 68.13 | 68.2 | 68.1 |
Goldstein (Vic) | 62.68 | 62.7 | 62.7 |
Gorton (Vic) | 32.59 | 31.7 | 31.7 |
Higgins (Vic) | 60.64 | 60.2 | 60.3 |
Holt (Vic) | 40.16 | 40.1 | 40.2 |
Hotham (Vic) | 44.89 | 45.8 | 45.8 |
Indi (Vic) | |||
Isaacs (Vic) | 48.69 | 47.7 | 47.8 |
Jagajaga (Vic) | 46.03 | 45 | 45 |
Kooyong (Vic) | 63.14 | 62.8 | 62.9 |
La Trobe (Vic) | 52.68 | 53.5 | 52.4 |
Lalor (Vic) | 34.6 | 35.6 | 35.6 |
Macnamara (Vic) | 48.41 | 48.7 | 48.6 |
Mallee (Vic) | 69.42 | 69.8 | 69.6 |
Maribyrnong (Vic) | 40.18 | 40.6 | 40.6 |
McEwen (Vic) | 44.03 | 44.7 | 44.6 |
Melbourne (Vic) | |||
Menzies (Vic) | 58.2 | 57.9 | 57.9 |
Monash (Vic) | 58.02 | 57.6 | 57.8 |
Nicholls (Vic) | 72.49 | 72.3 | 72.4 |
Scullin (Vic) | 27.26 | 29.6 | 29.7 |
Wannon (Vic) | 59.59 | 59.3 | 59.3 |
Wills (Vic) | 28.07 | 28.2 | 28.3 |
Brand (WA) | 38.57 | 38.6 | 38.6 |
Burt (WA) | 42.89 | 42.9 | 42.9 |
Canning (WA) | 56.79 | 56.8 | 56.8 |
Cowan (WA) | 49.32 | 49.3 | 49.3 |
Curtin (WA) | 70.7 | 70.7 | 70.7 |
Durack (WA) | 61.06 | 61.1 | 61.1 |
Forrest (WA) | 62.56 | 62.6 | 62.6 |
Fremantle (WA) | 42.48 | 42.5 | 42.5 |
Hasluck (WA) | 52.05 | 52.1 | 52.1 |
Moore (WA) | 61.02 | 61 | 61 |
O'Connor (WA) | 65.04 | 65 | 65 |
Pearce (WA) | 53.63 | 53.6 | 53.6 |
Perth (WA) | 46.67 | 46.7 | 46.7 |
Stirling (WA) | 56.12 | 56.1 | 56.1 |
Swan (WA) | 53.59 | 53.6 | 53.6 |
Tangney (WA) | 61.07 | 61.1 | 61.1 |
Monday, September 17, 2018
Ipsos 47 to 53 in Labor's favour
The Ipsos monthly poll has been released. It estimates Labor would receive 53 per cent of the two-party preferred (TPP) vote if an election was held now. Popping these latest numbers into the aggregation, we get an aggregate estimate of 54.2 to 45.8 per cent in Labor's favour.
Turning to the moving averages, which do not cope well with disruption events, we can see the short-run averages are coming in close to the Bayesian model. The longer-run averages will need more time to come into line.
The Ipsos primary vote numbers were a little unusual, with 35 per cent of the primary vote going outside of the Coalition and Labor parties. Also unusual was Labor's low primary vote share in this poll.
Extrapolating a TPP from the primary vote aggregations yields the following.
Turning to the moving averages, which do not cope well with disruption events, we can see the short-run averages are coming in close to the Bayesian model. The longer-run averages will need more time to come into line.
The Ipsos primary vote numbers were a little unusual, with 35 per cent of the primary vote going outside of the Coalition and Labor parties. Also unusual was Labor's low primary vote share in this poll.
Extrapolating a TPP from the primary vote aggregations yields the following.
Wednesday, September 12, 2018
Monday, September 10, 2018
Fourth Morrison poll - Second Newspoll
Newspoll is out at the start of another parliamentary sitting fortnight. It's the same headline message as the previous Newspoll. Labor is on 56 per cent of the two-party preferred vote, well ahead of the Coalition on 44 per cent. With these numbers, there is not a lot of subtlety: Labor would win with a landslide election result if an election was held at the moment.
I would not read too much into the slight downwards slant of the Morrison period to the right of the chart after the discontinuity. The first day of this period has three polls informing its position, the last day has just one poll. The slant may disappear as more polls come in.
The moving average models are coming around. They will over-shoot the Bayesian model before coming into line. They are not designed for the discontinuity we have seen.
Turning to the primary votes aggregation, we see a similar picture.
I would not read too much into the slight downwards slant of the Morrison period to the right of the chart after the discontinuity. The first day of this period has three polls informing its position, the last day has just one poll. The slant may disappear as more polls come in.
The moving average models are coming around. They will over-shoot the Bayesian model before coming into line. They are not designed for the discontinuity we have seen.
Turning to the primary votes aggregation, we see a similar picture.
Sunday, September 2, 2018
Monte Carlo simulation of elections
Between elections, the Australian Election Commission (AEC) redraws the electoral boundaries to ensure each seat has a similar number of voters. Now that this redistribution process has been completed, I can use the new seats to model election outcomes.
The first thing I needed was the recalculated margins for each seat. For this data I used Wikipedia. Antony Green has also undertaken these calculations. For the seats that had not been redistributed, we have original polling outcome data from the AEC. This base, expressed as margins, looks something like this.
With this base, I have built a Monte Carlo simulation. In a Monte Carlo simulation we sample from probability distributions many thousands of times to identify the range of possible outcomes. These are then analysed to identify the probabilities for different events occurring. The model needs to consider those factors that can see the results vary.
The biggest source of uncertainty I need to manage is polling uncertainty. It is not unusual for an aggregated opinion poll to be plus or minus two percentage points from the final election outcome. In the Monte Carlo model I have assumed that the actual election outcome will be normally distributed around the poll estimate with a standard deviation of one percentage point.
Another source of uncertainty is the way in which the swing in the individual seats is distributed around the national swing and the way in which this swing varies state-by-state. Historically, individual seat swings have been close to normally distributed around the the national swing with a standard deviation of 3 percentage points. They have also been close to normally distributed around state swings with a standard deviation of 2.5 percentage points.
For this model, I have used state swings, based on the most recent state-by-state Newspoll (which pre-dates the Morrison ascendancy), and then adjusted for a change in the aggregate two-party preferred (TPP) since the Morrison ascendancy. I draw random numbers from a Dirichlet distribution to achieve this adjustment. The state swings since the last election the model used can be seen in the following kernel density estimate plot. The chart is of the state swings to the Coalition in percentage points since the 2016 election. The largest swings against the government at the moment appear to be in WA and Queensland.
There are two key factors that I have not modeled. The first is the sophomore effect - a bump that first term members of Parliament get when running for re-election. The second factor is the retirement effect - a decline in the party vote in a seat following the retirement of long standing member for that party. Labor has a large number of first-term parliamentarians, and is likely to benefit from the sophomore effect at the next election, not withstanding it also has a number of retirees.
A further (and perhaps more critical) factor I have not modeled is the outcome in seats currently held by other parties. For this analysis I have simply assumed those seats will continue to be held by other parties.
In the current climate, with an estimated aggregate TPP of 45 per cent for the Coalition. The model predicts a substantial victory for Labor were an election held now. Based on a simulation run of 100,000, the model predicts Labor is most likely to win 95 seats, and the Coalition 51 seats.
While this is the most likely outcome, there are a cluster of possible outcomes for both parties. But there is little doubt, if an election were held now, a significant Labor majority would be the outcome.
Turning to the individual seat outcomes, these are charted below. In this chart, the seats where we have the Coalition at zero or 100 per cent probability are not sorted.
And finally, my rough and ready code for this exercise. Usual caveats apply: this is a work in progress.
The first thing I needed was the recalculated margins for each seat. For this data I used Wikipedia. Antony Green has also undertaken these calculations. For the seats that had not been redistributed, we have original polling outcome data from the AEC. This base, expressed as margins, looks something like this.
With this base, I have built a Monte Carlo simulation. In a Monte Carlo simulation we sample from probability distributions many thousands of times to identify the range of possible outcomes. These are then analysed to identify the probabilities for different events occurring. The model needs to consider those factors that can see the results vary.
The biggest source of uncertainty I need to manage is polling uncertainty. It is not unusual for an aggregated opinion poll to be plus or minus two percentage points from the final election outcome. In the Monte Carlo model I have assumed that the actual election outcome will be normally distributed around the poll estimate with a standard deviation of one percentage point.
Another source of uncertainty is the way in which the swing in the individual seats is distributed around the national swing and the way in which this swing varies state-by-state. Historically, individual seat swings have been close to normally distributed around the the national swing with a standard deviation of 3 percentage points. They have also been close to normally distributed around state swings with a standard deviation of 2.5 percentage points.
For this model, I have used state swings, based on the most recent state-by-state Newspoll (which pre-dates the Morrison ascendancy), and then adjusted for a change in the aggregate two-party preferred (TPP) since the Morrison ascendancy. I draw random numbers from a Dirichlet distribution to achieve this adjustment. The state swings since the last election the model used can be seen in the following kernel density estimate plot. The chart is of the state swings to the Coalition in percentage points since the 2016 election. The largest swings against the government at the moment appear to be in WA and Queensland.
There are two key factors that I have not modeled. The first is the sophomore effect - a bump that first term members of Parliament get when running for re-election. The second factor is the retirement effect - a decline in the party vote in a seat following the retirement of long standing member for that party. Labor has a large number of first-term parliamentarians, and is likely to benefit from the sophomore effect at the next election, not withstanding it also has a number of retirees.
A further (and perhaps more critical) factor I have not modeled is the outcome in seats currently held by other parties. For this analysis I have simply assumed those seats will continue to be held by other parties.
In the current climate, with an estimated aggregate TPP of 45 per cent for the Coalition. The model predicts a substantial victory for Labor were an election held now. Based on a simulation run of 100,000, the model predicts Labor is most likely to win 95 seats, and the Coalition 51 seats.
While this is the most likely outcome, there are a cluster of possible outcomes for both parties. But there is little doubt, if an election were held now, a significant Labor majority would be the outcome.
Turning to the individual seat outcomes, these are charted below. In this chart, the seats where we have the Coalition at zero or 100 per cent probability are not sorted.
And finally, my rough and ready code for this exercise. Usual caveats apply: this is a work in progress.
# PYTHON: Monte-Carlo simulation of election outcomes # -- NOTE: a number of data sources need to be updated # in this code before it is run. import pandas as pd import numpy as np import matplotlib.pyplot as plt import sys sys.path.append( '../bin' ) plt.style.use('../bin/markgraph.mplstyle') # --- version information print('Python version: {}'.format(sys.version)) # --- Seat data # Seat data sourced from # https://en.wikipedia.org/wiki/Pre-election_pendulum_for_the_next_Australian_federal_election workbook = pd.ExcelFile('./Data/Seats.xlsx') df = workbook.parse('seats') df.index = df.Seat Coalition_TPP_2016 = 0.5036 # ===> UPDATE HERE <=== Coalition_TPP_now = 0.4500 # TO DO - source Coalition_TPP_now directly from TPP aggregation Swing_to_Coalition = Coalition_TPP_now - Coalition_TPP_2016 others = df[df['LNP TPP'].isnull()] df = df[df['LNP TPP'].notnull()] base = ((df['LNP TPP'] / 100.0) - 0.5) # NOTE: base < 0 is Labor; base > 0 is Coalition # --- State Data - note: TPP from the Coalition's perspective. states = ['NSW', 'Vic', 'Qld', 'WA', 'SA', 'Tas', 'ACT', 'NT'] # voters from https://www.aec.gov.au/Enrolling_to_vote/Enrolment_stats/national/index.htm # ===> UPDATE HERE <=== voters = [5211182, 4094212, 3203789, 1615900, 1200395, 381409, 290654, 138581] voters = pd.Series(voters, index=states) # State 2016 TPP source: https://results.aec.gov.au/20499/Website/HouseTppByState-20499.htm tpp_2016 = [0.5053, 0.4817, 0.5410, 0.5466, 0.4773, 0.4264, 0.3887, 0.4294] # latest TPP estimate draws on # https://www.theaustralian.com.au/national-affairs/turnbull-axed-as-coalition-closed-the-gap-on-labor/news-story/487dd05cd4dc95693bd6c55b44bfbe88 # ===> UPDATE HERE <=== tpp_est_now= [0.4963, 0.4597, 0.5000, 0.4996, 0.5103, 0.4164, 0.3787, 0.4194] # the multinomial vector for drawing the Dirichlet random numbers of state swings alpha_scale = 10000000 state_alpha = (tpp_est_now * voters * alpha_scale / voters.sum()).astype(int) # --- let's simulate ... Monte_Carlo_N = 100000 # next line - preallocate space to speed up calculations simulations = pd.DataFrame(np.zeros((len(base),Monte_Carlo_N))) simulations.index = df.index state_swings = pd.DataFrame(np.zeros((len(states),Monte_Carlo_N))) state_swings.index = states print('Commencing ', str(Monte_Carlo_N), ' simulation run ...') for i in range(Monte_Carlo_N) : # -- progress indication if i % (Monte_Carlo_N // 20) == 0 : print(i) # -- polling uncertainty - polls often out by up +/- two percentage points pollingUncertainty = np.random.standard_normal(1) * 0.01 # = standard deviation #pollingUncertainty = 0.0 # -- variable swing by state - use a dirichlet random to manage this element to ensure # total Coalition vote is the same as the Coalition TPP for all eligible voters # NOTE: Comment out this section to use national swings rather than state swings # NOTE: drawing random numbers from the Dirichlet distribution is slow state_dirichlet = np.random.dirichlet(state_alpha) # proportion of Coalition cote in each state state_Coalition_tpp = state_dirichlet * (voters.sum() * Coalition_TPP_now) / voters state_swing_to_coalition = state_Coalition_tpp - tpp_2016 state_swings[i] = state_swing_to_coalition # we will plot this Swing_to_Coalition = df.State.map(state_swing_to_coalition) # -- TO DO - retirement effect # -- TO DO - sophomore effect # -- variable swing seat-by-seat - normally distributed noise around 0 # -- use a standard deviation of 0.03 for national swings # -- use a standard deviation of 0.025 for state swings # -- https://marktheballot.blogspot.com/2016/11/how-are-seat-swings-distributed-around.html # -- https://marktheballot.blogspot.com/2012/11/state-swings.html seatDistributedAroundSwing = np.random.standard_normal(len(base)) * 0.025 # = standard deviation # -- bring it all together ... simulations[i] = base + Swing_to_Coalition + pollingUncertainty + seatDistributedAroundSwing print('Finished simulation ... analysing data ...') sumCoalition = simulations[simulations >= 0].count() sumLabor = len(base) - sumCoalition simSummary = pd.concat({'Coalition': sumCoalition.value_counts(), 'Labor': sumLabor.value_counts()}, axis=1) min_value = simSummary.index.min() max_value = simSummary.index.max()+1 simSummary = pd.DataFrame(simSummary[['Labor', 'Coalition']], index=range(min_value, max_value)) simSummary = simSummary / simSummary.sum() simSummary = simSummary.sort_index() # -- seat count distributional plot print('About to plot ...') ax = simSummary.plot.bar() ax.set_title('Election Outcome Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Seats Won') ax.set_ylabel('Probability') ticks = np.arange(min_value, max_value, 5) ax.set(xticks=[x - ticks[0] for x in ticks], xticklabels=ticks) fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatCountProbabilities.png', dpi=125) plt.close() # -- most likely outcome plot ml_coalition = simSummary[simSummary['Coalition'] == simSummary['Coalition'].max()].index[0] ml_labor = len(base) - ml_coalition ml_other = len(others) ml_outcome = pd.Series(data=[ml_labor, ml_coalition, ml_other], index=['Labor', 'Coalition', 'Other']) ax = ml_outcome.plot.barh() ax.set_title('Most likely Election Outcome for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Number of Seats Won by Party') ax.set_ylabel('') for i in ax.patches: ax.text(x=1, y=i.get_y()+.16, s=str(i.get_width()), color='white') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatLikelyOutcome.png', dpi=125) plt.close() # -- plot state swings to the Coalition - as a KDE state_swings = state_swings * 100 # covert to percentage points ax = state_swings.T.plot.kde() ax.set_title('State Swing Kernel Density Estimates for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Swing to the Coalition in Percentage Points') ax.set_ylabel('Density') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/Seat-StateSwingKDE.png', dpi=125) plt.close() # -- plot base base = base * 100 # covert to percentage points base.sort_values(inplace=True) ax = base.plot.barh(color='royalblue') ax.set_title('2016 Coalition Margins Starting Point') ax.set_xlabel('Percentage Points (Labor is <0; Coalition is >0)') ax.set_ylabel('Seat') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/Seat-baseMargins.png', dpi=125) plt.close() # -- plot individual seat outcomes sumSeatCoalition = simulations[simulations >= 0].count(axis=1) sumSeatCoalition = sumSeatCoalition / Monte_Carlo_N sumSeatLabor = 1.0 - sumSeatCoalition seatSummary = pd.DataFrame(data={'Coalition': sumSeatCoalition, 'Labor': sumSeatLabor}) seatSummary = seatSummary[['Labor', 'Coalition']] # correct order for colours seatSummary.sort_index(inplace=True) ax = seatSummary.plot.barh(stacked=True, legend=False) ax.set_title('Seat Win Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Probability') ax.set_ylabel('') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatWinProbabilitiesNameOrder.png', dpi=125) plt.close() seatSummary.sort_values(by='Coalition', inplace=True) ax = seatSummary.plot.barh(stacked=True, legend=False) ax.set_title('Seat Win Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Probability') ax.set_ylabel('') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatWinProbabilitiesOutcomeOrder.png', dpi=125) plt.close()