The first thing I needed was the recalculated margins for each seat. For this data I used Wikipedia. Antony Green has also undertaken these calculations. For the seats that had not been redistributed, we have original polling outcome data from the AEC. This base, expressed as margins, looks something like this.
With this base, I have built a Monte Carlo simulation. In a Monte Carlo simulation we sample from probability distributions many thousands of times to identify the range of possible outcomes. These are then analysed to identify the probabilities for different events occurring. The model needs to consider those factors that can see the results vary.
The biggest source of uncertainty I need to manage is polling uncertainty. It is not unusual for an aggregated opinion poll to be plus or minus two percentage points from the final election outcome. In the Monte Carlo model I have assumed that the actual election outcome will be normally distributed around the poll estimate with a standard deviation of one percentage point.
Another source of uncertainty is the way in which the swing in the individual seats is distributed around the national swing and the way in which this swing varies state-by-state. Historically, individual seat swings have been close to normally distributed around the the national swing with a standard deviation of 3 percentage points. They have also been close to normally distributed around state swings with a standard deviation of 2.5 percentage points.
For this model, I have used state swings, based on the most recent state-by-state Newspoll (which pre-dates the Morrison ascendancy), and then adjusted for a change in the aggregate two-party preferred (TPP) since the Morrison ascendancy. I draw random numbers from a Dirichlet distribution to achieve this adjustment. The state swings since the last election the model used can be seen in the following kernel density estimate plot. The chart is of the state swings to the Coalition in percentage points since the 2016 election. The largest swings against the government at the moment appear to be in WA and Queensland.
There are two key factors that I have not modeled. The first is the sophomore effect - a bump that first term members of Parliament get when running for re-election. The second factor is the retirement effect - a decline in the party vote in a seat following the retirement of long standing member for that party. Labor has a large number of first-term parliamentarians, and is likely to benefit from the sophomore effect at the next election, not withstanding it also has a number of retirees.
A further (and perhaps more critical) factor I have not modeled is the outcome in seats currently held by other parties. For this analysis I have simply assumed those seats will continue to be held by other parties.
In the current climate, with an estimated aggregate TPP of 45 per cent for the Coalition. The model predicts a substantial victory for Labor were an election held now. Based on a simulation run of 100,000, the model predicts Labor is most likely to win 95 seats, and the Coalition 51 seats.
While this is the most likely outcome, there are a cluster of possible outcomes for both parties. But there is little doubt, if an election were held now, a significant Labor majority would be the outcome.
Turning to the individual seat outcomes, these are charted below. In this chart, the seats where we have the Coalition at zero or 100 per cent probability are not sorted.
And finally, my rough and ready code for this exercise. Usual caveats apply: this is a work in progress.
# PYTHON: Monte-Carlo simulation of election outcomes # -- NOTE: a number of data sources need to be updated # in this code before it is run. import pandas as pd import numpy as np import matplotlib.pyplot as plt import sys sys.path.append( '../bin' ) plt.style.use('../bin/markgraph.mplstyle') # --- version information print('Python version: {}'.format(sys.version)) # --- Seat data # Seat data sourced from # https://en.wikipedia.org/wiki/Pre-election_pendulum_for_the_next_Australian_federal_election workbook = pd.ExcelFile('./Data/Seats.xlsx') df = workbook.parse('seats') df.index = df.Seat Coalition_TPP_2016 = 0.5036 # ===> UPDATE HERE <=== Coalition_TPP_now = 0.4500 # TO DO - source Coalition_TPP_now directly from TPP aggregation Swing_to_Coalition = Coalition_TPP_now - Coalition_TPP_2016 others = df[df['LNP TPP'].isnull()] df = df[df['LNP TPP'].notnull()] base = ((df['LNP TPP'] / 100.0) - 0.5) # NOTE: base < 0 is Labor; base > 0 is Coalition # --- State Data - note: TPP from the Coalition's perspective. states = ['NSW', 'Vic', 'Qld', 'WA', 'SA', 'Tas', 'ACT', 'NT'] # voters from https://www.aec.gov.au/Enrolling_to_vote/Enrolment_stats/national/index.htm # ===> UPDATE HERE <=== voters = [5211182, 4094212, 3203789, 1615900, 1200395, 381409, 290654, 138581] voters = pd.Series(voters, index=states) # State 2016 TPP source: https://results.aec.gov.au/20499/Website/HouseTppByState-20499.htm tpp_2016 = [0.5053, 0.4817, 0.5410, 0.5466, 0.4773, 0.4264, 0.3887, 0.4294] # latest TPP estimate draws on # https://www.theaustralian.com.au/national-affairs/turnbull-axed-as-coalition-closed-the-gap-on-labor/news-story/487dd05cd4dc95693bd6c55b44bfbe88 # ===> UPDATE HERE <=== tpp_est_now= [0.4963, 0.4597, 0.5000, 0.4996, 0.5103, 0.4164, 0.3787, 0.4194] # the multinomial vector for drawing the Dirichlet random numbers of state swings alpha_scale = 10000000 state_alpha = (tpp_est_now * voters * alpha_scale / voters.sum()).astype(int) # --- let's simulate ... Monte_Carlo_N = 100000 # next line - preallocate space to speed up calculations simulations = pd.DataFrame(np.zeros((len(base),Monte_Carlo_N))) simulations.index = df.index state_swings = pd.DataFrame(np.zeros((len(states),Monte_Carlo_N))) state_swings.index = states print('Commencing ', str(Monte_Carlo_N), ' simulation run ...') for i in range(Monte_Carlo_N) : # -- progress indication if i % (Monte_Carlo_N // 20) == 0 : print(i) # -- polling uncertainty - polls often out by up +/- two percentage points pollingUncertainty = np.random.standard_normal(1) * 0.01 # = standard deviation #pollingUncertainty = 0.0 # -- variable swing by state - use a dirichlet random to manage this element to ensure # total Coalition vote is the same as the Coalition TPP for all eligible voters # NOTE: Comment out this section to use national swings rather than state swings # NOTE: drawing random numbers from the Dirichlet distribution is slow state_dirichlet = np.random.dirichlet(state_alpha) # proportion of Coalition cote in each state state_Coalition_tpp = state_dirichlet * (voters.sum() * Coalition_TPP_now) / voters state_swing_to_coalition = state_Coalition_tpp - tpp_2016 state_swings[i] = state_swing_to_coalition # we will plot this Swing_to_Coalition = df.State.map(state_swing_to_coalition) # -- TO DO - retirement effect # -- TO DO - sophomore effect # -- variable swing seat-by-seat - normally distributed noise around 0 # -- use a standard deviation of 0.03 for national swings # -- use a standard deviation of 0.025 for state swings # -- https://marktheballot.blogspot.com/2016/11/how-are-seat-swings-distributed-around.html # -- https://marktheballot.blogspot.com/2012/11/state-swings.html seatDistributedAroundSwing = np.random.standard_normal(len(base)) * 0.025 # = standard deviation # -- bring it all together ... simulations[i] = base + Swing_to_Coalition + pollingUncertainty + seatDistributedAroundSwing print('Finished simulation ... analysing data ...') sumCoalition = simulations[simulations >= 0].count() sumLabor = len(base) - sumCoalition simSummary = pd.concat({'Coalition': sumCoalition.value_counts(), 'Labor': sumLabor.value_counts()}, axis=1) min_value = simSummary.index.min() max_value = simSummary.index.max()+1 simSummary = pd.DataFrame(simSummary[['Labor', 'Coalition']], index=range(min_value, max_value)) simSummary = simSummary / simSummary.sum() simSummary = simSummary.sort_index() # -- seat count distributional plot print('About to plot ...') ax = simSummary.plot.bar() ax.set_title('Election Outcome Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Seats Won') ax.set_ylabel('Probability') ticks = np.arange(min_value, max_value, 5) ax.set(xticks=[x - ticks[0] for x in ticks], xticklabels=ticks) fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatCountProbabilities.png', dpi=125) plt.close() # -- most likely outcome plot ml_coalition = simSummary[simSummary['Coalition'] == simSummary['Coalition'].max()].index[0] ml_labor = len(base) - ml_coalition ml_other = len(others) ml_outcome = pd.Series(data=[ml_labor, ml_coalition, ml_other], index=['Labor', 'Coalition', 'Other']) ax = ml_outcome.plot.barh() ax.set_title('Most likely Election Outcome for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Number of Seats Won by Party') ax.set_ylabel('') for i in ax.patches: ax.text(x=1, y=i.get_y()+.16, s=str(i.get_width()), color='white') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatLikelyOutcome.png', dpi=125) plt.close() # -- plot state swings to the Coalition - as a KDE state_swings = state_swings * 100 # covert to percentage points ax = state_swings.T.plot.kde() ax.set_title('State Swing Kernel Density Estimates for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Swing to the Coalition in Percentage Points') ax.set_ylabel('Density') fig = ax.figure fig.set_size_inches(8, 4) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/Seat-StateSwingKDE.png', dpi=125) plt.close() # -- plot base base = base * 100 # covert to percentage points base.sort_values(inplace=True) ax = base.plot.barh(color='royalblue') ax.set_title('2016 Coalition Margins Starting Point') ax.set_xlabel('Percentage Points (Labor is <0; Coalition is >0)') ax.set_ylabel('Seat') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/Seat-baseMargins.png', dpi=125) plt.close() # -- plot individual seat outcomes sumSeatCoalition = simulations[simulations >= 0].count(axis=1) sumSeatCoalition = sumSeatCoalition / Monte_Carlo_N sumSeatLabor = 1.0 - sumSeatCoalition seatSummary = pd.DataFrame(data={'Coalition': sumSeatCoalition, 'Labor': sumSeatLabor}) seatSummary = seatSummary[['Labor', 'Coalition']] # correct order for colours seatSummary.sort_index(inplace=True) ax = seatSummary.plot.barh(stacked=True, legend=False) ax.set_title('Seat Win Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Probability') ax.set_ylabel('') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatWinProbabilitiesNameOrder.png', dpi=125) plt.close() seatSummary.sort_values(by='Coalition', inplace=True) ax = seatSummary.plot.barh(stacked=True, legend=False) ax.set_title('Seat Win Probabilities for Coalition TPP: ' + str(Coalition_TPP_now * 100.0)) ax.set_xlabel('Probability') ax.set_ylabel('') fig = ax.figure fig.set_size_inches(8, 30) fig.tight_layout(pad=1) fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom', fontsize='x-small', fontstyle='italic', color='#999999') fig.savefig('./Graphs/SeatWinProbabilitiesOutcomeOrder.png', dpi=125) plt.close()
Why is Mallee on the list twice...?
ReplyDeleteThanks for pointing out the glitch. The newly named seat of Nicholls (currently known as the seat of Murray) was listed as Mallee in my source data from the Wikipedia page. Now fixed.
ReplyDelete