Sunday, September 2, 2018

Monte Carlo simulation of elections

Between elections, the Australian Election Commission (AEC) redraws the electoral boundaries to ensure each seat has a similar number of voters. Now that this redistribution process has been completed, I can use the new seats to model election outcomes.

The first thing I needed was the recalculated margins for each seat. For this data I used Wikipedia. Antony Green has also undertaken these calculations. For the seats that had not been redistributed, we have original polling outcome data from the AEC. This base, expressed as margins, looks something like this.



With this base, I have built a Monte Carlo simulation. In a Monte Carlo simulation we sample from probability distributions many thousands of times to identify the range of possible outcomes. These are then analysed to identify the probabilities for different events occurring. The model needs to consider those factors that can see the results vary.

The biggest source of uncertainty I need to manage is polling uncertainty. It is not unusual for an aggregated opinion poll to be plus or minus two percentage points from the final election outcome. In the Monte Carlo model I have assumed that the actual election outcome will be normally distributed around the poll estimate with a standard deviation of one percentage point.

Another source of uncertainty is the way in which the swing in the individual seats is distributed around the national swing and the way in which this swing varies state-by-state. Historically, individual seat swings have been close to normally distributed around the the national swing with a standard deviation of 3 percentage points. They have also been close to normally distributed around state swings with a standard deviation of 2.5 percentage points.

For this model, I have used state swings, based on the most recent state-by-state Newspoll (which pre-dates the Morrison ascendancy), and then adjusted for a change in the aggregate two-party preferred (TPP) since the Morrison ascendancy. I draw random numbers from a Dirichlet distribution to achieve this adjustment. The state swings since the last election the model used can be seen in the following kernel density estimate plot. The chart is of the state swings to the Coalition in percentage points since the 2016 election. The largest swings against the government at the moment appear to be in WA and Queensland.
 


There are two key factors that I have not modeled. The first is the sophomore effect - a bump that first term members of Parliament get when running for re-election. The second factor is the retirement effect - a decline in the party vote in a seat following the retirement of long standing member for that party. Labor has a large number of first-term parliamentarians, and is likely to benefit from the sophomore effect at the next election, not withstanding it also has a number of retirees.

A further (and perhaps more critical) factor I have not modeled is the outcome in seats currently held by other parties. For this analysis I have simply assumed those seats will continue to be held by other parties.

In the current climate, with an estimated aggregate TPP of 45 per cent for the Coalition. The model predicts a substantial victory for Labor were an election held now. Based on a simulation run of 100,000, the model predicts Labor is most likely to win 95 seats, and the Coalition 51 seats.


While this is the most likely outcome, there are a cluster of possible outcomes for both parties. But there is little doubt, if an election were held now, a significant Labor majority would be the outcome.


Turning to the individual seat outcomes, these are charted below. In this chart, the seats where we have the Coalition at zero or 100 per cent probability are not sorted.



And finally, my rough and ready code for this exercise. Usual caveats apply: this is a work in progress.
# PYTHON: Monte-Carlo simulation of election outcomes
# -- NOTE: a number of data sources need to be updated 
#          in this code before it is run.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append( '../bin' )
plt.style.use('../bin/markgraph.mplstyle')


# --- version information
print('Python version: {}'.format(sys.version))


# --- Seat data
# Seat data sourced from 
# https://en.wikipedia.org/wiki/Pre-election_pendulum_for_the_next_Australian_federal_election
workbook = pd.ExcelFile('./Data/Seats.xlsx')
df = workbook.parse('seats')
df.index = df.Seat

Coalition_TPP_2016 = 0.5036
# ===> UPDATE HERE <===
Coalition_TPP_now  = 0.4500 
# TO DO - source Coalition_TPP_now directly from TPP aggregation
Swing_to_Coalition = Coalition_TPP_now - Coalition_TPP_2016

others = df[df['LNP TPP'].isnull()]
df = df[df['LNP TPP'].notnull()]
base = ((df['LNP TPP'] / 100.0) - 0.5) 
# NOTE: base < 0 is Labor; base > 0 is Coalition 


# --- State Data - note: TPP from the Coalition's perspective. 
states =     ['NSW',   'Vic',   'Qld',   'WA',    'SA',    'Tas',   'ACT',   'NT']

# voters from https://www.aec.gov.au/Enrolling_to_vote/Enrolment_stats/national/index.htm
# ===> UPDATE HERE <===
voters =     [5211182, 4094212, 3203789, 1615900, 1200395,  381409,  290654, 138581]
voters = pd.Series(voters, index=states)

# State 2016 TPP source: https://results.aec.gov.au/20499/Website/HouseTppByState-20499.htm
tpp_2016 =   [0.5053,  0.4817,  0.5410,  0.5466,  0.4773,  0.4264,  0.3887,  0.4294]

# latest TPP estimate draws on 
# https://www.theaustralian.com.au/national-affairs/turnbull-axed-as-coalition-closed-the-gap-on-labor/news-story/487dd05cd4dc95693bd6c55b44bfbe88
# ===> UPDATE HERE <===
tpp_est_now= [0.4963,  0.4597,  0.5000,  0.4996,  0.5103, 0.4164,   0.3787,  0.4194]

# the multinomial vector for drawing the Dirichlet random numbers of state swings
alpha_scale = 10000000 
state_alpha = (tpp_est_now * voters * alpha_scale / voters.sum()).astype(int) 


# --- let's simulate ...
Monte_Carlo_N = 100000
# next line - preallocate space to speed up calculations
simulations = pd.DataFrame(np.zeros((len(base),Monte_Carlo_N)))
simulations.index = df.index
state_swings = pd.DataFrame(np.zeros((len(states),Monte_Carlo_N)))
state_swings.index = states

print('Commencing ', str(Monte_Carlo_N), ' simulation run ...')
for i in range(Monte_Carlo_N) :

    # -- progress indication
    if i % (Monte_Carlo_N // 20) == 0 :
        print(i)

    # -- polling uncertainty - polls often out by up +/- two percentage points
    pollingUncertainty = np.random.standard_normal(1) * 0.01 # = standard deviation
    #pollingUncertainty = 0.0
    
    # -- variable swing by state - use a dirichlet random to manage this element to ensure
    #    total Coalition vote is the same as the Coalition TPP for all eligible voters
    # NOTE: Comment out this section to use national swings rather than state swings
    # NOTE: drawing random numbers from the Dirichlet distribution is slow
    state_dirichlet = np.random.dirichlet(state_alpha) # proportion of Coalition cote in each state
    state_Coalition_tpp = state_dirichlet * (voters.sum() * Coalition_TPP_now) / voters
    state_swing_to_coalition = state_Coalition_tpp - tpp_2016
    state_swings[i] = state_swing_to_coalition # we will plot this
    Swing_to_Coalition = df.State.map(state_swing_to_coalition)
    
    # -- TO DO - retirement effect
    
    # -- TO DO - sophomore effect
    
    # -- variable swing seat-by-seat - normally distributed noise around 0
    # -- use a standard deviation of 0.03 for national swings
    # -- use a standard deviation of 0.025 for state swings
    # -- https://marktheballot.blogspot.com/2016/11/how-are-seat-swings-distributed-around.html
    # -- https://marktheballot.blogspot.com/2012/11/state-swings.html
    seatDistributedAroundSwing = np.random.standard_normal(len(base)) * 0.025 # = standard deviation
    
    # -- bring it all together ...
    simulations[i] = base + Swing_to_Coalition + pollingUncertainty + seatDistributedAroundSwing
    

print('Finished simulation ... analysing data ...')
sumCoalition = simulations[simulations >= 0].count()
sumLabor = len(base) - sumCoalition
simSummary = pd.concat({'Coalition': sumCoalition.value_counts(), 
    'Labor': sumLabor.value_counts()}, axis=1)
min_value = simSummary.index.min()
max_value = simSummary.index.max()+1
simSummary = pd.DataFrame(simSummary[['Labor', 'Coalition']], 
    index=range(min_value, max_value))
simSummary = simSummary / simSummary.sum() 
simSummary = simSummary.sort_index()


# -- seat count distributional plot
print('About to plot ...')
ax = simSummary.plot.bar()
ax.set_title('Election Outcome Probabilities for Coalition TPP: ' 
    + str(Coalition_TPP_now * 100.0))
ax.set_xlabel('Seats Won')
ax.set_ylabel('Probability') 
ticks = np.arange(min_value, max_value, 5)
ax.set(xticks=[x - ticks[0] for x in ticks], xticklabels=ticks)

fig = ax.figure
fig.set_size_inches(8, 4)
fig.tight_layout(pad=1)
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/SeatCountProbabilities.png', dpi=125) 
plt.close()


# -- most likely outcome plot
ml_coalition = simSummary[simSummary['Coalition'] == 
    simSummary['Coalition'].max()].index[0] 
ml_labor = len(base) - ml_coalition
ml_other = len(others)
ml_outcome = pd.Series(data=[ml_labor, ml_coalition, ml_other], 
    index=['Labor', 'Coalition', 'Other'])
    
ax = ml_outcome.plot.barh()
ax.set_title('Most likely Election Outcome for Coalition TPP: ' 
    + str(Coalition_TPP_now * 100.0))
ax.set_xlabel('Number of Seats Won by Party')
ax.set_ylabel('') 

for i in ax.patches:
    ax.text(x=1, y=i.get_y()+.16, s=str(i.get_width()), color='white')
    
fig = ax.figure
fig.set_size_inches(8, 4)
fig.tight_layout(pad=1)
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/SeatLikelyOutcome.png', dpi=125) 
plt.close()


# -- plot state swings to the Coalition - as a KDE
state_swings = state_swings * 100 # covert to percentage points
ax = state_swings.T.plot.kde()
ax.set_title('State Swing Kernel Density Estimates for Coalition TPP: ' 
    + str(Coalition_TPP_now * 100.0))
ax.set_xlabel('Swing to the Coalition in Percentage Points')
ax.set_ylabel('Density') 

fig = ax.figure
fig.set_size_inches(8, 4)
fig.tight_layout(pad=1)
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/Seat-StateSwingKDE.png', dpi=125) 
plt.close()


# -- plot base
base = base * 100 # covert to percentage points
base.sort_values(inplace=True)
ax = base.plot.barh(color='royalblue')
ax.set_title('2016 Coalition Margins Starting Point')
ax.set_xlabel('Percentage Points (Labor is <0; Coalition is >0)')
ax.set_ylabel('Seat') 

fig = ax.figure
fig.set_size_inches(8, 30)
fig.tight_layout(pad=1)
fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/Seat-baseMargins.png', dpi=125) 
plt.close()


# -- plot individual seat outcomes
sumSeatCoalition = simulations[simulations >= 0].count(axis=1) 
sumSeatCoalition = sumSeatCoalition / Monte_Carlo_N
sumSeatLabor = 1.0 - sumSeatCoalition
seatSummary = pd.DataFrame(data={'Coalition': sumSeatCoalition, 'Labor': sumSeatLabor})
seatSummary = seatSummary[['Labor', 'Coalition']] # correct order for colours
seatSummary.sort_index(inplace=True)

ax = seatSummary.plot.barh(stacked=True, legend=False)
ax.set_title('Seat Win Probabilities for Coalition TPP: ' 
    + str(Coalition_TPP_now * 100.0))
ax.set_xlabel('Probability')
ax.set_ylabel('') 

fig = ax.figure
fig.set_size_inches(8, 30)
fig.tight_layout(pad=1)
fig.text(0.99, 0.005, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/SeatWinProbabilitiesNameOrder.png', dpi=125) 
plt.close()

seatSummary.sort_values(by='Coalition', inplace=True)
ax = seatSummary.plot.barh(stacked=True, legend=False)
ax.set_title('Seat Win Probabilities for Coalition TPP: ' 
    + str(Coalition_TPP_now * 100.0))
ax.set_xlabel('Probability')
ax.set_ylabel('') 

fig = ax.figure
fig.set_size_inches(8, 30)
fig.tight_layout(pad=1)
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au',
        ha='right', va='bottom', fontsize='x-small', 
        fontstyle='italic', color='#999999') 
fig.savefig('./Graphs/SeatWinProbabilitiesOutcomeOrder.png', dpi=125) 
plt.close()

2 comments:

  1. Why is Mallee on the list twice...?

    ReplyDelete
  2. Thanks for pointing out the glitch. The newly named seat of Nicholls (currently known as the seat of Murray) was listed as Mallee in my source data from the Wikipedia page. Now fixed.

    ReplyDelete