## Saturday, September 22, 2018

### Redistributions

One of the myths of data science is the myth of scientific accuracy. The myth is that when a data scientist applies the correct mathematical procedures, they get a mathematical correct answer to seven decimal places. In reality, data scientists make many choices in the analysis they bring, and the data they work with. Gelman and Loken (2013) enlivened this concept to the statistics community with their paper on the garden of forking paths.

They way in which we curate data, munge it and prepare it for analysis, and the analytical frames we we use are all choices. Unconsciously those choices can frame our results. And they can blind us to the results we might have produced if we had taken a different approach. In short, they can mislead us, even when we set out to undertake ethical and unbiased analysis.

I found myself reflecting on the garden of forking paths as I considered how to best apply the 2016 polling outcome data to the electoral boundary changes that occurred in 2017 and 2018. I wanted to find the best way to allocate to the new seat boundaries the ordinary votes that were geographically linked (via longitude and latitude) to polling locations in 2016 . I also had to allocate the non-tagged votes (some ordinary votes, but also absentee votes, provisional votes, postal votes and declaration pre-poll votes).

My initial plan was to allocate the 2016 two-party preferred (TPP) votes to new electorates based on the geo-coded place where the vote was cast. I would then allocate the remaining votes for each electorate in proportion with the geo-coded votes by party for that electorate. Once all the votes from 2016 had been allocated to the new electorate boundaries, I could calculate a TPP margin for each seat.

However, this turned out to be problematic. For example, a number of votes in the Victorian seat of Goldstein (which was unchanged by the Victorian redistribution) were coded as being cast outside of the electorate. But as the seat was unaffected by the redistribution, those votes should remain attributed to Goldstein, and should not be counted in another electorate.

The interesting analytical question became which data was most reflective of where people live (at a reasonably fine level of granularity), and which data is only reflective of the voter's electorate without any granularity about where in the electorate they might live. For example, if someone from the outer-suburbs of Melbourne votes in the city on polling day, can I draw any useful information about where they live in their electorate. My conclusion was that I could not.

As a consequence, to the extent I could apply this rule, where people voted at a polling booth outside of their electorate, and we have Geo-coding for that polling place, I chose to ignore that Geo-coding. I also decided to ignore the geo-coding for pre-poll votes. Because there are so few pre-poll voting centres, I assumed the votes cast at these centres were not necessarily informative about where in an electorate the pre-poll voters come from. Conversely, I assumed the location data from the many polling booths to be informative, if the person was casting an ordinary vote on the polling day in their electorate.

I was picky, because I wanted to be comfortable that as boundaries are redrawn, I was meaningfully attributing votes from the parts of an electorate that might be redistributed differently.

For the votes that I could only attribute to an electorate, I allocated them in the same proportion as the ordinary TPP votes by party were allocated. If 70 per cent of the Labor vote stayed in an electorate and 30 per cent went to another electorate, that is how I treated the Labor vote from the electorate where I did not have geographic information or where I deemed the geographic information I had to be insufficiently informative. I treated the flow of Coalition and Labor votes separately.

My initial results are close to, but also different from the work of others. My walk in the garden of forking paths was different to theirs. The usual caveats for preliminary analysis apply: Be warned, this work could include errors that I have not identified. While I have calculated results to two decimal places, this suggests a level of accuracy that belies the many choices I have made in munging and analysing the data. Also note: I could not find map data for the 2016 electorate boundaries in the shapefile format for NT and Tasmania, so my treatment of these states is less certain than the others.

And here are my Coalition 2016 TPP estimates for each seat compared with those from Antony Green and William Bowe.

 Seat Mark Graph Antony Green William Bowe Bean (ACT) 40.9 41.1 41.1 Canberra (ACT) 36.54 36.8 36.8 Fenner (ACT) 38.97 38.4 38.2 Banks (NSW) 51.44 51.4 51.4 Barton (NSW) 41.7 41.7 41.7 Bennelong (NSW) 59.72 59.7 59.7 Berowra (NSW) 66.45 66.4 66.5 Blaxland (NSW) 30.52 30.5 30.5 Bradfield (NSW) 71.04 71 71 Calare (NSW) 61.81 61.8 61.8 Chifley (NSW) 30.81 30.8 30.8 Cook (NSW) 65.39 65.4 65.4 Cowper (NSW) 62.58 62.6 62.6 Cunningham (NSW) 36.68 36.7 36.7 Dobell (NSW) 45.19 45.2 45.2 Eden-Monaro (NSW) 47.07 47.1 47.1 Farrer (NSW) 70.53 70.5 70.5 Fowler (NSW) 32.51 32.5 32.5 Gilmore (NSW) 50.73 50.7 50.7 Grayndler (NSW) 27.64 27.6 27.6 Greenway (NSW) 43.69 43.7 43.7 Hughes (NSW) 59.33 59.3 59.3 Hume (NSW) 60.18 60.2 60.2 Hunter (NSW) 37.54 37.5 37.5 Kingsford Smith (NSW) 41.43 41.4 41.4 Lindsay (NSW) 48.89 48.9 48.9 Lyne (NSW) 61.63 61.6 61.6 Macarthur (NSW) 41.67 41.7 41.7 Mackellar (NSW) 65.74 65.7 65.7 Macquarie (NSW) 47.81 47.8 47.8 McMahon (NSW) 37.89 37.9 37.9 Mitchell (NSW) 67.82 67.8 67.8 New England (NSW) 66.42 66.4 66.4 Newcastle (NSW) 36.16 36.2 36.2 North Sydney (NSW) 63.61 63.6 63.6 Page (NSW) 52.3 52.3 52.3 Parkes (NSW) 65.1 65.1 65.1 Parramatta (NSW) 42.33 42.3 42.3 Paterson (NSW) 39.26 39.3 39.3 Reid (NSW) 54.69 54.7 54.7 Richmond (NSW) 46.04 46 46 Riverina (NSW) 66.44 66.4 66.4 Robertson (NSW) 51.14 51.1 51.1 Shortland (NSW) 40.06 40.1 40.1 Sydney (NSW) 34.69 34.7 34.7 Warringah (NSW) 61.09 61.1 61.1 Watson (NSW) 32.42 32.4 32.4 Wentworth (NSW) 67.75 67.7 67.8 Werriwa (NSW) 41.8 41.8 41.8 Whitlam (NSW) 36.28 36.3 36.3 Lingiari (NT) 41.58 41.9 41.8 Solomon (NT) 44 43.9 43.9 Blair (Qld) 41.81 42 41.8 Bonner (Qld) 53.39 53.4 53.4 Bowman (Qld) 57.07 57.1 57.1 Brisbane (Qld) 55.85 56 56.1 Capricornia (Qld) 50.63 50.6 50.6 Dawson (Qld) 53.37 53.3 53.4 Dickson (Qld) 51.6 52 52 Fadden (Qld) 61.05 61.2 61.3 Fairfax (Qld) 60.78 61 60.8 Fisher (Qld) 59.24 59.2 59.3 Flynn (Qld) 51.04 51 51 Forde (Qld) 50.63 50.6 50.6 Griffith (Qld) 48.93 48.6 48.7 Groom (Qld) 65.31 65.3 65.3 Herbert (Qld) 49.98 49.98 50 Hinkler (Qld) 58.42 58.4 58.4 Kennedy (Qld) Leichhardt (Qld) 53.89 54 53.9 Lilley (Qld) 44.68 44.2 44.2 Longman (Qld) 49.21 49.2 49.2 Maranoa (Qld) 67.54 67.5 67.5 McPherson (Qld) 61.64 61.6 61.6 Moncrieff (Qld) 64.94 64.5 64.7 Moreton (Qld) 45.58 46 45.9 Oxley (Qld) 40.92 40.9 40.9 Petrie (Qld) 51.65 51.6 51.7 Rankin (Qld) 38.7 38.7 38.7 Ryan (Qld) 59.21 58.8 59.1 Wide Bay (Qld) 58.14 58.3 58.2 Wright (Qld) 59.62 59.6 59.6 Adelaide (SA) 40.5 41 41.1 Barker (SA) 64.13 64.3 64.1 Boothby (SA) 52.74 52.8 52.9 Grey (SA) 58.37 58.5 58.1 Hindmarsh (SA) 42.64 41.8 41.8 Kingston (SA) 36.21 36.5 36.4 Makin (SA) 39.04 39.1 39.2 Mayo (SA) Spence (SA) 31.79 32.1 32.2 Sturt (SA) 55.78 55.8 55.7 Bass (Tas) 44.57 44.7 44.5 Braddon (Tas) 48.43 48.5 48.4 Clark (Tas) Franklin (Tas) 39.3 39.3 39.3 Lyons (Tas) 46.12 46 46.9 Aston (Vic) 57.73 57.6 57.4 Ballarat (Vic) 42.55 42.6 42.6 Bendigo (Vic) 45.98 46.1 46.1 Bruce (Vic) 33.5 34.3 34.5 Calwell (Vic) 27.88 29.9 29.7 Casey (Vic) 54.03 54.5 54.3 Chisholm (Vic) 53.54 53.4 53.4 Cooper (Vic) 28.25 28 27.9 Corangamite (Vic) 50.61 50.03 50 Corio (Vic) 40.84 41.7 41.5 Deakin (Vic) 56.11 56.3 56.6 Dunkley (Vic) 48.79 48.7 48.7 Flinders (Vic) 56.64 57.2 57.1 Fraser (Vic) 29.85 29.4 29.5 Gellibrand (Vic) 35.68 35.3 35.3 Gippsland (Vic) 68.13 68.2 68.1 Goldstein (Vic) 62.68 62.7 62.7 Gorton (Vic) 32.59 31.7 31.7 Higgins (Vic) 60.64 60.2 60.3 Holt (Vic) 40.16 40.1 40.2 Hotham (Vic) 44.89 45.8 45.8 Indi (Vic) Isaacs (Vic) 48.69 47.7 47.8 Jagajaga (Vic) 46.03 45 45 Kooyong (Vic) 63.14 62.8 62.9 La Trobe (Vic) 52.68 53.5 52.4 Lalor (Vic) 34.6 35.6 35.6 Macnamara (Vic) 48.41 48.7 48.6 Mallee (Vic) 69.42 69.8 69.6 Maribyrnong (Vic) 40.18 40.6 40.6 McEwen (Vic) 44.03 44.7 44.6 Melbourne (Vic) Menzies (Vic) 58.2 57.9 57.9 Monash (Vic) 58.02 57.6 57.8 Nicholls (Vic) 72.49 72.3 72.4 Scullin (Vic) 27.26 29.6 29.7 Wannon (Vic) 59.59 59.3 59.3 Wills (Vic) 28.07 28.2 28.3 Brand (WA) 38.57 38.6 38.6 Burt (WA) 42.89 42.9 42.9 Canning (WA) 56.79 56.8 56.8 Cowan (WA) 49.32 49.3 49.3 Curtin (WA) 70.7 70.7 70.7 Durack (WA) 61.06 61.1 61.1 Forrest (WA) 62.56 62.6 62.6 Fremantle (WA) 42.48 42.5 42.5 Hasluck (WA) 52.05 52.1 52.1 Moore (WA) 61.02 61 61 O'Connor (WA) 65.04 65 65 Pearce (WA) 53.63 53.6 53.6 Perth (WA) 46.67 46.7 46.7 Stirling (WA) 56.12 56.1 56.1 Swan (WA) 53.59 53.6 53.6 Tangney (WA) 61.07 61.1 61.1