Monday, October 15, 2018

Polling update

Today we had an Ipsos poll (45-55 in Labor's favour) and a Newspoll (47-53) with vastly different interpretations. For Newspoll the story was one of steady Coalition improvement (47 is better than Morrison's debut at 44). For Ipsos it was one of no benefit from the recent leadership change. The last Turnbull poll under Ipsos was also 45-55.

In the Bayesian model, I have allowed for a discontinuity in public opinion on 23 August, and for a period of higher than normal volatility in day-to-day voting sentiment from 24 August to 1 October. The results are as follows.



Enough time has passed since the Coalition leadership change for the moving average models to start to come back into alignment with the Bayesian model. Not withstanding the Coalition bounce following the immediate polling collapse in reaction to the leadership change in August 2018, Coalition voting sentiment is as low now as it was at Turnbull's worst period in the polls in late 2017.


My primary vote model has decided to stop working. Actually, I upgraded to the latest versions of Stan and pystan, and I need to tweak the model to get it working again.

Wednesday, September 26, 2018

Updated aggregation

With Essential coming in at 47-53 in Labor's favour, it is time to update the aggregation. Both Essential and Newspoll are pegging a two percentage point recovery for the Coalition over the past four weeks. I have tweaked the TPP model to allow for a more rapid changes in public opinion over the period immediately following the recent disruption event.




Turning to the primary vote model ...






Monday, September 24, 2018

Aggregated poll update

With the publication of the latest Newspoll (46 to 54 in Labor's favour), it is time to update my poll aggregation. I currently have the Coalition with a 45.8 per cent share of the two-party preferred (TPP) vote, if an election was held at the moment.



The moving averages ... which take time to settle after a disruption event.


The primary votes ...






The data for this analysis was sourced from Wikipedia.

Saturday, September 22, 2018

Redistributions

One of the myths of data science is the myth of scientific accuracy. The myth is that when a data scientist applies the correct mathematical procedures, they get a mathematical correct answer to seven decimal places. In reality, data scientists make many choices in the analysis they bring, and the data they work with. Gelman and Loken (2013) enlivened this concept to the statistics community with their paper on the garden of forking paths.

They way in which we curate data, munge it and prepare it for analysis, and the analytical frames we we use are all choices. Unconsciously those choices can frame our results. And they can blind us to the results we might have produced if we had taken a different approach. In short, they can mislead us, even when we set out to undertake ethical and unbiased analysis. 

I found myself reflecting on the garden of forking paths as I considered how to best apply the 2016 polling outcome data to the electoral boundary changes that occurred in 2017 and 2018. I wanted to find the best way to allocate to the new seat boundaries the ordinary votes that were geographically linked (via longitude and latitude) to polling locations in 2016 . I also had to allocate the non-tagged votes (some ordinary votes, but also absentee votes, provisional votes, postal votes and declaration pre-poll votes).

My initial plan was to allocate the 2016 two-party preferred (TPP) votes to new electorates based on the geo-coded place where the vote was cast. I would then allocate the remaining votes for each electorate in proportion with the geo-coded votes by party for that electorate. Once all the votes from 2016 had been allocated to the new electorate boundaries, I could calculate a TPP margin for each seat.

However, this turned out to be problematic. For example, a number of votes in the Victorian seat of Goldstein (which was unchanged by the Victorian redistribution) were coded as being cast outside of the electorate. But as the seat was unaffected by the redistribution, those votes should remain attributed to Goldstein, and should not be counted in another electorate.

The interesting analytical question became which data was most reflective of where people live (at a reasonably fine level of granularity), and which data is only reflective of the voter's electorate without any granularity about where in the electorate they might live. For example, if someone from the outer-suburbs of Melbourne votes in the city on polling day, can I draw any useful information about where they live in their electorate. My conclusion was that I could not.

As a consequence, to the extent I could apply this rule, where people voted at a polling booth outside of their electorate, and we have Geo-coding for that polling place, I chose to ignore that Geo-coding. I also decided to ignore the geo-coding for pre-poll votes. Because there are so few pre-poll voting centres, I assumed the votes cast at these centres were not necessarily informative about where in an electorate the pre-poll voters come from. Conversely, I assumed the location data from the many polling booths to be informative, if the person was casting an ordinary vote on the polling day in their electorate.

I was picky, because I wanted to be comfortable that as boundaries are redrawn, I was meaningfully attributing votes from the parts of an electorate that might be redistributed differently.

For the votes that I could only attribute to an electorate, I allocated them in the same proportion as the ordinary TPP votes by party were allocated. If 70 per cent of the Labor vote stayed in an electorate and 30 per cent went to another electorate, that is how I treated the Labor vote from the electorate where I did not have geographic information or where I deemed the geographic information I had to be insufficiently informative. I treated the flow of Coalition and Labor votes separately.

My initial results are close to, but also different from the work of others. My walk in the garden of forking paths was different to theirs. The usual caveats for preliminary analysis apply: Be warned, this work could include errors that I have not identified. While I have calculated results to two decimal places, this suggests a level of accuracy that belies the many choices I have made in munging and analysing the data. Also note: I could not find map data for the 2016 electorate boundaries in the shapefile format for NT and Tasmania, so my treatment of these states is less certain than the others.

And here are my Coalition 2016 TPP estimates for each seat compared with those from Antony Green and William Bowe.


Seat Mark Graph Antony Green William Bowe
Bean (ACT) 40.9 41.1 41.1
Canberra (ACT) 36.54 36.8 36.8
Fenner (ACT) 38.97 38.4 38.2
Banks (NSW) 51.44 51.4 51.4
Barton (NSW) 41.7 41.7 41.7
Bennelong (NSW) 59.72 59.7 59.7
Berowra (NSW) 66.45 66.4 66.5
Blaxland (NSW) 30.52 30.5 30.5
Bradfield (NSW) 71.04 71 71
Calare (NSW) 61.81 61.8 61.8
Chifley (NSW) 30.81 30.8 30.8
Cook (NSW) 65.39 65.4 65.4
Cowper (NSW) 62.58 62.6 62.6
Cunningham (NSW) 36.68 36.7 36.7
Dobell (NSW) 45.19 45.2 45.2
Eden-Monaro (NSW) 47.07 47.1 47.1
Farrer (NSW) 70.53 70.5 70.5
Fowler (NSW) 32.51 32.5 32.5
Gilmore (NSW) 50.73 50.7 50.7
Grayndler (NSW) 27.64 27.6 27.6
Greenway (NSW) 43.69 43.7 43.7
Hughes (NSW) 59.33 59.3 59.3
Hume (NSW) 60.18 60.2 60.2
Hunter (NSW) 37.54 37.5 37.5
Kingsford Smith (NSW) 41.43 41.4 41.4
Lindsay (NSW) 48.89 48.9 48.9
Lyne (NSW) 61.63 61.6 61.6
Macarthur (NSW) 41.67 41.7 41.7
Mackellar (NSW) 65.74 65.7 65.7
Macquarie (NSW) 47.81 47.8 47.8
McMahon (NSW) 37.89 37.9 37.9
Mitchell (NSW) 67.82 67.8 67.8
New England (NSW) 66.42 66.4 66.4
Newcastle (NSW) 36.16 36.2 36.2
North Sydney (NSW) 63.61 63.6 63.6
Page (NSW) 52.3 52.3 52.3
Parkes (NSW) 65.1 65.1 65.1
Parramatta (NSW) 42.33 42.3 42.3
Paterson (NSW) 39.26 39.3 39.3
Reid (NSW) 54.69 54.7 54.7
Richmond (NSW) 46.04 46 46
Riverina (NSW) 66.44 66.4 66.4
Robertson (NSW) 51.14 51.1 51.1
Shortland (NSW) 40.06 40.1 40.1
Sydney (NSW) 34.69 34.7 34.7
Warringah (NSW) 61.09 61.1 61.1
Watson (NSW) 32.42 32.4 32.4
Wentworth (NSW) 67.75 67.7 67.8
Werriwa (NSW) 41.8 41.8 41.8
Whitlam (NSW) 36.28 36.3 36.3
Lingiari (NT) 41.58 41.9 41.8
Solomon (NT) 44 43.9 43.9
Blair (Qld) 41.81 42 41.8
Bonner (Qld) 53.39 53.4 53.4
Bowman (Qld) 57.07 57.1 57.1
Brisbane (Qld) 55.85 56 56.1
Capricornia (Qld) 50.63 50.6 50.6
Dawson (Qld) 53.37 53.3 53.4
Dickson (Qld) 51.6 52 52
Fadden (Qld) 61.05 61.2 61.3
Fairfax (Qld) 60.78 61 60.8
Fisher (Qld) 59.24 59.2 59.3
Flynn (Qld) 51.04 51 51
Forde (Qld) 50.63 50.6 50.6
Griffith (Qld) 48.93 48.6 48.7
Groom (Qld) 65.31 65.3 65.3
Herbert (Qld) 49.98 49.98 50
Hinkler (Qld) 58.42 58.4 58.4
Kennedy (Qld)


Leichhardt (Qld) 53.89 54 53.9
Lilley (Qld) 44.68 44.2 44.2
Longman (Qld) 49.21 49.2 49.2
Maranoa (Qld) 67.54 67.5 67.5
McPherson (Qld) 61.64 61.6 61.6
Moncrieff (Qld) 64.94 64.5 64.7
Moreton (Qld) 45.58 46 45.9
Oxley (Qld) 40.92 40.9 40.9
Petrie (Qld) 51.65 51.6 51.7
Rankin (Qld) 38.7 38.7 38.7
Ryan (Qld) 59.21 58.8 59.1
Wide Bay (Qld) 58.14 58.3 58.2
Wright (Qld) 59.62 59.6 59.6
Adelaide (SA) 40.5 41 41.1
Barker (SA) 64.13 64.3 64.1
Boothby (SA) 52.74 52.8 52.9
Grey (SA) 58.37 58.5 58.1
Hindmarsh (SA) 42.64 41.8 41.8
Kingston (SA) 36.21 36.5 36.4
Makin (SA) 39.04 39.1 39.2
Mayo (SA)


Spence (SA) 31.79 32.1 32.2
Sturt (SA) 55.78 55.8 55.7
Bass (Tas) 44.57 44.7 44.5
Braddon (Tas) 48.43 48.5 48.4
Clark (Tas)


Franklin (Tas) 39.3 39.3 39.3
Lyons (Tas) 46.12 46 46.9
Aston (Vic) 57.73 57.6 57.4
Ballarat (Vic) 42.55 42.6 42.6
Bendigo (Vic) 45.98 46.1 46.1
Bruce (Vic) 33.5 34.3 34.5
Calwell (Vic) 27.88 29.9 29.7
Casey (Vic) 54.03 54.5 54.3
Chisholm (Vic) 53.54 53.4 53.4
Cooper (Vic) 28.25 28 27.9
Corangamite (Vic) 50.61 50.03 50
Corio (Vic) 40.84 41.7 41.5
Deakin (Vic) 56.11 56.3 56.6
Dunkley (Vic) 48.79 48.7 48.7
Flinders (Vic) 56.64 57.2 57.1
Fraser (Vic) 29.85 29.4 29.5
Gellibrand (Vic) 35.68 35.3 35.3
Gippsland (Vic) 68.13 68.2 68.1
Goldstein (Vic) 62.68 62.7 62.7
Gorton (Vic) 32.59 31.7 31.7
Higgins (Vic) 60.64 60.2 60.3
Holt (Vic) 40.16 40.1 40.2
Hotham (Vic) 44.89 45.8 45.8
Indi (Vic)


Isaacs (Vic) 48.69 47.7 47.8
Jagajaga (Vic) 46.03 45 45
Kooyong (Vic) 63.14 62.8 62.9
La Trobe (Vic) 52.68 53.5 52.4
Lalor (Vic) 34.6 35.6 35.6
Macnamara (Vic) 48.41 48.7 48.6
Mallee (Vic) 69.42 69.8 69.6
Maribyrnong (Vic) 40.18 40.6 40.6
McEwen (Vic) 44.03 44.7 44.6
Melbourne (Vic)


Menzies (Vic) 58.2 57.9 57.9
Monash (Vic) 58.02 57.6 57.8
Nicholls (Vic) 72.49 72.3 72.4
Scullin (Vic) 27.26 29.6 29.7
Wannon (Vic) 59.59 59.3 59.3
Wills (Vic) 28.07 28.2 28.3
Brand (WA) 38.57 38.6 38.6
Burt (WA) 42.89 42.9 42.9
Canning (WA) 56.79 56.8 56.8
Cowan (WA) 49.32 49.3 49.3
Curtin (WA) 70.7 70.7 70.7
Durack (WA) 61.06 61.1 61.1
Forrest (WA) 62.56 62.6 62.6
Fremantle (WA) 42.48 42.5 42.5
Hasluck (WA) 52.05 52.1 52.1
Moore (WA) 61.02 61 61
O'Connor (WA) 65.04 65 65
Pearce (WA) 53.63 53.6 53.6
Perth (WA) 46.67 46.7 46.7
Stirling (WA) 56.12 56.1 56.1
Swan (WA) 53.59 53.6 53.6
Tangney (WA) 61.07 61.1 61.1

Monday, September 17, 2018

Ipsos 47 to 53 in Labor's favour

The Ipsos monthly poll has been released. It estimates Labor would receive 53 per cent of the two-party preferred (TPP) vote if an election was held now. Popping these latest numbers into the aggregation, we get an aggregate estimate of 54.2 to 45.8 per cent in Labor's favour.



Turning to the moving averages, which do not cope well with disruption events, we can see the short-run averages are coming in close to the Bayesian model. The longer-run averages will need more time to come into line.


The Ipsos primary vote numbers were a little unusual, with 35 per cent of the primary vote going outside of the Coalition and Labor parties. Also unusual was Labor's low primary vote share in this poll.





Extrapolating a TPP from the primary vote aggregations yields the following.



Monday, September 10, 2018

Fourth Morrison poll - Second Newspoll

Newspoll is out at the start of another parliamentary sitting fortnight. It's the same headline message as the previous Newspoll. Labor is on 56 per cent of the two-party preferred vote, well ahead of the Coalition on 44 per cent. With these numbers, there is not a lot of subtlety: Labor would win with a landslide election result if an election was held at the moment.

I would not read too much into the slight downwards slant of the Morrison period to the right of the chart after the discontinuity. The first day of this period has three polls informing its position, the last day has just one poll. The slant may disappear as more polls come in.



The moving average models are coming around. They will over-shoot the Bayesian model before coming into line. They are not designed for the discontinuity we have seen.


Turning to the primary votes aggregation, we see a similar picture.