Friday, January 24, 2025

Code clean-up: Bayesian Aggregation models

I recently spent some time cleaning up the Python code I use to produce the Bayesian Aggregation charts.  When I originally wrote the code some two-plus years ago, it was all in the global name-space. For short analytical notebooks, that's generally not a problem, But for larger notebooks (such as this one) it can hide name clashes, which can result in subtle errors. 

About six months ago, one of my models - the Gaussian Process model - just randomly stopped working. The error message was obscure (something about the linker not working). It was not immediately obvious what the problem was. And I did not want to spend the time diagnosing the actual problem. 

In the past week, I decided to bite the bullet, and encapsulate the Python code into functions, so that the code was mostly not in the global name-space. I also decided I would make the Python code lint compliant (using black, ruff and pylint) and type safe (using mypy). I have now completed that process (well almost, I still have one function with too many local variables). 

The unexpected benefit: the Gaussian Process model works again (even though I did not fiddle with that bit of the code). The lesson learnt (again): doing too much work in the global namespace can result in subtle and hard to detect errors. The other lesson learnt: keep the code clean, and regularly check it with linting tools such as mypy and pylint. 

If you want to see the rewritten notebook, it is here. The main supporting functions (including the Bayesian models) are here.

Anyways, the most recent endpoints for the three models I run are as follows (the values in the table are percentages). 

2pp vote ALP 2pp vote L/NP Primary vote ALP Primary vote GRN Primary vote L/NP Primary vote Other
Gaussian Random Walk - Normal likelihood - fixed priors 48.82 51.16 30.67 12.40 39.39 17.53
Gaussian Process - Normal likelihood - fixed priors 49.18 50.82 31.02 12.46 39.10 17.41
150-day local regression 48.42 51.58 30.18 12.65 39.81 16.52

My preferred model remains the Gaussian Random Walk. As I have noted before, the Gaussian Process model does not perform well when the polling information is relative rare (as it was in 2022 and into 2023. The Gaussian Process model also tends to revert back to the mean at the ends of the series. The local regression model can be overly influenced by the last few data points on the right hand side.

The latest charts follow.






























Monday, January 20, 2025

Polling update mid January

January is normally a quite month for polling in Australia, and 2025 is no exception. Nonetheless there have been a couple of polls since Christmas 2024. These polls suggest little movement since Christmas in two-party preferred terms. You will note that I now indicate on the first house-effects chart below, those polling houses which I have constrained so that their house effects sum to zero. I exclude pollsters with fewer than 5 polls, and the prior methodologies from pollsters which have changed methodologies.









 In terms of the Primary Vote, Labor has picked up a tick in the most recent polls.





Conversely, the Greens have declined a touch.





The Coalition has improved a touch.





And the Other parties have declined a touch.





The betting marker has not moved in the past month.





Update: The polling charts were updated to correct some glitches in the earlier calculations. The code I use to produce these charts is publicly available on my GitHub page. The data for these charts is sourced from the Wikipedia page on Opinion Polling for the 2025 Australian Federal Election.