I recently spent some time cleaning up the Python code I use to produce the Bayesian Aggregation charts. When I originally wrote the code some two-plus years ago, it was all in the global name-space. For short analytical notebooks, that's generally not a problem, But for larger notebooks (such as this one) it can hide name clashes, which can result in subtle errors.
About six months ago, one of my models - the Gaussian Process model - just randomly stopped working. The error message was obscure (something about the linker not working). It was not immediately obvious what the problem was. And I did not want to spend the time diagnosing the actual problem.
In the past week, I decided to bite the bullet, and encapsulate the Python code into functions, so that the code was mostly not in the global name-space. I also decided I would make the Python code lint compliant (using black, ruff and pylint) and type safe (using mypy). I have now completed that process (well almost, I still have one function with too many local variables).
The unexpected benefit: the Gaussian Process model works again (even though I did not fiddle with that bit of the code). The lesson learnt (again): doing too much work in the global namespace can result in subtle and hard to detect errors. The other lesson learnt: keep the code clean, and regularly check it with linting tools such as mypy and pylint.
If you want to see the rewritten notebook, it is here. The main supporting functions (including the Bayesian models) are here.
Anyways, the most recent endpoints for the three models I run are as follows (the values in the table are percentages).
| 2pp vote ALP | 2pp vote L/NP | Primary vote ALP | Primary vote GRN | Primary vote L/NP | Primary vote Other | |
|---|---|---|---|---|---|---|
| Gaussian Random Walk - Normal likelihood - fixed priors | 48.82 | 51.16 | 30.67 | 12.40 | 39.39 | 17.53 | 
| Gaussian Process - Normal likelihood - fixed priors | 49.18 | 50.82 | 31.02 | 12.46 | 39.10 | 17.41 | 
| 150-day local regression | 48.42 | 51.58 | 30.18 | 12.65 | 39.81 | 16.52 | 
My preferred model remains the Gaussian Random Walk. As I have noted before, the Gaussian Process model does not perform well when the polling information is relative rare (as it was in 2022 and into 2023. The Gaussian Process model also tends to revert back to the mean at the ends of the series. The local regression model can be overly influenced by the last few data points on the right hand side.
The latest charts follow.
No comments:
Post a Comment