Bayesian Aggregation

This page provides some background on the Bayesian poll aggregation model I use to track trends ahead of the 2028 Australian Federal election.


The Gaussian Random Walk Model

This is a linear model with four important assumptions. The model assumes:

  • poll results are noisy observations of an unobserved underlying whole-of-population voting intention
  • this voting intention on any particular day only changes by a small random amount from the day before (this is the 'random walk' assumption)
  • the methodologies deployed by each pollster have their own stable, built-in house effect when compared with the other pollsters, and
  • the median of all house effects is zero

Each of these assumptions needs some explanation: 

With the first assumption, the model employs a Student-t likelihood for the distribution of polls around the underlying voting intention. This is robust to outliers. The heavy tails of the Student-t distribution automatically down-weight anomalous poll results, allowing the model to estimate a tighter observation noise for the bulk of polls while accommodating occasional outliers through the degrees of freedom parameter (the Greek letter nu, ν). When nu is small (<10), the model is actively down-weighting outlier polls. When nu is moderate (10-30), the data is reasonably clean but the model provides insurance against occasional anomalies. When nu is large (>30), the distribution of poll results is essentially Normal.

The model estimates polling noise from the data rather than using reported sample sizes. The estimated observation noise implies effective sample sizes of around 1,000-1,500 people - smaller than reported samples but consistent with design effects from quota sampling and weighting adjustments. 

The second assumption - the daily transition - is hard-coded as drawn from a normal sample with a standard deviation of 0.1 percentage points from the previous day. This is a modelling choice. It can be contested, but it appears reasonable given the polling movements to date. If there were a sharp sudden movement in polling - for example, associated with a change of political leader - then we might need to introduce a discontinuity into the time-series.

Embedded in the third assumption is that pollsters do not change their methodology. In practice, and given enough time, this is very unlikely. Where it becomes clear that a pollster has changed their methodology, either by public statement, or because it is clear that the polling dynamics have changed, then a separate polling series for that pollster will be commenced from the date of the methodological change.

Because the model is very sensitive to this third assumption, we run a number of tests in respect of the polling residuals for those pollsters with more than 5 polls present. These tests help identify where there may be undisclosed changes in polling methodology:

  • Residuals outside ±3σ: individual polls deviating more than expected
  • Heteroskedasticity test: poll-to-poll variance changing over time
  • Recent outliers count: clustering of unusual results in latest polls
  • T-test for mean shift: systematic shift between earlier and later polls

This final assumption - that the median house effect is zero - is somewhat ad hoc. It anchors the aggregate voting intention to the middle of the pack: it assumes that half of pollsters lean toward Labor, half toward the Coalition, and the midpoint is treated as "truth." I chose the median over the mean because the mean can be pulled around by one or two extreme pollsters; the median is naturally robust to outliers.

Historically, Australian polls have leaned toward Labor more often than not. The Coalition has typically outperformed the polling average - most infamously in 2019, when polls pointed to a Labor win and the Coalition was returned. The 2025 election was a dramatic reversal: pollsters significantly underestimated the scale of Labor's victory, with most predicting a narrow win or even a minority government rather than the Labor landslide that eventuated. The lesson: this model tells you where the polling industry's collective estimate sits, but that estimate can be wrong in either direction.

Nonetheless, you can use the house effect plots to make your own adjustment to the time-series up or down. If you have reason to believe a particular pollster is likely to be more accurate than the others, you can mentally adjust the time-series by the amount of that pollster's median house effect. For example, if you believe that Newspoll is the gold standard in Australian polling, and the median house effect for Newspoll is (say) plus one percentage point for a particular series, you can add one percentage point to the voting intention time-series line to get an aggregation that accommodates your belief about the collective house effects. 


Model Specification

Now that we have explained the model in prose, let's do the same thing in the more formal language of mathematics. Your browser needs to support MathJax to see these formulas. Most desktop browsers do. Browsers on the iPhone and iPad may not.

Voting Intention

The true underlying voting intention evolves as a Gaussian random walk:

\[ \mu_0 \sim \mathcal{N}\left(\bar{y}_{1:10}, \, 5^2\right) \]

\[ \mu_t = \mu_{t-1} + \epsilon_t, \quad \epsilon_t \sim \mathcal{N}\left(0, \sigma_{\text{walk}}^2\right) \]

where \(\bar{y}_{1:10}\) is the mean of the first 10 polls (used to initialise the random walk), and \(\sigma_{\text{walk}} = 0.1\) percentage points per day is fixed.

House Effects

House effects are modelled with a median-to-zero constraint, which is robust to outlier pollsters:

\[ h_p^{\text{raw}} \sim \mathcal{N}(0, \sigma_h^2) \]

\[ h_p = h_p^{\text{raw}} - \text{median}(\mathbf{h}^{\text{raw}}) \]

where \(\sigma_h = 5\). This ensures that the median house effect is exactly zero, while allowing individual pollsters to deviate. Unlike a sum-to-zero (mean-to-zero) constraint, the median is not pulled around as much by one or two extreme pollsters.

Observation Model (Student-t Likelihood)

\[ y_i \sim \text{Student-t}\left(\nu, \; \mu_{t_i} + h_{p_i}, \; \sigma_{\text{obs}}\right) \]

where:

  • \(y_i\) is the \(i\)-th poll result (zero-centred)
  • \(t_i\) is the day on which poll \(i\) was conducted
  • \(p_i\) is the pollster index for poll \(i\)
  • \(\mu_{t_i}\) is the latent voting intention on day \(t_i\)
  • \(h_{p_i}\) is the house effect for pollster \(p_i\)

Priors

\[ \sigma_{\text{obs}} \sim \text{HalfNormal}(5) \]

\[ \nu \sim \text{Gamma}(2, \, 0.1) + 1 \]


Source code

The code for this model can be found on my GitHub site.