Friday, January 30, 2015

January 2015 Update





Updated

  • 5 February 2015 - added maxima, minima and endpoint statistics to the charts

Saturday, January 10, 2015

Polling accuracy

Leigh and Wolfers observed in 2006 that "the 'margin of error' reported by pollsters substantially over-states the precision of poll-based forecasts. Furthermore, the time-series volatility of the polls (relative to the betting markets) suggest that poll movements are often noise rather than signal" (p326). They went on to suggest, "for forecasting purposes the pollsters' published margins of error should at least be doubled" (p334).

Leigh and Wolfers are not alone. Walsh, Dolfin and DiNardo wrote in 2009, "Our chief argument is that pre-election presidential polling is an activity more akin to forecasting next year's GDP or the winner of a sporting match than to scientific probability sampling" (p316).

In this post I will examine these claims a little further. We start with the theory of scientific probability sampling.

Polling theory

Opinion polls tell us how the nation might have voted if an election was held at the time of the poll. To achieve this magic, opinion polls depend on the central limit theorem. According to this theorem, the arithmetic means from a sufficiently large number of random samples from the entire population population will be normally distributed around the population mean (regardless of the distribution in the population).

We can use computer generated pseudo-random numbers to simulate the taking of many samples from a population, and we can plot the distribution of arithmetic means for those samples. A python code snippet to this effect follows.

# --- initial
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

# --- parameters
sample_size = 400      # we will work with multiples of this
num_samples = 200000   # number of samples to be drawn in each simulation
threshold = 0.5        # proportion of the population that "vote" in a particular way

# --- model
fig = plt.figure(figsize=(8,4))
ax = fig.add_subplot(111)

for i in [1,2,3,4,6,8]:

    # get sample size for this simulation
    s = i * sample_size

    # draw num_samples, each of size s
    m = np.random.random((s, num_samples))

    # get the population proportion (as a percent) that is less than the threshold
    m = np.where(m < threshold, 1.0, 0.0).sum(axis=0) / (s * 1.0) * 100.0

    # perform the kernel density estimation
    kde = sm.nonparametric.KDEUnivariate(m)
    kde.fit()

    # plot
    ax.plot(kde.support, kde.density, lw=1.5,
        label='Sample size: {0:}   SD: {1:.2f} TSD: {2:.2f}'.format(s, np.std(m),
            100.0 * np.sqrt(threshold * (1 - threshold) / s)))

ax.legend(loc='best', fontsize=10)
ax.grid(True)
ax.set_ylabel(r'Density', fontsize=12)
ax.set_xlabel(r'Mean Vote Percentage for Party', fontsize=12)
fig.suptitle('The Central Limit Theorem: Probability Densities for Differnt Sample Sizes')
fig.tight_layout(pad=1)
fig.savefig('./graphs/model0', dpi=125)

In each simulation we draw 200,000 samples from our imaginary population. In the first simulation each sample was 400 cases in size. In the subsequent simulations the sample sizes were 800, 1200, 1600, 2400 and finally 3200 cases. For each simulation, we assume (randomly) that half the individuals in the population vote for one party and half vote for the other party. We can plot these simulations as a set of probability densities for each sample size, where the area under the curve is one unit in size. I have also reported the standard deviation (in percentage points) from the simulation (SD) against the theoretical standard deviation you would expect (TSD) for a particular sample size and vote share.


As the sample gets larger, the width of the bell curve narrows. The mean of larger samples, when randomly selected, is more likely to be closer to the population mean than the mean of the smaller samples. And so, with a sample of 1200, we can assume that there is a 95% probability that the mean of the population is within plus or minus 1.96 standard deviations (ie. plus or minus 2.8 percentage points) of the mean of our sample. This is the oft cited "margin or error", which derives from sampling error (the error that occurs from observing a sample, rather than observing the entire population).

So far so good. 

Polling practice

But sampling error is not the only problem with which opinion polls must contend. The impression of precision from a margin of error is (at least in part) misleading, as it "does not include an estimate of the effect of the many sources of non-sampling error" (Miller 2002, p225).

Terhanian (2008) notes that telephone polling "sometimes require telephone calls to more than 25,000 different numbers to complete 1,000 15-minute interviews over a five-day period (at least in the US)". Face-to-face polling typically excludes those living in high-rise apartments and gated communities, as well as those people who are intensely private. Terhanian argues that inadequate coverage and low response rates are the most likely culprits when polls produce wildly inaccurate results.The reason for inaccuracy is that the sampling frame or approach that has been adopted does not randomly select people from the entire population. Segments of the population are excluded.

Other issues that affect poll outcomes include question design and the order in which questions are asked (McDermott and Frankovic 2003) both of which can shift poll results markedly, and house effects (the tenancy for a pollster's methodology to produce results that tend to lean to one side of politics or the other) (Jackman 2005; Curtice and Sparrow 1997). 

Manski (1990) observed that while some people hold firm opinions, others do not. For some people, their voting preference is soft: what they say they would do and what they actually do differ. Manski's (2000) solution to this problem was to encourage pollsters to ask people about the firmness of their voting intention. In related research, Hoek and Gendall (1997) found that strategies to reduce the proportion of undecided responses in a poll may actually reduce poll accuracy.

A final point worth noting is that opinion polls tell us an historical fact. On the date people were polled, they claim they would have voted in a particular way. Typically, the major opinion polls do not seek to forecast how people will vote at the next election (Walsh, Dolfin and DiNardo 2009, p317). Notwithstanding this limitation, opinion polls are often reported in the media in a way that suggests a prediction on how people will vote at the next election (based on what they said last weekend when they were polled). In this context, I should note another Wolfers and Leigh (2002) finding:
Not surprisingly, the election-eve polls appear to be the most accurate, although polls taken one month prior to the election also have substantial predictive power. Polls taken more than a month before the election fare substantially worse, suggesting that the act of calling the election leads voters to clarify their voting intentions. Those taken three months prior to the election do not perform much better than those taken a year prior. By contrast, polls taken two years before the election, or immediately following the preceding election, have a very poor record. Indeed, we cannot reject a null hypothesis that they have no explanatory power at all... These results suggest that there is little reason to conduct polls in the year following an election.

Conclusion

The central limit theorem allows us to take a relatively small but randomly selected sample and make  statements about the whole population. These statements have a mathematically quantified reliability, which is known as the margin of error.

Nonetheless, the margins of error that are often reported with opinion polls overstate the accuracy of those polls. These statements only refer to one of the many sources of error that impact on accuracy. While the many other sources of error are rarely as clearly identified and quantified as the sampling error, their impact on poll accuracy is no less real.

There are further complications when you want to take opinion polls and predict voter behaviour at the next election. Only polls taken immediately prior to an election are truly effective for this purpose.

All-in-all, it is not hard to see why Leigh and Wolfers (2006) said, "for forecasting purposes the pollsters' published margins of error should at least be doubled" (p334).

Bibliography

John Curtice and Nick Sparrow (1997),  "How accurate are traditional quota opinion polls?", Journal of the Market Research Society, Jul 1997, 39:3, pp433-448.

Janet Hoek and Philip Gendall (1997), "Factors Affecting Political Poll Accuracy: An Analysis of Undecided Respondents", Marketing Bulletin, 1997, 8, pp1-14.

Simon Jackman (2005), "Pooling the polls over an election campaign", Australian
Journal of Political Science, 40:4, pp499-51.

Andrew Leigh and Justin Wolfers (2006), "Competing Approaches to Forecasting Elections: Economic Models, Opinion Polling and Prediction Markets", Economic Record, September 2006, Vol. 82, No. 258, pp325-340.

Monika L McDermott and Kathleen A Frankovic (2003), "Horserace Polling and Survey Method Effects: An Analysis of the 2000 Campaign", The Public Opinion Quarterly, Vol. 67, No. 2 (Summer, 2003), pp244-26.

Charles F Manski (1990), “The Use of Intentions Data to Predict Behavior: A Best-Case Analysis.” Journal of the American Statistical Association, Vol 85, No 412, pp934-40.

Charles F Manski (2000), "Why Polls are Fickle", Op-Ed article, The New York Times, 16 October 2000.

Peter V Miller (2002), "The Authority and Limitations of Polls", in Jeff Manza, Fay Lomax Cook and Benjamin J Page (eds) (2002), Navigating Public Opinion: Polls, Policy and the Future of American Democracy, Oxford University Press, New York.

George Terhanian (2008), "Changing Times, Changing Modes: The Future of Public Opinion Polling?",  Journal of Elections, Public Opinion and Parties, Vol. 18, No. 4, pp331–342, November 2008.

Elias Walsh, Sarah Dolfin and John DiNardo (2009), "Lies, Damn Lies and Pre-Election Polling", American Economic Review: Papers & Proceedings 2009, 99:2, pp316–322.

Justin Wolfers and Andrew Leigh (2002), "Three Tools for Forecasting Federal Elections: Lessons from 2001", Australian Journal of Political Science, Vol. 37, No. 2, pp223–240.