BY SPENCER KIMBALL
During the 2016 Presidential election, like most previous elections during the past century, voters and pundits looked to the media presentation of polling results to provide guidance on the probable outcome of the upcoming election for president. Unfortunately for the credibility of the media, news as in headlines like “Poll: Clinton leads Trump by 7 points in Michigan” (Lim, 2016) presented the polling data as a precise figure and failed in most cases to provide the needed context for viewers to properly understand the variability inherent in poll numbers being presented. One major example of the criteria for polling’s validity and reliability that is often ignored by the daily media is a matter called the sampling error, aka margin of error, which is one of the criteria that must be applied to all polls (Converse, 1986).
This “margin of sampling error” is created because a poll relies on a sample of the population where not everyone is included, as would be the case with a census. This automatically creates a statistical variability in the results, and as such a poll should always be presented and read as a range of scores, and not as an exact figure. How results commonly are presented to the public can lead to misunderstanding of their meaning. If the results are within the poll’s margin of error (which depends on sample size), there is not a statistical difference between the two candidates. If that difference is larger than (or outside of) the poll’s margin of error, then the poll can conclude with a 95% confidence that the candidate leading will receive more votes than the other candidate unless something unexpected occurs between the poll and the actual election (NCPP, 2004; AAPOR, n.d.; Mercer, 2016).
There are other factors that could be used, allowing for additional precision in presenting polling results. While the size of the sample responding to the survey is the most important factor for political polling—because when there are two candidates, it is very likely that they will poll close to “50-50” and it is rare that one has to consider the possibility of a landslide. Furthermore, like all surveys, the selection of respondents should be random, the questions shouldn’t push one candidate over another, and the time between the poll and the election (and events during that time) creates the possibility of all kinds of shifts in the actual election. But, still, the largest statistical problem in evaluating the results of a poll is that of “margin of error.” There is really no excuse to ignore it, since doing so would almost certainly lead to a misunderstanding of the results of the poll.
A margin of error can be calculated based on a variety of formulas, some of which are quite complicated as they include such matters as the size of each candidate’s results, but as a general rule an increase in the size of the sample will decrease the poll’s margin of error (MOE). And this is important. For example a sample of n=1,000 has a MOE of +/- 3 percentage points, but if the sample size is only n=350 the MOE becomes +/- 5.2 percentage points. Listed below are margin of error calculations for a number of common sample sizes (Raosoft, Inc., 2004):
Margin of Error
+/- 2.1 percentage points
A simple way—one that both any member of the press and the public can use to understand if the result of a poll is within or outside the margin of error is based on this formula: Take the absolute difference between the two candidates’ poll results, expressed as percentages of the sample size, and if this is greater than the margin of error (MOE) doubled, then the poll is outside the generally accepted margin of error. If it is less than or equal to the MOE doubled, then it is within the poll’s margin of error. (Kimball, 2017; Mistofsky, 1998). Of course, the use of this formula implies that the sample has been randomly selected.
Using a 2016 Suffolk poll as an example, the poll margin would take the 44% recorded for Clinton minus 37% for Trump—which is a net difference of 7 percentage points. The margin of error for this size sample is multiplied by two, which equals 8.8. Since 7 is less than 8.8, it is within the poll’s margin of error, and thus interpreters of the poll would be unable to conclude (with 95% confidence) that Clinton would win the election—if the election were held immediately.
Another way of reaching (and reporting) this same conclusion would be to use the 44% for Clinton and add/subtract 4.4 percentage points to give Clinton a range of scores that could really be as high as 48.4% and as low as 39.6%. With this method, we’d be confident (at the 95% level, or odds of 19 to 1) that Trump’s 37% could be as high as 41.4% and as low as 32.6%. The conclusion based on this poll is that Clinton’s lead was within the poll’s margin of sampling error, because Clinton's low of 39.6% and Trumps high of 41.4% overlap and, hence, that the results of this poll are within the margin of error. (If, in the earlier example, with a sample size of 1,000, the poll’s margin of error was 3 percentage points, then Clinton’s score could be as low as 41%, and Trump’s could be as high as 40%. That would place the poll’s results outside the margin of error; there would be a 95% probability that Clinton would win … yet, that is not what this poll can conclude.)
If a pollster lowered the confidence level from the industry standard of 95%, that would also lower the poll’s margin of error. In the Suffolk example, if the pollsters lowered their confidence level to 85%, the margin of error would be 3.2 percentage points, and now the poll results would be outside the margin of error. However, in exchange for this lower margin of error, the statistical reliability of the poll would drop from being an outlier in (chance would be expected to provide that answer in only 1 out of 20 cases, or 95% confidence), to only 3 out of 20 cases. Almost all pollsters prefer the higher reliability approach at 95%, as does the American Association for Public Opinion Research (AAPOR), which usually sets the industry standard.
Another issue in 2016 were news outlets cherry-picking polls based on seemingly subjective criteria. For example, there was a Florida Atlantic University poll in August 2016 that had Trump leading Clinton 43% to 41% in Florida. The poll was dismissed by a news organization because the cross-tab within the poll showing that Hispanics had Clinton leading 50% to 40%, and the reporter felt this margin was too small and should have been closer to a 60% to 30% split. This reporter’s criterion for not reporting the results were based on his personal feeling that “the numbers just look a little wrong” (Dawsey, et al., 2016). A proper analysis would have accounted for the margin of error for the subset of Hispanics (n= 206; +/- 7 percentage point margin of error), and should have been presented and interpreted as a range of scores. In this example, Clinton could be as high as 57% and Trump as low as 33%--in line with what the reporter thought would be the expected preference. In this case, did the reporter’s lack of polling literacy and fundamental principles, such as the central limits theorem, limit the dissemination of accurate—valid and reliable—information?
Regardless of the reasons why some polls are reported, or how they are reported, one take-home lesson from 2016 remains clear. Both media organizations and pollsters must do a better job in explaining the results and other uncertainties of a poll, calculate the respective margin of errors, and spell out the implications of these figures in order to create a more informed electorate and to provide better expectations come Election Day.
In conclusion, a simple way to think of polls is to build on the classic idiom: “Close only counts in horseshoes, hand grenades and polls.” And even in these activities, there are ranges and uncertainties.
AAPOR. (n.d.). Margin of Sampling Error/Credibility Interval. Retrieved May 17, 2017, from http://www.aapor.org/Education-Resources/Election-Polling-Resources/Margin-of-Sampling-Error-Credibility-Interval.aspx
Converse, P. E., & Traugott, M. W. (1986). Assessing the Accuracy of Polls and Surveys. Science, 234(4780), 1094-1098. doi:10.1126/science.234.4780.1094
Dawsey, J., Stern, K., Shafer, J., & Messina, J. (2016, August 24). Trump in Tampa as Clinton FL ad bashes his clothing line – DCCC’s hack attack fallout – Prosecutor Angela Corey under fire (again) – Fixing Orange County’s bear problem. Retrieved November 19, 2017, from https://www.politico.com/tipsheets/florida-playbook/2016/08/trump-in-tampa-as-clinton-fl-ad-bashes-his-clothing-line-dcccs-hack-attack-fallout-prosecutor-angela-corey-under-fire-again-fixing-orange-countys-bear-problem-216032
Kimball, S. (2017). 2016 Presidential Statewide Polling—A Substandard Performance: A Proposal and Application for Evaluating Preelection Poll Accuracy. American Behavioral Scientist, 000276421773562. doi:10.1177/0002764217735622
Lim, N. (2016, August 26). Poll: Clinton leads Trump by 7 points in Michigan. Retrieved November 19, 2017, from http://www.cnn.com/2016/08/25/politics/michigan-suffolk-university-poll-hillary-clinton-leads-donald-trump/index.html
Mercer, A. (2016, September 08). 5 key things to know about the margin of error in election polls. Retrieved January 16, 2017, from http://www.pewresearch.org/fact-tank/2016/09/08/understanding-the-margin-of-error-in-election-polls/
Mitofsky, W. J. (1998). Review: Was 1996 a Worse Year for Polls Than 1948? Public Opinion Quarterly,62(2), 230. doi:10.1086/297842
National Council on Public Polls (NCPP) THE 2004 ELECTION POLLS, 2004. (n.d.). Retrieved May 24, 2017, from http://www.ncpp.org/drupal57/files/2004%20Election%20Polls%20Review.pdf