I am a professor of mathematics, so my ears perk up when I hear someone say that polls seem inaccurate.
The public understandably focuses on polling results and how much these results seem to vary. Take two presidential approval polls from March 21. Polling firm Rasmussen Reports reported that 50 percent of Americans approve of President Donald Trump’s performance, while, that same day, Gallup stated that only 37 percent do. In late February, the website FiveThirtyEight listed 18 other presidential approval polls in which Trump’s approval ratings ranged from 39 percent to 55 percent.
Some of these pollsters queried likely voters, some registered voters and others adults, regardless of their voting status. Almost half of the polls relied on phone calls, another half on online polling and a few used a mix of the two. Further complicating matters, it’s not entirely clear how calling cellphones or landlines affects a poll’s results.
Each of these choices has a consequence, and the range of results attests to the degree that these choices can influence results.
Polling is what mathematicians might call a “black art,” a tongue-in-cheek way of saying it does not have the precision of pure mathematics. This perspective offers some insight into why polls appear divided, contradictory, or even flat-out wrong – such as those in the recent presidential election.
In my view, the popular sense that polls are inaccurate stems not from poor polling practices, but from assumptions that both pollsters and the public make. For polls to be more useful to consumers, we need to understand their limitations. The practice of polling and how results are communicated could be improved to build better trust with consumers.
Like many of you, I watched TV on the evening of Nov. 8 in increasing disbelief. I had closely followed FiveThirtyEight’s projections throughout the election season. The site used hundreds of state presidential preference polls to model the election’s outcome. Its poll-based projections have a stellar track record: Between the 2008 and 2012 presidential elections, FiveThirtyEight correctly forecast the victor in every state but one, as well as Washington, D.C.
While FiveThirtyEight’s final projections assigned a 71 to 72 percent probability to Hillary Clinton’s victory, it wasn’t as bullish on her chances as other poll-based models. The New York Times model gave Clinton an 85 percent chance of winning. The Princeton Election Consortium put Clinton’s probability of victory at greater than 99 percent.
Trump’s “surprise” victory led many to wonder how the polls and the models that use them got things so wrong.
At the national level, however, the polls did get it right. The final average of national polls at RealClearPolitics had Hillary Clinton ahead by 3.2 percent nationally. Clinton won the popular vote by roughly 2.1 percent, well within the margin of error.
The presidential election is not decided by national votes, but rather at the state level. If the polls did err, it was in a handful of electorally important states. The majority of the poll-based models listed on The New York Times site, including those of FiveThirtyEight and the Princeton Election Consortium, projected that Clinton would win the pivotal states of Florida, Michigan, North Carolina, Pennsylvania and Wisconsin. Most polls in these states put Clinton ahead as well. If Clinton lost two, or even three, of these states, she could still win.
When the results came in, many of us reacted with shock. Had we more closely attended to the implications of the margin of error, we would perhaps not have.
The margin of error
Every poll has a margin of error. The margin of error means that the true number is not necessarily the reported result, but is within a given range.
Pollsters include a margin of error because they are polling a tiny sample of the voting public. While pollsters do an excellent job of making sure their sample is representative of the voting public, it is rarely a perfect mirror, so there is inevitably error.
In other words, true support for a candidate could fall anywhere within a given range of the poll’s results.
For example, the Democratic polling firm Public Policy Polling sampled 957 likely Michigan voters over two days in November, placing Clinton in the lead over Trump, 46 percent to 41 percent. The poll listed a 3.2 percent margin of error.
So, rather than a simple total, the polls provide a range of possible outcomes. The margin of error implied Clinton’s support level was between 42.8 and 49.2 percent – that is, 46 percent plus or minus the margin of error. Trump’s, likewise, lay between 37.8 and 44.2 percent.
It is entirely possible that both candidates’ votes could be in the overlap of their respective ranges. It is here that scenarios exist where Trump is ahead in Michigan. Most November polls in Florida, Michigan, North Carolina, Pennsylvania and Wisconsin had Clinton ahead, but, in almost every case, the final results fell within a poll’s margins of error.
It is quite natural to see a headline saying that Clinton leads in a poll and conclude that she is indeed ahead. But a correct interpretation of that result can include the possibility that she may not be. To be a savvy reader of polls requires knowing about polling’s inherent limitations.
Polling is limited because pollsters make assumptions, including assumptions about likely voters and demographics. Out of necessity, these assumptions are based on voting patterns from past elections.
Pollsters need to project with a great deal of precision the final voting percentage of each of the subpopulations that compose the electorate. Since polling occurs before an election, it is no easy task to predict, for example, how many white working-class men will vote. Likewise, it is extremely difficult to know the degree to which the prospect of electing a black president drew African-Americans to the polls in 2008 and 2012. Pollsters have to make assumptions about these kinds of things, and each assumption introduces potential error.
The different assumptions polls make about their samples helps explain the broad range of results we saw in the Trump approval ratings.
It also may help explain why, during the election, Trump outperformed the polls in battleground states. His support was high among white working-class voters, who evidently came to the polls in greater numbers than expected. Clinton hoped that black voters would turn out close to how they did in 2008 or 2012, which did not occur. Trump’s margin of victory in the pivotal states of Michigan, Pennsylvania and Wisconsin was roughly 77,000 votes out of 15 million cast.
Very slight changes in demographic assumptions could have accounted for these 77,000 votes and resulted in polls that put Trump ahead.
Hedging your bets
There are ways to hedge against error. Baseball teams like the Chicago Cubs and the Boston Red Sox mix sophisticated analytics with an “eye test”: that is, the input of old baseball hands who rely on observation and feel rather than pure numbers.
Much in the same way, pollsters and modelers could try mixing in human elements. For example, to find out more about personality traits that might impact electability, the Cook Political Report incorporates personal interviews with candidates into their projections of House races. Pollsters could also try to gauge the enthusiasm of a presidential candidate’s supporters by measuring social media activity or public signs.
Another way to improve a poll’s accuracy would be to offer multiple demographic models. For example, Public Policy Polling could have used three different models for its Michigan poll, each based on different demographic assumptions. One might assume black turnout as being the same as the previous presidential election; the second could assume a slightly greater turnout; and the third a smaller one. While these kinds of results might resist easy reduction to a headline, they would provide a richer range of possibilities and perhaps fewer surprises.
Some poll-based models hedge against error by considering other factors, such as their own demographic analyses, incumbent approval ratings and economic indicators. Stating their results as a probability also serves to highlight the uncertainty involved. But they are still based on polls.
An apt analogy is another way to hedge. On the morning of the election, The New York Times observed that Clinton’s chances of winning were roughly the same that a professional field goal kicker will make a 37-yard field goal.
But even the best kickers sometimes miss.