For pollsters, the British general election of 2017 goes down as another uncomfortable experience. While, on average, the final polls of those companies that belong to the British Polling Council were spot on in their estimate of Conservative support across Great Britain as a whole (44%), they underestimated Labour support by as much as five points.
As a result, on average the final polls put the Conservatives eight points ahead – implying that the party would secure at least a modest overall majority. In the event, the party was ahead by just three points, a lead that proved inadequate for the party to retain a majority at all.
Still, at least it was not the mistake that pollsters usually make. At recent elections, including in 2015, the polls have typically underestimated Conservative support and overestimated Labour’s strength. This time around it was Labour support that was underestimated – and by a bigger margin than ever before.
Ghosts of 2015
Meanwhile, also in contrast to 2015, not every company made the same mistake. One BPC member, Survation, actually slightly underestimated the Conservative lead. It put it at one point, doing so in part because the poll underestimated the Conservative tally by between two and three points.
In addition, US company SurveyMonkey only put the Tory lead at four points (but again slightly underestimated the Conservative share). The same was true of a novel exercise conducted by YouGov, which used big data to forecast the outcome in seats rather than just the countrywide share of the vote.
So, why might the polls have suddenly moved from overestimating Labour to, for the most part if not always, underestimating it?
The past two years have seen the polling companies undertake substantial methodological changes, as they have tried to learn the lessons of 2015. They have focused particularly on estimating demographic differences in turnout more accurately. If a party is relatively strong within a demographic group whose members tend to turn out in low numbers, that party’s vote share can all too easily be overestimated if the poll overestimates the level of turnout in that group.
This became a particular concern in 2015 as Labour grew in popularity among younger people. They are usually less likely to vote, but in 2015 the polls overestimated their likelihood of doing so. To address this problem, some companies have made efforts to recruit less politically engaged people into their samples, while others have used information about the relationship between demographic background and turnout in 2015 in an attempt to model the likely pattern of turnout in 2017.
Still, a question that inevitably arises is whether these and various other changes that were made by the pollsters ended up overcompensating for the mistakes that were apparent in 2015. Perhaps adjusting the data made the projections less, rather than more, reflective of the public mood?
One way to assess whether that might be the case is to look at the underlying unweighted numbers in the polls, that is, simply, the total number of people who said they were going to vote Conservative, Labour, Liberal Democrat etc.
One striking feature of the final polls in 2015 was that, on average, across all the companies, these unweighted numbers pointed to exactly the same outcome as the headline figures that were reported after the data had been weighted to look more demographically and politically representative.
It appeared that the principal problem was that the polls had simply interviewed too many Labour voters – an imbalance that the pollsters’ various adjustments failed overall to correct. This is one reason why an independent inquiry into the performance of the polls in 2015 concluded that the principal reason why the polls overestimated Labour’s strength relative to that of the Conservatives was because their underlying samples were biased towards Labour.
An adjustment too far?
A look at the unweighted data in the polls this time around suggests, however, that the underestimation of Labour’s strength was not simply occasioned by poor samples. Looking at the detailed tables produced by the pollsters, it appears that, on average, the total Labour tally in the raw unweighted data was just one point less than that for the Conservatives. In short, the polls’ unweighted data were actually slightly underestimating the Tory lead over Labour (in line with the historical record of the polls), not substantially overestimating it as in the figures that were headlined.
Now, no one would suggest that polling companies should be reporting as their headline figure their raw unweighted data. Polls are often clearly demographically unrepresentative, for example typically containing too few younger people and too few in a working-class occupation. Indeed, in at least one final poll the underlying design oversampled voters living in some parts of Britain and thus could not possibly be regarded as representative without some downweighting of those living in the areas that had been oversampled.
However, the sharp contrast in most polls between the small Tory lead in the unweighted polling data and the lead in the reported headline figure suggests that the companies’ principal problem this time was not that their samples were unrepresentative. Indeed, a number of companies had tried to improve the quality of their samples. Rather, it seems the attempts made to adjust the data often proved to be a step too far.
The British Polling Council has decided not to hold an inquiry into the performance of the polls this year. In the end, at least some polls more or less got it right, and so not all of the industry was in error in the way that it had been in 2015. Instead, each company has been asked to review their work and report on what they think went wrong (and right) with their polls. These reports will be presented at a conference before Christmas. But it looks as though a good place for most companies to start will be with the merits of their various adjustments.