Game-playing of the REF makes it an incomplete census

Not a full picture. Missing puzzle pieces via tadamichi/Shutterstock

Research assessment is only partly reliable as an indicator of the real quality of the work going on in higher education. It has a dual character. On one hand it is rooted in material facts and objective methods. Strong research quality and quantity should be and are rewarded in the UK Research Excellence Framework (REF), the results of which have just been published.

But the outcome is also shaped by the universities that select and fashion data for competitive purposes and the subject area panels that define research judged to be outstanding on a global scale.

Total research activity can never be fully captured in performance data. Some things, such as citations in top journals, are easier to measure than others, such as the long-term impacts of research on policy and professional practice. Experienced players are best at gaming the system in their own interest.

A very strong overall REF performance signifies a large concentration of outstanding work. It is an unambiguous plus. All the same, precise league table positions in the REF, indicator by indicator, should be taken with a grain of salt.

Measuring ‘impact’

In the REF, the indicators for “impact”, which are new to the 2014 assessment, are the least objectively grounded and most vulnerable to manipulation. This is because of the intrinsic difficulty of measuring the changes to society, economy and policy induced by new knowledge, especially in the long-term, and because of the kind of crafted “impact-related” data that is collected during the REF assessment process. A sophisticated industry has already emerged to manufacture examples of the relevant “evidence” of impact.

At best, this gets everyone thinking about real connections with the users of research, which is one – though only one – of the starting points when producing the impact documentation. At worst, it leads to data that bears as much relation to reality as Soviet-era statements of output by Russian factories in response to government targets.

Inevitably, the universities most experienced and adept at managing their response to performance measures of all kinds will perform especially well in demonstrating proof of impact. There is also a “halo” effect, of the kind that affects all measures contaminated by prior reputation.

The REF indicators that are the most meaningful are those related to “output” quality, such as the grade-point average (GPA) of each university, and the proportion of researchers ranked as “world-leading”. These are grounded in considered judgements of real research work, by panels with significant expertise.

Is it getting better and better?

Yet the value of the output indicators in the REF, which include publication numbers, as measures of comparative quality, are subject to two caveats.

First, between the previous Research Assessment Exercise (RAE) in 2008 and the 2014 REF there has been a notable inflation of the proportion of UK research outputs judged to be “world-leading” (rated four-star) and “internationally excellent” (rated three-star).

In 2008, just 14% of research outputs were judged to be four-star and 37% were judged to be three-star, a total of 51% in the top two categories. In 2014, the proportion of the work judged to be outstanding had somehow jumped to 72% – 22% was judged to be four-star and another 50% judged to be three-star. This phenomenal improvement happened at a time when resources in UK higher education were constrained by historical standards.

While genuine improvement no doubt has occurred in at least some fields, the scale and speed of this improvement beggars belief. It reflects a combination of factors that generate boosterism. Higher education institutions have a vested interest in maximising their apparent quality. Subject area panels have a vested interest in maximising the world-class character for their fields. And UK higher education and its institutions are competing with other nations, especially the United States, for research rankings, doctoral students and offshore income.

The inflation of three and four-star research is a worrying sign of a system in danger of becoming too complacent about its own self-defined excellence. This is not the way to drive long-term improvement in UK research. Less hubris and more hard-nosed Chinese-style realism would produce better outcomes.

It would be better to rely less on self-regulation, enhance the role of external international assessors in judgements about what constitutes “world-leading” research and spotlight areas where improvement is most needed, rather than focusing attention solely on the areas where research is very strong.

The selectivity game

The second caveat is that universities can readily game the assessment of output quality, by being highly selective about whose work they include in the assessment. Including only the best researchers pushes up the average GPA and the proportion of research ranked as four-star. Those institutions that do this pay a financial price, in that their apparent volume of research is reduced – and their subsequent funding will fall. Nevertheless, it is good for reputation. That has many long-term spin-offs, including financial benefits.

While some universities have chosen to approach the REF on an inclusive basis, others have pursued highly tailored entries designed to maximise average output quality and impact. Just one example: Cardiff sharply reduced its number of full-time equivalent staff, from 1,030 in the 2008 RAE to only 738 in the 2014 REF, according to analysis by Times Higher Education. This lifted Cardiff’s quality rating, the grade-point average of its outputs, to sixth in the country, though in terms of the volume of high-quality research it appeared to fall from 15th in the UK to 18th in the Times Higher Education’s ranking.

As universities do not have to enter all the eligible staff for the REF, the data is an incomplete census of all research activity and does not compare like-with-like. In each field of research, the measures of performance compare universities that enter 80%-100% of their staff in that field, with universities that enter only 10%-20% of the eligible staff, rendering meaningless any comparison of average quality. This undermines the validity of the REF as a league table of system performance, though everyone treats it that way.

The trend to greater selectivity manifest in some, but not all, higher education institutions is no doubt one of the factors that has inflated the incidence of four-star and three-star rated research.

This is an extract of an article published on the IOE London blog.

Next read: What is the REF and how is the quality of research measured?