Imagine some of the key evidence for promotions at work being anonymous responses from coworkers who just received a bad performance evaluation from you. Something similar happens in higher education, with teachers rated by students grateful for good grades or disgruntled by low grades. That’s a bitter pill to swallow for some academics.
Evidence tells us students take their feedback personally. Jurors’ decision-making is similarly affected by their emotional state. People make worse decisions when they are uncertain or stressed, which are two common states for students.
So how unreliable are student evaluations? And what can we do about it? Our work indicates there is still much to be done in this space, but we can set some rules to make it easier.
All surveys are not equal
Australia’s national Student Experience Survey is considered “the pulse” on student satisfaction rather than a device to enable teacher growth, with the data being easily skewable by circumstances at the time. Unsurprisingly, during 2020, universities that already had an online presence saw the smallest decline in student experience scores.
So the question becomes: did the quality of learning crash in Group of Eight universities, which had the greatest declines in student experience? Unlikely. Instead, students’ ratings reflected their difficulties engaging with new forms of teaching and learning, plus the inertia of COVID-19 lockdowns.
Maybe they should have given students chocolate?
The reality is these surveys do not tell us how students learn, but instead how students perceive their learning. Yet students aren’t experts at what learning is. And when students don’t receive effective training in evaluation, it’s hardly a surprise that teacher gender, race and attractiveness change scores. As the popular allegory puts it:
“Everyone is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”
Instead, let’s ask students to share the most enjoyable content, the most rewarding educational technologies, and where improvement was needed. Include ethics and feedback training for bonus credit.
Making survey tools that work
Psychometrics is the study of measurements. Interestingly, many academics have specialist knowledge in developing surveys that are designed to be valid and reliable. But it’s unclear if universities use them as a resource to develop their surveys, with some academics wondering if they should. The 2021 Employer Satisfaction Survey Methodological Report, for example, does not refer explicitly to the words validity or reliability once across its 140 pages.
Valid surveys exist when the questions align to what we think they are measuring. Using a stopwatch to measure time is easy. When we try to decide how we feel about intangible concepts, it’s harder.
The national Student Experience Survey asks students whether they have developed a sense of belonging to their institution. Yet the evidence on belonging indicates it is typically developed through interpersonal relationships, not institutions, and not through universities.
Reliable surveys exist when the questions generate consistent results over time and over different participants. It’s analogous to when we bake a cake and we assume the scales will always accurately measure 40 grams of butter.
In contrast, as an example, the Australian Student Experience Survey asks whether students have developed their critical thinking skills during their course. How accurately can a person with low critical thinking skills answer this question?
5 rules for surveys to help teachers improve
There are ways that surveys can be used for good. To actually help teachers be better educators and improve student learning. But it requires a reset.
Here are five rules institutions could consider when developing their surveys.
1. Find psychometric specialists to create quality tools
We go to dentists to have our teeth fixed. The same rule applies here. Find individuals who can take the theory of scale development (producing reliable and valid measures to assess an attribute of interest) into the practices of learning and teaching.
2. Change when the survey is done
Lots of evaluations are done before, during and after a program. In higher education, they are completed only after the class has ended.
A change to evaluations at multiple points will help identify if the learner makes progress during the class. This would also help control for cohort problems (one year, for example, students are smarter).
For student experience, contrasting how the same student rates different classes each semester may serve as a stable measure to see which classes need review.
3. Use more than just numbers
4. Control for bias
It’s not always possible to eliminate bias and emotion. We can seek to understand them and use the measures as a case-by-case conversation about improving teaching. Developing reliable and valid tools will help, but if the aim is for these to help teachers improve, then we need to focus on that, not cross-institutional comparisons.
Better yet, let’s actively recognise teachers’ professional growth, call decline into question, and report on averages.
We can also train students to be better evaluators.
5. Create a growth community
Teaching quality surveys do not necessarily increase teaching quality, but they can.
The surveys offer an opportunity to raise awareness of differences. If students rate seven items at 90% but one is 84%, this should prompt research into the reasons. It could be a great opportunity to create more meaningful content; it could also just be an outlier.
Use these findings as publishing opportunities to share what was learned.
Correction: This article has been updated to remove an erroneous attribution of the allegory. It is often attributed to Albert Einstein, but there is no evidence that he said it.