Let us consider the following scenario.
You have enrolled in a Massive Open Online Course (MOOC) offered by a world renowned university.
After four weeks of solid work you have completed your first assignment and you sit down to upload the essay. Within a second of the essay being sent for grading your result appears declaring your essay to be a less than stellar effort.
But the essay might not have even been seen by a human, but instead been graded entirely by a computer system comparing your essay to sample essays in a database.
But should automated grading systems be used for essays? And how good are these systems anyway?
The MOOC phenomena
The hype surrounding MOOCs reached fever pitch last year. MOOCs were initially brought to us by prestigious American universities – offering the same content that students paid for, to anyone for free.
Australian schools and universities have been using automated grading systems for multiple-choice and true-false tests for many years.
But edX has moved one step further using artificial intelligence technology to grade essays – a controversial step given this approach is yet to be accepted.
EdX’s president, Anant Agarwal told the New York Times earlier this month that “instant grading software would be a useful pedagogical tool, enabling students to take tests and write essays over and over and improve the quality of their answers.” Agarwal said the use of artificial intelligence technologies to grade essays had “distinct advantages over the traditional classroom system, where students often wait days or weeks for grades.”
Automated grading systems that assess written test answers have been around since the 1960s when the first mainframe computers were introduced.
The New York Times reports that four US states (Louisiana, North Dakota, Utah and West Virginia) now use automated essay grading systems in secondary schools and in some situations the software is used as a backup which provides a check on the human assessors.
Automated essay grading relies upon the system being trained with a set of example essays that have been hand-scored. It then learns from the example essays and results provided for other student’s essays, and includes an analysis of indicators about phrases, keywords, sentence and paragraph construction.
Automated essay grading systems can be fine-tuned by getting humans to grade a sub-set of the submitted essays. But this limits the ability of the automated grading system to provide instantaneous results and feedback.
The artificial intelligence technology can then step in to make the process more sophisticated, using knowledge gained from human marked essays.
Can a computer really grade an essay?
There has not been general acceptance for the use of artificial intelligence technologies within automated grading systems. And recent moves by online education organisations to use artificial intelligence technologies for high-stakes testing has caused concern among academia.
This concern culminated in an online petition against machine scoring of essays launched earlier this year by a group of concerned academics and research staff. Over 3,600 signatures have been collected so far including from high profile intellectuals like Noam Chomsky.
A statement on the petition website argues that “computers cannot ‘read’. They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organisation, clarity, and veracity, among others.”
Les Perelman, a researcher at MIT, is highly critical of automated grading systems and has provided a critique of the state-of-the-art of artificial intelligence technology use in automated essay grading systems.
Perelman states that “comparing the performance of human graders matching each other to the machines matching the resolved score still gives some indication that the human raters may be significantly more reliable than machines."
ETS uses the e-Rater software, in conjunction with human assessors, to grade the Graduate Record Examinations (GRE) and Test Of English as a Foreign Language (TOEFL), without human intervention for practice tests.
Both these tests are high stakes – the former decides entrance to US graduate schools and the latter the fate of non-English speakers wishing to study at American universities.
In the rush to adopt MOOCs, Australian universities may skip important debates on what forms of assessment are acceptable and how to ensure educational outcomes are valid.
Central to the value of MOOCs as a pedagogical tool is the method used to assess course participants.
Artificial intelligence technologies have been advancing rapidly but have they reached the point where automated grading systems can replace teaching academics?
Please tell us what you think – do we need real live humans to grade essays or do you believe computers can do the job just as well? Leave your thoughts in the comments below.