Sections

Services

Information

UK United Kingdom

Computer thinks you’re dumb: automated essay grading in the world of MOOCs

Let us consider the following scenario. You have enrolled in a Massive Open Online Course (MOOC) offered by a world renowned university. After four weeks of solid work you have completed your first assignment…

An essay you submit in an online course might not be graded by humans but by computers instead. Keyboard image from www.shutterstock.com

Let us consider the following scenario.

You have enrolled in a Massive Open Online Course (MOOC) offered by a world renowned university.

After four weeks of solid work you have completed your first assignment and you sit down to upload the essay. Within a second of the essay being sent for grading your result appears declaring your essay to be a less than stellar effort.

But the essay might not have even been seen by a human, but instead been graded entirely by a computer system comparing your essay to sample essays in a database.

EdX, a non-profit MOOC provider founded by Harvard University and the Massachusetts Institute of Technology, introduced automated essay grading capability in a software upgrade earlier this year.

But should automated grading systems be used for essays? And how good are these systems anyway?

The MOOC phenomena

The hype surrounding MOOCs reached fever pitch last year. MOOCs were initially brought to us by prestigious American universities – offering the same content that students paid for, to anyone for free.

Australian universities soon jumped on board and homegrown MOOC platforms quickly followed.

Australian schools and universities have been using automated grading systems for multiple-choice and true-false tests for many years.

But edX has moved one step further using artificial intelligence technology to grade essays – a controversial step given this approach is yet to be accepted.

EdX’s president, Anant Agarwal told the New York Times earlier this month that “instant grading software would be a useful pedagogical tool, enabling students to take tests and write essays over and over and improve the quality of their answers.” Agarwal said the use of artificial intelligence technologies to grade essays had “distinct advantages over the traditional classroom system, where students often wait days or weeks for grades.”

Robot graders

Automated grading systems that assess written test answers have been around since the 1960s when the first mainframe computers were introduced.

The New York Times reports that four US states (Louisiana, North Dakota, Utah and West Virginia) now use automated essay grading systems in secondary schools and in some situations the software is used as a backup which provides a check on the human assessors.

Automated essay grading relies upon the system being trained with a set of example essays that have been hand-scored. It then learns from the example essays and results provided for other student’s essays, and includes an analysis of indicators about phrases, keywords, sentence and paragraph construction.

Automated essay grading systems can be fine-tuned by getting humans to grade a sub-set of the submitted essays. But this limits the ability of the automated grading system to provide instantaneous results and feedback.

The artificial intelligence technology can then step in to make the process more sophisticated, using knowledge gained from human marked essays.

Can a computer really grade an essay?

There has not been general acceptance for the use of artificial intelligence technologies within automated grading systems. And recent moves by online education organisations to use artificial intelligence technologies for high-stakes testing has caused concern among academia.

This concern culminated in an online petition against machine scoring of essays launched earlier this year by a group of concerned academics and research staff. Over 3,600 signatures have been collected so far including from high profile intellectuals like Noam Chomsky.

A statement on the petition website argues that “computers cannot ‘read’. They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organisation, clarity, and veracity, among others.”

Les Perelman, a researcher at MIT, is highly critical of automated grading systems and has provided a critique of the state-of-the-art of artificial intelligence technology use in automated essay grading systems.

Perelman states that “comparing the performance of human graders matching each other to the machines matching the resolved score still gives some indication that the human raters may be significantly more reliable than machines."

In June 2012, Perelman submitted a nonsense essay to the US Educational Testing Service’s (ETS) automated grading systems called e-Rater and received the highest possible grade.

ETS uses the e-Rater software, in conjunction with human assessors, to grade the Graduate Record Examinations (GRE) and Test Of English as a Foreign Language (TOEFL), without human intervention for practice tests.

Both these tests are high stakes – the former decides entrance to US graduate schools and the latter the fate of non-English speakers wishing to study at American universities.

Testing times

In the rush to adopt MOOCs, Australian universities may skip important debates on what forms of assessment are acceptable and how to ensure educational outcomes are valid.

Central to the value of MOOCs as a pedagogical tool is the method used to assess course participants.

Artificial intelligence technologies have been advancing rapidly but have they reached the point where automated grading systems can replace teaching academics?

Please tell us what you think – do we need real live humans to grade essays or do you believe computers can do the job just as well? Leave your thoughts in the comments below.

Join the conversation

30 Comments sorted by

  1. Kevin Orrman-Rossiter

    Research Partnerships Officer at University of Melbourne

    Insightful article. Thank you.

    I am puzzled by the statement about speed of feedback and the ability for students to "improve':
    "EdX’s president, Anant Agarwal told the New York Times earlier this month that “instant grading software would be a useful pedagogical tool, enabling students to take tests and write essays over and over and improve the quality of their answers.” Agarwal said the use of artificial intelligence technologies to grade essays had “distinct advantages over the traditional…

    Read more
  2. Craig Savage

    Professor of Theoretical Physics at Australian National University

    Computers can't beat chess grandmasters - but Deep Blue did.
    Can't can't drive cars in traffic - but the Google cars do.
    Can't understand natural language, with all its ambiguities - but Watson does.

    Computers don't mark essays as well as people, or don't give as insightful feedback? Perhaps, but improvements are likely to be rapid now that the end is in sight. Machine learning is a game changing technology. Have you noticed how good Google Search has got at knowing what you want?

    I think the underlying debate is between techno-optimistics and those who believe there is something special about blood and guts.

    report
    1. Michael Wilbur-Ham (MWH)

      Writer (ex telecommunications engineer)

      In reply to Craig Savage

      This isn't a debate about the future, it is about what is happening now.

      Read the nonsense essay which scored top marks and have a good laugh (or cry).

      report
    2. Mark A Gregory

      Senior Lecturer in Electrical and Computer Engineering at RMIT University

      In reply to Craig Savage

      Hi Craig,

      but can computers be inspriational or have that spark of insight?

      Computers are good at crunching the numbers - Deep Blue - but can they match human's for their ability to use a language as complex as English? What about the slang, nuances, oxymorons - no I'm not talking about students :)

      Or are we using a technology that is not mature?

      report
    3. Mark A Gregory

      Senior Lecturer in Electrical and Computer Engineering at RMIT University

      In reply to Craig Savage

      Hi Craig,

      I'm not sure that most people would agree. I'm yet to come across a teacher or Academic without skills, knowledge or the human spark that computers will find difficult to emulate.

      I think that there is a need for national debate on the use of AI so the facts can be put on the table.

      regards,
      Mark Gregory

      report
    4. Gary Myers

      logged in via LinkedIn

      In reply to Michael Wilbur-Ham (MWH)

      The nonsense essay will probably be examined to identify why it is nonsense and the results fed back into the next iteration of the rater.

      Also, as more articles and comments fill the internet, there'll be an increased demand for automated assessment tools like these. There's a lot of money to be made getting these right, and in a couple of years there'll be one on this website rating us on the quality of our comments.

      report
  3. Sean Lamb

    Science Denier

    I think there is something called MOOCophobia in the academic community.
    MOOCs do a narrow range of things very well, some things adequately and some things not at all. Don't expect to see MOOC trained doctors, nurses, allied health, engineers, lawyers, experimental scientists, linguists, historians or musicians any time soon.
    What do they do well is a range of areas within computing sciences, data handling, statistics, mathematics, first year and heavily theoretical aspects of sciences and people…

    Read more
    1. Mark A Gregory

      Senior Lecturer in Electrical and Computer Engineering at RMIT University

      In reply to Sean Lamb

      Hi Sean,

      The panic you mention could simply be concern by academics that MOOCs evolve daily without adequate review. Should MOOCs be fully analysed and tested?

      The prank essay is one example of how MOOC AI might be tested but there needs to be a more rigorous study surely?

      regards,
      Mark Gregory

      report
    2. Sean Lamb

      Science Denier

      In reply to Mark A Gregory

      Every new technology has someone who says some idiotic things during its initial deployment.
      Clearly Anant Agarwal is one of those person saying idiotic things. But MOOCs are developing mainly in a conversation between the providers and the students - the reason why machine marking of essays won't take off is because the students will say it is rubbish.

      report
    3. Jess Harding

      logged in via email @gmail.com

      In reply to Sean Lamb

      I've found MOOC courses to be quite effective, despite the coincidence that none (0) of the MOOC courses I've taken have been in any of the realms you mention. Interesting.

      I don't believe anybody expects fully MOOC trained professionals. The technology and ongoing development, learning through doing, have only been under way less than two years. That doesn't mean that it isn't and won't be transformative, nor does it imply there is little value in MOOCs.

      While the pedagogy is still being developed,results to date show measurably increased learning when students use their own time for the MOOC portion of class, then use class time not for instruction but for meaningful discussion with the professor. That certainly transforms to role of professor / instructor from rote lecturer.

      report
    4. Sean Lamb

      Science Denier

      In reply to Jess Harding

      It would possibly help if you actually specified what MOOCs you had taken it might have facilitated discussion.
      Generally MOOC assessment works well when it has yes/no, multichoice, success/fail tests, peer marking works less well and is certainly less rigorous.
      Of course you can do a course in chemistry and not go near a lab, you can do a course in music theory and not go near an instrument, you can listen to a series of lectures on a historical subject and not write a properly referenced essay critiquing sources and interpretations. All of these might be interesting, enjoyable and rewarding - but they are falling below a proper rigorous university standard.

      report
    5. Jess Harding

      logged in via email @gmail.com

      In reply to Sean Lamb

      I don't believe the list of MOOCs I've taken is relevant to the discussion. We're not evaluating any specific course nor school nor professor.

      I do not believe it is fair nor accurate to categorize them as "falling below a proper university standard.' The course standards and quality do vary, just as they do in 'real' university courses, in my experience.

      The courses have been offered by highly regarded professors at highly regarded universities in USA and UK. I see no reason they should…

      Read more
  4. Pat Moore

    gardener

    Well 2001 Space Odyssey (Hal?) wasn't very nice was he when he came into his own? That predatory, ominous voice (like a cat with a mouse), perhaps holds a presentiment, indicative of our fate at being subject to judgement by machines? Machines which are the material result of human logical analysis taken to its Nth degree....which becomes rather Kafkaesque in its implications. Or that old Frankensteinian nightmare comes to mind.... of the mechanical creations of man assuming a life of their own…

    Read more
  5. Cat Mack

    logged in via Facebook

    Can't tell you how ambivalent I am about this.

    On the one hand, I don't for a minute believe that programs are capable of marking for meaning..... On the other, only a lowly tutor can know the true horror of thousands of essays to mark - most of which have no meaning. On the other (I have three hands) what would academics do without one of their main sources of humor - student essays.

    P.s I'm not sure that Perelman's essay wasn't worth that mark. Sure made me laugh.

    report
    1. Pat Moore

      gardener

      In reply to Cat Mack

      Funny Cat! Your two observations concerning lack of 'meaning' coincided ...that 'most' of the student essays (sadly) 'have no meaning' and 'that programs are (not) capable of marking for meaning' leads to a double tick for meaninglessness on the debit side of the ledger doesn't it? Meaning that your students would be lost without your meaningful labours of Psyche!

      Programs reading barcodes at the degree checkout or real humanized, "led out" education?

      report
  6. jose correa loureiro junior

    jailguard

    Well, there are good reasons to believe that this would be the right thing to do , since a machine at list makes no apologie to one another. It's a great point since the real value of the grade is allways controvertial. By now would be enough to let the thik work to the machine doing, and human would do a rapid revision, until the point that human will see the computer doing excceding traits. What probably is nor far at all !...

    report
  7. Denis Tolkach

    PhD student in Tourism at Victoria University

    The MOOC I took that had an essay in it was peer marked. Each student submits an essay, each student is then asked to mark three other essays. Each student receives the average of the three peer marks he got as the final mark.
    I prefer such approach rather than the automated computer marking. It allows students to think critically about the work of other students and simultaniously reflect on their own writing. The use of three students as opposed to just one reduces human error (eventhough it does not eliminate it).

    report
    1. Denis Tolkach

      PhD student in Tourism at Victoria University

      In reply to Craig Savage

      Maybe they should try. Do you mean, that the feedback and marks from students could be fed into the AI database for the learning process? Afterwards the AI produces his mark. But it would still require some time for the AI to learn about what is expected from the essay, wouldn't it?

      report
    2. Craig Savage

      Professor of Theoretical Physics at Australian National University

      In reply to Denis Tolkach

      AI is a catch all for many things. I think the focus here is on "machine learning". If you believe the student peers are marking effectively, then machine learning algorithms can improve their marking by learning from the student peers' marking.

      report
    3. Sean Lamb

      Science Denier

      In reply to Craig Savage

      Probably not, because AI is only judging sentence structure and language complexity. At present it can't differentiate between the answers to "Why is University education so expensive?"
      a. Universities carry an excessive load of administrative staff.
      b. Teaching assistants all belong to a global secret society and are outrageously highly remunerated.
      c. My goldfish is riding a bicycle around campus, but his balance is rather precarious and this is resulting in a number of tort actions against the university under OH and S legislation

      report
    4. Jess Harding

      logged in via email @gmail.com

      In reply to Denis Tolkach

      I've taken several MOOCs which include peer-reviewed essays. This is an interesting learning process, both in producing your own paper and in peer-review of others'.
      It would be interesting to see a computer grade included, even if only for review of language and structure, alongside the peer reviews. I'm sure this would be a meaningful learning exercise for the students themselves, for faculty, for the MOOC hosting firm or university, and for the computer-grading software developers.

      Given recent progress, I suspect the e-grading software itself could probably learn a lot through the process - perhaps learning to improve itself faster than the humans in the process.

      report
  8. Luke Freeman
    Luke Freeman is a Friend of The Conversation.

    ABC

    I don't think we're there yet, but I don't doubt that we could get there eventually. Natural language processing on steroids really.

    report
  9. lyndal breen

    logged in via Twitter

    I've had rotten marks and contradictory comments from real live human markers too. I'm currently enrolled in a MOOC which relies on students to grade each other and give feedback - I'm finding it quite fair.

    report
    1. Craig Savage

      Professor of Theoretical Physics at Australian National University

      In reply to lyndal breen

      This important point sometimes gets lost in discussions about AIs versus people: we tend to have an unrealistic view about how good people really are.

      One of the lessons of AI has been we're not as good as we think we are at: chess, driving cars; perhaps marking and giving feedback?

      report
  10. Tim Kottek

    logged in via Facebook

    Some time back when a daughter enrolled in an identical subject to my partner who completed it a decade earlier. This 101 subject had the same essay topics!
    That let to my fantasy business "Rent an Essay" where after a while a student would be bounced for plagiarism when the essay was purposefully directed at staff member who had marked it previously.

    The student then in defence presented the range of assessments that the essay had received over a span of time from the faculty - do you want to bounce me, that is your right and I will publish my defence! What will your academic rigour look like?

    report