UK United Kingdom

NAPLAN doesn’t stand up to international tests

A new parliamentary report on the National Assessment Program – Literacy and Numeracy (NAPLAN) finally takes a long, hard look at the calibre of these controversial tests. As part of the committee process…

How do NAPLAN tests compare? Test image from

A new parliamentary report on the National Assessment Program – Literacy and Numeracy (NAPLAN) finally takes a long, hard look at the calibre of these controversial tests.

As part of the committee process, the Australian Curriculum, Assessment and Reporting Authority (ACARA), the body responsible for NAPLAN, recommended a number of reforms including online delivery, linking NAPLAN to the National Curriculum, reducing the time gap between testing and results, and introducing flexible delivery of the tests.

But missing from this list is perhaps the most important change: improving the test itself. NAPLAN needs to reflect a higher standard, especially when we compare it internationally.

How do we compare?

One simple way to see how NAPLAN stacks up globally is to compare results from the NAPLAN assessment to another international literacy test. Comparisons between NAPLAN and another reading test for Year 4 students (known as PIRLS) show that NAPLAN may not have the depth – or the benchmarks – to assess Australian schoolchildren in a global context.

For starters, the number of students reaching the minimum standard differ. Results published by the Australian Council for Educational Research (ACER) showed that more than one-quarter of the Australian Year 4 students who participated in PIRLS failed to meet the minimum international benchmark.

In contrast, the most recent round of NAPLAN results showed only 9% of Year 5 students did not meet the minimum national standard as the below figure shows. (The grey indicates students who are failing to meet the benchmark standard in literacy):

How is the same cohort faring? NAPLAN data from 2012; PIRLS data from 2011 Author/ACER/ACARA

On the international PIRLS test, Australia’s average score was similar to the score for Bulgaria, New Zealand, Slovenia, Austria, Lithuania and Poland. And significantly lower than the average score for 21 other countries, including the United States, England and Canada, Hong Kong and Singapore.

Comparing PIRLS results across countries. ACER ([see p.16 for a complete list of international results](

The same cohort of children took the two tests. So what is going on here?

There are three possible explanations: the NAPLAN standard is too low, PIRLS texts are more difficult, or PIRLS items are more challenging. All three, its seems, are true.

NAPLAN misses the international mark

Dr Sue Thomson from the Australian Council for Education Research (ACER), the body that oversees PIRLS, said recently that “almost everybody agrees that the NAPLAN standards are too low”. Even in sections where the two assessments are somewhat similar – for example, making inferences about fiction texts – students who are meeting the NAPLAN minimum are falling short of the international benchmark.

A spokesperson for ACARA said that it will “take account of international standards” when aligning the national assessment program to the national curriculum. It hopes the curriculum – mid-way through its roll-out – will also lead to “improved results”.

But teachers believe the longer PIRLS text turns students off. One Year 4 teacher explained: “they’re not used to reading such long periods of text”. While the PIRLS reader has only two stories that are nearly 800 words long, NAPLAN has seven stories, all less than 200 words.

There are two types of passages, informational and fiction.Readability statistics show PIRLS informational texts are harder to read.

Interestingly, PIRLS fiction texts are slightly easier than the NAPLAN literacy texts. This, too, brings a sobering point: even with easier texts, fewer Australian students are meeting the international benchmark once the word count goes beyond five paragraphs.

Although the NAPLAN tests are about ten items longer, the additional questions are easier recall questions. PIRLS asks more higher-order questions about its fiction texts and has more open-ended items.

Data from publicly available PIRLS (2011) and NAPLAN (2012) reading tests

Getting the results to principals

At an education department National Principals' Conference in early March only ten out of 120 principals in the audience knew about the PIRLS results. This information needs to get to principals, teachers, parents and students in order for meaningful progress to occur.

Currently, international reading assessments have longer texts, harder non-fiction, deeper questions and higher benchmarks. When the Senate committee returns to the inquiry after the election, let’s hope that the content, nature and standards of NAPLAN plays a more central role in its recommendations.

If Australian students are to be held to a high international standard, NAPLAN needs to improve to become a world-class test.

Join the conversation

11 Comments sorted by

  1. Frank Baarda


    I haven't read the report, nor thought deeply about the article.
    I live in a place where the vernacular language spoken is Warlpiri.
    Aboriginal children in remote comunities fail miserably at NAPLAN tests.
    Often in these discussions the furphy of immigrant children whose mother tongue isn't English is invoked. I was such a migrant child. When I first went to school in Australia I was the only Dutch born child in the class. I was immersed in English and it wasn't long before I was pegging level…

    Read more
  2. Nick Connolly

    Test Developer

    Thank you for this thought provoking article. However I'm not sure about your assumptions in the first graph comparing NAPLAN and PIRLS benchmarks. "Below Low" in PIRLS would be those students who fail to reach the basic standard in the "Low" benchmark - this is comparable in terms of decsription of skills with students below National Minimum Standards in NAPLAN. On that basis the relative percentages "below benchmark" are very similar in NAPLAN and PIRLS.

    1. Greg North

      Retired Engineer

      In reply to Nick Connolly

      You may have mis-interpreted it Nick for the way I do is that below low band and the low band together are the percentage that are not meeting the basic standard or benchmark level.

    2. Nick Connolly

      Test Developer

      In reply to Greg North

      I don't think I have misinterpreted but rather I think the graph shows a conceptual error - a bit like saying Farenheight has lower standards than Celsius by trtying to line up the zeroes on two thermometers.
      The PIRLS benchmarks are a set of standards. The NAPLAN bands are another set of standards - from which come the National Minimum Standards defined as: "Students who are below the National Minimum Standard have not achieved the learning outcomes expected for their year level. They are at risk…

      Read more
    3. David Thompson

      Marketing Research

      In reply to Nick Connolly

      Nick, that's a good distinction between the two tests. PIRLS is much more 'achievement' loaded than NAPLAN. NAPLAN is much more concerned with getting it right in identifying kids who are falling behind, and next particular assistance. NAPLAN was never designed to be an IQ test capable of sorting the top 5% for the top 1%.\

  3. Greg North

    Retired Engineer

    There do seem to be a few things that do stand out a bit as to why there are variations.
    First off, if PIRLS has longer articles to be read and more evaluation focus, it is bound to be a much more stringent testing regime, there likely being a much greater commitment to be expected.
    That this is not occurring is illustrated by
    " But teachers believe the longer PIRLS text turns students off. One Year 4 teacher explained: “they’re not used to reading such long periods of text”. "

    So if teacher…

    Read more
  4. Jim KABLE


    I am nonplussed as to why folk think that tests improve scores - especially when NAPLAN has become such big business - check out your local news-agency for racks of NAPLAN test books - or the offerings of after-school study centres - all narrowing educational offering in fact. Take away the tests altogether. At best they are a distorted-by-the-existence-of-the-test-itself snapshot of where a child may be (perhaps) in relation to all others - but then cannot be used diagnostically as claimed (by its…

    Read more
    1. Kerry Hempenstall

      Retired at RMIT University

      In reply to Jim KABLE

      “almost everybody agrees that the NAPLAN standards are too low”.

      The standards are supposed to be a snapshot of typical achievement. But it is not clear how typical achievement is defined. On what basis is a score of, say, 16/40 questions correct on a NAPLAN test considered typical achievement? Against which external criterion? This lack of information makes interpretation of results difficult. Another way of reporting the same results is to provide the figures for students who are at or above…

      Read more
    2. Peter Farrell

      teaching-principal at at a small rural school

      In reply to Jim KABLE

      I have to agree with you, Jim, that the diagnostic claims for NAPLAN are difficult to justify with respect to what we learn about students. There are other tests, usually on-line that are more responsive to student ability. The adaptive on-line questions increase in difficulty as the student is more successful, or get easier in response to incorrect answers. I am thinking about Victoria's on-demand tests here. I find the reports generated by these types of tests a useful way to identify what the…

      Read more
    3. Greg North

      Retired Engineer

      In reply to Peter Farrell

      Jim, Kerry and you Peter, all being in teaching seem to have a pretty good handle on specifics as I would expect in you all being in teaching.
      Aside from Jim's thoughts on NAPLAN being big business and I can recall on when it was being introduced and since, many claims that there would be a lot of teaching for NAPLAN testing, looking at your own experience with:
      " Last year our data (from various measures) showed many of my kids were reading very well indeed and it was a teaching challenge to ensure…

      Read more
    4. Peter Farrell

      teaching-principal at at a small rural school

      In reply to Greg North

      You make a couple of points:

      NAPLAN was not my only measure, there were others, like for example, the Scholastic Lexile Test (an adaptive test), which I left out of my discussion for the sake of clarity. Measuring the effect size in reading using the data from these other tests, also indicated that the 'long novel' approach, as developed by me, had not worked for my students. It wasn't just NAPLAN data which led me to this conclusion.

      The other point:

      What you say about how we have…

      Read more