NAPLAN has caused much controversy this year, as has become customary. With the test in its tenth year, the New South Wales government called for it to be scrapped and there were calls for a review after a report found no change in results in a decade. In June a review was finally ordered into the use of NAPLAN information.
The most recent controversy – the delay in releasing NAPLAN results – is in part about whether scores from paper and online tests can be statistically compared.
The data collected this year will be comparable, both between paper and online tests, and with tests from previous years because they will be compared on a common scale.
What’s the issue?
This years’ preliminary release of NAPLAN data, which was due out August 8, has been delayed. We don’t yet know when it will be released.
State education department heads questioned whether the paper and online tests were too different to be statistically comparable. The Victorian minister for education, James Merlino, criticised the Australian Curriculum, Assessment and Reporting Authority (ACARA) over its management of the online test.
What is comparability?
It’s important when considering comparability that we understand it has different meanings. In a measurement sense (the way it’s used with NAPLAN), it means we compare the achievement of the students on a “common mathematics scale”. This does not mean they are “the same”. That is, doing the test online is different from doing the test in a paper-and-pencil mode. But both forms provide evidence of what the students know and can do in numeracy and literacy.
This type of comparability happens regularly. For example, if you want to compare Australian dollars and Chinese yuan, you make them comparable by putting them onto a common scale (one AUD = five yuan). You can then compare them in terms of “an amount of money”, but they are not the same.
Similarly, when we construct an Australian Tertiary Admission Rank (ATAR), we convert scores in different subjects to make them comparable on a common scale, add the scores up, then calculate an ATAR. This makes it possible to compare them in terms of the general ability that characterises the combined Higher School Certificate (HSC) score. The subjects are not the same, but they are comparable on a common scale.
In the same way, scores can be compared when the NAPLAN tests have been done in paper-and-pencil format and online. Comparing the results across years when we move from paper-and-pencil NAPLAN tests to NAPLAN online is much the same.
Read more: Five things we wouldn't know without NAPLAN
In this case ACARA has carried out significant research to examine the impact of how the test is administered on the results. This has shown there is little – if any – major impact in terms of the purpose of NAPLAN.
Storm in a teacup
NAPLAN isn’t the only test ever to move from paper-and-pencil to online. The Program for International Student Assessment (PISA) and numerous other high-stakes international assessments have recently moved online. The OECD website explicitly states:
Student performance is comparable between the computer-based and paper-based tests within PISA 2015 and also between PISA 2015 and previous paper-based cycles.
There is no doubt the controversy over NAPLAN comparability is a storm in a teacup. Students would have all attempted good-quality NAPLAN tests and done their best. The results will give them an indication of how they’re going on this occasion.
When the results do come out, my educated guess is teachers will find their students will have done pretty much as expected, based on all the other information teachers have about student achievement through their classroom-based assessments. NAPLAN provides one more bit of evidence, from a different perspective, that contributes to the overall image of the student.
The real issue is misuse of data
The real issue underpinning the controversy is the misuse of NAPLAN data. It was never intended that NAPLAN data would be used for fine-grained comparison of students.
The MySchool website has contributed to the misuse of NAPLAN data. For example, the scores from the site are being used to make comparisons irrespective of the “error bands” that need to be taken into account when making comparisons. People are ascribing a level of precision to the results that was never intended when the tests were developed. The test was never designed to be high-stakes and the results should not be used as such.
When people challenge the “validity” of the NAPLAN test, they should be challenging the validity of the use of the results. NAPLAN has a high degree of validity, but we need to understand it better and use the results in a more judicious and defensible manner. The correct use of NAPLAN data is a major issue and it needs to be addressed as a matter of priority.
This article has been updated since publication to clarify the author’s relevant affiliations.