If you’ve been listening to education policy chatter recently, you'll have heard rumblings of a comparative judgement revolution.

Most large-scale assessments in England – for example, GCSEs – evaluate student performances by giving the work a mark, which is then assigned to a grade. Marks are typically chosen using “absolute” judgement, in which the performance is judged against a mark scheme, grade descriptor, or similar. 

This has been the dominant approach to marking for decades but, behind the scenes, comparative judgement (CJ) has been quietly gathering steam. With CJ, pieces of work are evaluated against each other, rather than against a rubric. With enough judges making enough comparisons, you can place the performances in a rank order with a high degree of reliability. By including performances that exemplify work at the grade boundaries, you can also assign grades.

For those of us considering the future of GCSEs and A-levels, new methods of marking are always interesting. Advocates of CJ make some exciting arguments: they suggest that it can be used for a wide range of materials, that it is more reliable than conventional assessment, and that relative judgements are easier and more valid for human brains to make. So, could CJ replace traditional exam marking? 

Practically speaking, probably not. 

Let’s do some rough calculations using AQA GCSE English Language. In July 2019, over 500,000 students took this qualification, producing over 1 million exam scripts. We recruit around 4,500 examiners for this qualification currently and, given a standard allocation of 300 scripts and assuming that an examiner can mark four scripts an hour, we can assume each examiner would work around 15 hours a week during the marking period. 

A switch to marking with comparative judgement would mean a huge increase in workload for current examiners, many of whom also have teaching commitments or mark for multiple qualifications 

However, a switch to marking with CJ would mean a huge increase in workload for current examiners, many of whom also have teaching commitments or mark for multiple qualifications. To achieve the reliability that makes CJ worthwhile, we would need around 20 judgements per script – 20 million judgements, or 10 million decisions between pairs. Judges can make around eight decisions an hour for this type of script, resulting in 1.25 million hours’ work over the five-week marking period. Without recruiting more examiners, each examiner would need to work a 55-hour week. 

Alternatively, to preserve a 15-hour week, we would need to increase the number of examiners to 17,000 for this qualification alone. This additional examiner capacity would come at a cost for exam boards and centres. 

Without some radical changes then, CJ is too time-consuming and expensive to use for marking. The real calculations would, of course, be more complicated. The assumptions we have made could be adjusted, and some costs could be offset – for example, it may not be necessary to write a mark scheme if CJ were adopted. 

The move away from mark schemes is another factor to consider. Traditional assessments use some kind of rubric to set out the criteria that performances are evaluated against and how they are mapped to those criteria. This approach has disadvantages. The rubric must be written by somebody and that person will have a particular perspective, which will define “what good looks like” for everybody else. It can also be hard to capture in words the essence of a good performance. In CJ, this challenge is avoided as performances are evaluated against each other in holistically. And as many judges are involved, a range of perspectives can be captured. 

However, rubrics record what the assessment is trying to do and how – even if imperfectly. They represent an attempt at transparency and allow test takers to hold us to account. It’s less clear how this can be done with CJ. As a student’s mark would depend on how their work compares to that of other students, rather than the absolute merit of the work, it would be very difficult to answer the question “on what basis has this student been awarded their mark?” This would present challenges for students wishing to appeal their mark, and they may need to show that the whole judgement exercise was flawed. Using CJ would eliminate the expense of running an appeals process, but at the cost of effectively removing students’ right to question the justice of their marks. 

CJ has its advantages, but from the perspective of practicality and expectations around accountability and transparency, it’s unlikely to replace conventional marking for national examinations. There are other ways that it could be useful, however. As well as being a handy research tool, CJ has potential in the classroom, with students judging each other’s work to stimulate reflective conversations about their own performance.

But whatever the future of CJ, we will be watching – and exploring – with interest.