Assessment

Could comparative judgement replace traditional exam marking?

There are rumblings of a revolution

Date03/10/21

AuthorKate Kelly, Lead Researcher, AQA

If you’ve been listening to education policy chatter recently, you'll have heard rumblings of a comparative judgement revolution.

Most large-scale assessments in England – for example, GCSEs – evaluate student performances by giving the work a mark, which is then assigned to a grade. Marks are typically chosen using “absolute” judgement, in which the performance is judged against a mark scheme, grade descriptor, or similar.

This has been the dominant approach to marking for decades but, behind the scenes, comparative judgement (CJ) has been quietly gathering steam. With CJ, pieces of work are evaluated against each other, rather than against a rubric. With enough judges making enough comparisons, you can place the performances in a rank order with a high degree of reliability. By including performances that exemplify work at the grade boundaries, you can also assign grades.

For those of us considering the future of GCSEs and A-levels, new methods of marking are always interesting. Advocates of CJ make some exciting arguments: they suggest that it can be used for a wide range of materials, that it is more reliable than conventional assessment, and that relative judgements are easier and more valid for human brains to make. So, could CJ replace traditional exam marking?

Practically speaking, probably not.

Let’s do some rough calculations using AQA GCSE English Language. In July 2019, over 500,000 students took this qualification, producing over 1 million exam scripts. We recruit around 4,500 examiners for this qualification currently and, given a standard allocation of 300 scripts and assuming that an examiner can mark four scripts an hour, we can assume each examiner would work around 15 hours a week during the marking period.

A switch to marking with comparative judgement would mean a huge increase in workload for current examiners, many of whom also have teaching commitments or mark for multiple qualifications

However, a switch to marking with CJ would mean a huge increase in workload for current examiners, many of whom also have teaching commitments or mark for multiple qualifications. To achieve the reliability that makes CJ worthwhile, we would need around 20 judgements per script – 20 million judgements, or 10 million decisions between pairs. Judges can make around eight decisions an hour for this type of script, resulting in 1.25 million hours’ work over the five-week marking period. Without recruiting more examiners, each examiner would need to work a 55-hour week.

Alternatively, to preserve a 15-hour week, we would need to increase the number of examiners to 17,000 for this qualification alone. This additional examiner capacity would come at a cost for exam boards and centres.

Without some radical changes then, CJ is too time-consuming and expensive to use for marking. The real calculations would, of course, be more complicated. The assumptions we have made could be adjusted, and some costs could be offset – for example, it may not be necessary to write a mark scheme if CJ were adopted.

The move away from mark schemes is another factor to consider. Traditional assessments use some kind of rubric to set out the criteria that performances are evaluated against and how they are mapped to those criteria. This approach has disadvantages. The rubric must be written by somebody and that person will have a particular perspective, which will define “what good looks like” for everybody else. It can also be hard to capture in words the essence of a good performance. In CJ, this challenge is avoided as performances are evaluated against each other in holistically. And as many judges are involved, a range of perspectives can be captured.

However, rubrics record what the assessment is trying to do and how – even if imperfectly. They represent an attempt at transparency and allow test takers to hold us to account. It’s less clear how this can be done with CJ. As a student’s mark would depend on how their work compares to that of other students, rather than the absolute merit of the work, it would be very difficult to answer the question “on what basis has this student been awarded their mark?” This would present challenges for students wishing to appeal their mark, and they may need to show that the whole judgement exercise was flawed. Using CJ would eliminate the expense of running an appeals process, but at the cost of effectively removing students’ right to question the justice of their marks.

CJ has its advantages, but from the perspective of practicality and expectations around accountability and transparency, it’s unlikely to replace conventional marking for national examinations. There are other ways that it could be useful, however. As well as being a handy research tool, CJ has potential in the classroom, with students judging each other’s work to stimulate reflective conversations about their own performance.

But whatever the future of CJ, we will be watching – and exploring – with interest.

Computing

Could girls be the secret to boosting the UK’s growth as a technology superpower?

What if women played a more central role in responding to the rapid technological changes in our world? Girls in England outperform boys at every grade level but disproportionately don’t take Computer Science at GCSE.

Assessment

What can this year’s GCSE entries tell us to look for in tomorrow’s results?

With GCSE Results published on 22nd August, Dr Chinwe Njoku looks into the underlying data on what subjects this year's cohort took and how this has changed from previous years.

Assessment

A-level maths students hit six figures – what’s behind its popularity?

On Thursday, more than 100,000 A-level maths students in England will find out their results – 11.4% more than last year. Why the upturn? Dr Chinwe Njoku, AQA Education Insights Lead and former maths teacher, was heartened by the news and keen to look at the story behind the data.

Education Policy

What comes after ‘urgent’ for the new Education Secretary?

After the burning issues are addressed, what should come next for the new Education Secretary?

Education Policy

Labour’s oracy plans: They need clear goals

Sir Keir Starmer has said he wants to boost students’ confidence by raising the importance of speaking skills – oracy. In this previously published blog, Reza Schwitzer, AQA’s director of external affairs, applauds the ambition but warns there needs to be clear goals

Education Policy

Through the looking glass: How polling the public can help policymakers learn about themselves

Public attitude data is key to effective policymaking. Proper polling can reveal what people think about existing policies and what they want for the future. But, if looked at from a different angle, it can also help policymakers question themselves and their assumptions about the public. In this blog, AQA’s Policy and Evidence Manager Adam Steedman-Thake, reveals the lessons he learned about himself while reading a recent public attitude survey.

Assessment

Assessing oracy: Is Comparative Judgement the answer?

Oracy skills are vital to success in school and life. And yet, for many children, opportunities to develop them are missed. Educationalists are engaging in a growing debate about where oracy fits into the school system. Labour has put it at the heart of its plans to improve social mobility and an independent commission is looking at how it is taught in the classroom. This renewed focus on oracy means it is more important than ever that teachers have a way to reliably assess and understand their students’ attainment and progression. Amanda Moorghen of oracy education charity Voice 21 explains how Comparative Judgement can help with that and why it may be a game changer.

Education

TV subtitles as an aid to literacy: What does the research say?

Jack Black is probably best known in educational circles for playing a renegade substitute teacher in School of Rock. But the Hollywood star has made a more conventional foray into education by backing the use of TV subtitles to improve child literacy. Stephen Fry and the World Literacy Foundation also want parents to use their TV remotes to get children reading. So, could this simple click of a button be a solution to boost pupils’ reading skills? AQA’s resident expert on language teaching, Dr Katy Finch, casts her eye over the research to see whether it stacks up.

Data Analysis

What is left behind now education’s Data Wave has receded?

Is data the solution to all education’s issues? About a decade ago the prevailing wisdom said it was. Advocates of this Data Wave argued that harvesting internal statistics would help schools solve issues such as teacher accountability and attainment gaps. As with all waves, after crashing onto the beach they recede, leaving space for another to roll in. In this blog, teacher, author and data analyst Richard Selfridge looks at the legacy of the Data Wave to see what schools can take from it.

International Approaches

Finland & PISA – A fall from grace but still a high performer?

Finland was once recognised as one of the most successful educational systems in the world. At the turn of the millennium, it topped the PISA rankings in reading, maths and science. But by 2012, decline set in. The last set of results showed performances in maths, reading and science were at an all-time low. In this blog Dr Jonathan Doherty of Leeds Trinity University outlines some reasons that may account for the slide.

Download a PDF version.

Download a copy of this content to your device as a PDF file. We generate PDF versions for the convenience of offline reading, but we recommend sharing this link if you'd like to send it to someone else.

Could comparative judgement replace traditional exam marking?

If you’ve been listening to education policy chatter recently, you'll have heard rumblings of a comparative judgement revolution.

Could girls be the secret to boosting the UK’s growth as a technology superpower?

What can this year’s GCSE entries tell us to look for in tomorrow’s results?

A-level maths students hit six figures – what’s behind its popularity?

What comes after ‘urgent’ for the new Education Secretary?

Labour’s oracy plans: They need clear goals

Through the looking glass: How polling the public can help policymakers learn about themselves

Assessing oracy: Is Comparative Judgement the answer?

TV subtitles as an aid to literacy: What does the research say?

What is left behind now education’s Data Wave has receded?

Finland & PISA – A fall from grace but still a high performer?

Join the conversation on Twitter

Download a PDF version.

Sign up to our newsletter