|Children doing exams inside a classroom, 1940 (Wikipedia)|
Every summer results day gives rise to a raft of headlines about the number of students who have to appeal because of inaccurate marking - though strangely enough, no-one ever appeals because they think their paper has been marked too highly.
Despite the headlines, the actual likelihood of a marker getting it totally wrong is actually pretty low. This is because markers do not know what the actual grade boundaries will be for each paper - their job is simply to apply the mark scheme as accurately as they can. So even if they are generous on principle, the effect will simply be to raise the grade boundaries. Besides, examiner training emphasises the need to be fair to all candidates, rather than sympathetic to a few.
As a former marker myself, and someone who has researched the topic, I know how examiner training, and the system used to check up on marking accuracy throughout the marking period, is designed to remove as much variability as possible. Although there is of course an element of subjectivity with all marking - and in some subjects more than others.
When it comes to marking test papers, it stands to reason that the more straightforward the question, the more straightforward the marking - and questions requiring short answers are naturally the easiest to mark reliably. With these types of questions, markers can compare the actual answer with the correct one and then can simply check for a match or mismatch.
But as answers get longer, it becomes more difficult to anticipate every possible response - meaning there is of course more scope for variability in markers’ judgements of the same script.
To be or not to be
English is notorious as being particularly difficult to mark - and “creative writing” pieces are especially subjective. Computer marking systems that are considered to be completely reliable are known for giving poor marks to what are widely considered great pieces of writing by the likes of Winston Churchill or George Orwell - because no matter what rules you lay down for making judgements, there are always exceptions.
Besides, what makes good writing is a subjective judgement. If it wasn’t, the same book would win all the literary prizes in any given year. What can be specified in writing, however, is spelling, punctuation and grammar - so grade criteria will include things like “using a variety of punctuation”, and “varying sentence lengths for effect”.
But of course, interest and imagination in a written piece is not captured in spelling and grammar. And how a reader reacts to the creative element - such as word choice, ideas, or literary language - will depend on that individual and their life experience.
Essays are also difficult to mark - they are long, and markers must bear in mind the mark scheme, and model marked essays. This makes it a tough mental activity. Add in the time pressure and it becomes harder still, although exam boards are careful to remind examiners to mark at an “appropriate rate”. That is, slowly enough that they aren’t making mistakes, but fast enough to meet demanding deadlines.
Ofqual, the exams regulator in England acknowledged in its review of marking reliability in 2014 that “there would always be an element of subjectivity in some marking”. The difficulty for appeals is distinguishing between actual errors, and acceptable subjectivity.
But despite this element of subjectivity, in reality it would actually be difficult for one examiner to get it wrong enough to change the whole outcome. Not only because grade boundaries are unknown by the marker, but also because qualifications are often made up of more than one examined paper. Each paper at the very least will have different markers - and in many examinations now, papers are scanned and questions separated so that several markers will have input into one paper.
This should, in theory, increase the reliability overall because it reduces the chance of any paper being marked by an overgenerous, or overly mean, marker. It also means that examiners cannot give the benefit of the doubt where a candidate is on a boundary - because they don’t know how the candidate has done on the rest of the paper.
Additionally, the mark scheme which markers use in the examining process only refers to marks, not grades. Grade boundaries are set separately every year after the marking is complete, using statistical information alongside the proportion of each grade awarded the previous year. And this is informed by the expert judgement of senior examiners.
Off the mark
And yet, despite all of these efforts to avoid bias and remove as much subjectivity as possible, there is still a tension in the examination system in England between the “criteria-referenced mark schemes” - which imply that anyone can get an A, providing they work hard enough and meet the criteria - and the actual way in which grading functions.
Grading is not quite “norm-referenced” - where grades are allocated on a bell-curve, as is the usual practice in the US - but the number of As is limited by the proportion of students who are predicted to get that based on previous results. The number of As given in previous years is also taken into account - with a small margin for increased standards if necessary.
This leaves a mismatch between the expectations of students and parents, and the way the system actually works. But while it may sound harsh, we need qualifications to function at least partly as a sorting mechanism for entry to university or to work places - because if everyone could achieve an A grade, they simply wouldn’t fulfil this function - meaning these marks wouldn’t be worth the paper they’re written on.
Velda Elliott, Associate Professor of English and Literacy Education, University of Oxford
This article was originally published on The Conversation. Read the original article.