Resolution models in scoring assessment
Currently teaching French courses at the university, I am as one would expect confronted to the task of grading students’ assessments, which can take various forms such as written essays, presentations of projects in front of peers, written reflective account of the learning outcomes, recordings of interviews, and so on and so forth.
I have the great pleasure of working with confirmed and experienced teachers, which means that I learn a lot from them. However, with respect to grading, I was wondering how my scores would compare to theirs. Am I fair with the students’ tests, or am I too generous or too strict? I first started elaborating criteria describing the scoring of the various tasks students had to do, which were afterwards discussed and refined with the help of the coordinator and the lecturers involved in the different modules. An example of criteria of a task is diplayed below.

By following the scoring grids, it is manageable to have a constant grading method across all students and tasks, which means that the variations due to tiredness or subjectivity can be avoided. Furthermore, by recording the students’ presentations, a second rating will be feasible, which designates more fairness to students. In the event of a disagreement between two raters, Penny and Johnson (2011) classify resolution procedures in scoring writing assessment into five general models:
- The rater mean model consists of computing the average scores of two raters if the scores are discrepant and adjacent.
- The parity model includes the decision of a third rater, who may be a more expert rater than the other two, but not necessarily. The characteristic of this model is that all three ratings are considered equal and valid. The average of the three scores is thus computed before providing the learner with his or her final grade.
- The tertium quid model involves the participation of a third rater in favour of one or the other. The final score is not an average between the three, but could be a selection of one original rating or an average between the third and the closest original score.
- The expert model necessitates a third person, an adjudicator, who will make the final decision. This person is imperatively a more expert rater than the other two. This expert will have the final say and his or her rating will replace the two original ones.
- The discussion model implies that the two discrepant ratings are reviewed by the two raters and that they try to reach an accord on the final grade.
So far I tested the discussion method with one of my module coordinators, who can certainly be labeled as an experienced rater. In fact, we had very few discrepancies, which were resolved without difficulty. The exchange of views allowed us to position our way of grading, which enabled a better alignment of the scores across students within the same class. It seems, however, that the most efficient ways of resolving rater disagreement are the parity and expert models (Penny & Johnson, 2011, p.233).
Reference
Penny, J. A., & Johnson, R. L. (2011). The accuracy of performance task scores after resolution of rater disagreement: a Monte Carlo study. Assessing Writing, 16(4), 221-236. doi:10.1016/j.asw.2011.06.001
Tags: Penny and Johnson 2011, resolution of rater disagreement


