Saturday, 1 June 2013

The ancient of day's marking



During my career as an academic member of staff at the University of Birmingham, I probably marked thousands of essays, from both undergraduates and postgraduates. The most miserable experience was marking hundreds of medical student examination papers at a time. As with most markers, I became a sort of machine. I would devise a checklist of suitable responses to each question and tick the number of times each student used one. I could thereby mark each mini-essay in two or three minutes. Much more enjoyable were the essays written by undergraduate medical students for their elective project on learning disability and health, which I still teach and mark. Students choose their own topic within this field and always produce long essays that are stimulating and, in some cases, worthy of publication.

A further problem with marking is not just the number of scripts to mark, but also the requirement to assign a numerical score to a long essay. It was different when I worked with my colleague Dr Beryl Smith when we set up our postgraduate masters course in intellectual (‘learning’) disabilities in 1992. Students completed eight assignments (each of which would be answered by long essays of between 1500 and 3000 words) and a dissertation of 15,000 words. Students were encouraged to write about their area of special clinical interest (such as challenging behaviour, epilepsy and so on). We decided that only four grades were required to mark each assignment to an acceptable standard of reliability. We gave a B if the student met the specified requirement for the assignment, a C if they answered the question but did not argue their case well or failed to draw on sufficient evidence. We gave an A where the answer was a high standard and would be publishable in a professional journal. Finally, the failed D grade was given where the student did not meet the requirements of the essay assignment. To make this scheme work, we needed to make sure the requirements of the assignment were stated clearly, and we included details of our marking-scheme. This system was reliable because there were only three grade boundaries (A/B, B/C and C/D) to decide. We double-marked all assignments and usually came to rapid agreement on the grading for each essay.

All this of course would look like mollycoddling to the sort of academic who believes that the only aims of examinations are to catch students out and identify a would-be elite. Beryl and I took a different approach: we believed that the purpose of our course and hence the marking of assignments was to help students learn to become more reflective and effective practitioners. This would, indirectly, be our contribution to improving the lives of people with intellectual disabilities. The purpose of the assignments was not just to decide whether each student’s work was of an acceptable standard, but also to help us measure their progress, and see what areas required some individual attention. As important as the grade, therefore, were the detailed comments we completed on each essay, identifying how the student could improve what they had written, areas of strength they could develop, and areas of weakness they could concentrate on improving.

Beryl eventually retired and the University introduced new regulations that stipulated that all marking should be numerical. Instead of our four grades with clear descriptions and marked to a high degree of reliability, we had to use percentages. What nobody could tell me was what they were a percentage of. If students are set a large number of questions to answer (for instance in a maths exam), then it is possible to calculate the percentage they answered correctly. This only has any meaning of course if each question is deemed to be of equal difficulty and all questions can be marked as either ‘correct’ or ‘incorrect’. But to give a ‘percentage’ for an essay suggests that this is the degree to which it approximates to some perfect essay. No such perfect essay exists. Indeed, it seems rare in universities for any essay, however good, to be given more than 85 ‘percent’. At the lowest end of the scale, I have once or twice given a mark of 35 ‘percent’, but I think that almost all marks for essays fall somewhere between these two extremes.

Since it is not possible to define what marks are a percentage of, the marks are not actually percentages at all: they are ‘pseudo-percentages’. They are an example of the belief that scores and numbers are preferable to description, even when they are used to supposedly measure things that are inherently non-numerical. Academic psychologists are probably the most prone to this disorder, ‘measuring’ such diverse concepts as intelligence, affection, extroversion and so on with numerical scales calculated by summing answers to sets of questions or assigning weights to responses to various ingenious types of numeric scales. Sooner or later, people come to believe that because it is possible to assign a numerical score, then there must be a thing corresponding to it. So  many believe there is an entity called ‘intelligence’ which you can either have a lot of or a little of. This is despite the common-sense observation that people often have a very uneven pattern of mental skills, being, for instance, brilliant at thinking through maths problems but incompetent at remembering times and dates.

The urge to assign scores to people also says something about the people assigning the scores. No competent clinical psychologist would believe that an individual patient could be summarised by a few numerical test scores. Instead, each patient is seen as unique, perhaps having some familiar categories of problems, but still assessed and treated as an individual. A clinical psychologist and academics like myself and Beryl can afford to treat people as individuals because we are assessing so few of them. Once organisations become involved in processing large numbers of people, they see people as numbers. This has happened to many universities. There may be hundreds of students in a single year of an undergraduate course, knowing little of the academics who teach it and having few chances to exchange ideas with them. Indeed, some universities try hard to prevent any such exchange, placing their academic staff in research centres and laboratory blocks behind locked doors, inaccessible to mere undergraduate students. They have become people-processing institutions which, like business corporations, are judged not by how they improve the lives of ordinary citizens but by how much income they generate. Cash has thus become the supreme number which measures all. It is the Modern of Days, replacing the Ancient of Days in William Blake’s painting.

See also

No comments:

Post a Comment

Comments welcome