The London Olympics were a festival for statisticians. As an example, let’s look at the men’s 100 metres. Usain Bolt won this race with a time of 9.63 seconds (a new Olympic record). But all the entrants had very similar times apart from Asafa Powell, who pulled up with a groin injury and completed the race in 11.99 seconds. Leaving him out of the calculations, the difference between Usain Bolt and the last-but-one runner was only 0.35 seconds. This tiny sliver of time is only 3.6% longer than Bolt’s winning time. So a statistical summary of the race (using nonparametric statistics because of the skewed distribution) would say that the median race time was 9.84 seconds, with an interquartile range of 0.76-0.97 seconds. Or, in everyday language, they all (but one) ran very fast and there was little variation in the times they took.
Of course, this misses the point. The 100 metres final is a race, and what matters in a race is the ranking of the runners - who comes first (and to a less degree second and third). But just as it makes no sense to ignore ranking in races, it is equally pointless to analyse all areas of human activity as if they are races. This does not stop people doing it, and rankings are now common not just for things that are simple to measure (such as the time taken to run a race), but also for institutions and the complex range of activities they perform. This means that such rankings have to be derived from multiples of ‘scores’ based on a range of unreliable data about something that can never be reliably measured.
As an example, let’s take university research and teaching. Recent years have seen a growing collection of international ‘league tables’ of universities. These use a wide range of data, including statistics about numbers of staff and students, numbers of publications in academic journals, income from research grants, and rankings by panels of academics. This data is then scored, weighted and combined to produce a combined score which can be ranked. Individual universities can then congratulate themselves that they have ascended from the 79th best university in the world to the 78th best, while those that have descended a place or two can fret, threaten their academic staff and sack their vice chancellor.
Yet higher education systems differ greatly between countries, and universities themselves are usually a diverse ragbag of big and small research groups, teaching teams, departments, and schools. This makes a single league table a dubious affair, even if it is based on reliable data. But that of course is not the case. The university in which I worked (by most standards a well-managed institution) struggled to find out what its academic staff were doing with their time, or the quality of their achievements. In other universities, the data on which international league tables are based may be little more than a work of mystery and imagination.
But the problem lies not so much in the dubious quality of the data, but the very act of ranking. Even when the results are derived from a single survey in one country, the results are often analysed in a misleading way. As an example, look at the National Student Survey in the UK, which is taken very seriously by the UK higher education sector. In its most recent form, this comprises 22 questions about different aspects of the student’s university and course. All ranked on a five-point Likert scale and given a score from ‘definitely agree’ (scored 5) to ‘definitely disagree’ (scored 1). A conventional way of comparing universities and courses would therefore be to take the mean score on each scale, and this is how they are analysed in papers like the Guardian. So, by institution, the results for the overall satisfaction question vary from the maximum of 4.5 (the Open University) to 3.5 (the University of the Arts, London).
This is all seen as being a bit too technical for prospective students, and so the Unistats website reports only the responses to a single statement in the Survey: “Overall, I am satisfied with the quality of the course”. It then adds together the number of students with scores of 5 (‘definitely agree’) and 4 (‘mostly agree’) to produce the percentage of ‘satisfied’ students. This is quite a common survey procedure (I admit to having done it myself), but it is flawed. Someone who ‘mostly agrees’ may still have important reservations about their course.
If you actually analyse satisfaction scores, you find that the great majority of universities fall within a narrow range, with some outliers. As an example, ‘satisfaction’ among students with degrees in medicine in English universities range from 99% in Oxford to 69% in Manchester. The median satisfaction is 89%, with the interquartile range between 84% and 94%. So half of all medical schools fall within only ten percentage points around the middle of the range. This makes ranking pointless, because a small change in percentage satisfaction from one year to the next could send an individual medical schools several places up the rankings, but would amount to little more than the usual fluctuations common to surveys of this kind.
A far more useful step is to look at the outliers. What is so special about Oxford (99%) and Leeds (97%)? Alternatively, are there problems at Manchester (69%) and Liverpool (70%)? Before we get too excited about the latter two universities, note that they have levels of satisfactions that most politicians and people in the media would only dream about. However, to see if there are particular problems, we need to look in more depth at the full range of results. We could also see if there is anything distinctive in the way they teach. Actually, we do know that both universities have been very committed to problem-based learning (PBL). This is a way of teaching medicine that involves replacing conventional lecture-based teaching by a system whereby small groups of students are set a series of written case descriptions. Students then work as a group to investigate the scientific basis for the presenting problem and the evidence for the most effective treatment.
Research on PBL in medicine is (in common with a lot of research in education) inconclusive. But medical students are very bright and highly-motivated, and would probably triumph if their education amounted to little more than setting them a weekly question to answer and presenting them with a pile of textbooks to read. Come to think of it, this more or less describes how PBL operates.
Of course, this misses the point. The 100 metres final is a race, and what matters in a race is the ranking of the runners - who comes first (and to a less degree second and third). But just as it makes no sense to ignore ranking in races, it is equally pointless to analyse all areas of human activity as if they are races. This does not stop people doing it, and rankings are now common not just for things that are simple to measure (such as the time taken to run a race), but also for institutions and the complex range of activities they perform. This means that such rankings have to be derived from multiples of ‘scores’ based on a range of unreliable data about something that can never be reliably measured.
As an example, let’s take university research and teaching. Recent years have seen a growing collection of international ‘league tables’ of universities. These use a wide range of data, including statistics about numbers of staff and students, numbers of publications in academic journals, income from research grants, and rankings by panels of academics. This data is then scored, weighted and combined to produce a combined score which can be ranked. Individual universities can then congratulate themselves that they have ascended from the 79th best university in the world to the 78th best, while those that have descended a place or two can fret, threaten their academic staff and sack their vice chancellor.
Yet higher education systems differ greatly between countries, and universities themselves are usually a diverse ragbag of big and small research groups, teaching teams, departments, and schools. This makes a single league table a dubious affair, even if it is based on reliable data. But that of course is not the case. The university in which I worked (by most standards a well-managed institution) struggled to find out what its academic staff were doing with their time, or the quality of their achievements. In other universities, the data on which international league tables are based may be little more than a work of mystery and imagination.
But the problem lies not so much in the dubious quality of the data, but the very act of ranking. Even when the results are derived from a single survey in one country, the results are often analysed in a misleading way. As an example, look at the National Student Survey in the UK, which is taken very seriously by the UK higher education sector. In its most recent form, this comprises 22 questions about different aspects of the student’s university and course. All ranked on a five-point Likert scale and given a score from ‘definitely agree’ (scored 5) to ‘definitely disagree’ (scored 1). A conventional way of comparing universities and courses would therefore be to take the mean score on each scale, and this is how they are analysed in papers like the Guardian. So, by institution, the results for the overall satisfaction question vary from the maximum of 4.5 (the Open University) to 3.5 (the University of the Arts, London).
This is all seen as being a bit too technical for prospective students, and so the Unistats website reports only the responses to a single statement in the Survey: “Overall, I am satisfied with the quality of the course”. It then adds together the number of students with scores of 5 (‘definitely agree’) and 4 (‘mostly agree’) to produce the percentage of ‘satisfied’ students. This is quite a common survey procedure (I admit to having done it myself), but it is flawed. Someone who ‘mostly agrees’ may still have important reservations about their course.
If you actually analyse satisfaction scores, you find that the great majority of universities fall within a narrow range, with some outliers. As an example, ‘satisfaction’ among students with degrees in medicine in English universities range from 99% in Oxford to 69% in Manchester. The median satisfaction is 89%, with the interquartile range between 84% and 94%. So half of all medical schools fall within only ten percentage points around the middle of the range. This makes ranking pointless, because a small change in percentage satisfaction from one year to the next could send an individual medical schools several places up the rankings, but would amount to little more than the usual fluctuations common to surveys of this kind.
A far more useful step is to look at the outliers. What is so special about Oxford (99%) and Leeds (97%)? Alternatively, are there problems at Manchester (69%) and Liverpool (70%)? Before we get too excited about the latter two universities, note that they have levels of satisfactions that most politicians and people in the media would only dream about. However, to see if there are particular problems, we need to look in more depth at the full range of results. We could also see if there is anything distinctive in the way they teach. Actually, we do know that both universities have been very committed to problem-based learning (PBL). This is a way of teaching medicine that involves replacing conventional lecture-based teaching by a system whereby small groups of students are set a series of written case descriptions. Students then work as a group to investigate the scientific basis for the presenting problem and the evidence for the most effective treatment.
Research on PBL in medicine is (in common with a lot of research in education) inconclusive. But medical students are very bright and highly-motivated, and would probably triumph if their education amounted to little more than setting them a weekly question to answer and presenting them with a pile of textbooks to read. Come to think of it, this more or less describes how PBL operates.