2024
Core Maths: Who takes it, what do they take it with, and does it improve performance in other subjects?
Gill, T. (2024). Core Maths: who takes it, what do they take it with, and does it improve performance in other subjects? Research Matters: A Cambridge University Press & Assessment publication, 38, 48-65. https://doi.org/10.17863/CAM.111628
Core Maths qualifications were introduced into the post-16 curriculum in England in 2014 to help students develop their quantitative and problem-solving skills. Taking the qualification should also give students confidence in understanding the mathematical content in other courses taken at the same time.
In this article, we explore whether Core Maths is fulfilling its aims. In particular:
• Does Core Maths provide students with a benefit (in terms of attainment) in other, quantitative, Key Stage 5 subjects (e.g., A Level Psychology, BTEC Engineering)?
We also investigate some aspects of the uptake of Core Maths:
• What are the background characteristics of Core Maths students (e.g., gender, prior attainment, ethnicity)?
• Which other qualifications (e.g., A Levels, BTECs, Cambridge Technicals) and subjects are students most likely to take alongside Core Maths?
The main finding was that students taking Core Maths had a slightly higher probability (than those not taking Core Maths) of achieving good grades in some subjects taken concurrently. Uptake of Core Maths remains relatively low, so there is certainly scope for greater numbers of students to take advantage of the potential benefits of studying the qualification.
Research Matters 38: Autumn 2024
- Foreword Tim Oates
- Editorial Victoria Crisp
- Troubleshooting in emergency education settings: What types of strategies did schools employ during the COVID-19 pandemic and what can they tell us about schools’ adaptability, values and crisis-readiness?Filio Constantinou
- How long should a high stakes test be?Tom Benton
- Core Maths: Who takes it, what do they take it with, and does it improve performance in other subjects?Tim Gill
- Does typing or handwriting exam responses make any difference? Evidence from the literatureSanti Lestari
- Comparing music recordings using Pairwise Comparative Judgement: Exploring the judge experienceLucy Chambers, Emma Walland and Jo Ireland
- Research NewsLisa Bowett
2023
An analysis of the relationship between Secondary Checkpoint and IGCSE results
Gill, T. (2023). An analysis of the relationship between Secondary Checkpoint and IGCSE results Research Matters: A Cambridge University Press & Assessment publication, 36, 59-74. https://doi.org/10.17863/CAM.101745
Secondary Checkpoint assessments are taken by students at the end of the Cambridge Lower Secondary programme (aged 14) in countries around the world. Many students continue with Cambridge after this and take IGCSE exams two years later.
Given that there is a high level of coherence between the curricula in the two stages, performance in Secondary Checkpoint should be a good indicator of performance at IGCSE.
In this article, I investigate whether there is evidence to support this contention, by calculating correlations between Checkpoint scores and IGCSE grades, across a range of subjects. I also look at whether students in schools offering the Cambridge Lower Secondary programme go on to perform better at IGCSE than schools not offering the programme.
Research Matters 36: Autumn 2023
- Foreword Tim Oates
- Editorial Tom Bramley
- The prevalence and relevance of Natural History assessments in the school curriculum, 1858–2000: a study of the Assessment ArchivesGillian Cooke
- The impact of GCSE maths reform on progression to mathematics post-16Carmen Vidal Rodeiro, Joanna Williamson
- An example of redeveloping checklists to support assessors who check draft exam papers for errorsSylvia Vitello, Victoria Crisp, Jo Ireland
- An analysis of the relationship between Secondary Checkpoint and IGCSE resultsTim Gill
- Synchronous hybrid teaching: how easy is it for schools to implement?Filio Constantinou
- Research NewsLisa Bowett
2022
Research Matters 33: Spring 2022
- Foreword Tim Oates
- Editorial Tom Bramley
- A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries Tom Benton, Tim Gill, Sarah Hughes, Tony Leech
- How do judges in Comparative Judgement exercises make their judgements? Tony Leech, Lucy Chambers
- Judges' views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking Emma Walland
- The concurrent validity of Comparative Judgement outcomes compared with marks Tim Gill
- How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence? Joanna Williamson
- Moderation of non-exam assessments: is Comparative Judgement a practical alternative? Carmen Vidal Rodeiro, Lucy Chambers
- Research News Lisa Bowett
The concurrent validity of comparative judgement outcomes compared with marks
Gill, T. (2022). The concurrent validity of comparative judgement outcomes compared with marks. Research Matters: A Cambridge University Press & Assessment publication, 33, 68–79.
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards.
Results from previous CJ studies have demonstrated that the method appears to be valid and reliable in many contexts. However, it is not entirely clear whether CJ works as well as it does because of the physical and judgemental processes involved (i.e., placing two scripts next to each other and deciding which is better based on an intuitive, holistic, and relative judgement), or because CJ exercises capture a lot of individual paired comparison decisions quickly. This article adds to the research on this question by re-analysing data from previous CJ studies and comparing the concurrent validity of the outcomes of individual CJ paired comparisons with the concurrent validity of outcomes based on the original marks given to scripts.
The results show that for 16 out of the 20 data sets analysed, mark-based decisions had higher concurrent validity than CJ-based decisions. Two possible reasons for this finding are: CJ decisions reward different skills to marks; or individual CJ decisions are of lower quality than individual decisions based on marks. Either way, the implication is that the CJ method works because many individual paired comparison decisions are captured quickly, rather than because of the physical and psychological processes involved in making holistic judgements.
A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries
Benton, T., Gill. T., Hughes, S., & Leech. T. (2022). A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries. Research Matters: A Cambridge University Press & Assessment publication, 33, 10–30.
The rationale for the use of comparative judgement (CJ) to help set grade boundaries is to provide a way of using expert judgement to identify and uphold certain minimum standards of performance rather than relying purely on statistical approaches such as comparable outcomes. This article summarises the results of recent trials of using CJ for this purpose in terms of how much difference it might have made to the positions of grade boundaries, the reported precision of estimates and the amount of time that was required from expert judges.
The results show that estimated grade boundaries from a CJ approach tend to be fairly close to those that were set (using other forms of evidence) in practice. However, occasionally, CJ results displayed small but significant differences with existing boundary locations. This implies that adopting a CJ approach to awarding would have a noticeable impact on awarding decisions but not such a large one as to be implausible. This article also demonstrates that implementing CJ using simplified methods (described by Benton, Cunningham et al, 2020) achieves the same precision as alternative CJ approaches, but in less time. On average, each CJ exercise required roughly 30 judge-hours across all judges.
2019
Research Matters 28: Autumn 2019
- Foreword Tim Oates, CBE
- Editorial Tom Bramley
- Which is better: one experienced marker or many inexperienced markers? Tom Benton
- "Learning progressions": A historical and theoretical discussion Tom Gallacher, Martin Johnson
- The impact of A Level subject choice and students' background characteristics on Higher Education participation Carmen Vidal Rodeiro
- Studying English and Mathematics at Level 2 post-16: issues and challenges Jo Ireland
- Methods used by teachers to predict final A Level grades for their students Tim Gill
- Research News David Beauchamp
Methods used by teachers to predict final A Level grades for their students
Gill, T. (2019). Methods used by teachers to predict final A Level grades for their students. Research Matters: A Cambridge Assessment publication, 28, 33-42.
This research used a survey to investigate how Chemistry, English Literature and Psychology teachers go about the process of estimating their students’ A level grades. There are a variety of different sources of information available to help teachers, including statistically based predictions (e.g., ALIS), performance in previous assessments (e.g., GCSEs) or in-class assessments, and their own judgements of students’ motivation, interest and resilience. Teachers were also asked to provide grade estimates for their current A level students and these were then compared with actual grades to provide an indication of accuracy. Follow up interviews were undertaken to elicit more detail about the process of making estimates, and to ask teachers about specific students who either under or over-performed compared to their estimate.
The results will be discussed in the context of recent reforms to A levels (e.g., de-coupling of AS levels), which are likely to have had an impact on how teachers make their estimates and how accurate they are.
2018
How have students and schools performed on the Progress 8 performance measure?
Gill, T. (2018). How have students and schools performed on the Progress 8 performance measure? Research Matters: A Cambridge Assessment publication, 26, 28-36.
The new league table measures (Attainment 8 and Progress 8) are based on performance in a student’s best eight subjects at GCSE (or equivalent). One criticism of the previous measures was that they penalised schools with a low-attaining intake. As Progress 8 is a value-added measure, it already accounts for the prior attainment of the student and should in theory no longer penalise these schools. The purpose of this research was to delve deeper into the relationship between Progress 8 scores and various student and school level factors. In particular, multilevel regression modelling was undertaken to infer which factors were most important in determining scores at student level. The results showed that various groups of students were predicted higher Progress 8 scores including girls, less deprived students, students without SEN and students in schools with a higher performing intake. At the school level, higher Progress 8 scores were found amongst schools with higher-performing intakes. This suggests that one of the main aims of the new measures (levelling the playing field) has not been completely achieved.
2017
Higher education choices of secondary school graduates with a Science, Technology, Engineering or Mathematics (STEM) background.
Gill, T., Vidal Rodeiro, C.L. and Zanini, N. (2017). Higher education choices of secondary school graduates with a Science, Technology, Engineering or Mathematics (STEM) background. Journal of Further and Higher Education, 42(7), 998-1014.
An analysis of the effect of taking the EPQ on performance in other Level 3 qualifications
Gill, T. (2017). An analysis of the effect of taking the EPQ on performance in other Level 3 qualifications. Research Matters: A Cambridge Assessment publication, 23, 27-34.
The Extended Project Qualification (EPQ) is a stand-alone qualification taken by sixth form students. It involves undertaking a substantial project, where the outcome can range from writing a dissertation or report to putting on a performance. It is possible that some of the skills learnt by students whilst undertaking their project (e.g. independent research, problem-solving) could help them in other qualifications taken at the same time. Two separate investigations were undertaken: firstly, the performance of individual students was analysed, using a multilevel regression model to compare EPQ and non-EPQ students. The results showed that there was a small, but statistically significant effect, with those taking EPQ achieving better results on average in their A levels. The second investigation analysed performance at school level, using a regression to model the effect of increasing the percentage of students in a school taking EPQ. The results showed a significant and positive effect of increasing the percentage of students taking EPQ. However, the effect was very small.
2016
Assessing the equivalencies of the UCAS tariff for different qualifications
Gill, T. (2016). Assessing the equivalencies of the UCAS tariff for different qualifications. Research Matters: A Cambridge Assessment publication, 21, 16-23.
In the United Kingdom (UK) the Universities and College Admissions Service (UCAS) provides the application process for most universities. The UCAS tariff points system is used by universities to help them select students for their courses. Each grade in eligible qualifications is allocated a points score,which can then be summed in order to provide an overall points score for each student. The allocation of points is such that,in theory,students with the same overall points score gained from different qualifications can be considered to be of equivalent ability or potential. The purpose of this article is to test whether this assumption works in practice, by calculating empirical equivalencies of the UCAS tariff for different qualification.
2015
Students’ choices in Higher Education
Gill, T., Vidal Rodeiro, C.L. and Zanini, N. (2015) Paper presented at the British Educational Research Association (BERA) conference, Belfast, 15-17 September 2015
Using generalised boosting models to evaluate the UCAS tariff
Gill, T. (2015). Using generalised boosting models to evaluate the UCAS tariff. Research Matters: A Cambridge Assessment publication, 20, 2-6.
The Universities and Colleges Admissions Service (UCAS) is a UK-based organisation providing the application process for almost all British universities. The UCAS tariff points system is used by universities to help select students for entry to their courses. Each grade in a qualification has a certain number of UCAS points allocated to it, which are then summed to provide an overall tariff points score for each student. The assumption made is that two students with the same UCAS tariff gained from different qualifications are of the same ability, or have the same potential to achieve at university. This article uses a statistical technique known as generalised boosting models (GBMs) to evaluate the use of the UCAS tariff as a predictor of degree outcome.
Using generalised boosting models to evaluate the UCAS tariff
Gill, T. (2015). Using generalised boosting models to evaluate the UCAS tariff. Research Matters: A Cambridge Assessment publication, 20, 2-6.
The Universities and Colleges Admissions Service (UCAS) is a UK-based organisation providing the application process for almost all British universities. The UCAS tariff points system is used by universities to help select students for entry to their courses. Each grade in a qualification has a certain number of UCAS points allocated to it, which are then summed to provide an overall tariff points score for each student. The assumption made is that two students with the same UCAS tariff gained from different qualifications are of the same ability, or have the same potential to achieve at university. This article uses a statistical technique known as generalised boosting models (GBMs) to evaluate the use of the UCAS tariff as a predictor of degree outcome.
The moderation of coursework and controlled assessment: A summary
Gill, T. (2015). The moderation of coursework and controlled assessment: A summary. Research Matters: A Cambridge Assessment publication, 19, 26-31.
To ensure consistency and accuracy of marking, awarding bodies carry out moderation of GCSE and A level internally assessed work (e.g., coursework or controlled assessment). Training and instructions are provided by the awarding body to the internal assessors in each centre, including training in task-setting, marking and internal standardisation. Awarding bodies are required to modify centres’ marks where necessary to bring judgements into line with the required standard.
Samples are taken of (internally standardised) candidates’ work. A moderator re-marks the sampled work, and if there is a difference between the centre’s and moderator’s marks that is larger than a certain amount then marks should be adjusted. Should it be necessary to adjust a centre’s marks then the magnitude of the adjustments is determined by a regression analysis, based on the relationship between the marks given by the centre and those of the moderator in the sample.
This article summarises the processes undertaken by the Oxford, Cambridge and RSA (OCR) exam board to moderate and, if necessary, adjust the marks of centre-marked coursework and controlled assessments. Some brief data analysis is also presented to give an idea of the extent of moderation and how much difference it makes to candidates’ marks.
A level History: Which factors motivate teachers’ unit and topic choices?
Child, S., Darlington, E. and Gill, T. (2015). A level History: Which factors motivate teachers’ unit and topic choices? Research Matters: A Cambridge Assessment publication, 19, 2-6.
The flexibility inherent in A level History qualifications means that teachers have to negotiate competing factors that may influence topic, unit or qualification choices. The present article aimed to use questionnaire data derived from heads of History departments to analyse the motivations underpinning the unit and topic choices for an A level History course. A second aim was to analyse whether the Heads of Department from different school types had different influences underlying their choices.
The two most common motivating factors underlying teachers’ choices of units and topics were found to be teacher expertise and perceived student engagement. Fisher’s Exact analyses revealed that these motivations were deemed significantly more important by state school teachers, compared to independent school teachers, in guiding their topic selections (both p < .05). There were also statistically significant differences between school types in terms of how their Heads of Department rated the importance of the curriculum support offered via different resources.
These findings are discussed with reference to the recent qualifications reform in the UK, and the role of the teacher in determining topic choice and delivery of history.
2014
An investigation of the effect of early entry on overall GCSE performance, using a propensity score matching method
Gill, T. (2014). An investigation of the effect of early entry on overall GCSE performance, using a propensity score matching method. Research Matters: A Cambridge Assessment publication, 18, 28-36.
Previous research has shown (Gill, 2013) that certain groups of students performed worse than expected in some GCSE subjects when they were taken early (i.e. in Year 10).
However, one possible reason for taking a GCSE exam early is to ‘get it out of the way’ to enable increased focus on other subjects in Year 11. This study used a propensity score matching method to investigate whether students entering early for GCSEs performed better or worse across all their GCSEs (or equivalents) than those who did not enter for any GCSEs early. In terms of overall GCSE performance, there did not seem to be any advantage in early entry after accounting for differences in the characteristics of the students in the two groups. However, when looking at all qualifications (including non-GCSEs), early entry students did perform better than those not taking any GCSEs early, to a statistically significant degree. Furthermore, early entry students were more likely to pass the five A* to C threshold measure.
Students’ views and experiences of A level module re-sits
Gill, T. and Suto, I. (2014). Students' views and and experiences of A level module re-sits. Research Matters: A Cambridge Assessment publication, 18, 10-18.
In this study we obtained over 1,300 A level students’ views and experiences of re-sits in Psychology and Mathematics, prior to a reduction in re-sit opportunities taking effect nationally. The aim in collecting the data was to gain an understanding of what the likely effects of a system of reduced re-sits would be on students and their teachers. We focused on two popular but contrasting A level subjects: Psychology and Mathematics.
We found that one of the students’ most common reasons for re-sitting could be seen as a valid means of getting a higher grade. However, most students who responded to the questionnaire gave multiple reasons for re-sitting a module. In each subject, a majority thought that re-sits had both made them work harder, and increased their knowledge of the subject. These views indicate that module examinations do not only provide summative assessment, but are also used for formative assessment purposes too.
An analysis of the unit and topic choices made in an OCR A level History course
Child, S., Darlington, E. and Gill, T. (2014). An analysis of the unit and topic choices made in an OCR A level History course. Research Matters: A Cambridge Assessment publication, 18, 2-9.
This study aimed to explore how schools that offer A level History use the options available to them, in terms of unit and topic choices. Specifically, this study aimed to determine which units and topics were most commonly taught. It was intended that this data would help establish how optionality within A level History is used, and whether it meets the desired purpose of exposing students to a broad range of historical periods and topics. Data was collated using a survey of 90 heads of history departments, and from an analysis of topic choices within one A level history unit. Comparisons were made between different school types (state vs independent), and schools with different levels of performance. Approximately 60% of centres sampled taught either a combination of F961B and F964B or F962B and F963B; the two unit combinations which permit Modern History to be studied exclusively. In terms of topic choice, it was found that schools seek to teach in-depth within a historical era, rather than breadth over different historical periods. The findings are discussed in relation to the ongoing reforms to A level history.
2012
Cambridge Assessment Statistics Reports: Recent highlights
Emery, J., Gill, T., Grayson, R. and Vidal Rodeiro, C. L. (2012). Cambridge Assessment Statistics Reports: Recent highlights. Research Matters: A Cambridge Assessment publication, 14, 45-50.
The Research Division publishes a number of Statistics Reports each year based on the latest national examinations data. These are statistical summaries of various aspects of the English examination system, covering topics such as subject provision and uptake, popular subject combinations, trends over time in the uptake of particular subjects and the examination attainment of different groups of candidates. The National Pupil Database (NPD) is the source of most of these reports. This is a very large longitudinal database, owned by the Department for Education, which tracks the examination attainment of all pupils within schools in England from their early years up to Key Stage 5 (A level or equivalent). Another database, the Pupil Level Annual School Census (PLASC), can be requested matched to the NPD. This contains background information on candidates such as deprivation indicators, language, ethnicity and special educational needs. Other sources of data used to produce the Statistics Reports include the Inter-Awarding Body Statistics produced by the Joint Council for Qualifications (JCQ). This article highlights some of the most recent Statistics Reports, published between 2010 and 2011.
2011
Assessment instruments over time
Elliott, G., Curcin, M., Johnson, N., Bramley, T., Ireland, J., Gill, T. & Black, B. Assessment instruments over time. Research Matters: A Cambridge University Press & Assessment publication, A selection of articles (2011) 2-4. First published in Research Matters, Issue 7, January 2009
As Cambridge Assessment celebrated its 150th anniversary in 2008 members of the Evaluation and Psychometrics Team looked back at question papers over the years. Details of the question papers and examples of questions were used to illustrate the development of seven subjects: Mathematics, Physics, Geography, Art, French, Cookery and English Literature. Two clear themes emerged from the work across most subjects - an increasing emphasis on real-world contexts in more recent years and an increasing choice of topic areas and question/component options available to candidates.
Does doing a critical thinking A level confer any advantage for candidates in their performance on other A levels?
Black, B. and Gill, T. (2011). Does doing a critical thinking A level confer any advantage for candidates in their performance on other A levels? Research Matters: A Cambridge Assessment publication, 11, 20-24.
Critical Thinking AS level was introduced in schools in 2001. There is much research that shows that the teaching of Critical Thinking (CT) does improve critical thinking skills. However, there is less research which shows whether CT skills can be profitably transferred to other subject domains. The purpose of this research was to investigate whether taking an AS level in CT improves performance in other A levels. Using national examination data from 2005 and 2006 we compared the performance of CT students with those not taking CT. The results of both a basic comparison of mean A level performance and a regression analysis (which accounts for prior attainment) showed a significant and positive effect of taking CT, compared with not doing so. According to the regression results, the difference was equivalent to about one tenth of a grade per A level, although the difference was greater for those achieving a higher CT grade.
2010
Must examiners meet in order to standardise their marking? An experiment with new and experienced examiners of GCE AS Psychology
Raikes, N., Fidler, J. and Gill, T. (2010). Must examiners meet in order to standardise their marking? An experiment with new and experienced examiners of GCE AS Psychology. Research Matters: A Cambridge Assessment publication, 10, 21-27.
When high-stakes examinations are marked by a panel of examiners, the examiners must be standardised so that candidates are not advantaged or disadvantaged according to which examiner marks their work.
It is common practice for Awarding Bodies’ standardisation processes to include a “Standardisation” or “Co-ordination” meeting, where all examiners meet to be briefed by the Principal Examiner and to discuss the application of the mark scheme in relation to specific examples of candidates’ work. Research into the effectiveness of standardisation meetings has cast doubt on their usefulness, however, at least for experienced examiners.
In the present study we addressed the following research questions:
1. What is the effect on marking accuracy of including a face-to-face meeting as part of an examiner standardisation process?
2. How does the effect on marking accuracy of a face-to-face meeting vary with the type of question being marked (short-answer or essay) and the level of experience of the examiners?
3. To what extent do examiners carry forward standardisation on one set of questions to a different but very similar set of questions?
2009
How effective is fast and automated feedback to examiners in tackling the size of marking errors?
Sykes, E., Novakovic, N., Greatorex, J., Bell, J., Nadas, R. and Gill, T. (2009). How effective is fast and automated feedback to examiners in tackling the size of marking errors? Research Matters: A Cambridge Assessment publication, 8, 8-15.
Reliability is important in national assessment systems. Therefore, there is a good deal of research about examiners’ marking reliability. However, some questions remain unanswered due to the changing context of e-marking, particularly the opportunity for fast and automated feedback to examiners on their marking. Some of these questions are:
• will iterative feedback result in greater marking accuracy than only one feedback session?
• will encouraging examiners to be consistent (rather than more accurate) result in greater marking accuracy?
• will encouraging examiners to be more accurate (rather than more consistent) result in greater marking accuracy?
Thirty three examiners were matched into four experimental groups based on severity of their marking. All examiners marked the same 100 candidate responses, in the same short time scale. Group 1 received one session of feedback about their accuracy. Group 2 received three iterative sessions of feedback about the accuracy of their marking. Group 3 received one session of feedback about their consistency. Group 4 received three iterative sessions of feedback about the consistency of their marking. Absolute differences between examiners’ marking and a reference mark were analysed using a general linear model. The results of the present analysis pointed towards the answer to all the research questions being “no”. The results presented in this article are not intended to be used to evaluate current marking practices. Rather the article is intended to contribute to answering the research questions, and developing an evidence base for the principles that should be used to design and improve marking practices.
Assessment instruments over time
Elliott, G., Curcin, M., Bramley, T., Ireland, J., Gill, T. and Black, B. (2009). Assessment instruments over time. Research Matters: A Cambridge Assessment publication, 7, 23-25.
As Cambridge Assessment celebrated its 150th anniversary in 2008 members of the Evaluation and Psychometrics Team looked back at question papers over the years. Details of the question papers and examples of questions were used to illustrate the development of seven subjects: Mathematics, Physics, Geography, Art, French, Cookery and English Literature. Two clear themes emerged from the work across most subjects - an increasing emphasis on real-world contexts in more recent years and an increasing choice of topic areas and question/component options available to candidates.
2008
Assessment Instruments over Time
Elliott, G., Black, B. Ireland, J., Gill, T., Bramley, T., Johnson, N. and Curcin, M. (2008) International Association for Educational Assessment (IAEA) Conference, Cambridge
Using simulated data to model the effect of inter-marker correlation on classification consistency
Gill, T. and Bramley, T. (2008). Using simulated data to model the effect of inter-marker correlation on classification consistency. Research Matters: A Cambridge Assessment publication, 5, 29-36.
The marking of exam papers is never going to be 100% reliable unless all exams consist entirely of multiple-choice or other completely objective questions. Different opinions on the quality of the work or different interpretations of the mark schemes create the potential for candidates to receive a different mark depending on which examiner marks their paper. Of more concern for candidates is the potential for candidates to receive a different grade from a different examiner. The purpose of this study was to use simulated data to estimate the extent to which examinees might get a different grade for: i) different levels of correlation between markers and ii) for different grade bandwidths.