ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Newton, Phil; Xiromeriti, Maria

doi:10.1080/02602938.2023.2299059

Journal article 1165 views 3163 downloads

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Phil Newton

, Maria Xiromeriti

Assessment and Evaluation in Higher Education, Volume: 49, Issue: 6, Pages: 781 - 798

Swansea University Authors: Phil Newton , Maria Xiromeriti

PDF | Version of Record

© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
Download (2.96MB)

Check full text

DOI (Published version): 10.1080/02602938.2023.2299059

Abstract

Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance...

Full description

Published in:	Assessment and Evaluation in Higher Education
ISSN:	0260-2938 1469-297X
Published:	Informa UK Limited 2024
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa65458

Abstract:	Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.
Keywords:	Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism
College:	Faculty of Medicine, Health and Life Sciences
Funders:	Swansea University
Issue:	6
Start Page:	781
End Page:	798

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Similar Items