Journal article 370 views 200 downloads
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
Assessment and Evaluation in Higher Education, Volume: 49, Issue: 6, Pages: 781 - 798
Swansea University Authors: Phil Newton , Maria Xiromeriti
-
PDF | Version of Record
© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
Download (2.96MB)
DOI (Published version): 10.1080/02602938.2023.2299059
Abstract
Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance...
Published in: | Assessment and Evaluation in Higher Education |
---|---|
ISSN: | 0260-2938 1469-297X |
Published: |
Informa UK Limited
2024
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa65458 |
first_indexed |
2024-01-18T09:59:03Z |
---|---|
last_indexed |
2024-11-25T14:16:07Z |
id |
cronfa65458 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2024-10-17T15:47:36.4070953</datestamp><bib-version>v2</bib-version><id>65458</id><entry>2024-01-18</entry><title>ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review</title><swanseaauthors><author><sid>6e0a363d04c407371184d82f7a5bddc8</sid><ORCID>0000-0002-5272-7979</ORCID><firstname>Phil</firstname><surname>Newton</surname><name>Phil Newton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>cca9e23c0afcde81658d9223c57583ec</sid><firstname>Maria</firstname><surname>Xiromeriti</surname><name>Maria Xiromeriti</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-01-18</date><deptcode>MEDS</deptcode><abstract>Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.</abstract><type>Journal Article</type><journal>Assessment and Evaluation in Higher Education</journal><volume>49</volume><journalNumber>6</journalNumber><paginationStart>781</paginationStart><paginationEnd>798</paginationEnd><publisher>Informa UK Limited</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0260-2938</issnPrint><issnElectronic>1469-297X</issnElectronic><keywords>Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism</keywords><publishedDay>17</publishedDay><publishedMonth>1</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-01-17</publishedDate><doi>10.1080/02602938.2023.2299059</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2024-10-17T15:47:36.4070953</lastEdited><Created>2024-01-18T09:56:09.9377750</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Medicine</level></path><authors><author><firstname>Phil</firstname><surname>Newton</surname><orcid>0000-0002-5272-7979</orcid><order>1</order></author><author><firstname>Maria</firstname><surname>Xiromeriti</surname><order>2</order></author></authors><documents><document><filename>65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf</filename><originalFilename>65458_VoR.pdf</originalFilename><uploaded>2024-03-20T17:53:56.1968436</uploaded><type>Output</type><contentLength>3105434</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by-nc-nd/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
spelling |
2024-10-17T15:47:36.4070953 v2 65458 2024-01-18 ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review 6e0a363d04c407371184d82f7a5bddc8 0000-0002-5272-7979 Phil Newton Phil Newton true false cca9e23c0afcde81658d9223c57583ec Maria Xiromeriti Maria Xiromeriti true false 2024-01-18 MEDS Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge. Journal Article Assessment and Evaluation in Higher Education 49 6 781 798 Informa UK Limited 0260-2938 1469-297X Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism 17 1 2024 2024-01-17 10.1080/02602938.2023.2299059 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University SU Library paid the OA fee (TA Institutional Deal) Swansea University 2024-10-17T15:47:36.4070953 2024-01-18T09:56:09.9377750 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Medicine Phil Newton 0000-0002-5272-7979 1 Maria Xiromeriti 2 65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf 65458_VoR.pdf 2024-03-20T17:53:56.1968436 Output 3105434 application/pdf Version of Record true © 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License true eng http://creativecommons.org/licenses/by-nc-nd/4.0/ |
title |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
spellingShingle |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review Phil Newton Maria Xiromeriti |
title_short |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
title_full |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
title_fullStr |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
title_full_unstemmed |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
title_sort |
ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review |
author_id_str_mv |
6e0a363d04c407371184d82f7a5bddc8 cca9e23c0afcde81658d9223c57583ec |
author_id_fullname_str_mv |
6e0a363d04c407371184d82f7a5bddc8_***_Phil Newton cca9e23c0afcde81658d9223c57583ec_***_Maria Xiromeriti |
author |
Phil Newton Maria Xiromeriti |
author2 |
Phil Newton Maria Xiromeriti |
format |
Journal article |
container_title |
Assessment and Evaluation in Higher Education |
container_volume |
49 |
container_issue |
6 |
container_start_page |
781 |
publishDate |
2024 |
institution |
Swansea University |
issn |
0260-2938 1469-297X |
doi_str_mv |
10.1080/02602938.2023.2299059 |
publisher |
Informa UK Limited |
college_str |
Faculty of Medicine, Health and Life Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
department_str |
Swansea University Medical School - Medicine{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Medicine |
document_store_str |
1 |
active_str |
0 |
description |
Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge. |
published_date |
2024-01-17T20:36:48Z |
_version_ |
1821982808107646976 |
score |
11.048042 |