A Bayesian active learning approach to comparative judgement within education assessment

Gray, Andrew; Rahat, Alma; Crick, Tom; Lindsay, Stephen

doi:10.1016/j.caeai.2024.100245

Journal article 322 views 50 downloads

A Bayesian active learning approach to comparative judgement within education assessment

Andrew Gray, Alma Rahat

, Tom Crick

, Stephen Lindsay

Computers and Education: Artificial Intelligence, Volume: 6, Start page: 100245

Swansea University Authors: Andrew Gray, Alma Rahat , Tom Crick

PDF | Version of Record

This is an open access article under the CC BY 4.0 Creative Commons Attribution license.
Download (1.36MB)

Check full text

DOI (Published version): 10.1016/j.caeai.2024.100245

Abstract

Assessment is a crucial part of education. Traditional marking is a source of inconsistencies andunconscious bias, placing a high cognitive load on the assessors. One approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items of work, and as...

Full description

Published in:	Computers and Education: Artificial Intelligence
ISSN:	2666-920X 2666-920X
Published:	Elsevier BV 2024
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa66575

first_indexed	2024-06-03T12:20:34Z
last_indexed	2024-11-25T14:18:26Z
id	cronfa66575
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-07-04T12:09:57.4524603</datestamp><bib-version>v2</bib-version><id>66575</id><entry>2024-06-03</entry><title>A Bayesian active learning approach to comparative judgement within education assessment</title><swanseaauthors><author><sid>bc3b702690562033af78adc3e4c7ef9e</sid><firstname>Andrew</firstname><surname>Gray</surname><name>Andrew Gray</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><ORCID>0000-0002-5023-1371</ORCID><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-06-03</date><deptcode>MACS</deptcode><abstract>Assessment is a crucial part of education. Traditional marking is a source of inconsistencies andunconscious bias, placing a high cognitive load on the assessors. One approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items of work, and asked to select the better one. Following a series of comparisons, a rank for any item may be derived using a ranking model, for example, the Bradley-Terry model, based on the pairwise comparisons. While CJ is considered to be a reliable method for conducting marking, there are concerns surrounding its transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used within the process.As a consequence, a random selection approach is usually deployed.In this paper, we propose a novel Bayesian approach to CJ (which we call BCJ) for determiningthe ranks of a range of items under scrutiny alongside a new way to select the pairs to present tothe marker(s) using active learning, addressing the key shortcomings of traditional CJ. Furthermore,we demonstrate how the entire approach may provide transparency by providing the user insightsinto how it is making its decisions and, at the same time, being more efficient. Results from oursynthetic experiments confirm that the proposed BCJ combined with entropy-driven active learningpair-selection method is superior (i.e. always equal to or significantly better) than other alternatives,for example, the traditional CJ method with differing selection methods such as uniformly random,or the popular no repeating pairs where pairs are selected in a round-robin fashion. We also find thatthe more comparisons that are conducted, the more accurate BCJ becomes, which solves the issuethe current method has of the model deteriorating if too many comparisons are performed. As ourapproach can generate the complete predicted rank distribution for an item, we also show how thiscan be utilised in probabilistically devising a predicted grade, guided by the choice of the assessor.Finally, we demonstrate our approach on a real dataset on assessing GCSE (UK school-level) essays,highlighting the advantages of BCJ over CJ.</abstract><type>Journal Article</type><journal>Computers and Education: Artificial Intelligence</journal><volume>6</volume><journalNumber/><paginationStart>100245</paginationStart><paginationEnd/><publisher>Elsevier BV</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2666-920X</issnPrint><issnElectronic>2666-920X</issnElectronic><keywords>Comparative judgement, bayesian learning, active learning, machine learning, assessment, Bradley-Terry model (BTM)</keywords><publishedDay>1</publishedDay><publishedMonth>6</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-06-01</publishedDate><doi>10.1016/j.caeai.2024.100245</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>External research funder(s) paid the OA fee (includes OA grants disbursed by the Library)</apcterm><funders>EP/S021892/1</funders><projectreference/><lastEdited>2024-07-04T12:09:57.4524603</lastEdited><Created>2024-06-03T13:12:27.2709693</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Mathematics</level></path><authors><author><firstname>Andrew</firstname><surname>Gray</surname><order>1</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><orcid>0000-0002-5023-1371</orcid><order>2</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>3</order></author><author><firstname>Stephen</firstname><surname>Lindsay</surname><order>4</order></author></authors><documents><document><filename>66575__30687__36f3c05d31884d85ae23541298662abb.pdf</filename><originalFilename>66575.vor.pdf</originalFilename><uploaded>2024-06-19T16:30:55.2092777</uploaded><type>Output</type><contentLength>1422518</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>This is an open access article under the CC BY 4.0 Creative Commons Attribution license.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2024-07-04T12:09:57.4524603 v2 66575 2024-06-03 A Bayesian active learning approach to comparative judgement within education assessment bc3b702690562033af78adc3e4c7ef9e Andrew Gray Andrew Gray true false 6206f027aca1e3a5ff6b8cd224248bc2 0000-0002-5023-1371 Alma Rahat Alma Rahat true false 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2024-06-03 MACS Assessment is a crucial part of education. Traditional marking is a source of inconsistencies andunconscious bias, placing a high cognitive load on the assessors. One approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items of work, and asked to select the better one. Following a series of comparisons, a rank for any item may be derived using a ranking model, for example, the Bradley-Terry model, based on the pairwise comparisons. While CJ is considered to be a reliable method for conducting marking, there are concerns surrounding its transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used within the process.As a consequence, a random selection approach is usually deployed.In this paper, we propose a novel Bayesian approach to CJ (which we call BCJ) for determiningthe ranks of a range of items under scrutiny alongside a new way to select the pairs to present tothe marker(s) using active learning, addressing the key shortcomings of traditional CJ. Furthermore,we demonstrate how the entire approach may provide transparency by providing the user insightsinto how it is making its decisions and, at the same time, being more efficient. Results from oursynthetic experiments confirm that the proposed BCJ combined with entropy-driven active learningpair-selection method is superior (i.e. always equal to or significantly better) than other alternatives,for example, the traditional CJ method with differing selection methods such as uniformly random,or the popular no repeating pairs where pairs are selected in a round-robin fashion. We also find thatthe more comparisons that are conducted, the more accurate BCJ becomes, which solves the issuethe current method has of the model deteriorating if too many comparisons are performed. As ourapproach can generate the complete predicted rank distribution for an item, we also show how thiscan be utilised in probabilistically devising a predicted grade, guided by the choice of the assessor.Finally, we demonstrate our approach on a real dataset on assessing GCSE (UK school-level) essays,highlighting the advantages of BCJ over CJ. Journal Article Computers and Education: Artificial Intelligence 6 100245 Elsevier BV 2666-920X 2666-920X Comparative judgement, bayesian learning, active learning, machine learning, assessment, Bradley-Terry model (BTM) 1 6 2024 2024-06-01 10.1016/j.caeai.2024.100245 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University External research funder(s) paid the OA fee (includes OA grants disbursed by the Library) EP/S021892/1 2024-07-04T12:09:57.4524603 2024-06-03T13:12:27.2709693 Faculty of Science and Engineering School of Mathematics and Computer Science - Mathematics Andrew Gray 1 Alma Rahat 0000-0002-5023-1371 2 Tom Crick 0000-0001-5196-9389 3 Stephen Lindsay 4 66575__30687__36f3c05d31884d85ae23541298662abb.pdf 66575.vor.pdf 2024-06-19T16:30:55.2092777 Output 1422518 application/pdf Version of Record true This is an open access article under the CC BY 4.0 Creative Commons Attribution license. true eng http://creativecommons.org/licenses/by/4.0/
title	A Bayesian active learning approach to comparative judgement within education assessment
spellingShingle	A Bayesian active learning approach to comparative judgement within education assessment Andrew Gray Alma Rahat Tom Crick
title_short	A Bayesian active learning approach to comparative judgement within education assessment
title_full	A Bayesian active learning approach to comparative judgement within education assessment
title_fullStr	A Bayesian active learning approach to comparative judgement within education assessment
title_full_unstemmed	A Bayesian active learning approach to comparative judgement within education assessment
title_sort	A Bayesian active learning approach to comparative judgement within education assessment
author_id_str_mv	bc3b702690562033af78adc3e4c7ef9e 6206f027aca1e3a5ff6b8cd224248bc2 200c66ef0fc55391f736f6e926fb4b99
author_id_fullname_str_mv	bc3b702690562033af78adc3e4c7ef9e_*_Andrew Gray 6206f027aca1e3a5ff6b8cd224248bc2__Alma Rahat 200c66ef0fc55391f736f6e926fb4b99_**_Tom Crick
author	Andrew Gray Alma Rahat Tom Crick
author2	Andrew Gray Alma Rahat Tom Crick Stephen Lindsay
format	Journal article
container_title	Computers and Education: Artificial Intelligence
container_volume	6
container_start_page	100245
publishDate	2024
institution	Swansea University
issn	2666-920X 2666-920X
doi_str_mv	10.1016/j.caeai.2024.100245
publisher	Elsevier BV
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Mathematics{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Mathematics
document_store_str	1
active_str	0
description	Assessment is a crucial part of education. Traditional marking is a source of inconsistencies andunconscious bias, placing a high cognitive load on the assessors. One approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items of work, and asked to select the better one. Following a series of comparisons, a rank for any item may be derived using a ranking model, for example, the Bradley-Terry model, based on the pairwise comparisons. While CJ is considered to be a reliable method for conducting marking, there are concerns surrounding its transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used within the process.As a consequence, a random selection approach is usually deployed.In this paper, we propose a novel Bayesian approach to CJ (which we call BCJ) for determiningthe ranks of a range of items under scrutiny alongside a new way to select the pairs to present tothe marker(s) using active learning, addressing the key shortcomings of traditional CJ. Furthermore,we demonstrate how the entire approach may provide transparency by providing the user insightsinto how it is making its decisions and, at the same time, being more efficient. Results from oursynthetic experiments confirm that the proposed BCJ combined with entropy-driven active learningpair-selection method is superior (i.e. always equal to or significantly better) than other alternatives,for example, the traditional CJ method with differing selection methods such as uniformly random,or the popular no repeating pairs where pairs are selected in a round-robin fashion. We also find thatthe more comparisons that are conducted, the more accurate BCJ becomes, which solves the issuethe current method has of the model deteriorating if too many comparisons are performed. As ourapproach can generate the complete predicted rank distribution for an item, we also show how thiscan be utilised in probabilistically devising a predicted grade, guided by the choice of the assessor.Finally, we demonstrate our approach on a real dataset on assessing GCSE (UK school-level) essays,highlighting the advantages of BCJ over CJ.
published_date	2024-06-01T05:43:13Z
_version_	1822288976260628480
score	11.048453

A Bayesian active learning approach to comparative judgement within education assessment

Similar Items