From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design

Walton, Sean; Evans, Ben; Rahat, Alma; Stovold, James; Vincalek, Jakub

doi:10.1145/3773292

Journal article 1015 views 479 downloads

From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design

Sean Walton

, Ben Evans

, Alma Rahat, James Stovold

, Jakub Vincalek

ACM Transactions on Interactive Intelligent Systems, Volume: 16, Issue: 1, Pages: 1 - 31

Swansea University Authors: Sean Walton , Ben Evans , Alma Rahat, Jakub Vincalek

PDF | Accepted Manuscript

Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (10.32MB)

Check full text

DOI (Published version): 10.1145/3773292

Abstract

As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems,...

Full description

Published in:	ACM Transactions on Interactive Intelligent Systems
ISSN:	2160-6455 2160-6463
Published:	Association for Computing Machinery (ACM) 2026
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa70750

first_indexed	2025-10-22T07:33:29Z
last_indexed	2026-02-20T04:21:18Z
id	cronfa70750
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2026-02-18T15:41:40.7986942</datestamp><bib-version>v2</bib-version><id>70750</id><entry>2025-10-22</entry><title>From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design</title><swanseaauthors><author><sid>0ec10d5e3ed3720a2d578417a894cf49</sid><ORCID>0000-0002-6451-265X</ORCID><firstname>Sean</firstname><surname>Walton</surname><name>Sean Walton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>3d273fecc8121fe6b53b8fe5281b9c97</sid><ORCID>0000-0003-3662-9583</ORCID><firstname>Ben</firstname><surname>Evans</surname><name>Ben Evans</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><ORCID/><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b62486f17abbb685a2012b729dc70376</sid><firstname>Jakub</firstname><surname>Vincalek</surname><name>Jakub Vincalek</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-22</date><deptcode>MACS</deptcode><abstract>As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.</abstract><type>Journal Article</type><journal>ACM Transactions on Interactive Intelligent Systems</journal><volume>16</volume><journalNumber>1</journalNumber><paginationStart>1</paginationStart><paginationEnd>31</paginationEnd><publisher>Association for Computing Machinery (ACM)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2160-6455</issnPrint><issnElectronic>2160-6463</issnElectronic><keywords>Mixed initiative, MAP–Elites, Human–AI Collaboration, Optimisation, Design, User Evaluation</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2026</publishedYear><publishedDate>2026-03-01</publishedDate><doi>10.1145/3773292</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was supported by EPSRC under Grant Engineering and Physical Sciences Research Council EPSRC: EP/S021892/1.</funders><projectreference/><lastEdited>2026-02-18T15:41:40.7986942</lastEdited><Created>2025-10-22T08:27:48.8561375</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Sean</firstname><surname>Walton</surname><orcid>0000-0002-6451-265X</orcid><order>1</order></author><author><firstname>Ben</firstname><surname>Evans</surname><orcid>0000-0003-3662-9583</orcid><order>2</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><orcid/><order>3</order></author><author><firstname>James</firstname><surname>Stovold</surname><orcid>0000-0002-0708-2630</orcid><order>4</order></author><author><firstname>Jakub</firstname><surname>Vincalek</surname><order>5</order></author></authors><documents><document><filename>70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf</filename><originalFilename>59815814_File000001_1461343780.pdf</originalFilename><uploaded>2025-10-22T08:33:07.0724641</uploaded><type>Output</type><contentLength>10818635</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807>
spelling	2026-02-18T15:41:40.7986942 v2 70750 2025-10-22 From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design 0ec10d5e3ed3720a2d578417a894cf49 0000-0002-6451-265X Sean Walton Sean Walton true false 3d273fecc8121fe6b53b8fe5281b9c97 0000-0003-3662-9583 Ben Evans Ben Evans true false 6206f027aca1e3a5ff6b8cd224248bc2 Alma Rahat Alma Rahat true false b62486f17abbb685a2012b729dc70376 Jakub Vincalek Jakub Vincalek true false 2025-10-22 MACS As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. Journal Article ACM Transactions on Interactive Intelligent Systems 16 1 1 31 Association for Computing Machinery (ACM) 2160-6455 2160-6463 Mixed initiative, MAP–Elites, Human–AI Collaboration, Optimisation, Design, User Evaluation 1 3 2026 2026-03-01 10.1145/3773292 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was supported by EPSRC under Grant Engineering and Physical Sciences Research Council EPSRC: EP/S021892/1. 2026-02-18T15:41:40.7986942 2025-10-22T08:27:48.8561375 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Sean Walton 0000-0002-6451-265X 1 Ben Evans 0000-0003-3662-9583 2 Alma Rahat 3 James Stovold 0000-0002-0708-2630 4 Jakub Vincalek 5 70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf 59815814_File000001_1461343780.pdf 2025-10-22T08:33:07.0724641 Output 10818635 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en
title	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
spellingShingle	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design Sean Walton Ben Evans Alma Rahat Jakub Vincalek
title_short	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_full	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_fullStr	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_full_unstemmed	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_sort	From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
author_id_str_mv	0ec10d5e3ed3720a2d578417a894cf49 3d273fecc8121fe6b53b8fe5281b9c97 6206f027aca1e3a5ff6b8cd224248bc2 b62486f17abbb685a2012b729dc70376
author_id_fullname_str_mv	0ec10d5e3ed3720a2d578417a894cf49_*_Sean Walton 3d273fecc8121fe6b53b8fe5281b9c97__Ben Evans 6206f027aca1e3a5ff6b8cd224248bc2__Alma Rahat b62486f17abbb685a2012b729dc70376_*_Jakub Vincalek
author	Sean Walton Ben Evans Alma Rahat Jakub Vincalek
author2	Sean Walton Ben Evans Alma Rahat James Stovold Jakub Vincalek
format	Journal article
container_title	ACM Transactions on Interactive Intelligent Systems
container_volume	16
container_issue	1
container_start_page	1
publishDate	2026
institution	Swansea University
issn	2160-6455 2160-6463
doi_str_mv	10.1145/3773292
publisher	Association for Computing Machinery (ACM)
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.
published_date	2026-03-01T05:27:37Z
_version_	1868760463027732480
score	11.110258

From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design

Similar Items