Journal article 384 views 85 downloads
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
ACM Transactions on Interactive Intelligent Systems, Volume: 16, Issue: 1, Pages: 1 - 31
Swansea University Authors:
Sean Walton , Ben Evans
, Alma Rahat, Jakub Vincalek
-
PDF | Accepted Manuscript
Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (10.32MB)
DOI (Published version): 10.1145/3773292
Abstract
As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems,...
| Published in: | ACM Transactions on Interactive Intelligent Systems |
|---|---|
| ISSN: | 2160-6455 2160-6463 |
| Published: |
Association for Computing Machinery (ACM)
2026
|
| Online Access: |
Check full text
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa70750 |
| first_indexed |
2025-10-22T07:33:29Z |
|---|---|
| last_indexed |
2026-02-20T04:21:18Z |
| id |
cronfa70750 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2026-02-18T15:41:40.7986942</datestamp><bib-version>v2</bib-version><id>70750</id><entry>2025-10-22</entry><title>From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design</title><swanseaauthors><author><sid>0ec10d5e3ed3720a2d578417a894cf49</sid><ORCID>0000-0002-6451-265X</ORCID><firstname>Sean</firstname><surname>Walton</surname><name>Sean Walton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>3d273fecc8121fe6b53b8fe5281b9c97</sid><ORCID>0000-0003-3662-9583</ORCID><firstname>Ben</firstname><surname>Evans</surname><name>Ben Evans</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b62486f17abbb685a2012b729dc70376</sid><firstname>Jakub</firstname><surname>Vincalek</surname><name>Jakub Vincalek</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-22</date><deptcode>MACS</deptcode><abstract>As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.</abstract><type>Journal Article</type><journal>ACM Transactions on Interactive Intelligent Systems</journal><volume>16</volume><journalNumber>1</journalNumber><paginationStart>1</paginationStart><paginationEnd>31</paginationEnd><publisher>Association for Computing Machinery (ACM)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2160-6455</issnPrint><issnElectronic>2160-6463</issnElectronic><keywords>Mixed initiative, MAP–Elites, Human–AI Collaboration, Optimisation, Design, User Evaluation</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2026</publishedYear><publishedDate>2026-03-01</publishedDate><doi>10.1145/3773292</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was supported by EPSRC under Grant Engineering and Physical Sciences Research Council EPSRC: EP/S021892/1.</funders><projectreference/><lastEdited>2026-02-18T15:41:40.7986942</lastEdited><Created>2025-10-22T08:27:48.8561375</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Sean</firstname><surname>Walton</surname><orcid>0000-0002-6451-265X</orcid><order>1</order></author><author><firstname>Ben</firstname><surname>Evans</surname><orcid>0000-0003-3662-9583</orcid><order>2</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><order>3</order></author><author><firstname>James</firstname><surname>Stovold</surname><orcid>0000-0002-0708-2630</orcid><order>4</order></author><author><firstname>Jakub</firstname><surname>Vincalek</surname><order>5</order></author></authors><documents><document><filename>70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf</filename><originalFilename>59815814_File000001_1461343780.pdf</originalFilename><uploaded>2025-10-22T08:33:07.0724641</uploaded><type>Output</type><contentLength>10818635</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807> |
| spelling |
2026-02-18T15:41:40.7986942 v2 70750 2025-10-22 From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design 0ec10d5e3ed3720a2d578417a894cf49 0000-0002-6451-265X Sean Walton Sean Walton true false 3d273fecc8121fe6b53b8fe5281b9c97 0000-0003-3662-9583 Ben Evans Ben Evans true false 6206f027aca1e3a5ff6b8cd224248bc2 Alma Rahat Alma Rahat true false b62486f17abbb685a2012b729dc70376 Jakub Vincalek Jakub Vincalek true false 2025-10-22 MACS As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. Journal Article ACM Transactions on Interactive Intelligent Systems 16 1 1 31 Association for Computing Machinery (ACM) 2160-6455 2160-6463 Mixed initiative, MAP–Elites, Human–AI Collaboration, Optimisation, Design, User Evaluation 1 3 2026 2026-03-01 10.1145/3773292 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was supported by EPSRC under Grant Engineering and Physical Sciences Research Council EPSRC: EP/S021892/1. 2026-02-18T15:41:40.7986942 2025-10-22T08:27:48.8561375 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Sean Walton 0000-0002-6451-265X 1 Ben Evans 0000-0003-3662-9583 2 Alma Rahat 3 James Stovold 0000-0002-0708-2630 4 Jakub Vincalek 5 70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf 59815814_File000001_1461343780.pdf 2025-10-22T08:33:07.0724641 Output 10818635 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en |
| title |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| spellingShingle |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design Sean Walton Ben Evans Alma Rahat Jakub Vincalek |
| title_short |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_full |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_fullStr |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_full_unstemmed |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_sort |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| author_id_str_mv |
0ec10d5e3ed3720a2d578417a894cf49 3d273fecc8121fe6b53b8fe5281b9c97 6206f027aca1e3a5ff6b8cd224248bc2 b62486f17abbb685a2012b729dc70376 |
| author_id_fullname_str_mv |
0ec10d5e3ed3720a2d578417a894cf49_***_Sean Walton 3d273fecc8121fe6b53b8fe5281b9c97_***_Ben Evans 6206f027aca1e3a5ff6b8cd224248bc2_***_Alma Rahat b62486f17abbb685a2012b729dc70376_***_Jakub Vincalek |
| author |
Sean Walton Ben Evans Alma Rahat Jakub Vincalek |
| author2 |
Sean Walton Ben Evans Alma Rahat James Stovold Jakub Vincalek |
| format |
Journal article |
| container_title |
ACM Transactions on Interactive Intelligent Systems |
| container_volume |
16 |
| container_issue |
1 |
| container_start_page |
1 |
| publishDate |
2026 |
| institution |
Swansea University |
| issn |
2160-6455 2160-6463 |
| doi_str_mv |
10.1145/3773292 |
| publisher |
Association for Computing Machinery (ACM) |
| college_str |
Faculty of Science and Engineering |
| hierarchytype |
|
| hierarchy_top_id |
facultyofscienceandengineering |
| hierarchy_top_title |
Faculty of Science and Engineering |
| hierarchy_parent_id |
facultyofscienceandengineering |
| hierarchy_parent_title |
Faculty of Science and Engineering |
| department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. |
| published_date |
2026-03-01T06:45:29Z |
| _version_ |
1857625709982253056 |
| score |
11.096872 |

