No Cover Image

Journal article 365 views 62 downloads

Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England

Cameron Razieh Orcid Logo, Bethan Powell Orcid Logo, Rosemary Drummond, Isobel L. Ward, Jasper Morgan, Myer Glickman Orcid Logo, Chris White, Francesco Zaccardi, Jonathan Hope, Veena Raleigh, Ashley Akbari Orcid Logo, Nazrul Islam Orcid Logo, Thomas Yates, Lisa Murphy, Bilal A. Mateen, Kamlesh Khunti, Vahe Nafilyan

PLOS Medicine, Volume: 22, Issue: 2, Start page: e1004507

Swansea University Author: Ashley Akbari Orcid Logo

  • pmed.1004507.pdf

    PDF | Version of Record

    © 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.

    Download (1.3MB)

Abstract

Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded...

Full description

Published in: PLOS Medicine
ISSN: 1549-1277 1549-1676
Published: Public Library of Science (PLoS) 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa68993
first_indexed 2025-02-28T12:55:22Z
last_indexed 2025-03-01T05:38:20Z
id cronfa68993
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-02-28T12:58:26.4184980</datestamp><bib-version>v2</bib-version><id>68993</id><entry>2025-02-28</entry><title>Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England</title><swanseaauthors><author><sid>aa1b025ec0243f708bb5eb0a93d6fb52</sid><ORCID>0000-0003-0814-0801</ORCID><firstname>Ashley</firstname><surname>Akbari</surname><name>Ashley Akbari</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-02-28</date><deptcode>MEDS</deptcode><abstract>Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (&#x2265;96%), followed by the Bangladeshi category (&#x2265;93%). Levels of agreement for Pakistani, Indian, and Chinese categories were &#x2265;87%, &#x2265;83%, and &#x2265;80% across all sources. Agreement was lower for Mixed (&#x2264;75%) and Other (&#x2264;71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (&#x2264;6%), Other Black (&#x2264;19%), and Any Other Ethnic Group (&#x2264;25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.</abstract><type>Journal Article</type><journal>PLOS Medicine</journal><volume>22</volume><journalNumber>2</journalNumber><paginationStart>e1004507</paginationStart><paginationEnd/><publisher>Public Library of Science (PLoS)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>1549-1277</issnPrint><issnElectronic>1549-1676</issnElectronic><keywords/><publishedDay>26</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-02-26</publishedDate><doi>10.1371/journal.pmed.1004507</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>Another institution paid the OA fee</apcterm><funders>This work was commissioned (via non-competitive tender) by the Data for Science and Health team at the Wellcome Trust to the Office for National Statistics. CR, FZ, TY and KK are supported by the National Institute for Health Research (NIHR) Applied Research Collaboration East Midlands (ARC-EM) and the NIHR Leicester Biomedical Research Centre (BRC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funders><projectreference/><lastEdited>2025-02-28T12:58:26.4184980</lastEdited><Created>2025-02-28T12:50:22.5460615</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Cameron</firstname><surname>Razieh</surname><orcid>0000-0003-3597-2945</orcid><order>1</order></author><author><firstname>Bethan</firstname><surname>Powell</surname><orcid>0009-0001-3051-6142</orcid><order>2</order></author><author><firstname>Rosemary</firstname><surname>Drummond</surname><order>3</order></author><author><firstname>Isobel L.</firstname><surname>Ward</surname><order>4</order></author><author><firstname>Jasper</firstname><surname>Morgan</surname><order>5</order></author><author><firstname>Myer</firstname><surname>Glickman</surname><orcid>0000-0001-6067-3811</orcid><order>6</order></author><author><firstname>Chris</firstname><surname>White</surname><order>7</order></author><author><firstname>Francesco</firstname><surname>Zaccardi</surname><order>8</order></author><author><firstname>Jonathan</firstname><surname>Hope</surname><order>9</order></author><author><firstname>Veena</firstname><surname>Raleigh</surname><order>10</order></author><author><firstname>Ashley</firstname><surname>Akbari</surname><orcid>0000-0003-0814-0801</orcid><order>11</order></author><author><firstname>Nazrul</firstname><surname>Islam</surname><orcid>0000-0003-3982-4325</orcid><order>12</order></author><author><firstname>Thomas</firstname><surname>Yates</surname><order>13</order></author><author><firstname>Lisa</firstname><surname>Murphy</surname><order>14</order></author><author><firstname>Bilal A.</firstname><surname>Mateen</surname><order>15</order></author><author><firstname>Kamlesh</firstname><surname>Khunti</surname><order>16</order></author><author><firstname>Vahe</firstname><surname>Nafilyan</surname><order>17</order></author></authors><documents><document><filename>68993__33711__f22f9b7ca6254bb1aa2f1caed152f6a5.pdf</filename><originalFilename>pmed.1004507.pdf</originalFilename><uploaded>2025-02-28T12:50:22.5459811</uploaded><type>Output</type><contentLength>1360941</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>&#xA9; 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2025-02-28T12:58:26.4184980 v2 68993 2025-02-28 Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England aa1b025ec0243f708bb5eb0a93d6fb52 0000-0003-0814-0801 Ashley Akbari Ashley Akbari true false 2025-02-28 MEDS Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions. Journal Article PLOS Medicine 22 2 e1004507 Public Library of Science (PLoS) 1549-1277 1549-1676 26 2 2025 2025-02-26 10.1371/journal.pmed.1004507 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University Another institution paid the OA fee This work was commissioned (via non-competitive tender) by the Data for Science and Health team at the Wellcome Trust to the Office for National Statistics. CR, FZ, TY and KK are supported by the National Institute for Health Research (NIHR) Applied Research Collaboration East Midlands (ARC-EM) and the NIHR Leicester Biomedical Research Centre (BRC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 2025-02-28T12:58:26.4184980 2025-02-28T12:50:22.5460615 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Cameron Razieh 0000-0003-3597-2945 1 Bethan Powell 0009-0001-3051-6142 2 Rosemary Drummond 3 Isobel L. Ward 4 Jasper Morgan 5 Myer Glickman 0000-0001-6067-3811 6 Chris White 7 Francesco Zaccardi 8 Jonathan Hope 9 Veena Raleigh 10 Ashley Akbari 0000-0003-0814-0801 11 Nazrul Islam 0000-0003-3982-4325 12 Thomas Yates 13 Lisa Murphy 14 Bilal A. Mateen 15 Kamlesh Khunti 16 Vahe Nafilyan 17 68993__33711__f22f9b7ca6254bb1aa2f1caed152f6a5.pdf pmed.1004507.pdf 2025-02-28T12:50:22.5459811 Output 1360941 application/pdf Version of Record true © 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License. true eng http://creativecommons.org/licenses/by/4.0/
title Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
spellingShingle Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
Ashley Akbari
title_short Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
title_full Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
title_fullStr Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
title_full_unstemmed Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
title_sort Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
author_id_str_mv aa1b025ec0243f708bb5eb0a93d6fb52
author_id_fullname_str_mv aa1b025ec0243f708bb5eb0a93d6fb52_***_Ashley Akbari
author Ashley Akbari
author2 Cameron Razieh
Bethan Powell
Rosemary Drummond
Isobel L. Ward
Jasper Morgan
Myer Glickman
Chris White
Francesco Zaccardi
Jonathan Hope
Veena Raleigh
Ashley Akbari
Nazrul Islam
Thomas Yates
Lisa Murphy
Bilal A. Mateen
Kamlesh Khunti
Vahe Nafilyan
format Journal article
container_title PLOS Medicine
container_volume 22
container_issue 2
container_start_page e1004507
publishDate 2025
institution Swansea University
issn 1549-1277
1549-1676
doi_str_mv 10.1371/journal.pmed.1004507
publisher Public Library of Science (PLoS)
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science
document_store_str 1
active_str 0
description Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.
published_date 2025-02-26T12:21:23Z
_version_ 1850852071378518016
score 11.08895