Journal article 365 views 62 downloads
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
PLOS Medicine, Volume: 22, Issue: 2, Start page: e1004507
Swansea University Author:
Ashley Akbari
-
PDF | Version of Record
© 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.
Download (1.3MB)
DOI (Published version): 10.1371/journal.pmed.1004507
Abstract
Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded...
| Published in: | PLOS Medicine |
|---|---|
| ISSN: | 1549-1277 1549-1676 |
| Published: |
Public Library of Science (PLoS)
2025
|
| Online Access: |
Check full text
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa68993 |
| first_indexed |
2025-02-28T12:55:22Z |
|---|---|
| last_indexed |
2025-03-01T05:38:20Z |
| id |
cronfa68993 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-02-28T12:58:26.4184980</datestamp><bib-version>v2</bib-version><id>68993</id><entry>2025-02-28</entry><title>Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England</title><swanseaauthors><author><sid>aa1b025ec0243f708bb5eb0a93d6fb52</sid><ORCID>0000-0003-0814-0801</ORCID><firstname>Ashley</firstname><surname>Akbari</surname><name>Ashley Akbari</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-02-28</date><deptcode>MEDS</deptcode><abstract>Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.</abstract><type>Journal Article</type><journal>PLOS Medicine</journal><volume>22</volume><journalNumber>2</journalNumber><paginationStart>e1004507</paginationStart><paginationEnd/><publisher>Public Library of Science (PLoS)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>1549-1277</issnPrint><issnElectronic>1549-1676</issnElectronic><keywords/><publishedDay>26</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-02-26</publishedDate><doi>10.1371/journal.pmed.1004507</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>Another institution paid the OA fee</apcterm><funders>This work was commissioned (via non-competitive tender) by the Data for Science and Health team at the Wellcome Trust to the Office for National Statistics. CR, FZ, TY and KK are supported by the National Institute for Health Research (NIHR) Applied Research Collaboration East Midlands (ARC-EM) and the NIHR Leicester Biomedical Research Centre (BRC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funders><projectreference/><lastEdited>2025-02-28T12:58:26.4184980</lastEdited><Created>2025-02-28T12:50:22.5460615</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Cameron</firstname><surname>Razieh</surname><orcid>0000-0003-3597-2945</orcid><order>1</order></author><author><firstname>Bethan</firstname><surname>Powell</surname><orcid>0009-0001-3051-6142</orcid><order>2</order></author><author><firstname>Rosemary</firstname><surname>Drummond</surname><order>3</order></author><author><firstname>Isobel L.</firstname><surname>Ward</surname><order>4</order></author><author><firstname>Jasper</firstname><surname>Morgan</surname><order>5</order></author><author><firstname>Myer</firstname><surname>Glickman</surname><orcid>0000-0001-6067-3811</orcid><order>6</order></author><author><firstname>Chris</firstname><surname>White</surname><order>7</order></author><author><firstname>Francesco</firstname><surname>Zaccardi</surname><order>8</order></author><author><firstname>Jonathan</firstname><surname>Hope</surname><order>9</order></author><author><firstname>Veena</firstname><surname>Raleigh</surname><order>10</order></author><author><firstname>Ashley</firstname><surname>Akbari</surname><orcid>0000-0003-0814-0801</orcid><order>11</order></author><author><firstname>Nazrul</firstname><surname>Islam</surname><orcid>0000-0003-3982-4325</orcid><order>12</order></author><author><firstname>Thomas</firstname><surname>Yates</surname><order>13</order></author><author><firstname>Lisa</firstname><surname>Murphy</surname><order>14</order></author><author><firstname>Bilal A.</firstname><surname>Mateen</surname><order>15</order></author><author><firstname>Kamlesh</firstname><surname>Khunti</surname><order>16</order></author><author><firstname>Vahe</firstname><surname>Nafilyan</surname><order>17</order></author></authors><documents><document><filename>68993__33711__f22f9b7ca6254bb1aa2f1caed152f6a5.pdf</filename><originalFilename>pmed.1004507.pdf</originalFilename><uploaded>2025-02-28T12:50:22.5459811</uploaded><type>Output</type><contentLength>1360941</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-02-28T12:58:26.4184980 v2 68993 2025-02-28 Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England aa1b025ec0243f708bb5eb0a93d6fb52 0000-0003-0814-0801 Ashley Akbari Ashley Akbari true false 2025-02-28 MEDS Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions. Journal Article PLOS Medicine 22 2 e1004507 Public Library of Science (PLoS) 1549-1277 1549-1676 26 2 2025 2025-02-26 10.1371/journal.pmed.1004507 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University Another institution paid the OA fee This work was commissioned (via non-competitive tender) by the Data for Science and Health team at the Wellcome Trust to the Office for National Statistics. CR, FZ, TY and KK are supported by the National Institute for Health Research (NIHR) Applied Research Collaboration East Midlands (ARC-EM) and the NIHR Leicester Biomedical Research Centre (BRC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 2025-02-28T12:58:26.4184980 2025-02-28T12:50:22.5460615 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Cameron Razieh 0000-0003-3597-2945 1 Bethan Powell 0009-0001-3051-6142 2 Rosemary Drummond 3 Isobel L. Ward 4 Jasper Morgan 5 Myer Glickman 0000-0001-6067-3811 6 Chris White 7 Francesco Zaccardi 8 Jonathan Hope 9 Veena Raleigh 10 Ashley Akbari 0000-0003-0814-0801 11 Nazrul Islam 0000-0003-3982-4325 12 Thomas Yates 13 Lisa Murphy 14 Bilal A. Mateen 15 Kamlesh Khunti 16 Vahe Nafilyan 17 68993__33711__f22f9b7ca6254bb1aa2f1caed152f6a5.pdf pmed.1004507.pdf 2025-02-28T12:50:22.5459811 Output 1360941 application/pdf Version of Record true © 2025 Razieh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License. true eng http://creativecommons.org/licenses/by/4.0/ |
| title |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| spellingShingle |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England Ashley Akbari |
| title_short |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| title_full |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| title_fullStr |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| title_full_unstemmed |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| title_sort |
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England |
| author_id_str_mv |
aa1b025ec0243f708bb5eb0a93d6fb52 |
| author_id_fullname_str_mv |
aa1b025ec0243f708bb5eb0a93d6fb52_***_Ashley Akbari |
| author |
Ashley Akbari |
| author2 |
Cameron Razieh Bethan Powell Rosemary Drummond Isobel L. Ward Jasper Morgan Myer Glickman Chris White Francesco Zaccardi Jonathan Hope Veena Raleigh Ashley Akbari Nazrul Islam Thomas Yates Lisa Murphy Bilal A. Mateen Kamlesh Khunti Vahe Nafilyan |
| format |
Journal article |
| container_title |
PLOS Medicine |
| container_volume |
22 |
| container_issue |
2 |
| container_start_page |
e1004507 |
| publishDate |
2025 |
| institution |
Swansea University |
| issn |
1549-1277 1549-1676 |
| doi_str_mv |
10.1371/journal.pmed.1004507 |
| publisher |
Public Library of Science (PLoS) |
| college_str |
Faculty of Medicine, Health and Life Sciences |
| hierarchytype |
|
| hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
| hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
| department_str |
Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data. Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories. Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions. |
| published_date |
2025-02-26T12:21:23Z |
| _version_ |
1850852071378518016 |
| score |
11.08895 |

