No Cover Image

Journal article 192 views 41 downloads

Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity

Marta Pineda-Moncusí Orcid Logo, Freya Allery, Antonella Delmestri Orcid Logo, Thomas Bolton, John Nolan, Johan H. Thygesen Orcid Logo, Alex Handy, Amitava Banerjee, Spiros Denaxas, Christopher Tomlinson Orcid Logo, Alastair K. Denniston, Cathie Sudlow, Ashley Akbari Orcid Logo, Angela Wood, Gary S. Collins Orcid Logo, Irene Petersen, Laura C. Coates, Kamlesh Khunti, Daniel Prieto-sAlhambra, Sara Khalid, (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium)

Scientific Data, Volume: 11, Issue: 1

Swansea University Author: Ashley Akbari Orcid Logo

  • 65674.pdf

    PDF | Version of Record

    This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

    Download (2.07MB)

Abstract

Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completen...

Full description

Published in: Scientific Data
ISSN: 2052-4463
Published: Springer Science and Business Media LLC 2024
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa65674
first_indexed 2024-03-05T10:24:05Z
last_indexed 2024-11-25T14:16:36Z
id cronfa65674
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2024-03-26T12:01:29.8824949</datestamp><bib-version>v2</bib-version><id>65674</id><entry>2024-02-23</entry><title>Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity</title><swanseaauthors><author><sid>aa1b025ec0243f708bb5eb0a93d6fb52</sid><ORCID>0000-0003-0814-0801</ORCID><firstname>Ashley</firstname><surname>Akbari</surname><name>Ashley Akbari</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-02-23</date><deptcode>MEDS</deptcode><abstract>Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond &#x201C;White&#x201D;, &#x201C;Black&#x201D;, &#x201C;Asian&#x201D;, &#x201C;Mixed&#x201D; and &#x201C;Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.</abstract><type>Journal Article</type><journal>Scientific Data</journal><volume>11</volume><journalNumber>1</journalNumber><paginationStart/><paginationEnd/><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2052-4463</issnElectronic><keywords/><publishedDay>22</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-02-22</publishedDate><doi>10.1038/s41597-024-02958-1</doi><url/><notes>Data availability:The data used in this study are available in NHS England&#x2019;s Secure Data Environment (SDE) service for England (https://digital.nhs.uk/services/secure-data-environment-service). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://bhfdatasciencecentre.org/) received approval to access data in NHS England&#x2019;s SDE service for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals &amp; Oversight Board (https://bhfdatasciencecentre.org/areas/cvd-covid-uk-covid-impact/) subsequently granted approval to this project to access the data within NHS England&#x2019;s SDE service for England. The de-identified data used in this study were made available to accredited researchers. Those wishing to gain access to the data should contact bhfdsc@hdruk.ac.uk in the first instance.</notes><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK), funded co-development (with NHS England) of the Secure Data Environment service for England, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser&#x2019;s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. The authors acknowledge English language editing by Dr Jennifer A de Beyer and Amelia M Doran, Centre for Statistics in Medicine, University of Oxford. This work was carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678). This study made use of de-identified data held in NHS England&#x2019;s Secure Data Environment service for England and made available via the BHF Data Science Centre&#x2019;s CVD-COVID-UK/COVID-IMPACT consortium. This work used data provided by patients and collected by the NHS as part of their care and support. We would like to acknowledge all data providers who make health relevant data available for research. This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058). This work was also supported by The Alan Turing Institute via &#x2018;Towards Turing 2.0&#x2019; EPSRC Grant Funding. The funders had no role in the study design, data collection, data analysis, data interpretation, or report writing.</funders><projectreference/><lastEdited>2024-03-26T12:01:29.8824949</lastEdited><Created>2024-02-23T11:31:44.4872770</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Marta</firstname><surname>Pineda-Moncus&#xED;</surname><orcid>0000-0003-0567-0137</orcid><order>1</order></author><author><firstname>Freya</firstname><surname>Allery</surname><order>2</order></author><author><firstname>Antonella</firstname><surname>Delmestri</surname><orcid>0000-0003-0388-3403</orcid><order>3</order></author><author><firstname>Thomas</firstname><surname>Bolton</surname><order>4</order></author><author><firstname>John</firstname><surname>Nolan</surname><order>5</order></author><author><firstname>Johan H.</firstname><surname>Thygesen</surname><orcid>0000-0002-7479-3459</orcid><order>6</order></author><author><firstname>Alex</firstname><surname>Handy</surname><order>7</order></author><author><firstname>Amitava</firstname><surname>Banerjee</surname><order>8</order></author><author><firstname>Spiros</firstname><surname>Denaxas</surname><order>9</order></author><author><firstname>Christopher</firstname><surname>Tomlinson</surname><orcid>0000-0002-0903-5395</orcid><order>10</order></author><author><firstname>Alastair K.</firstname><surname>Denniston</surname><order>11</order></author><author><firstname>Cathie</firstname><surname>Sudlow</surname><order>12</order></author><author><firstname>Ashley</firstname><surname>Akbari</surname><orcid>0000-0003-0814-0801</orcid><order>13</order></author><author><firstname>Angela</firstname><surname>Wood</surname><order>14</order></author><author><firstname>Gary S.</firstname><surname>Collins</surname><orcid>0000-0002-2772-2316</orcid><order>15</order></author><author><firstname>Irene</firstname><surname>Petersen</surname><order>16</order></author><author><firstname>Laura C.</firstname><surname>Coates</surname><order>17</order></author><author><firstname>Kamlesh</firstname><surname>Khunti</surname><order>18</order></author><author><firstname>Daniel</firstname><surname>Prieto-sAlhambra</surname><order>19</order></author><author><firstname>Sara</firstname><surname>Khalid</surname><order>20</order></author><author><firstname>(on behalf of the CVD-COVID-UK/COVID-IMPACT</firstname><surname>Consortium)</surname><order>21</order></author></authors><documents><document><filename>65674__29631__a3b5ed1bf38341f5be96a94a6cc557b7.pdf</filename><originalFilename>65674.pdf</originalFilename><uploaded>2024-03-05T10:23:04.2807052</uploaded><type>Output</type><contentLength>2173444</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</documentNotes><copyrightCorrect>false</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2024-03-26T12:01:29.8824949 v2 65674 2024-02-23 Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity aa1b025ec0243f708bb5eb0a93d6fb52 0000-0003-0814-0801 Ashley Akbari Ashley Akbari true false 2024-02-23 MEDS Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all. Journal Article Scientific Data 11 1 Springer Science and Business Media LLC 2052-4463 22 2 2024 2024-02-22 10.1038/s41597-024-02958-1 Data availability:The data used in this study are available in NHS England’s Secure Data Environment (SDE) service for England (https://digital.nhs.uk/services/secure-data-environment-service). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://bhfdatasciencecentre.org/) received approval to access data in NHS England’s SDE service for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (https://bhfdatasciencecentre.org/areas/cvd-covid-uk-covid-impact/) subsequently granted approval to this project to access the data within NHS England’s SDE service for England. The de-identified data used in this study were made available to accredited researchers. Those wishing to gain access to the data should contact bhfdsc@hdruk.ac.uk in the first instance. COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK), funded co-development (with NHS England) of the Secure Data Environment service for England, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. The authors acknowledge English language editing by Dr Jennifer A de Beyer and Amelia M Doran, Centre for Statistics in Medicine, University of Oxford. This work was carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678). This study made use of de-identified data held in NHS England’s Secure Data Environment service for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work used data provided by patients and collected by the NHS as part of their care and support. We would like to acknowledge all data providers who make health relevant data available for research. This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058). This work was also supported by The Alan Turing Institute via ‘Towards Turing 2.0’ EPSRC Grant Funding. The funders had no role in the study design, data collection, data analysis, data interpretation, or report writing. 2024-03-26T12:01:29.8824949 2024-02-23T11:31:44.4872770 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Marta Pineda-Moncusí 0000-0003-0567-0137 1 Freya Allery 2 Antonella Delmestri 0000-0003-0388-3403 3 Thomas Bolton 4 John Nolan 5 Johan H. Thygesen 0000-0002-7479-3459 6 Alex Handy 7 Amitava Banerjee 8 Spiros Denaxas 9 Christopher Tomlinson 0000-0002-0903-5395 10 Alastair K. Denniston 11 Cathie Sudlow 12 Ashley Akbari 0000-0003-0814-0801 13 Angela Wood 14 Gary S. Collins 0000-0002-2772-2316 15 Irene Petersen 16 Laura C. Coates 17 Kamlesh Khunti 18 Daniel Prieto-sAlhambra 19 Sara Khalid 20 (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium) 21 65674__29631__a3b5ed1bf38341f5be96a94a6cc557b7.pdf 65674.pdf 2024-03-05T10:23:04.2807052 Output 2173444 application/pdf Version of Record true This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. false eng https://creativecommons.org/licenses/by/4.0/
title Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
spellingShingle Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
Ashley Akbari
title_short Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
title_full Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
title_fullStr Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
title_full_unstemmed Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
title_sort Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
author_id_str_mv aa1b025ec0243f708bb5eb0a93d6fb52
author_id_fullname_str_mv aa1b025ec0243f708bb5eb0a93d6fb52_***_Ashley Akbari
author Ashley Akbari
author2 Marta Pineda-Moncusí
Freya Allery
Antonella Delmestri
Thomas Bolton
John Nolan
Johan H. Thygesen
Alex Handy
Amitava Banerjee
Spiros Denaxas
Christopher Tomlinson
Alastair K. Denniston
Cathie Sudlow
Ashley Akbari
Angela Wood
Gary S. Collins
Irene Petersen
Laura C. Coates
Kamlesh Khunti
Daniel Prieto-sAlhambra
Sara Khalid
(on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium)
format Journal article
container_title Scientific Data
container_volume 11
container_issue 1
publishDate 2024
institution Swansea University
issn 2052-4463
doi_str_mv 10.1038/s41597-024-02958-1
publisher Springer Science and Business Media LLC
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science
document_store_str 1
active_str 0
description Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.
published_date 2024-02-22T14:33:00Z
_version_ 1821959919737241600
score 11.048149