No Cover Image

Journal article 106 views 26 downloads

Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity

Marta Pineda-Moncusí Orcid Logo, Freya Allery, Antonella Delmestri Orcid Logo, Thomas Bolton, John Nolan, Johan H. Thygesen Orcid Logo, Alex Handy, Amitava Banerjee, Spiros Denaxas, Christopher Tomlinson Orcid Logo, Alastair K. Denniston, Cathie Sudlow, Ashley Akbari Orcid Logo, Angela Wood, Gary S. Collins Orcid Logo, Irene Petersen, Laura C. Coates, Kamlesh Khunti, Daniel Prieto-sAlhambra, Sara Khalid, (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium)

Scientific Data, Volume: 11, Issue: 1

Swansea University Author: Ashley Akbari Orcid Logo

  • 65674.pdf

    PDF | Version of Record

    This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

    Download (2.07MB)

Abstract

Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completen...

Full description

Published in: Scientific Data
ISSN: 2052-4463
Published: Springer Science and Business Media LLC 2024
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa65674
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.
Item Description: Data availability:The data used in this study are available in NHS England’s Secure Data Environment (SDE) service for England (https://digital.nhs.uk/services/secure-data-environment-service). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://bhfdatasciencecentre.org/) received approval to access data in NHS England’s SDE service for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (https://bhfdatasciencecentre.org/areas/cvd-covid-uk-covid-impact/) subsequently granted approval to this project to access the data within NHS England’s SDE service for England. The de-identified data used in this study were made available to accredited researchers. Those wishing to gain access to the data should contact bhfdsc@hdruk.ac.uk in the first instance.
College: Faculty of Medicine, Health and Life Sciences
Funders: The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK), funded co-development (with NHS England) of the Secure Data Environment service for England, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. The authors acknowledge English language editing by Dr Jennifer A de Beyer and Amelia M Doran, Centre for Statistics in Medicine, University of Oxford. This work was carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678). This study made use of de-identified data held in NHS England’s Secure Data Environment service for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work used data provided by patients and collected by the NHS as part of their care and support. We would like to acknowledge all data providers who make health relevant data available for research. This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058). This work was also supported by The Alan Turing Institute via ‘Towards Turing 2.0’ EPSRC Grant Funding. The funders had no role in the study design, data collection, data analysis, data interpretation, or report writing.
Issue: 1