No Cover Image

Journal article 45 views

Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics

Jane Lyons, Rhodri Johnson Orcid Logo, Mike Edwards Orcid Logo, Samantha Turner Orcid Logo, Rich Fry Orcid Logo, Lucy Griffiths Orcid Logo, Ronan Lyons

International Journal of Population Data Science, Volume: 10, Issue: 1

Swansea University Authors: Jane Lyons, Rhodri Johnson Orcid Logo, Mike Edwards Orcid Logo, Samantha Turner Orcid Logo, Rich Fry Orcid Logo, Lucy Griffiths Orcid Logo, Ronan Lyons

  • 70741.VOR.pdf

    PDF | Version of Record

    2025 © The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License.

    Download (736.1KB)

Abstract

Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymis...

Full description

Published in: International Journal of Population Data Science
ISSN: 2399-4908
Published: Swansea University 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa70741
first_indexed 2025-10-21T09:33:52Z
last_indexed 2025-12-16T05:27:19Z
id cronfa70741
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-12-15T15:08:47.5172789</datestamp><bib-version>v2</bib-version><id>70741</id><entry>2025-10-21</entry><title>Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics</title><swanseaauthors><author><sid>1b74fa5125a88451c52c45bcf20e0b47</sid><ORCID/><firstname>Jane</firstname><surname>Lyons</surname><name>Jane Lyons</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>5f97fd65ef8cf66db750f645f115454c</sid><ORCID>0000-0001-9636-0753</ORCID><firstname>Rhodri</firstname><surname>Johnson</surname><name>Rhodri Johnson</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>684864a1ce01c3d774e83ed55e41770e</sid><ORCID>0000-0003-3367-969X</ORCID><firstname>Mike</firstname><surname>Edwards</surname><name>Mike Edwards</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>51236cb22cd896545c87e4c15fda17af</sid><ORCID>0000-0001-5293-3871</ORCID><firstname>Samantha</firstname><surname>Turner</surname><name>Samantha Turner</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>d499b898d447b62c81b2c122598870e0</sid><ORCID>0000-0002-7968-6679</ORCID><firstname>Rich</firstname><surname>Fry</surname><name>Rich Fry</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>e35ea6ea4b429e812ef204b048131d93</sid><ORCID>0000-0001-9230-624X</ORCID><firstname>Lucy</firstname><surname>Griffiths</surname><name>Lucy Griffiths</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>83efcf2a9dfcf8b55586999d3d152ac6</sid><firstname>Ronan</firstname><surname>Lyons</surname><name>Ronan Lyons</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-21</date><deptcode>MEDS</deptcode><abstract>Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL.</abstract><type>Journal Article</type><journal>International Journal of Population Data Science</journal><volume>10</volume><journalNumber>1</journalNumber><paginationStart/><paginationEnd/><publisher>Swansea University</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2399-4908</issnElectronic><keywords>data linkage; census representativeness; administrative data</keywords><publishedDay>26</publishedDay><publishedMonth>11</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-11-26</publishedDate><doi>10.23889/ijpds.v10i1.2994</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>External research funder(s) paid the OA fee (includes OA grants disbursed by the Library)</apcterm><funders>This work is supported by Administrative Data Research (ADR) Wales (Grant ref: ES/W012227/1), part of the ADR UK investment, uniting research expertise from Swansea University Medical School and WISERD (Wales Institute of Social and Economic Research and Data) at Cardiff University with analysts from Welsh Government. ADR UK is funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation.</funders><projectreference/><lastEdited>2025-12-15T15:08:47.5172789</lastEdited><Created>2025-10-21T10:29:54.5191013</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Jane</firstname><surname>Lyons</surname><orcid/><order>1</order></author><author><firstname>Rhodri</firstname><surname>Johnson</surname><orcid>0000-0001-9636-0753</orcid><order>2</order></author><author><firstname>Mike</firstname><surname>Edwards</surname><orcid>0000-0003-3367-969X</orcid><order>3</order></author><author><firstname>Samantha</firstname><surname>Turner</surname><orcid>0000-0001-5293-3871</orcid><order>4</order></author><author><firstname>Rich</firstname><surname>Fry</surname><orcid>0000-0002-7968-6679</orcid><order>5</order></author><author><firstname>Lucy</firstname><surname>Griffiths</surname><orcid>0000-0001-9230-624X</orcid><order>6</order></author><author><firstname>Ronan</firstname><surname>Lyons</surname><order>7</order></author></authors><documents><document><filename>70741__35826__776764260b184ec5b0a842954a738178.pdf</filename><originalFilename>70741.VOR.pdf</originalFilename><uploaded>2025-12-15T15:05:57.6945131</uploaded><type>Output</type><contentLength>753771</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>2025 &#xA9; The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2025-12-15T15:08:47.5172789 v2 70741 2025-10-21 Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics 1b74fa5125a88451c52c45bcf20e0b47 Jane Lyons Jane Lyons true false 5f97fd65ef8cf66db750f645f115454c 0000-0001-9636-0753 Rhodri Johnson Rhodri Johnson true false 684864a1ce01c3d774e83ed55e41770e 0000-0003-3367-969X Mike Edwards Mike Edwards true false 51236cb22cd896545c87e4c15fda17af 0000-0001-5293-3871 Samantha Turner Samantha Turner true false d499b898d447b62c81b2c122598870e0 0000-0002-7968-6679 Rich Fry Rich Fry true false e35ea6ea4b429e812ef204b048131d93 0000-0001-9230-624X Lucy Griffiths Lucy Griffiths true false 83efcf2a9dfcf8b55586999d3d152ac6 Ronan Lyons Ronan Lyons true false 2025-10-21 MEDS Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL. Journal Article International Journal of Population Data Science 10 1 Swansea University 2399-4908 data linkage; census representativeness; administrative data 26 11 2025 2025-11-26 10.23889/ijpds.v10i1.2994 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University External research funder(s) paid the OA fee (includes OA grants disbursed by the Library) This work is supported by Administrative Data Research (ADR) Wales (Grant ref: ES/W012227/1), part of the ADR UK investment, uniting research expertise from Swansea University Medical School and WISERD (Wales Institute of Social and Economic Research and Data) at Cardiff University with analysts from Welsh Government. ADR UK is funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation. 2025-12-15T15:08:47.5172789 2025-10-21T10:29:54.5191013 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Jane Lyons 1 Rhodri Johnson 0000-0001-9636-0753 2 Mike Edwards 0000-0003-3367-969X 3 Samantha Turner 0000-0001-5293-3871 4 Rich Fry 0000-0002-7968-6679 5 Lucy Griffiths 0000-0001-9230-624X 6 Ronan Lyons 7 70741__35826__776764260b184ec5b0a842954a738178.pdf 70741.VOR.pdf 2025-12-15T15:05:57.6945131 Output 753771 application/pdf Version of Record true 2025 © The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License. true eng https://creativecommons.org/licenses/by/4.0/
title Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
spellingShingle Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
Jane Lyons
Rhodri Johnson
Mike Edwards
Samantha Turner
Rich Fry
Lucy Griffiths
Ronan Lyons
title_short Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
title_full Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
title_fullStr Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
title_full_unstemmed Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
title_sort Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
author_id_str_mv 1b74fa5125a88451c52c45bcf20e0b47
5f97fd65ef8cf66db750f645f115454c
684864a1ce01c3d774e83ed55e41770e
51236cb22cd896545c87e4c15fda17af
d499b898d447b62c81b2c122598870e0
e35ea6ea4b429e812ef204b048131d93
83efcf2a9dfcf8b55586999d3d152ac6
author_id_fullname_str_mv 1b74fa5125a88451c52c45bcf20e0b47_***_Jane Lyons
5f97fd65ef8cf66db750f645f115454c_***_Rhodri Johnson
684864a1ce01c3d774e83ed55e41770e_***_Mike Edwards
51236cb22cd896545c87e4c15fda17af_***_Samantha Turner
d499b898d447b62c81b2c122598870e0_***_Rich Fry
e35ea6ea4b429e812ef204b048131d93_***_Lucy Griffiths
83efcf2a9dfcf8b55586999d3d152ac6_***_Ronan Lyons
author Jane Lyons
Rhodri Johnson
Mike Edwards
Samantha Turner
Rich Fry
Lucy Griffiths
Ronan Lyons
author2 Jane Lyons
Rhodri Johnson
Mike Edwards
Samantha Turner
Rich Fry
Lucy Griffiths
Ronan Lyons
format Journal article
container_title International Journal of Population Data Science
container_volume 10
container_issue 1
publishDate 2025
institution Swansea University
issn 2399-4908
doi_str_mv 10.23889/ijpds.v10i1.2994
publisher Swansea University
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science
document_store_str 1
active_str 0
description Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL.
published_date 2025-11-26T05:27:19Z
_version_ 1851641393312694272
score 11.090009