Journal article 45 views
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics
International Journal of Population Data Science, Volume: 10, Issue: 1
Swansea University Authors:
Jane Lyons, Rhodri Johnson , Mike Edwards
, Samantha Turner
, Rich Fry
, Lucy Griffiths
, Ronan Lyons
-
PDF | Version of Record
2025 © The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License.
Download (736.1KB)
DOI (Published version): 10.23889/ijpds.v10i1.2994
Abstract
Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymis...
| Published in: | International Journal of Population Data Science |
|---|---|
| ISSN: | 2399-4908 |
| Published: |
Swansea University
2025
|
| Online Access: |
Check full text
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa70741 |
| first_indexed |
2025-10-21T09:33:52Z |
|---|---|
| last_indexed |
2025-12-16T05:27:19Z |
| id |
cronfa70741 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-12-15T15:08:47.5172789</datestamp><bib-version>v2</bib-version><id>70741</id><entry>2025-10-21</entry><title>Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics</title><swanseaauthors><author><sid>1b74fa5125a88451c52c45bcf20e0b47</sid><ORCID/><firstname>Jane</firstname><surname>Lyons</surname><name>Jane Lyons</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>5f97fd65ef8cf66db750f645f115454c</sid><ORCID>0000-0001-9636-0753</ORCID><firstname>Rhodri</firstname><surname>Johnson</surname><name>Rhodri Johnson</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>684864a1ce01c3d774e83ed55e41770e</sid><ORCID>0000-0003-3367-969X</ORCID><firstname>Mike</firstname><surname>Edwards</surname><name>Mike Edwards</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>51236cb22cd896545c87e4c15fda17af</sid><ORCID>0000-0001-5293-3871</ORCID><firstname>Samantha</firstname><surname>Turner</surname><name>Samantha Turner</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>d499b898d447b62c81b2c122598870e0</sid><ORCID>0000-0002-7968-6679</ORCID><firstname>Rich</firstname><surname>Fry</surname><name>Rich Fry</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>e35ea6ea4b429e812ef204b048131d93</sid><ORCID>0000-0001-9230-624X</ORCID><firstname>Lucy</firstname><surname>Griffiths</surname><name>Lucy Griffiths</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>83efcf2a9dfcf8b55586999d3d152ac6</sid><firstname>Ronan</firstname><surname>Lyons</surname><name>Ronan Lyons</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-21</date><deptcode>MEDS</deptcode><abstract>Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL.</abstract><type>Journal Article</type><journal>International Journal of Population Data Science</journal><volume>10</volume><journalNumber>1</journalNumber><paginationStart/><paginationEnd/><publisher>Swansea University</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2399-4908</issnElectronic><keywords>data linkage; census representativeness; administrative data</keywords><publishedDay>26</publishedDay><publishedMonth>11</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-11-26</publishedDate><doi>10.23889/ijpds.v10i1.2994</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>External research funder(s) paid the OA fee (includes OA grants disbursed by the Library)</apcterm><funders>This work is supported by Administrative Data Research (ADR) Wales (Grant ref: ES/W012227/1), part of the ADR UK investment, uniting research expertise from Swansea University Medical School and WISERD (Wales Institute of Social and Economic Research and Data) at Cardiff University with analysts from Welsh Government. ADR UK is funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation.</funders><projectreference/><lastEdited>2025-12-15T15:08:47.5172789</lastEdited><Created>2025-10-21T10:29:54.5191013</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Jane</firstname><surname>Lyons</surname><orcid/><order>1</order></author><author><firstname>Rhodri</firstname><surname>Johnson</surname><orcid>0000-0001-9636-0753</orcid><order>2</order></author><author><firstname>Mike</firstname><surname>Edwards</surname><orcid>0000-0003-3367-969X</orcid><order>3</order></author><author><firstname>Samantha</firstname><surname>Turner</surname><orcid>0000-0001-5293-3871</orcid><order>4</order></author><author><firstname>Rich</firstname><surname>Fry</surname><orcid>0000-0002-7968-6679</orcid><order>5</order></author><author><firstname>Lucy</firstname><surname>Griffiths</surname><orcid>0000-0001-9230-624X</orcid><order>6</order></author><author><firstname>Ronan</firstname><surname>Lyons</surname><order>7</order></author></authors><documents><document><filename>70741__35826__776764260b184ec5b0a842954a738178.pdf</filename><originalFilename>70741.VOR.pdf</originalFilename><uploaded>2025-12-15T15:05:57.6945131</uploaded><type>Output</type><contentLength>753771</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>2025 © The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-12-15T15:08:47.5172789 v2 70741 2025-10-21 Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics 1b74fa5125a88451c52c45bcf20e0b47 Jane Lyons Jane Lyons true false 5f97fd65ef8cf66db750f645f115454c 0000-0001-9636-0753 Rhodri Johnson Rhodri Johnson true false 684864a1ce01c3d774e83ed55e41770e 0000-0003-3367-969X Mike Edwards Mike Edwards true false 51236cb22cd896545c87e4c15fda17af 0000-0001-5293-3871 Samantha Turner Samantha Turner true false d499b898d447b62c81b2c122598870e0 0000-0002-7968-6679 Rich Fry Rich Fry true false e35ea6ea4b429e812ef204b048131d93 0000-0001-9230-624X Lucy Griffiths Lucy Griffiths true false 83efcf2a9dfcf8b55586999d3d152ac6 Ronan Lyons Ronan Lyons true false 2025-10-21 MEDS Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL. Journal Article International Journal of Population Data Science 10 1 Swansea University 2399-4908 data linkage; census representativeness; administrative data 26 11 2025 2025-11-26 10.23889/ijpds.v10i1.2994 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University External research funder(s) paid the OA fee (includes OA grants disbursed by the Library) This work is supported by Administrative Data Research (ADR) Wales (Grant ref: ES/W012227/1), part of the ADR UK investment, uniting research expertise from Swansea University Medical School and WISERD (Wales Institute of Social and Economic Research and Data) at Cardiff University with analysts from Welsh Government. ADR UK is funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation. 2025-12-15T15:08:47.5172789 2025-10-21T10:29:54.5191013 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Jane Lyons 1 Rhodri Johnson 0000-0001-9636-0753 2 Mike Edwards 0000-0003-3367-969X 3 Samantha Turner 0000-0001-5293-3871 4 Rich Fry 0000-0002-7968-6679 5 Lucy Griffiths 0000-0001-9230-624X 6 Ronan Lyons 7 70741__35826__776764260b184ec5b0a842954a738178.pdf 70741.VOR.pdf 2025-12-15T15:05:57.6945131 Output 753771 application/pdf Version of Record true 2025 © The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License. true eng https://creativecommons.org/licenses/by/4.0/ |
| title |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| spellingShingle |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics Jane Lyons Rhodri Johnson Mike Edwards Samantha Turner Rich Fry Lucy Griffiths Ronan Lyons |
| title_short |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| title_full |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| title_fullStr |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| title_full_unstemmed |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| title_sort |
Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics |
| author_id_str_mv |
1b74fa5125a88451c52c45bcf20e0b47 5f97fd65ef8cf66db750f645f115454c 684864a1ce01c3d774e83ed55e41770e 51236cb22cd896545c87e4c15fda17af d499b898d447b62c81b2c122598870e0 e35ea6ea4b429e812ef204b048131d93 83efcf2a9dfcf8b55586999d3d152ac6 |
| author_id_fullname_str_mv |
1b74fa5125a88451c52c45bcf20e0b47_***_Jane Lyons 5f97fd65ef8cf66db750f645f115454c_***_Rhodri Johnson 684864a1ce01c3d774e83ed55e41770e_***_Mike Edwards 51236cb22cd896545c87e4c15fda17af_***_Samantha Turner d499b898d447b62c81b2c122598870e0_***_Rich Fry e35ea6ea4b429e812ef204b048131d93_***_Lucy Griffiths 83efcf2a9dfcf8b55586999d3d152ac6_***_Ronan Lyons |
| author |
Jane Lyons Rhodri Johnson Mike Edwards Samantha Turner Rich Fry Lucy Griffiths Ronan Lyons |
| author2 |
Jane Lyons Rhodri Johnson Mike Edwards Samantha Turner Rich Fry Lucy Griffiths Ronan Lyons |
| format |
Journal article |
| container_title |
International Journal of Population Data Science |
| container_volume |
10 |
| container_issue |
1 |
| publishDate |
2025 |
| institution |
Swansea University |
| issn |
2399-4908 |
| doi_str_mv |
10.23889/ijpds.v10i1.2994 |
| publisher |
Swansea University |
| college_str |
Faculty of Medicine, Health and Life Sciences |
| hierarchytype |
|
| hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
| hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
| department_str |
Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
Introduction: Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives: To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods: An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results: The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions: Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL. |
| published_date |
2025-11-26T05:27:19Z |
| _version_ |
1851641393312694272 |
| score |
11.090009 |

