No Cover Image

E-Thesis 484 views 98 downloads

Natural language processing (NLP) for clinical information extraction and healthcare research / BEATA FONFERKO-SHADRACH

Swansea University Author: BEATA FONFERKO-SHADRACH

  • 2023_Fonferko-Shadrach_B.final.65061.pdf

    PDF | E-Thesis – open access

    Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).

    Download (7.28MB)

DOI (Published version): 10.23889/SUthesis.65061

Abstract

Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that...

Full description

Published: Swansea, Wales, UK 2023
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Halcox, Julian. and Pickrell, William Owen.
URI: https://cronfa.swan.ac.uk/Record/cronfa65061
first_indexed 2023-11-21T15:12:28Z
last_indexed 2024-11-25T14:15:17Z
id cronfa65061
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2023-11-21T15:24:54.3701230</datestamp><bib-version>v2</bib-version><id>65061</id><entry>2023-11-21</entry><title>Natural language processing (NLP) for clinical information extraction and healthcare research</title><swanseaauthors><author><sid>2d17cc2bf75b0aa1d7ebead5778f8d88</sid><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><name>BEATA FONFERKO-SHADRACH</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-21</date><abstract>Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data</keywords><publishedDay>28</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-09-28</publishedDate><doi>10.23889/SUthesis.65061</doi><url/><notes>A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Halcox, Julian. and Pickrell, William Owen.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>Staff funding</degreesponsorsfunders><apcterm/><funders>Swansea University Staff Funding</funders><projectreference/><lastEdited>2023-11-21T15:24:54.3701230</lastEdited><Created>2023-11-21T15:07:00.1202590</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><order>1</order></author></authors><documents><document><filename>65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf</filename><originalFilename>2023_Fonferko-Shadrach_B.final.65061.pdf</originalFilename><uploaded>2023-11-21T15:20:53.1884437</uploaded><type>Output</type><contentLength>7629185</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by-nc/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2023-11-21T15:24:54.3701230 v2 65061 2023-11-21 Natural language processing (NLP) for clinical information extraction and healthcare research 2d17cc2bf75b0aa1d7ebead5778f8d88 BEATA FONFERKO-SHADRACH BEATA FONFERKO-SHADRACH true false 2023-11-21 Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research. E-Thesis Swansea, Wales, UK Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data 28 9 2023 2023-09-28 10.23889/SUthesis.65061 A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information. COLLEGE NANME COLLEGE CODE Swansea University Halcox, Julian. and Pickrell, William Owen. Doctoral Ph.D Staff funding Swansea University Staff Funding 2023-11-21T15:24:54.3701230 2023-11-21T15:07:00.1202590 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science BEATA FONFERKO-SHADRACH 1 65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf 2023_Fonferko-Shadrach_B.final.65061.pdf 2023-11-21T15:20:53.1884437 Output 7629185 application/pdf E-Thesis – open access true Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0). true eng https://creativecommons.org/licenses/by-nc/4.0/
title Natural language processing (NLP) for clinical information extraction and healthcare research
spellingShingle Natural language processing (NLP) for clinical information extraction and healthcare research
BEATA FONFERKO-SHADRACH
title_short Natural language processing (NLP) for clinical information extraction and healthcare research
title_full Natural language processing (NLP) for clinical information extraction and healthcare research
title_fullStr Natural language processing (NLP) for clinical information extraction and healthcare research
title_full_unstemmed Natural language processing (NLP) for clinical information extraction and healthcare research
title_sort Natural language processing (NLP) for clinical information extraction and healthcare research
author_id_str_mv 2d17cc2bf75b0aa1d7ebead5778f8d88
author_id_fullname_str_mv 2d17cc2bf75b0aa1d7ebead5778f8d88_***_BEATA FONFERKO-SHADRACH
author BEATA FONFERKO-SHADRACH
author2 BEATA FONFERKO-SHADRACH
format E-Thesis
publishDate 2023
institution Swansea University
doi_str_mv 10.23889/SUthesis.65061
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science
document_store_str 1
active_str 0
description Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.
published_date 2023-09-28T14:30:53Z
_version_ 1821959786169630720
score 11.048149