No Cover Image

E-Thesis 375 views 299 downloads

Using Novel Data Types for Big Data Research in Epilepsy: Patient Records, Clinic Letters and Genetic Mutation / Arron S. Lacey

DOI (Published version): 10.23889/Suthesis.48905

Abstract

Introduction: The aims of this thesis was to explore novel data types in healthcare that could enhance epidemiology studies in epilepsy and to develop novel methods of analysing routinely collected linked healthcare data, unstructured free text in hospital clinic letters and genetic variation.Method...

Full description

Published: 2019
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
URI: https://cronfa.swan.ac.uk/Record/cronfa48905
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: Introduction: The aims of this thesis was to explore novel data types in healthcare that could enhance epidemiology studies in epilepsy and to develop novel methods of analysing routinely collected linked healthcare data, unstructured free text in hospital clinic letters and genetic variation.Method: The SAIL Databank was used to source linked healthcare data for people with epilepsy across Wales to study the effects of epilepsy and social deprivation, coding of epilepsy in GP records and the educational attainment of children born to mothers with epilepsy. Hospital clinic letters from Morriston Hospital in Swansea were analysed using Natural Language Processing techniques to extract rich clinic data not typically recorded as part of routinely collected data. An automated pipeline was developed to predict the pathogenicity of Single Nucleotide Polymorphisms to prioritize potential disease-causing genetic variation in epilepsy for further in-vitro analysis.Results: Incidence and prevalence of epilepsy was found to be strongly correlated with increased social deprivation, however a 10 year retrospective follow-up study found that there was no increase in deprivation following a diagnosis of epilepsy, pointing to deprivation contributing to social causation of epilepsy rather than epilepsy causing social drift. An algorithm was developed to accurately source epilepsy patients from GP records. Sodium Valproate was found to reduce educational attainment in 7 year olds by 12%. A Natural Language Processing pipeline was developed to extract rich epilepsy information from clinic letters. A pipeline was created to predict pathogencity of epilepsy SNPs that performed better than commonly used software.Conclusion: This thesis presents novel studies in epilepsy using population level healthcare data, unstructured clinic letters and genetic variation. New methods were developed that have the potential to be applied to other disease areas and used to link different data types into routinely collected healthcare records to enhance further research.
Keywords: Big Data, Natural Language Processing, Genomics, Epilepsy
College: Faculty of Medicine, Health and Life Sciences