No Cover Image

E-Thesis 221 views 380 downloads

Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples / DREW MAGOWAN

Swansea University Author: DREW MAGOWAN

DOI (Published version): 10.23889/SUthesis.70267

Abstract

This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current li...

Full description

Published: Swansea, Wales, UK 2025
Institution: Swansea University
Degree level: Doctoral
Degree name: M.D
Supervisor: Owen, Rhiannon ; Dunstan, Peter
URI: https://cronfa.swan.ac.uk/Record/cronfa70267
Abstract: This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway.
Item Description: ORCiD identifier: https://orcid.org/0000-0002-5086-2720
Keywords: Colorectal cancer, Raman spectroscopy, colorectal adenoma, colorectal polyp, liquid biopsy, machine learning, screening
College: Faculty of Medicine, Health and Life Sciences