No Cover Image

E-Thesis 154 views 37 downloads

Data mining patterns of receptor-drug interactions across vast biological and chemical space / James Witts

Swansea University Author: James Witts

DOI (Published version): 10.23889/SUthesis.65941

Abstract

The aim of the project was to determine how machine learning tools can assist in the process of drug discovery and in silico screening. As the development costs and attrition rates for candidate compounds can be high, a method of predicting likelihood for approval or for therapeutic promise with mac...

Full description

Published: Swansea, Wales, UK 2020
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Mullins, Jonathan G. ; Iago, Heledd F.
URI: https://cronfa.swan.ac.uk/Record/cronfa65941
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: The aim of the project was to determine how machine learning tools can assist in the process of drug discovery and in silico screening. As the development costs and attrition rates for candidate compounds can be high, a method of predicting likelihood for approval or for therapeutic promise with machine learning and high performance computing tools could be of great benefit to medical researchers, biotech, chemical and pharmaceutical industries. The first phase of the project was to determine if knowledge of the recorded in vitro protein interactions of particular compounds (listed in DrugBank and ToxCast) would be sufficient to determine whether a candidate compound could be designated as having a good (i.e approved drug) or a bad (i.e toxic) profile. The learning models assessed showed promise in correctly designating candidate compounds based on a small number of proteins used for pharmacological profiling, with over 90% overall profiling prediction accuracy in the best case, however the vast majority of interactions between compounds and proteins are unknown and so a predictive approach is needed. The second phase of the project was to provide a method for predicting these hitherto unknown interactions, by predicting protein-compound interaction pairs through clustering techniques. Several clustering methods based on protein and compound similarity measurement techniques were investigated and found that when tested on blind in vitro interactions, approximately half on average were detected successfully, highlighting the promising potential to strengthen the predictive profiling models. The third and final phase of the project focused on the development of a compound and protein target prediction interface, TargetPredict (http://proteins.swan.ac.uk/cheminf/), which incorporates the data and methodologies presented throughout the whole project into a single centralised source, to provide the sector for the first time with a unified tool, avoiding the onerousness of current approaches that either require the use of multiple often incompatible websites or require extensive coding experience.
Keywords: Data mining, clustering, classification, similarity
College: Faculty of Medicine, Health and Life Sciences