Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine

ASOMUGHA, HENRY

doi:https://doi.org/

E-Thesis 1004 views 210 downloads

Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine / HENRY ASOMUGHA

Swansea University Author: HENRY ASOMUGHA

PDF | E-Thesis – open access

Copyright: The author, Henry Asomugha, 2023.
Download (1.18MB)

Abstract

This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving...

Full description

Published:	Swansea 2023
Institution:	Swansea University
Degree level:	Master of Research
Degree name:	MSc by Research
Supervisor:	Mullins, Jonathan ; Ferla, Salvatore
URI:	https://cronfa.swan.ac.uk/Record/cronfa62661

first_indexed	2023-02-14T12:12:43Z
last_indexed	2023-02-15T04:17:20Z
id	cronfa62661
recordtype	RisThesis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2023-02-14T12:21:03.9803698</datestamp><bib-version>v2</bib-version><id>62661</id><entry>2023-02-14</entry><title>Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine</title><swanseaauthors><author><sid>dd1bef9cd6bf4529ae2ad12cf3dd2773</sid><firstname>HENRY</firstname><surname>ASOMUGHA</surname><name>HENRY ASOMUGHA</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-02-14</date><abstract>This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Precision Medicine, Bioinformatics</keywords><publishedDay>10</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-02-10</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Mullins, Jonathan ; Ferla, Salvatore</supervisor><degreelevel>Master of Research</degreelevel><degreename>MSc by Research</degreename><apcterm/><funders/><projectreference/><lastEdited>2023-02-14T12:21:03.9803698</lastEdited><Created>2023-02-14T11:52:52.3606059</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Biomedical Science</level></path><authors><author><firstname>HENRY</firstname><surname>ASOMUGHA</surname><order>1</order></author></authors><documents><document><filename>62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf</filename><originalFilename>Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf</originalFilename><uploaded>2023-02-14T12:17:01.4829239</uploaded><type>Output</type><contentLength>1235859</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Henry Asomugha, 2023.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2023-02-14T12:21:03.9803698 v2 62661 2023-02-14 Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine dd1bef9cd6bf4529ae2ad12cf3dd2773 HENRY ASOMUGHA HENRY ASOMUGHA true false 2023-02-14 This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance. E-Thesis Swansea Precision Medicine, Bioinformatics 10 2 2023 2023-02-10 COLLEGE NANME COLLEGE CODE Swansea University Mullins, Jonathan ; Ferla, Salvatore Master of Research MSc by Research 2023-02-14T12:21:03.9803698 2023-02-14T11:52:52.3606059 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Biomedical Science HENRY ASOMUGHA 1 62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf 2023-02-14T12:17:01.4829239 Output 1235859 application/pdf E-Thesis – open access true Copyright: The author, Henry Asomugha, 2023. true eng
title	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
spellingShingle	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine HENRY ASOMUGHA
title_short	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_full	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_fullStr	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_full_unstemmed	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_sort	Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
author_id_str_mv	dd1bef9cd6bf4529ae2ad12cf3dd2773
author_id_fullname_str_mv	dd1bef9cd6bf4529ae2ad12cf3dd2773_***_HENRY ASOMUGHA
author	HENRY ASOMUGHA
author2	HENRY ASOMUGHA
format	E-Thesis
publishDate	2023
institution	Swansea University
college_str	Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id	facultyofmedicinehealthandlifesciences
hierarchy_top_title	Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id	facultyofmedicinehealthandlifesciences
hierarchy_parent_title	Faculty of Medicine, Health and Life Sciences
department_str	Swansea University Medical School - Biomedical Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Biomedical Science
document_store_str	1
active_str	0
description	This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance.
published_date	2023-02-10T08:58:40Z
_version_	1867505384161280000
score	11.108223

Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine / HENRY ASOMUGHA

Similar Items