No Cover Image

E-Thesis 349 views 52 downloads

Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine / HENRY ASOMUGHA

Swansea University Author: HENRY ASOMUGHA

  • Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf

    PDF | E-Thesis – open access

    Copyright: The author, Henry Asomugha, 2023.

    Download (1.18MB)

Abstract

This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving...

Full description

Published: Swansea 2023
Institution: Swansea University
Degree level: Master of Research
Degree name: MSc by Research
Supervisor: Mullins, Jonathan ; Ferla, Salvatore
URI: https://cronfa.swan.ac.uk/Record/cronfa62661
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2023-02-14T12:12:43Z
last_indexed 2023-02-15T04:17:20Z
id cronfa62661
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2023-02-14T12:21:03.9803698</datestamp><bib-version>v2</bib-version><id>62661</id><entry>2023-02-14</entry><title>Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine</title><swanseaauthors><author><sid>dd1bef9cd6bf4529ae2ad12cf3dd2773</sid><firstname>HENRY</firstname><surname>ASOMUGHA</surname><name>HENRY ASOMUGHA</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-02-14</date><abstract>This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein&#x2019;s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein&#x2019;s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Precision Medicine, Bioinformatics</keywords><publishedDay>10</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-02-10</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Mullins, Jonathan ; Ferla, Salvatore</supervisor><degreelevel>Master of Research</degreelevel><degreename>MSc by Research</degreename><apcterm/><funders/><projectreference/><lastEdited>2023-02-14T12:21:03.9803698</lastEdited><Created>2023-02-14T11:52:52.3606059</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Biomedical Science</level></path><authors><author><firstname>HENRY</firstname><surname>ASOMUGHA</surname><order>1</order></author></authors><documents><document><filename>62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf</filename><originalFilename>Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf</originalFilename><uploaded>2023-02-14T12:17:01.4829239</uploaded><type>Output</type><contentLength>1235859</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Henry Asomugha, 2023.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2023-02-14T12:21:03.9803698 v2 62661 2023-02-14 Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine dd1bef9cd6bf4529ae2ad12cf3dd2773 HENRY ASOMUGHA HENRY ASOMUGHA true false 2023-02-14 This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance. E-Thesis Swansea Precision Medicine, Bioinformatics 10 2 2023 2023-02-10 COLLEGE NANME COLLEGE CODE Swansea University Mullins, Jonathan ; Ferla, Salvatore Master of Research MSc by Research 2023-02-14T12:21:03.9803698 2023-02-14T11:52:52.3606059 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Biomedical Science HENRY ASOMUGHA 1 62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf 2023-02-14T12:17:01.4829239 Output 1235859 application/pdf E-Thesis – open access true Copyright: The author, Henry Asomugha, 2023. true eng
title Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
spellingShingle Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
HENRY ASOMUGHA
title_short Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_full Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_fullStr Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_full_unstemmed Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
title_sort Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine
author_id_str_mv dd1bef9cd6bf4529ae2ad12cf3dd2773
author_id_fullname_str_mv dd1bef9cd6bf4529ae2ad12cf3dd2773_***_HENRY ASOMUGHA
author HENRY ASOMUGHA
author2 HENRY ASOMUGHA
format E-Thesis
publishDate 2023
institution Swansea University
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Biomedical Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Biomedical Science
document_store_str 1
active_str 0
description This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance.
published_date 2023-02-10T04:22:27Z
_version_ 1763754479605252096
score 11.012678