E-Thesis 213 views 19 downloads
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine / HENRY ASOMUGHA
Swansea University Author: HENRY ASOMUGHA
Abstract
This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving...
Published: |
Swansea
2023
|
---|---|
Institution: | Swansea University |
Degree level: | Master of Research |
Degree name: | MSc by Research |
Supervisor: | Mullins, Jonathan ; Ferla, Salvatore |
URI: | https://cronfa.swan.ac.uk/Record/cronfa62661 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2023-02-14T12:12:43Z |
---|---|
last_indexed |
2023-02-15T04:17:20Z |
id |
cronfa62661 |
recordtype |
RisThesis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2023-02-14T12:21:03.9803698</datestamp><bib-version>v2</bib-version><id>62661</id><entry>2023-02-14</entry><title>Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine</title><swanseaauthors><author><sid>dd1bef9cd6bf4529ae2ad12cf3dd2773</sid><firstname>HENRY</firstname><surname>ASOMUGHA</surname><name>HENRY ASOMUGHA</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-02-14</date><abstract>This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Precision Medicine, Bioinformatics</keywords><publishedDay>10</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-02-10</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Mullins, Jonathan ; Ferla, Salvatore</supervisor><degreelevel>Master of Research</degreelevel><degreename>MSc by Research</degreename><apcterm/><funders/><projectreference/><lastEdited>2023-02-14T12:21:03.9803698</lastEdited><Created>2023-02-14T11:52:52.3606059</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Biomedical Science</level></path><authors><author><firstname>HENRY</firstname><surname>ASOMUGHA</surname><order>1</order></author></authors><documents><document><filename>62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf</filename><originalFilename>Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf</originalFilename><uploaded>2023-02-14T12:17:01.4829239</uploaded><type>Output</type><contentLength>1235859</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Henry Asomugha, 2023.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2023-02-14T12:21:03.9803698 v2 62661 2023-02-14 Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine dd1bef9cd6bf4529ae2ad12cf3dd2773 HENRY ASOMUGHA HENRY ASOMUGHA true false 2023-02-14 This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance. E-Thesis Swansea Precision Medicine, Bioinformatics 10 2 2023 2023-02-10 COLLEGE NANME COLLEGE CODE Swansea University Mullins, Jonathan ; Ferla, Salvatore Master of Research MSc by Research 2023-02-14T12:21:03.9803698 2023-02-14T11:52:52.3606059 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Biomedical Science HENRY ASOMUGHA 1 62661__26585__ff3b0f4446544fd193017cfeeb36ca1c.pdf Asomugha_Henry_MSc_Research_Thesis_Final_Redacted_Signature.pdf 2023-02-14T12:17:01.4829239 Output 1235859 application/pdf E-Thesis – open access true Copyright: The author, Henry Asomugha, 2023. true eng |
title |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
spellingShingle |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine HENRY ASOMUGHA |
title_short |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
title_full |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
title_fullStr |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
title_full_unstemmed |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
title_sort |
Development of a combined database / in silico pipeline for the investigation of novel approaches for precision medicine |
author_id_str_mv |
dd1bef9cd6bf4529ae2ad12cf3dd2773 |
author_id_fullname_str_mv |
dd1bef9cd6bf4529ae2ad12cf3dd2773_***_HENRY ASOMUGHA |
author |
HENRY ASOMUGHA |
author2 |
HENRY ASOMUGHA |
format |
E-Thesis |
publishDate |
2023 |
institution |
Swansea University |
college_str |
Faculty of Medicine, Health and Life Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
department_str |
Swansea University Medical School - Biomedical Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Biomedical Science |
document_store_str |
1 |
active_str |
0 |
description |
This project investigates how integrated in silico approaches can be developed to advance precision medicine. A novel bioinformatics database was created and implemented. This database was able to output gene and variant information of all the known human protein drug targets, while also retrieving information on compounds, disorders and drugs associated with each protein. This database was created by writing code, in Linux, that was able to web-scrape multiple databases for information and store it in the pipeline database. The databases that were scraped were UniProt, ClinVar, PubChem, chEMBL, guide to pharmacology, MedGen and the therapeutic target database. These databases were selected as they provided the largest and most reliable data, that could be web-scraped, for each section they were scraped for. Once coded, the full pharmacology set (a set of 720 pharmacologically relevant proteins, whose pharmacological mechanism is known) was added to the pipeline, meaning all their information was downloaded and stored in the pipeline. This bioinformatics pipeline proved to be very effective as an investigative tool for identifying new avenues for personalised medicine as it was able to retrieve and integrate all the requested information on proteins, variants, diseases, and compounds when called upon. In the proof-of-concept study, the database was used to gather key information that allowed for an investigation into the effect of pathogenic variants on drug binding in proteins. This investigation was conducted by simulating the binding of a protein’s wild type to two of its known drug ligands. 10 benign variants and 10 pathogenic variants of the protein were also bound to 2 drug ligands associated with the protein; their relative binding energies were collected. This allowed for comparisons to be made between the effect of pathogenic variants and benign variants on a protein’s binding ability. Analysis from the docking simulations showed that in 3 of the 5 proteins studied (60%), more pathogenic mutations returned a binding energy with at least a 15% deviation from the wildtype binding energy than benign mutations. These results suggest that the binding interactions of a protein could be affected by polymorphic variation, especially pathogenic variation, although in this case study the difference between the two groups did not show statistical significance. |
published_date |
2023-02-10T04:22:27Z |
_version_ |
1763754479605252096 |
score |
10.970258 |