No Cover Image

Journal article 618 views

A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter

Guillaume Méric, Koji Yahara, Leonardos Mageiros, Ben Pascoe Orcid Logo, Martin C. J Maiden, Keith A Jolley, Samuel K Sheppard

PLoS ONE, Volume: 9, Issue: 3, Start page: e92798

Swansea University Author: Ben Pascoe Orcid Logo

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.1371/journal.pone.0092798

Abstract

The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for c...

Full description

Published in: PLoS ONE
Published: 2014
Online Access: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092798
URI: https://cronfa.swan.ac.uk/Record/cronfa16973
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2014-03-04T02:30:09Z
last_indexed 2018-02-09T04:50:09Z
id cronfa16973
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2014-03-28T00:56:08.3033927</datestamp><bib-version>v2</bib-version><id>16973</id><entry>2014-01-20</entry><title>A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter</title><swanseaauthors><author><sid>4660c0eb7e6bfd796cd749ae713ea558</sid><ORCID>0000-0001-6376-5121</ORCID><firstname>Ben</firstname><surname>Pascoe</surname><name>Ben Pascoe</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2014-01-20</date><deptcode>PMSC</deptcode><abstract>The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation - focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of &#x2265;70% identity over &#x2265;50% of the locus length - aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.</abstract><type>Journal Article</type><journal>PLoS ONE</journal><volume>9</volume><journalNumber>3</journalNumber><paginationStart>e92798</paginationStart><publisher/><keywords>Campylobacter; coli; genomes; epidemiology; pathogenic bacteria</keywords><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2014</publishedYear><publishedDate>2014-12-31</publishedDate><doi>10.1371/journal.pone.0092798</doi><url>http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092798</url><notes/><college>COLLEGE NANME</college><department>Medicine</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>PMSC</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2014-03-28T00:56:08.3033927</lastEdited><Created>2014-01-20T15:34:09.2718275</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Medicine</level></path><authors><author><firstname>Guillaume</firstname><surname>M&#xE9;ric</surname><order>1</order></author><author><firstname>Koji</firstname><surname>Yahara</surname><order>2</order></author><author><firstname>Leonardos</firstname><surname>Mageiros</surname><order>3</order></author><author><firstname>Ben</firstname><surname>Pascoe</surname><orcid>0000-0001-6376-5121</orcid><order>4</order></author><author><firstname>Martin C. J</firstname><surname>Maiden</surname><order>5</order></author><author><firstname>Keith A</firstname><surname>Jolley</surname><order>6</order></author><author><firstname>Samuel K</firstname><surname>Sheppard</surname><order>7</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling 2014-03-28T00:56:08.3033927 v2 16973 2014-01-20 A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter 4660c0eb7e6bfd796cd749ae713ea558 0000-0001-6376-5121 Ben Pascoe Ben Pascoe true false 2014-01-20 PMSC The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation - focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥70% identity over ≥50% of the locus length - aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach. Journal Article PLoS ONE 9 3 e92798 Campylobacter; coli; genomes; epidemiology; pathogenic bacteria 31 12 2014 2014-12-31 10.1371/journal.pone.0092798 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092798 COLLEGE NANME Medicine COLLEGE CODE PMSC Swansea University 2014-03-28T00:56:08.3033927 2014-01-20T15:34:09.2718275 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Medicine Guillaume Méric 1 Koji Yahara 2 Leonardos Mageiros 3 Ben Pascoe 0000-0001-6376-5121 4 Martin C. J Maiden 5 Keith A Jolley 6 Samuel K Sheppard 7
title A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
spellingShingle A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
Ben Pascoe
title_short A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
title_full A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
title_fullStr A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
title_full_unstemmed A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
title_sort A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter
author_id_str_mv 4660c0eb7e6bfd796cd749ae713ea558
author_id_fullname_str_mv 4660c0eb7e6bfd796cd749ae713ea558_***_Ben Pascoe
author Ben Pascoe
author2 Guillaume Méric
Koji Yahara
Leonardos Mageiros
Ben Pascoe
Martin C. J Maiden
Keith A Jolley
Samuel K Sheppard
format Journal article
container_title PLoS ONE
container_volume 9
container_issue 3
container_start_page e92798
publishDate 2014
institution Swansea University
doi_str_mv 10.1371/journal.pone.0092798
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Medicine{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Medicine
url http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092798
document_store_str 0
active_str 0
description The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation - focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥70% identity over ≥50% of the locus length - aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.
published_date 2014-12-31T03:19:29Z
_version_ 1763750518176350208
score 11.036553