Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy

Published on Sep 18, 2019in bioRxiv
· DOI :10.1101/771964
Donovan H. Parks36
Estimated H-index: 36
(UQ: University of Queensland),
Maria Chuvochina14
Estimated H-index: 14
(UQ: University of Queensland)
+ 3 AuthorsPhilip Hugenholtz114
Estimated H-index: 114
(UQ: University of Queensland)
We recently introduced the Genome Taxonomy Database (GTDB), a phylogenetically consistent, genome-based taxonomy providing rank normalized classifications for nearly 150,000 genomes from domain to genus. However, nearly 40% of the genomes used to infer the GTDB reference tree lack a species name, reflecting the large number of genomes in public repositories without complete taxonomic assignments. Here we address this limitation by proposing 24,706 species clusters which encompass all publicly available bacterial and archaeal genomes when using commonly accepted average nucleotide identity (ANI) criteria for circumscribing species. In contrast to previous ANI studies, we selected a single representative genome to serve as the nomenclatural type for circumscribing each species with type strains used where available. We complemented the 8,792 species clusters with validly or effectively published names with 15,914 de novo species clusters in order to assign placeholder names to the growing number of genomes from uncultivated species. This provides the first complete domain to species taxonomic framework which will improve communication of scientific results.
Figures & Tables
📖 Papers frequently viewed together
200 Citations
5 Authors (Stijn Wittouck, ..., Sarah Lebeer)
21 Citations
3,069 Citations
#1Pierre-Alain Chaumeil (UQ: University of Queensland)H-Index: 11
#2Aaron J. Mussig (UQ: University of Queensland)H-Index: 5
Last. Donovan H. Parks (UQ: University of Queensland)H-Index: 36
view all 4 authors...
SUMMARY: The GTDB Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB). GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10,156 bacterial and archaeal metagenome-assembled genomes. AVAILABILITY: GTDB-Tk is implemented in Python ...
527 CitationsSource
#1Stijn Wittouck (Katholieke Universiteit Leuven)H-Index: 9
#2Sander Wuyts (University of Antwerp)H-Index: 12
Last. Sarah Lebeer (University of Antwerp)H-Index: 44
view all 5 authors...
ABSTRACT There are more than 200 published species within the Lactobacillus genus complex (LGC), the majority of which have sequenced type strain genomes available. Although genome-based species delimitation cutoffs are accepted as the gold standard by the community, these are seldom actually checked for new or already published species. In addition, the availability of genome data is revealing inconsistencies in the species-level classification of many strains. We constructed a de novo species ...
21 CitationsSource
#1Alexander J. Fenwick (JHUSOM: Johns Hopkins University School of Medicine)H-Index: 2
#2Karen C. Carroll (JHUSOM: Johns Hopkins University School of Medicine)H-Index: 80
2 CitationsSource
#1Philip Arevalo (MIT: Massachusetts Institute of Technology)H-Index: 9
#2David VanInsberghe (MIT: Massachusetts Institute of Technology)H-Index: 10
Last. Martin F. Polz (MIT: Massachusetts Institute of Technology)H-Index: 57
view all 5 authors...
Summary Delineating ecologically meaningful populations among microbes is important for identifying their roles in environmental and host-associated microbiomes. Here, we introduce a metric of recent gene flow, which when applied to co-existing microbes, identifies congruent genetic and ecological units separated by strong gene flow discontinuities from their next of kin. We then develop a pipeline to identify genome regions within these units that show differential adaptation and allow mapping ...
43 CitationsSource
#1Guo-Hong LiuH-Index: 8
#2Manik Prabhu Narsing Rao (SYSU: Sun Yat-sen University)H-Index: 10
Last. Wen-Jun Li (CAS: Chinese Academy of Sciences)H-Index: 66
view all 9 authors...
In the present study, phylogenetic and genome-based comparison was carried out to clarify the taxonomic positions of alkaliphilic Bacillus species, Bacillus plakortidis, Bacillus lehensis, Bacillus oshimensis, Bacillus rhizosphaerae and Bacillus clausii. Phylogenetic trees based on 16S rRNA gene sequences and concatenated protein marker genes were constructed. Average nucleotide identity (ANI) values were calculated to compare genetic relatedness. In phylogenetic trees, B. plakortidis DSM 19153T...
7 CitationsSource
Last. Edward R. B. MooreH-Index: 4
view all 6 authors...
Ed. Note: The authors of the published articles did not respond. With this letter, we warn users of bacterial DNA sequence data about recent cases misusing the term “type strain” in bacterial genome sequence reports and highlight the importance that the term is used in the correct context. In
4 CitationsSource
#1Matthew R. Olm (University of California, Berkeley)H-Index: 17
#2Alexander Crits-Christoph (University of California, Berkeley)H-Index: 16
Last. Jillian F. BanfieldH-Index: 136
view all 6 authors...
Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. We compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity, and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, a...
7 CitationsSource
#1Edoardo Pasolli (University of Trento)H-Index: 29
#2Francesco Asnicar (University of Trento)H-Index: 21
Last. Nicola Segata (University of Trento)H-Index: 61
view all 18 authors...
Summary The body-wide human microbiome plays a role in health, but its full diversity remains uncharacterized, particularly outside of the gut and in international populations. We leveraged 9,428 metagenomes to reconstruct 154,723 microbial genomes (45% of high quality) spanning body sites, ages, countries, and lifestyles. We recapitulated 4,930 species-level genome bins (SGBs), 77% without genomes in public repositories (unknown SGBs [uSGBs]). uSGBs are prevalent (in 93% of well-assembled sampl...
478 CitationsSource
#1L.C. Reimer (Leibniz Association)H-Index: 4
#2Anna Vetcininova (Leibniz Association)H-Index: 3
Last. Jörg Overmann (Leibniz Association)H-Index: 67
view all 7 authors...
: The bacterial metadatabase BacDive ( has become a comprehensive resource for structured data on the taxonomy, morphology, physiology, cultivation, isolation and molecular data of prokaryotes. With its current release (7/2018) the database offers information for 63 669 bacterial and archaeal strains including 12 715 type strains. During recent developments of BacDive, the enrichment of information on existing strains was prioritized. This has resulted in a 146% increase o...
61 CitationsSource
#1I-Min A. Chen (JGI: Joint Genome Institute)H-Index: 22
#2Ken Chu (JGI: Joint Genome Institute)H-Index: 20
Last. Nikos C. Kyrpides (JGI: Joint Genome Institute)H-Index: 100
view all 17 authors...
Author(s): Chen, IMA; Chu, K; Palaniappan, K; Pillay, M; Ratner, A; Huang, J; Huntemann, M; Varghese, N; White, JR; Seshadri, R; Smirnova, T; Kirton, E; Jungbluth, SP; Woyke, T; Eloe-Fadrosh, EA; Ivanova, NN; Kyrpides, NC | Abstract: The Integrated Microbial Genomes a Microbiomes system v.5.0 (IMG/M: contains annotated datasets categorized into: archaea, bacteria, eukarya, plasmids, viruses, genome fragments, metagenomes, cell enrichments, single particle sorts, and m...
429 CitationsSource
Cited By29
#1Randi Lundberg (Chr. Hansen)
Last. Dorthe Sandvang (Chr. Hansen)H-Index: 4
view all 3 authors...
Background null Despite low genetic variation of broilers and deployment of considerate management practices, there still exists considerable body weight (BW) heterogeneity within broiler flocks which adversely affects the commercial value. The purpose of this study was to investigate the role of the cecal microbiome in weight differences between animals. Understanding how the gut microbiome may contribute to flock heterogeneity helps to pave the road for identifying methods to improve flock uni...
#1Lizbeth Sayavedra (Norwich Research Park)H-Index: 8
#2Tianqi Li (Jiangnan University)H-Index: 2
Last. Arjan Narbad (Norwich Research Park)H-Index: 51
view all 8 authors...
Sulphate-reducing bacteria (SRB) are widespread in human guts, yet their expansion has been linked to colonic diseases. We report the isolation, sequencing, and physiological characterisation of strain QI0027T , a novel SRB species belonging to the class Desulfovibrionia. Metagenomic sequencing of stool samples from 45 Chinese individuals, and comparison with 1690 Desulfovibrionaceae metagenome-assembled genomes recovered from humans of diverse geographic locations, revealed the presence of QI00...
#1Ines FriedrichH-Index: 1
#2Anna KlassenH-Index: 1
Last. Rolf DanielH-Index: 71
view all 6 authors...
#1Michael O. Eze (GAU: University of Göttingen)H-Index: 4
#2Volker ThielH-Index: 26
Last. Rolf DanielH-Index: 71
view all 5 authors...
The biotechnological application of microorganisms for rhizoremediation of contaminated sites requires the development of plant-microbe symbionts capable of plant growth promotion and hydrocarbon degradation. Studies focusing on microbial consortia are often difficult to reproduce, thereby necessitating the need for culturable single bacterial species for biotechnological applications. Through genomic analyses and plant growth experiments, we examined the synergistic interactions of Medicago sat...
#1Michael O. Eze (GAU: University of Göttingen)H-Index: 4
#2Grant C. HoseH-Index: 24
Last. R. Daniel
view all 4 authors...
The pollution of terrestrial and aquatic environments by petroleum contaminants, especially diesel fuel, is a persistent environmental threat requiring cost-effective and environmentally sensitive remediation approaches. Bioremediation is one such approach, but is dependent on the availability of microorganisms with the necessary metabolic abilities and environmental adaptability. The aim of this study was to examine the microbial community in a petroleum contaminated site, and isolate organisms...
1 CitationsSource
#1Christian Rinke (UQ: University of Queensland)H-Index: 29
#2Maria Chuvochina (UQ: University of Queensland)H-Index: 14
Last. Philip Hugenholtz (UQ: University of Queensland)H-Index: 114
view all 9 authors...
Abstract An increasing wealth of genomic data from cultured and uncultured microorganisms provides the opportunity to develop a systematic taxonomy based on evolutionary relationships. Here we propose a standardized archaeal taxonomy, as part of the Genome Taxonomy Database (GTDB), derived from a 122 concatenated protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence (RED). The resulting archaeal taxonomy is stable under a range of phyl...
12 CitationsSource
#1Sania Arif (GAU: University of Göttingen)H-Index: 4
#2Heiko Nacke (GAU: University of Göttingen)H-Index: 16
Last. Michael Hoppert (GAU: University of Göttingen)H-Index: 30
view all 3 authors...
We sequenced the metagenome of a biofilm collected near a leachate stream of the Marsberg copper mine (Germany) and reconstructed eight metagenome-assembled genomes. These genomes yield copper resistance through Cu(I) oxidation via multiple copper oxidases and extrusion through copper-exporting P-type ATPases.
Last. Ben BusbyH-Index: 11
view all 33 authors...
Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during thi...
#1Sylwia SiebielecH-Index: 4
#2Grzegorz SiebielecH-Index: 8
Last. Tomasz Stuczyński (KUL: John Paul II Catholic University of Lublin)H-Index: 11
view all 7 authors...
The aim was to assess plant driven changes in the activity and diversity of microorganisms in the top layer of the zinc and lead smelter waste piles. The study sites comprised two types (flotation waste—FW and slag waste—SW) of smelter waste deposits in Piekary Slaskie, Poland. Cadmium, zinc, lead, and arsenic contents in these technosols were extremely high. The root zone of 8 spontaneous plant species (FW—Thymus serpyllum, Silene vulgaris, Solidago virgaurea, Echium vulgare, and Rumex acetosa;...
#1Julian Yu (ASU: Arizona State University)H-Index: 5
#2Michael J. Pavia (ASU: Arizona State University)
Last. Christopher Ryan Penton (ASU: Arizona State University)H-Index: 7
view all 6 authors...
The functions and interactions of individual microbial populations and their genes in agricultural soils amended with biochar remain elusive but are crucial for a deeper understanding of nutrient cycling and carbon (C) sequestration. In this study, we coupled DNA stable isotope probing (SIP) with shotgun metagenomics in order to target the active community in microcosms which contained soil collected from biochar-amended and control plots under napiergrass cultivation. Our analyses revealed that...
1 CitationsSource