RaGOO: fast and accurate reference-guided scaffolding of draft genomes

Published on Oct 28, 2019in Genome Biology13.583
路 DOI :10.1186/S13059-019-1829-6
Michael Alonge10
Estimated H-index: 10
(Johns Hopkins University),
Sebastian Soyk12
Estimated H-index: 12
(CSHL: Cold Spring Harbor Laboratory)
+ 5 AuthorsMichael C. Schatz73
Estimated H-index: 73
(Johns Hopkins University)
Sources
Abstract
We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO .
馃摉 Papers frequently viewed together
1 Author (Ian F Korf)
12 Authors (Petr Danecek, ..., Richard Durbin)
References65
Newest
#1Michael AlongeH-Index: 10
#2Wouter De CosterH-Index: 1
Last. Michael C. SchatzH-Index: 73
view all 4 authors...
Source
#1Jay Ghurye (UMD: University of Maryland, College Park)H-Index: 15
#2Arang Rhie (NIH: National Institutes of Health)H-Index: 20
Last. Sergey Koren (NIH: National Institutes of Health)H-Index: 56
view all 8 authors...
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C ...
Source
#1Aaron M. Wenger (Pacific Biosciences)H-Index: 12
#2Paul Peluso (Pacific Biosciences)H-Index: 24
Last. Michael W. Hunkapiller (Pacific Biosciences)H-Index: 52
view all 28 authors...
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5鈥塳ilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall r...
Source
#1Fabien Dutreux (Universit茅 Paris-Saclay)H-Index: 1
#2Corinne Da Silva (Universit茅 Paris-Saclay)H-Index: 47
Last. Jean-Marc AuryH-Index: 56
view all 14 authors...
De novo assembly and annotation of three Leptosphaeria genomes using Oxford Nanopore MinION sequencing
Source
#1Mikhail Kolmogorov (UCSD: University of California, San Diego)H-Index: 13
#2Joel Armstrong (UCSC: University of California, Santa Cruz)H-Index: 23
Last. Son PhamH-Index: 15
view all 12 authors...
: Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between t...
Source
#1Heng Li (Broad Institute)H-Index: 62
Motivation: Recent advances in sequencing technologies promise ultra-long reads of \sim00 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database...
Source
#1Fritz J. Sedlazeck (BCM: Baylor College of Medicine)H-Index: 33
#2Hayan Lee (Stanford University)H-Index: 13
Last. Michael C. Schatz (Johns Hopkins University)H-Index: 73
view all 4 authors...
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overc...
Source
#1Todd P. Michael (JCVI: J. Craig Venter Institute)H-Index: 51
#2Florian Jupe (Monsanto)H-Index: 16
Last. Joseph R. Ecker (Salk Institute for Biological Studies)H-Index: 156
view all 9 authors...
The handheld Oxford Nanopore MinION sequencer generates ultra-long reads with minimal cost and time requirements, which makes sequencing genomes at the bench feasible. Here, we sequence the gold standard Arabidopsis thaliana genome (KBS-Mac-74 accession) on the bench with the MinION sequencer, and assemble the genome using typical consumer computing hardware (4 Cores, 16鈥塆b RAM) into chromosome arms (62 contigs with an N50 length of 12.3鈥塎b). We validate the contiguity and quality of the assembl...
Source
#1Tong Geon Lee (UF: University of Florida)H-Index: 9
#2Reza Shekasteband (UF: University of Florida)H-Index: 4
Last. Samuel F. Hutton (UF: University of Florida)H-Index: 17
view all 5 authors...
Source
#1Miten Jain (UCSC: University of California, Santa Cruz)H-Index: 22
#2Sergey Koren (NIH: National Institutes of Health)H-Index: 56
Last. Matthew Loose (University of Nottingham)H-Index: 26
view all 26 authors...
We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ~30脳 theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ~3 Mb). Next, we developed a protocol to generate ultra-long reads (N...
Source
Cited By154
Newest
#1Benjamin Jaegle (Austrian Academy of Sciences)H-Index: 1
#2Luz Mayela Soto-Jim茅nez (Austrian Academy of Sciences)H-Index: 3
Last. Magnus Nordborg (Austrian Academy of Sciences)H-Index: 74
view all 5 authors...
Background: It is becoming apparent that genomes harbor massive amounts of structural variation, and that this variation has largely gone undetected for technical reasons. In addition to being inherently interesting, structural variation can cause artifacts when short-read sequencing data are mapped to a reference genome. In particular, spurious SNPs (that do not show Mendelian segregation) may result from mapping of reads to duplicated regions. Recalling SNP using the raw reads of null the 1001...
Source
#1Matthew Naish (University of Cambridge)H-Index: 2
#2Michael Alonge (Johns Hopkins University)H-Index: 10
Last. Pallas C. Kuo (University of Cambridge)H-Index: 4
view all 24 authors...
Centromeres attach chromosomes to spindle microtubules during cell division and, despite this conserved role, show paradoxically rapid evolution and are typified by complex repeats. We used long-re...
Source
#1Thomas W枚hner (JKI: Julius K眉hn-Institut)H-Index: 10
#2Ofere Francis Emeriewen (JKI: Julius K眉hn-Institut)H-Index: 8
Last. Janne Lempe (JKI: Julius K眉hn-Institut)
view all 17 authors...
Cherries are among the most popular fruits among consumers and are grown for industrial processing or fresh consumption. The cultivation and breeding of cherries faces new challenges in the future, not least due to climate change. Cultivation is becoming increasingly difficult due to changing climatic conditions, diseases and pests. Therefore, the market demands new varieties with high fruit quality and adaptation to locally changing conditions. Breeding for tree fruit is a long-term task, thoug...
Source
#1Alexander S Leonard (ETH Zurich)H-Index: 1
#2Danang Crysnanto (ETH Zurich)H-Index: 5
Last. Hubert Pausch (ETH Zurich)H-Index: 22
view all 12 authors...
Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Ten haplotype-resolved assemblies of three bovine trios representing increasing levels of heterozygosity were generated that each demonstrate a substantial improvement in contiguity and accuracy over the c...
Source
#1Hilary S. Ireland (Plant & Food Research)H-Index: 10
#2Chen Wu (Plant & Food Research)
Last. David Chagn茅 (Plant & Food Research)H-Index: 45
view all 10 authors...
The Rosaceae family has striking phenotypic diversity and high syntenic conservation. Gillenia trifoliata is sister species to the Maleae tribe of apple and ~1000 other species. Gillenia has many putative ancestral features, such as herb/sub-shrub habit, dry fruit-bearing and nine base chromosomes. This coalescence of ancestral characters in a phylogenetically important species, positions Gillenia as a 鈥榬osetta stone鈥 for translational science within Rosaceae. We present genomic and phenological...
Source
#1Luc CornetH-Index: 8
Last. Pierre BeckerH-Index: 13
view all 7 authors...
The medically relevant Trichophyton rubrum species complex has a variety of phenotypic presentations but shows relatively little genetic differences. Conventional barcodes, such as the internal transcribed spacer (ITS) region or the beta-tubulin gene, are not able to completely resolve the relationships between these closely related taxa. T. rubrum, T. soudanense and T. violaceum are currently accepted as separate species. However, the status of certain variants, including the T. rubrum morphoty...
Source
#1Glaucia Del-Rio (LSU: Louisiana State University)H-Index: 5
#2Marco Antonio Rego (LSU: Louisiana State University)H-Index: 8
Last. Robb T. Brumfield (LSU: Louisiana State University)H-Index: 56
view all 7 authors...
Secondary contact between species often results in the formation of a hybrid zone, with the eventual fates of the hybridizing species dependent on evolutionary and ecological forces. We examine this process in the Amazon Basin by conducting the first genomic and phenotypic characterization of the hybrid zone formed after secondary contact between two obligate army-ant-followers: the White-breasted Antbird (Rhegmatorhina hoffmannsi) and the Harlequin Antbird (Rhegmatorhina berlepschi). We found a...
Source
The mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classificati...
Source
#1Hatim Almutairi (Lancaster University)H-Index: 1
#2Michael D. Urbaniak (Lancaster University)H-Index: 19
Last. Derek Gatherer (Lancaster University)H-Index: 33
view all 7 authors...
Porcisia hertigi is a parasitic kinetoplastid first isolated from porcupines (Coendou rothschildi) in central Panama in 1965. We present the complete genome sequence of P. hertigi, isolate C119, strain LV43, sequenced using combined short- and long-read technologies. This complete genome sequence will contribute to our knowledge of the parasitic genus Porcisia.
Source
#1Marie BuysseH-Index: 7
Last. Olivier DuronH-Index: 32
view all 13 authors...
Many animals are dependent on microbial partners that provide essential nutrients lacking from their diet. Ticks, whose diet consists exclusively on vertebrate blood, rely on maternally inherited bacterial symbionts to supply B vitamins. While previously studied tick species consistently harbor a single lineage of those nutritional symbionts, we evidence here that the invasive tick Hyalomma marginatum harbors a unique dual-partner nutritional system between an ancestral symbiont, Francisella, an...
Source
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.