DENOVOSEQ - DENOVO ASSEMBLY


De Novo assembly is the step required to assemble short nucleotide sequences into longer ones without the use of a reference genome. De Novo assembly is the methodology most commonly used in studies oriented to characterize genomes or transcriptomes of which nothing is known (i.e. for the first time). For this task, DeNovoSeq provides distinct interface solutions to manage the most common de Novo assemblers to build de novo high quality transcriptomes and/or genomes as well as additional tools such as gap filling and scaffolding to improve the quality and accuracy of genome assemblies.

3.1 - INPUT CONFIGURATION FILE

An input configuration file is required by DeNovoSeq to guide the assembly process. You can use a previously existent configuration file or create a new one as follows;

To create a new Input configuration file with DeNovoSeq go to:

           [ De novo assembly → Input data configuration ]

Figure 13: Animated GIF of using a Configuration file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

Note: Every assembly De Novo tool will need a configuration file. This step is required to run the following tools.

3.2 - ASSEMBLY

The current version of DeNovoSeq implements interface solution to call six De Novo assemblers powered by de bruijin graphs algorithms; two for transcriptomes (Oases and SOAPdeNovo-Trans) and for genomes (Velvet, SOAPdeNovo2, Canu and SPAdes).

- Transcriptomes: Oases

Oases (Schulz et al., 2012) is a de novo transcriptome assembler powered by the Velvet genome assembler core with the aim to resolve transcripts from short read sequencing technologies, such as Illumina, SOLiD or 454 in the absence of any genomic assembly. See the Oases manual at https://www.ebi.ac.uk/~zerbino/oases/OasesManual.pdf for more information.

To run Oases with DeNovoSeq go to:

           [ De novo assembly → Assembly → Transcriptomes → Oases ]

Figure 14: Animated GIF of using an Oases file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Transcriptomes: SOAPdenovo-Trans

SOAPdenovo-Trans (Luo et al., 2012) is a de novo transcriptome assembler based on the SOAPdenovo framework adapted to resolve alternative splicing and different expression level among transcripts. See the SOAPdenovo-Trans manual at https://github.com/aquaskyline/SOAPdenovo-Trans for more information.>

To run SOAPdenovo-Trans with DeNovoSeq go to:

           [ De novo assembly → Assembly → Transcriptomes → SOAPdenovo-Trans ]

Figure 15: Animated GIF of using a SOAP De Novo Trans file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Genomes: Velvet

Velvet (Zerbino and Birney, 2008) is a de novo genome assembler designed for short reads sequencing technologies, such as Solexa or 454. Currently, Velvet takes short read sequences and resolves high quality contigs. See the Velvet manual at https://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf for more information.

To run Velvet with DeNovoSeq go to:

           [ De novo assembly → Assembly → Genomes → Velvet ]

Figure 16: Animated GIF of using a Velvet file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Genomes: SOAPdenovo2

SOAPdenovo2 (Luo et al., 2012) is a short-read assembler designed to assemble Illumina GA short reads. SOAPdenovo aims to reduces memory consumption in graph construction resolving repeat regions in contig assembly, increasing coverage and length in scaffold construction and improving gap closing. See the SOAPdenovo manual at https://github.com/aquaskyline/SOAPdenovo2 for more information.

To run SOAPdenovo with DeNovoSeq go to:

           [ De novo assembly → Assembly → Genomes → SOAPdenovo ]

Figure 17: Animated GIF of using a SOAP De Novo file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Genomes: CANU

Canu (Koren et al., 2017) is an assembler of the Celera Assembler designed for high-noise single-molecule sequencing such as the PacBio RSII or Oxford Nanopore MinION. Canu is a hierarchical assembly pipeline, which runs in four steps:

See http://canu.readthedocs.io/en/latest/tutorial.html for more information.

To run Canu with DeNovoSeq go to:

           [ De novo assembly → Assembly → Genomes → CANU ]

Figure 18: Animated GIF of using a CANU file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Genomes: SPAdes

SPAdes (Nurk et al., 2013) is an assembler specifically recommended to reconstruct bacterial genomes (both single-cell MDA and standard isolates), fungal and other small genomes. SPAdes supports paired-end reads, mate-pairs and unpaired reads. See the SPAdes manual at http://cab.spbu.ru/software/spades/ for more information.

To run SPAdes with DeNovoSeq go to:

           [ De novo assembly → Assembly → Genomes → SPAdes ]

Basic Procedure

Figure 19: Animated GIF of using a SPAdes file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).


3.3 - GAP FILLING

- Gap filling: GapCloser

Due to low sequence coverage, repetitive elements assemblies reconstructed de Novo often show sequence and/or fragment “gaps” represented as uncharacterized nucleotide (N) stretches. Some of these gaps can be closed by re-processing latent information in the raw reads.GapCloser (Luo et al., 2012) closes gaps emerging during the scaffolding process by SOAPdenovo or other assembler using the abundant pair relationship of short reads. See the GapCloser manual at here for more information.

To run GapCloser with DeNovoSeq go to:

           [ De novo assembly → Gap filling → GapCloser ]

Figure 20: Animated GIF of using a GAPcloser file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

3.4 - SCAFFOLDING

The results delivered by a de novo assembly is a fragmented set of genomic sequences (contigs) that can be re-ordered, edited and joined using the paired-end information in larger sequence blocks called scaffolds. DeNovoSeq implements interfaces for two alternative scaffolders: BESST and OPERA.

- Scaffolding: BESST

BESST (Sahlin et al 2014) is a software package for scaffolding genomic assemblies. BESST includes several modules to build a “contig graph” from available assembly information, obtaining scaffolds from this graph and accurate gap size information. See the BESST manual at https://github.com/ksahlin/BESST for more information.

To run BESST with DeNovoSeq go to:

           [ De novo assembly → Scaffolding → BESST ]

Figure 21: Animated GIF of using a BESST file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).

- Scaffolding: OPERA-LG long reads

OPERA (Gao et al., 2011) iis a scaffolder based on an exact algorithm oriented to minimize the discordance of scaffolds with the information provided by the paired-end/mate-pair/long reads. OPERA uses information from paired-end/mate-pair/long reads to order and orient the intermediate contigs/scaffolds assembled in a genome assembly project. See the OPERA manual at https://sourceforge.net/projects/operasf/files/OPERA-LG%20version%202.0.6/ for more information.

To run OPERA with DeNovoSeq go to:

           [ Scaffolding → Opera → OPERA-LG long reads ]

Figure 22: Animated GIF of using a Opera-LG file application.

You will get a confirmation message that the job has been launched. Otherwise, check all options again. If an input field is invalid or missing you will get an error icon beside the field (hover the mouse over it to see the error message).









GPRO licensing and Usage           Former versions

Biotechvana


Valencia Lab
Parc Cientific Universitat de Valencia
Carrer del Catedràtic Agustín Escardino, 9. 46980 Paterna (Valencia) Spain
Madrid Lab
Parque Científico de Madrid
Campus de Cantoblanco
Calle Faraday 7, 28049 Madrid Spain
Contact us
Phone: +34 960 06 74 93
Email: biotechvana@biotechvana.com

Biotechvana © 2015
Privacy policy
Política de privacidad
This website use cookies, by continuing to browse the site you are agreeing to our use of cookies. More info about our cookies here.