Difference between revisions of "Canu"

From wiki
Jump to: navigation, search
Line 3: Line 3:
 
This is the de-novo genome assembler for long read technologies: mainly PacBio and Oxford Nanopore (MinION).
 
This is the de-novo genome assembler for long read technologies: mainly PacBio and Oxford Nanopore (MinION).
  
It comes from the Maryland Bioinformatics Laboratory, and is based on the Celera Assembler.
+
It comes from the Maryland Bioinformatics Laboratory, and is based on the Celera Assembler, whose code base is no longer maintained and was made open source in 2014.
  
 
= Example Usage =
 
= Example Usage =

Revision as of 14:37, 11 March 2017

Introduction

This is the de-novo genome assembler for long read technologies: mainly PacBio and Oxford Nanopore (MinION).

It comes from the Maryland Bioinformatics Laboratory, and is based on the Celera Assembler, whose code base is no longer maintained and was made open source in 2014.

Example Usage

The following use Nick Loman's Ecoli data file which can be obtained via:

curl -L -o oxford.fasta http://nanopore.s3.climb.ac.uk/MAP006-PCR-1_2D_pass.fasta

As you can see, this is a 2D data set. The downloaded file will be calle oxford.fasta.

The recommended way to run canu for this is:

canu -p ecoli -d ecoli-oxfordgenomeSize=4.8m -nanopore-raw oxford.fasta

Explanation:

  • -p, a prefix


Help File Output

usage: canu [-correct | -trim | -assemble | -trim-assemble] \
            [-s <assembly-specifications-file>] \
             -p <assembly-prefix> \
             -d <assembly-directory> \
             genomeSize=<number>[g|m|k] \
            [other-options] \
            [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq

  By default, all three stages (correct, trim, assemble) are computed.
  To compute only a single stage, use:
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the (created) -d <assembly-directory>, with most
  files named using the -p <assembly-prefix>.

  The genome size is your best guess of the genome size of what is being assembled.
  It is used mostly to compute coverage in reads.  Fractional values are allowed: '4.7m'
  is the same as '4700k' and '4700000'

  A full list of options can be printed with '-options'.  All options
  can be supplied in an optional sepc file.

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed
  with gz, bz2 or xz.  Reads are specified by the technology they were
  generated with:
    -pacbio-raw         <files>
    -pacbio-corrected   <files>
    -nanopore-raw       <files>
    -nanopore-corrected <files>

Complete documentation at http://canu.readthedocs.org/en/latest/