Poretools

From wiki
Revision as of 17:34, 9 February 2017 by Rf (talk | contribs) (Created page with "= Introduction Python tools for analysing fast5 files, which are output by the MinION sequencing system. = Usage = == Example == poretools fastq 5CG6210Y8Z_20160816_FNFA...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

= Introduction

Python tools for analysing fast5 files, which are output by the MinION sequencing system.

Usage

Example

poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5

Explanation:

  • poretools is the main tool command
  • fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
  • the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5

Subcommand listing

  • fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
  • fasta, converts fast5 to fasta format (featuring only the detected basecalls)
  • combine, actually just renders a tar file from a group of fast5 files.
  • yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
  • squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
  • winner, gives the longest read
  • stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
  • hist, histogram of read sizes
  • nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
  • qualdist, gives the quality distribution of a set of fast5 files
  • qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
  • tabular, gives the raw details of each read: its size, name, sequence and quality.
  • events, reads out the event information stored in the fast5 file.
  • times, time information in the fast5 file
  • occupancy, gives graph of pore performance based on information in a set of fast5 files.
  • index, tabulates all file location info and metadata
  • metadata, extracts metadata from a read.

Links

Examples of using this package can be found here