Difference between revisions of "Poretools"

From wiki
Jump to: navigation, search
(Created page with "= Introduction Python tools for analysing fast5 files, which are output by the MinION sequencing system. = Usage = == Example == poretools fastq 5CG6210Y8Z_20160816_FNFA...")
 
Line 1: Line 1:
= Introduction  
+
= Introduction =
  
 
Python tools for analysing fast5 files, which are output by the MinION sequencing system.
 
Python tools for analysing fast5 files, which are output by the MinION sequencing system.
 +
 +
Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below.
  
 
= Usage =
 
= Usage =

Revision as of 12:54, 13 February 2017

Introduction

Python tools for analysing fast5 files, which are output by the MinION sequencing system.

Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below.

Usage

Example

poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5

Explanation:

  • poretools is the main tool command
  • fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
  • the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5

Subcommand listing

  • fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
  • fasta, converts fast5 to fasta format (featuring only the detected basecalls)
  • combine, actually just renders a tar file from a group of fast5 files.
  • yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
  • squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
  • winner, gives the longest read
  • stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
  • hist, histogram of read sizes
  • nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
  • qualdist, gives the quality distribution of a set of fast5 files
  • qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
  • tabular, gives the raw details of each read: its size, name, sequence and quality.
  • events, reads out the event information stored in the fast5 file.
  • times, time information in the fast5 file
  • occupancy, gives graph of pore performance based on information in a set of fast5 files.
  • index, tabulates all file location info and metadata
  • metadata, extracts metadata from a read.

Links

Examples of using this package can be found here