Difference between revisions of "Poretools"
(Created page with "= Introduction Python tools for analysing fast5 files, which are output by the MinION sequencing system. = Usage = == Example == poretools fastq 5CG6210Y8Z_20160816_FNFA...") |
|||
Line 1: | Line 1: | ||
− | = Introduction | + | = Introduction = |
Python tools for analysing fast5 files, which are output by the MinION sequencing system. | Python tools for analysing fast5 files, which are output by the MinION sequencing system. | ||
+ | |||
+ | Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below. | ||
= Usage = | = Usage = |
Revision as of 11:54, 13 February 2017
Introduction
Python tools for analysing fast5 files, which are output by the MinION sequencing system.
Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below.
Usage
Example
poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5
Explanation:
- poretools is the main tool command
- fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
- the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5
Subcommand listing
- fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
- fasta, converts fast5 to fasta format (featuring only the detected basecalls)
- combine, actually just renders a tar file from a group of fast5 files.
- yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
- squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
- winner, gives the longest read
- stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
- hist, histogram of read sizes
- nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
- qualdist, gives the quality distribution of a set of fast5 files
- qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
- tabular, gives the raw details of each read: its size, name, sequence and quality.
- events, reads out the event information stored in the fast5 file.
- times, time information in the fast5 file
- occupancy, gives graph of pore performance based on information in a set of fast5 files.
- index, tabulates all file location info and metadata
- metadata, extracts metadata from a read.
Links
Examples of using this package can be found here