Difference between revisions of "Poretools"
Line 15: | Line 15: | ||
* '''fastq''' is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory | * '''fastq''' is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory | ||
* the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5 | * the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5 | ||
+ | |||
+ | This command can also work on a directory of fast5 files. More importantly for minion, it can exclude reads under a certain length. A command for doing so is as follows: | ||
+ | |||
+ | poretools fastq --min-length 5000 test_data/ | ||
== Subcommand listing == | == Subcommand listing == |
Latest revision as of 23:10, 23 March 2017
Introduction
Python tools for analysing fast5 files, which are output by the MinION sequencing system.
Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below.
Usage
Example
poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5
Explanation:
- poretools is the main tool command
- fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
- the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5
This command can also work on a directory of fast5 files. More importantly for minion, it can exclude reads under a certain length. A command for doing so is as follows:
poretools fastq --min-length 5000 test_data/
Subcommand listing
- fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
- fasta, converts fast5 to fasta format (featuring only the detected basecalls)
- combine, actually just renders a tar file from a group of fast5 files.
- yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
- squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
- winner, gives the longest read
- stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
- hist, histogram of read sizes
- nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
- qualdist, gives the quality distribution of a set of fast5 files
- qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
- tabular, gives the raw details of each read: its size, name, sequence and quality.
- events, reads out the event information stored in the fast5 file.
- times, time information in the fast5 file
- occupancy, gives graph of pore performance based on information in a set of fast5 files.
- index, tabulates all file location info and metadata
- metadata, extracts metadata from a read.
Links
Examples of using this package can be found here