Latest revision as of 00:10, 24 March 2017

Introduction

Python tools for analysing fast5 files, which are output by the MinION sequencing system.

Exist as a python module rom which script can be made, but also as a "poretools" system executable which accepts the subcommands mentioned below.

Usage

Example

poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5

Explanation:

poretools is the main tool command
fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5

This command can also work on a directory of fast5 files. More importantly for minion, it can exclude reads under a certain length. A command for doing so is as follows:

poretools fastq --min-length 5000 test_data/

Subcommand listing

fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
fasta, converts fast5 to fasta format (featuring only the detected basecalls)
combine, actually just renders a tar file from a group of fast5 files.
yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
winner, gives the longest read
stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
hist, histogram of read sizes
nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
qualdist, gives the quality distribution of a set of fast5 files
qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
tabular, gives the raw details of each read: its size, name, sequence and quality.
events, reads out the event information stored in the fast5 file.
times, time information in the fast5 file
occupancy, gives graph of pore performance based on information in a set of fast5 files.
index, tabulates all file location info and metadata
metadata, extracts metadata from a read.

Links

Examples of using this package can be found here

@@ Line 15: / Line 15: @@
 * '''fastq''' is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
 * the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5
+This command can also work on a directory of fast5 files. More importantly for minion, it can exclude reads under a certain length. A command for doing so is as follows:
+ poretools fastq --min-length 5000 test_data/
 == Subcommand listing ==

Difference between revisions of "Poretools"

Latest revision as of 00:10, 24 March 2017

Contents

Introduction

Usage

Example

Subcommand listing

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools