= Introduction

Python tools for analysing fast5 files, which are output by the MinION sequencing system.

Usage

Example

poretools fastq 5CG6210Y8Z_20160816_FNFAB28012_MN15120_sequencing_run_GroupB_1D_Ecoli_tune_85746_ch39_read230_strand.fast5

Explanation:

poretools is the main tool command
fastq is the subcommand, and it mostly defines the output that the user requires. The input is expected to be a list of fast5 filenames or a directory
the rest of the command is actually just the fast5 filename, which is clearly very long, and is probably due to the high metadata capabilities of fast5

fastq, converts fast5 to fastq format (the usual short read format with basecall quality values)
fasta, converts fast5 to fasta format (featuring only the detected basecalls)
combine, actually just renders a tar file from a group of fast5 files.
yield_plot, number of base pairs read over time (clearly important for the life of the flowcell).
squiggle, graphs the signal recorded by the pore as the DNA passed through it. This is all held by the fast5 file.
winner, gives the longest read
stats, statistics on the number of bases with respect to the reads, including size of the "winner" mentioned above
hist, histogram of read sizes
nucdist, will give nucleotide composition (%ATCGN) of a set of fast5 files
qualdist, gives the quality distribution of a set of fast5 files
qualpos, similar to the FASTQC packages gives a box-whisker plot of the quality seen over the positions of the bases.
tabular, gives the raw details of each read: its size, name, sequence and quality.
events, reads out the event information stored in the fast5 file.
times, time information in the fast5 file
occupancy, gives graph of pore performance based on information in a set of fast5 files.
index, tabulates all file location info and metadata
metadata, extracts metadata from a read.

Examples of using this package can be found here