Mash

From wiki
Revision as of 11:21, 8 March 2017 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

MinHash is a general dimensionality-reduction technique and it is used by Mash to reduce large sequences and sequence sets to small, representative sketches with the result that global mutation distances (Mash distances) can be rapidly estimated.

Other aspects

  • terms itself as an alignment-free method

Usage

Typical analysis

Mash is run on genomes. These will usually be de-novo assembled genomes from tools such as Velvet or SPAdes.

Parallel Usage on gridengine

We'll go through a process here of running Mash on a set of samples, using the DRMAA library to launch Gridengine job arrays.

The scripts will take as argument a file listing of the sample names, and it is assumed there are two pair-ended FASTQ reads per sample. It is also assumed that the paired-ended samples appeared in ordered fashoin in the file-listing: i.e. each consecutive set of two lines represent one sample.

De novo assembly

We use SPAdes with the --meta option here as we are dealign with metagenomes.


Links