Difference between revisions of "Pyrad"

From wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 19: Line 19:
 
* one of "-p", "-d", "-D" and "-n" options are essential.
 
* one of "-p", "-d", "-D" and "-n" options are essential.
 
* "pyrad -n" generates a "params.txt" file which contains settings for the run. A large part of the analysis can be configured by editing this file.
 
* "pyrad -n" generates a "params.txt" file which contains settings for the run. A large part of the analysis can be configured by editing this file.
* If using a Gridengine job script, be sure to match the "-pe multi" value and number of CPUs in "params.txt" to have the same value.
+
* If using a Gridengine job script, there are two important parameters: no. 7 "N processors" and no. 37 "vsearch max threads". Multiply these together and supply the result to the "-pe multi" option in the gridengine jobscript. This is the parallel environment (i.e. total number of threads/cores that pyrad will use).
 
* The first stage is de-multiplexing your short read dataset using the barcodes, a processes which will generate new fastq file depending on the barcodes. This is done via:
 
* The first stage is de-multiplexing your short read dataset using the barcodes, a processes which will generate new fastq file depending on the barcodes. This is done via:
 
  pyrad -p params.txt -s 1
 
  pyrad -p params.txt -s 1
 
* The second stage editing the raw reads for quality, whereupon they will be converted to fasta. This is done via:
 
* The second stage editing the raw reads for quality, whereupon they will be converted to fasta. This is done via:
 
  pyrad -p params.txt -s 2
 
  pyrad -p params.txt -s 2
* The third stage is about de-replicaiting (unsure as to meaning) and clustering the short reads. This is done via:
+
* The third stage is about de-replicating (unsure as to meaning) and clustering the short reads. This is done via:
 
  pyrad -p params.txt -s 3
 
  pyrad -p params.txt -s 3
 +
* the pattern can be observed, that the "-s" option defines the stage, and the "params.txt" must always be referred to.
 +
 +
The rest of the tutorial describes stages 4 to 7 of the procedure.

Latest revision as of 17:39, 20 April 2016

Introduction

Software for RADseq

module load pyrad

will load the necessary software, mainly muscle and vsearch.

Guides

The go-to tutorial for this is at:

(note this tutorial assumes the executable is pyRAD, while in actuality, there are are only lower-case letter in the executable name).

highlights

  • with the pyrad module loaded, the "pyrad "executable is immediately available on the command line
  • one of "-p", "-d", "-D" and "-n" options are essential.
  • "pyrad -n" generates a "params.txt" file which contains settings for the run. A large part of the analysis can be configured by editing this file.
  • If using a Gridengine job script, there are two important parameters: no. 7 "N processors" and no. 37 "vsearch max threads". Multiply these together and supply the result to the "-pe multi" option in the gridengine jobscript. This is the parallel environment (i.e. total number of threads/cores that pyrad will use).
  • The first stage is de-multiplexing your short read dataset using the barcodes, a processes which will generate new fastq file depending on the barcodes. This is done via:
pyrad -p params.txt -s 1
  • The second stage editing the raw reads for quality, whereupon they will be converted to fasta. This is done via:
pyrad -p params.txt -s 2
  • The third stage is about de-replicating (unsure as to meaning) and clustering the short reads. This is done via:
pyrad -p params.txt -s 3
  • the pattern can be observed, that the "-s" option defines the stage, and the "params.txt" must always be referred to.

The rest of the tutorial describes stages 4 to 7 of the procedure.