Latest revision as of 17:39, 20 April 2016

Introduction

Software for RADseq

module load pyrad

will load the necessary software, mainly muscle and vsearch.

The go-to tutorial for this is at:

(note this tutorial assumes the executable is pyRAD, while in actuality, there are are only lower-case letter in the executable name).

with the pyrad module loaded, the "pyrad "executable is immediately available on the command line
one of "-p", "-d", "-D" and "-n" options are essential.
"pyrad -n" generates a "params.txt" file which contains settings for the run. A large part of the analysis can be configured by editing this file.
If using a Gridengine job script, there are two important parameters: no. 7 "N processors" and no. 37 "vsearch max threads". Multiply these together and supply the result to the "-pe multi" option in the gridengine jobscript. This is the parallel environment (i.e. total number of threads/cores that pyrad will use).
The first stage is de-multiplexing your short read dataset using the barcodes, a processes which will generate new fastq file depending on the barcodes. This is done via:

pyrad -p params.txt -s 1

The second stage editing the raw reads for quality, whereupon they will be converted to fasta. This is done via:

pyrad -p params.txt -s 2

The third stage is about de-replicating (unsure as to meaning) and clustering the short reads. This is done via:

pyrad -p params.txt -s 3

the pattern can be observed, that the "-s" option defines the stage, and the "params.txt" must always be referred to.

The rest of the tutorial describes stages 4 to 7 of the procedure.

@@ Line 19: / Line 19: @@
 * one of "-p", "-d", "-D" and "-n" options are essential.
 * "pyrad -n" generates a "params.txt" file which contains settings for the run. A large part of the analysis can be configured by editing this file.
-* If using a Gridengine job script, be sure to match the "-pe multi" value and number of CPUs in "params.txt" to have the same value.
+* If using a Gridengine job script, there are two important parameters: no. 7 "N processors" and no. 37 "vsearch max threads". Multiply these together and supply the result to the "-pe multi" option in the gridengine jobscript. This is the parallel environment (i.e. total number of threads/cores that pyrad will use).
 * The first stage is de-multiplexing your short read dataset using the barcodes, a processes which will generate new fastq file depending on the barcodes. This is done via:
   pyrad -p params.txt -s 1
 * The second stage editing the raw reads for quality, whereupon they will be converted to fasta. This is done via:
   pyrad -p params.txt -s 2
-* The third stage is about de-replicaiting (unsure as to meaning) and clustering the short reads. This is done via:
+* The third stage is about de-replicating (unsure as to meaning) and clustering the short reads. This is done via:
   pyrad -p params.txt -s 3
+* the pattern can be observed, that the "-s" option defines the stage, and the "params.txt" must always be referred to.
+The rest of the tutorial describes stages 4 to 7 of the procedure.