Widely-used Radseq analysis software protocol by Julian Catchen.
There are 5 stages:
- Reads are demultiplexed and cleaned by process_radtags.
- Loci must be built with ustacks (if de-nov/no-reference) or pstacks (if with reference)
- Creating the catalog of loci cstacks
- Matching against catalog with sstacks.
- Either the populations or genotypes program is executed, depending on input datasets,
process_radtags -p ./raw/ -b ./barcodes/barcodes_lane3 -e sbfI -o ./samples/ -r -c -q
- -p, -b and -e are inputs, the directory with the fastq reads, the barcode file and the name of the restriction enzyme used.
- -r -c -q: these clean the data and correct barcodes and restriction enzyme cutsites.
- -o if for the directory in which to hold the output
populations -b 224 -P ./dupstacks/SRW224 -M ./popmap/popmap_trial -m 20 -r 0.75 -W ./dupstacks/SRW224/SRW224VarLoci.txt --write_random_snp --structure --vcf --genepop
- -b, when exporting the catalog, this is the batch ID.
- -P, an input path, containing the previous Stacks' command output.
- -M, the input population map.
- -W an input, a file with white-listed markers
- -m, -r these are settings, the minimum stack depth for individuals and the minimum percentage of individuals in a population, respectively, at a locus.
- --write_random_snp, at a certain locus with various snps will only analyse one randomly chosen one
- --structure --vcf --genepop, these three are for specifying output format, so there will be three in this case: structure-, vcf- and genepop-formatted outputs
setting up database on mysql
- Create database, you must have the appropriate privileges for this. If you do, the command will be something like:
echo "CREATE DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
- Ensure $STACKSROOTDIR/share/stacks/sql/my.cnf is set up properly
- Ensure modification privileges for the user
- Load stack.sql schema onto the database. It's in $STACKSROOTDIR/share/stacks/sql/stacks.sql, i.e.
mysql <mydbname> -h <name_of_server_running_mysql> -u <myusername> -p < $STACKSROOTDIR/share/stacks/sql/stacks.sql
- For whatever reason one can also start over by deleting a database. Care should be taken with a step like this, just in case one deletes the wrong database. The delete command's keyword is in fact "DROP" and woudl be run like this:
echo "DROP DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
Installation and compilation (admins only)
Stacks uses the usual configure && make && make install routine, but has several configure options. Primarily there is bam: this is not precisely bamtools, but rather samtools, though the directories say bam and not samtools ... the documentation isn't very clear. In any case, this is a model of a workable compile line.
./configure --prefix=/usr/local/Modules/modulefiles/tools/stacks/1.41 --enable-bam --enable-sparsehash --with-bam-include-path=/usr/local/Modules/modulefiles/ tools/sa 9640 mtools/0.1.19b/include --with-bam-lib-path=/usr/local/Modules/modulefiles/tools/samtools/0.1.19b/lib --with-sparsehash-include-path=/usr/local/Modules/modulefiles/tools/sparsehash/ gitv0_4cb9240/
The Stacks installation itself then needs a further two modifications which will make it depend on a particular running mysql server. It does seem to be the case that only one mysql server can be used for one Stacks installation, by the nature of these two modifications. The files to be modified are:
The settings in these files referring to the mysql server should be modified appropriately.
Installing stacks 2.0b7 and 2.0b9 sucked. This magic from Ramon worked:
./configure CXX=/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++ CPP='/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++ -E' --prefix=whatever