Difference between revisions of "Stacks"

From wiki
Jump to: navigation, search
(Installation and compilation (admins only))
 
(14 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
Widely-used Radseq analysis software protocol by Julian Catchen.
 
Widely-used Radseq analysis software protocol by Julian Catchen.
  
=Steps=
+
=Stages=
  
 +
== Overall ==
 +
 +
There are 5 stages:
 +
 +
* Reads are demultiplexed and cleaned by '''process_radtags'''.
 +
* Loci must be built with '''ustacks''' (if de-nov/no-reference) or '''pstacks''' (if with reference)
 +
* Creating the catalog of loci '''cstacks'''
 +
* Matching against catalog with '''sstacks'''.
 +
* Either the '''populations''' or '''genotypes''' program is executed, depending on input datasets,
 +
 +
== Typical workflow ==
 +
 +
process_radtags -p ./raw/ -b ./barcodes/barcodes_lane3 -e sbfI -o ./samples/ -r -c -q
 +
 +
<ins>Explanation:</ins>
 +
* -p, -b and -e are inputs, the directory with the fastq reads, the barcode file and the name of the restriction enzyme used.
 +
* -r -c -q: these clean the data and correct barcodes and restriction enzyme cutsites.
 +
* -o if for the directory in which to hold the output
 +
 +
=Example command-lines=
 +
 +
populations -b 224 -P ./dupstacks/SRW224 -M ./popmap/popmap_trial -m 20 -r 0.75 -W ./dupstacks/SRW224/SRW224VarLoci.txt --write_random_snp --structure --vcf --genepop
 +
 +
<ins>Explanation:</ins>
 +
 +
* -b, when exporting the catalog, this is the batch ID.
 +
* -P, an input path, containing the previous Stacks' command output.
 +
* -M, the input population map.
 +
* -W an input, a file with white-listed markers
 +
* -m, -r these are settings, the minimum stack depth for individuals and the minimum percentage of individuals in a population, respectively, at a locus.
 +
* --write_random_snp, at a certain locus with various snps will only analyse one randomly chosen one
 +
* --structure --vcf --genepop, these three are for specifying output format, so there will be three in this case: structure-, vcf- and genepop-formatted outputs
 +
 +
= setting up database on mysql =
 
* Create database, you must have the appropriate privileges for this. If you do, the command will be something like:
 
* Create database, you must have the appropriate privileges for this. If you do, the command will be something like:
 
  echo "CREATE DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
 
  echo "CREATE DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
Line 15: Line 49:
 
* For whatever reason one can also start over by deleting a database. Care should be taken with a step like this, just in case one deletes the wrong database. The delete command's keyword is in fact "DROP" and woudl be run like this:
 
* For whatever reason one can also start over by deleting a database. Care should be taken with a step like this, just in case one deletes the wrong database. The delete command's keyword is in fact "DROP" and woudl be run like this:
 
  echo "DROP DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
 
  echo "DROP DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
 +
 +
= Installation and compilation (admins only) =
 +
 +
Stacks uses the usual '''configure && make && make install''' routine, but has several configure options. Primarily there is bam: this is not precisely bamtools, but rather samtools, though the directories say bam and not samtools ... the documentation isn't very clear. In any case, this is a model of a workable compile line.
 +
 +
./configure --prefix=/usr/local/Modules/modulefiles/tools/stacks/1.41 --enable-bam --enable-sparsehash --with-bam-include-path=/usr/local/Modules/modulefiles/        tools/sa
 +
9640 mtools/0.1.19b/include --with-bam-lib-path=/usr/local/Modules/modulefiles/tools/samtools/0.1.19b/lib --with-sparsehash-include-path=/usr/local/Modules/modulefiles/tools/sparsehash/              gitv0_4cb9240/
 +
 +
The Stacks installation itself then needs a further two modifications which will make it depend on a particular running mysql server. It does seem to be the case that only one mysql server can be used for one Stacks installation, by the nature of these two modifications. The files to be modified are:
 +
* <stack_root_dir>/share/stacks/php/constants.php
 +
* <stack_root_dir>/share/stacks/sql/mysql.cnf
 +
 +
The settings in these files referring to the mysql server should be modified appropriately.
 +
 +
Installing stacks 2.0b7 and 2.0b9 sucked. This magic from Ramon worked:
 +
  ./configure CXX=/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++  CPP='/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++ -E' --prefix=whatever

Latest revision as of 10:03, 4 April 2018

Widely-used Radseq analysis software protocol by Julian Catchen.

Stages

Overall

There are 5 stages:

  • Reads are demultiplexed and cleaned by process_radtags.
  • Loci must be built with ustacks (if de-nov/no-reference) or pstacks (if with reference)
  • Creating the catalog of loci cstacks
  • Matching against catalog with sstacks.
  • Either the populations or genotypes program is executed, depending on input datasets,

Typical workflow

process_radtags -p ./raw/ -b ./barcodes/barcodes_lane3 -e sbfI -o ./samples/ -r -c -q

Explanation:

  • -p, -b and -e are inputs, the directory with the fastq reads, the barcode file and the name of the restriction enzyme used.
  • -r -c -q: these clean the data and correct barcodes and restriction enzyme cutsites.
  • -o if for the directory in which to hold the output

Example command-lines

populations -b 224 -P ./dupstacks/SRW224 -M ./popmap/popmap_trial -m 20 -r 0.75 -W ./dupstacks/SRW224/SRW224VarLoci.txt --write_random_snp --structure --vcf --genepop

Explanation:

  • -b, when exporting the catalog, this is the batch ID.
  • -P, an input path, containing the previous Stacks' command output.
  • -M, the input population map.
  • -W an input, a file with white-listed markers
  • -m, -r these are settings, the minimum stack depth for individuals and the minimum percentage of individuals in a population, respectively, at a locus.
  • --write_random_snp, at a certain locus with various snps will only analyse one randomly chosen one
  • --structure --vcf --genepop, these three are for specifying output format, so there will be three in this case: structure-, vcf- and genepop-formatted outputs

setting up database on mysql

  • Create database, you must have the appropriate privileges for this. If you do, the command will be something like:
echo "CREATE DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p
  • Ensure $STACKSROOTDIR/share/stacks/sql/my.cnf is set up properly
  • Ensure modification privileges for the user
  • Load stack.sql schema onto the database. It's in $STACKSROOTDIR/share/stacks/sql/stacks.sql, i.e.
mysql <mydbname> -h <name_of_server_running_mysql> -u <myusername> -p < $STACKSROOTDIR/share/stacks/sql/stacks.sql
  • For whatever reason one can also start over by deleting a database. Care should be taken with a step like this, just in case one deletes the wrong database. The delete command's keyword is in fact "DROP" and woudl be run like this:
echo "DROP DATABASE <mynewdbname>" | mysql -h <name_of_server_running_mysql> -u <myusername> -p

Installation and compilation (admins only)

Stacks uses the usual configure && make && make install routine, but has several configure options. Primarily there is bam: this is not precisely bamtools, but rather samtools, though the directories say bam and not samtools ... the documentation isn't very clear. In any case, this is a model of a workable compile line.

./configure --prefix=/usr/local/Modules/modulefiles/tools/stacks/1.41 --enable-bam --enable-sparsehash --with-bam-include-path=/usr/local/Modules/modulefiles/        tools/sa
9640 mtools/0.1.19b/include --with-bam-lib-path=/usr/local/Modules/modulefiles/tools/samtools/0.1.19b/lib --with-sparsehash-include-path=/usr/local/Modules/modulefiles/tools/sparsehash/              gitv0_4cb9240/

The Stacks installation itself then needs a further two modifications which will make it depend on a particular running mysql server. It does seem to be the case that only one mysql server can be used for one Stacks installation, by the nature of these two modifications. The files to be modified are:

  • <stack_root_dir>/share/stacks/php/constants.php
  • <stack_root_dir>/share/stacks/sql/mysql.cnf

The settings in these files referring to the mysql server should be modified appropriately.

Installing stacks 2.0b7 and 2.0b9 sucked. This magic from Ramon worked:

 ./configure CXX=/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++  CPP='/usr/local/Modules/modulefiles/tools/gcc/4.9.3/bin/g++ -E' --prefix=whatever