Thor

From wiki
Revision as of 09:18, 6 June 2017 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

This is one of the newer Differential Peak Callers from Costalab in Aachen.

It has a low traffic google groups page, but a google groups page nonetheless at:

Notes:

  • Thor follows on the ODIN tool which is largely supersedes.
  • For normalization it can use the TMM method widelused in RNA-Seq (EdgeR) or a bed-format list of housekeeping genes.
  • There is no verbose options to see what might be going wrong, though the developers are working on it.
  • There is no parallelism in this tool, it can run quite slowly. For a big sample set resercve as much as two days.
  • Neither Thor nor Odin take accoutn of paired reads in the BAM file. Recommendation from devs is to filter discordant pairs from the BAM file.

Housekeeping genes

The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.

Usage

a first run

The first few runs are usually error-prone. Thor is no exception:

rgt-THOR first.config --report --no-correction --output-dir ./thorfirst -n THOR_DCfirst

Explanation:

  • first.config, this is a configuration file, see below
  • --output-dir, program will fail to run if this already exists.

The contents of the first.config is as follows:

#rep1
WTIP_S5_L001/WTIP_S5_L001_srtd.bam
WTIP_S5_L002/WTIP_S5_L002_srtd.bam
#rep2
USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam
USL1IP_S7_L002/USL1IP_S7_L002_srtd.bam
#genome
/storage/home/users/as363/w303_genome/w303.fa
#chrom_sizes
/storage/home/users/as363/w303_genome/w303_.sizes
#inputs1
WTINP_S4_L001/WTINP_S4_L001_srtd.bam
WTINP_S4_L002/WTINP_S4_L002_srtd.bam
#inputs2
ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam
ULS1INP_S6_L002/ULS1INP_S6_L002_srtd.bam
   

output from this run

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
[fai_load] build FASTA index.
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
Compute HMM's training set
No differential peaks detected

A second run, with four housekeeping genes

rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute HMM's training set
No differential peaks detected