Difference between revisions of "Thor"

From wiki
Jump to: navigation, search
Line 16: Line 16:
 
The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.
 
The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.
  
= Usage =
+
 
 +
= Test Data Usage =
 +
 
 +
Thor has two tests, a basic one using an B-cell lymphoma study and another advanced one.
 +
 
 +
== Basic run: B-cell lymphoma ==
 +
 
 +
As usual, this ia an undemanding dataset, but is useful for ascertaining where Thir works properly. The test data is available as a tar-file download
 +
and the launch of the program is the easiest possible:
 +
 
 +
rgt-THOR THOR.config
 +
 
 +
The config file contains the following:
 +
 
 +
#rep1
 +
FL5_H3K27ac.100k.bam
 +
FL8_H3K27ac.100k.bam
 +
#rep2
 +
CC4_H3K27ac.100k.bam
 +
CC5_H3K27ac.100k.bam
 +
#chrom_sizes
 +
hg19.chrom.sizes
 +
 
 +
The output looks like this:
 +
 
 +
Warning: Do not compute GC-content, as there is no input file
 +
Warning: Do not compute GC-content, as there is no genome file
 +
Call DPs on whole genome.
 +
Computing read extension sizes for ChIP-seq profiles
 +
Use global TMM approach
 +
Compute HMM's training set
 +
Train HMM
 +
Compute HMM's posterior probabilities and Viterbi path to call differential peaks
 +
- taking into account chr1
 +
- taking into account chr10
 +
- taking into account chr11
 +
- taking into account chr12
 +
- taking into account chr13
 +
- taking into account chr14
 +
- taking into account chr15
 +
- taking into account chr16
 +
- taking into account chr17
 +
- taking into account chr18
 +
- taking into account chr19
 +
- taking into account chr2
 +
- taking into account chr20
 +
- taking into account chr21
 +
- taking into account chr22
 +
- taking into account chr3
 +
- taking into account chr4
 +
- taking into account chr5
 +
- taking into account chr6
 +
- taking into account chr7
 +
- taking into account chr8
 +
- taking into account chr9
 +
- taking into account chrM
 +
- taking into account chrX
 +
- taking into account chrY
 +
 
 +
= Real Data Usage =
  
 
== a first run ==
 
== a first run ==

Revision as of 09:59, 6 June 2017

Introduction

This is one of the newer Differential Peak Callers from Costalab in Aachen.

It has a low traffic google groups page, but a google groups page nonetheless at:

Notes:

  • Thor follows on the ODIN tool which is largely supersedes.
  • For normalization it can use the TMM method widelused in RNA-Seq (EdgeR) or a bed-format list of housekeeping genes.
  • There is no verbose options to see what might be going wrong, though the developers are working on it.
  • There is no parallelism in this tool, it can run quite slowly. For a big sample set resercve as much as two days.
  • Neither Thor nor Odin take account of paired reads in the BAM file. Recommendation from devs is to filter discordant pairs from the BAM file.

Housekeeping genes

The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.


Test Data Usage

Thor has two tests, a basic one using an B-cell lymphoma study and another advanced one.

Basic run: B-cell lymphoma

As usual, this ia an undemanding dataset, but is useful for ascertaining where Thir works properly. The test data is available as a tar-file download and the launch of the program is the easiest possible:

rgt-THOR THOR.config

The config file contains the following:

#rep1
FL5_H3K27ac.100k.bam
FL8_H3K27ac.100k.bam
#rep2
CC4_H3K27ac.100k.bam
CC5_H3K27ac.100k.bam
#chrom_sizes
hg19.chrom.sizes

The output looks like this:

Warning: Do not compute GC-content, as there is no input file
Warning: Do not compute GC-content, as there is no genome file
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Use global TMM approach 
Compute HMM's training set
Train HMM
Compute HMM's posterior probabilities and Viterbi path to call differential peaks
- taking into account chr1
- taking into account chr10
- taking into account chr11
- taking into account chr12
- taking into account chr13
- taking into account chr14
- taking into account chr15
- taking into account chr16
- taking into account chr17
- taking into account chr18
- taking into account chr19
- taking into account chr2
- taking into account chr20
- taking into account chr21
- taking into account chr22
- taking into account chr3
- taking into account chr4
- taking into account chr5
- taking into account chr6
- taking into account chr7
- taking into account chr8
- taking into account chr9
- taking into account chrM
- taking into account chrX
- taking into account chrY

Real Data Usage

a first run

The first few runs are usually error-prone. Thor is no exception:

rgt-THOR first.config --report --no-correction --output-dir ./thorfirst -n THOR_DCfirst

Explanation:

  • first.config, this is a configuration file, see below
  • --output-dir, program will fail to run if this already exists.

The contents of the first.config is as follows:

#rep1
WTIP_S5_L001/WTIP_S5_L001_srtd.bam
WTIP_S5_L002/WTIP_S5_L002_srtd.bam
#rep2
USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam
USL1IP_S7_L002/USL1IP_S7_L002_srtd.bam
#genome
/storage/home/users/as363/w303_genome/w303.fa
#chrom_sizes
/storage/home/users/as363/w303_genome/w303_.sizes
#inputs1
WTINP_S4_L001/WTINP_S4_L001_srtd.bam
WTINP_S4_L002/WTINP_S4_L002_srtd.bam
#inputs2
ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam
ULS1INP_S6_L002/ULS1INP_S6_L002_srtd.bam
   

output from this run

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
[fai_load] build FASTA index.
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
Compute HMM's training set
No differential peaks detected

A second run, with four housekeeping genes

rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute HMM's training set
No differential peaks detected

Links