Difference between revisions of "Thor"

From wiki
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 135: Line 135:
  
 
  rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec
 
  rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec
 +
 +
Conifg file didn't change on this one. Out is as follows:
  
 
  Call DPs on whole genome.
 
  Call DPs on whole genome.
Line 190: Line 192:
 
  Compute HMM's training set
 
  Compute HMM's training set
 
  No differential peaks detected
 
  No differential peaks detected
 +
 +
== A third run giving errors ==
 +
 +
With the fastq's concatenated and following note from Ivan Costa that TMM normalize actually should work, the following was tried:
 +
 +
$ rgt-THOR one.config --report --no-correction --output-dir ./thorsec -n THOR_DCsec
 +
Call DPs on whole genome.
 +
Computing read extension sizes for ChIP-seq profiles
 +
Compute GC-content
 +
[fai_load] build FASTA index.
 +
Compute factors
 +
Normalize input of Signal 0, Rep 0 with factor 0.844
 +
Normalize input of Signal 1, Rep 0 with factor 0.806
 +
Use global TMM approach
 +
Traceback (most recent call last):
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module>
 +
    load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')()
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
 +
    chrom_sizes, dims, inputs, tracker)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM
 +
    report=options.report, poisson=options.poisson)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 197, in _fit_mean_var_distr
 +
    _plot_func(plot_data, outputdir)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 108, in _plot_func
 +
    maxs.append(max(tmp[tmp < np.percentile(tmp, 90)]))
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile
 +
    interpolation=interpolation)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce
 +
    r = func(a, **kwargs)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile
 +
    x1 = take(ap, indices_below, axis=axis) * weights_below
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take
 +
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
 +
    return getattr(obj, method)(*args, **kwds)
 +
IndexError: cannot do a non-empty take from an empty axes.
 +
 +
This may have had something to do with a numpy update that was done between the second and third runs.
 +
 +
I have tried further and this is what I get now
 +
 +
[ramon@marvin SeqAH_Thor2]$ rgt-THOR one.config --report --no-correction --output-dir ./thorfirst -n THOR_DCone
 +
Call DPs on whole genome.
 +
Computing read extension sizes for ChIP-seq profiles
 +
Compute GC-content
 +
[fai_load] build FASTA index.
 +
Compute factors
 +
Normalize input of Signal 0, Rep 0 with factor 0.918
 +
Normalize input of Signal 1, Rep 0 with factor 0.85
 +
Use global TMM approach
 +
TMM normalization not successfully performed, do not normalize data
 +
TMM normalization not successfully performed, do not normalize data
 +
Compute GC-content
 +
Compute factors
 +
Normalize input of Signal 0, Rep 0 with factor 0.918
 +
Normalize input of Signal 1, Rep 0 with factor 0.85
 +
Use global TMM approach
 +
TMM normalization not successfully performed, do not normalize data
 +
Traceback (most recent call last):
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module>
 +
    load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')()
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
 +
    chrom_sizes, dims, inputs, tracker)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM
 +
    report=options.report, poisson=options.poisson)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 172, in _fit_mean_var_distr
 +
    data_rep = _get_data_rep(overall_coverage, name, debug, sample_size)
 +
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 143, in _get_data_rep
 +
    r = np.random.randint(cov.shape[1], size=sample_size)
 +
  File "mtrand.pyx", line 977, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:16117)
 +
ValueError: low >= high
 +
  
 
= Links =
 
= Links =
 
* [http://www.regulatory-genomics.org/thor-2/basic-intrstruction Thor's home page]
 
* [http://www.regulatory-genomics.org/thor-2/basic-intrstruction Thor's home page]

Latest revision as of 21:47, 28 June 2017

Introduction

This is one of the newer Differential Peak Callers from Costalab in Aachen.

It has a low traffic google groups page, but a google groups page nonetheless at:

Notes:

  • Thor follows on the ODIN tool which is largely supersedes.
  • For normalization it can use the TMM method widelused in RNA-Seq (EdgeR) or a bed-format list of housekeeping genes.
  • There is no verbose options to see what might be going wrong, though the developers are working on it.
  • There is no parallelism in this tool, it can run quite slowly. For a big sample set resercve as much as two days.
  • Neither Thor nor Odin take account of paired reads in the BAM file. Recommendation from devs is to filter discordant pairs from the BAM file.

Housekeeping genes

The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.


Test Data Usage

Thor has two tests, a basic one using an B-cell lymphoma study and another advanced one.

Basic run: B-cell lymphoma

As usual, this ia an undemanding dataset, but is useful for ascertaining where Thir works properly. The test data is available as a tar-file download and the launch of the program is the easiest possible:

rgt-THOR THOR.config

The config file contains the following:

#rep1
FL5_H3K27ac.100k.bam
FL8_H3K27ac.100k.bam
#rep2
CC4_H3K27ac.100k.bam
CC5_H3K27ac.100k.bam
#chrom_sizes
hg19.chrom.sizes

The output looks like this:

Warning: Do not compute GC-content, as there is no input file
Warning: Do not compute GC-content, as there is no genome file
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Use global TMM approach 
Compute HMM's training set
Train HMM
Compute HMM's posterior probabilities and Viterbi path to call differential peaks
- taking into account chr1
- taking into account chr10
- taking into account chr11
- taking into account chr12
- taking into account chr13
- taking into account chr14
- taking into account chr15
- taking into account chr16
- taking into account chr17
- taking into account chr18
- taking into account chr19
- taking into account chr2
- taking into account chr20
- taking into account chr21
- taking into account chr22
- taking into account chr3
- taking into account chr4
- taking into account chr5
- taking into account chr6
- taking into account chr7
- taking into account chr8
- taking into account chr9
- taking into account chrM
- taking into account chrX
- taking into account chrY

Real Data Usage

a first run

The first few runs are usually error-prone. Thor is no exception:

rgt-THOR first.config --report --no-correction --output-dir ./thorfirst -n THOR_DCfirst

Explanation:

  • first.config, this is a configuration file, see below
  • --output-dir, program will fail to run if this already exists.

The contents of the first.config is as follows:

#rep1
WTIP_S5_L001/WTIP_S5_L001_srtd.bam
WTIP_S5_L002/WTIP_S5_L002_srtd.bam
#rep2
USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam
USL1IP_S7_L002/USL1IP_S7_L002_srtd.bam
#genome
/storage/home/users/as363/w303_genome/w303.fa
#chrom_sizes
/storage/home/users/as363/w303_genome/w303_.sizes
#inputs1
WTINP_S4_L001/WTINP_S4_L001_srtd.bam
WTINP_S4_L002/WTINP_S4_L002_srtd.bam
#inputs2
ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam
ULS1INP_S6_L002/ULS1INP_S6_L002_srtd.bam
   

output from this run

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
[fai_load] build FASTA index.
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use global TMM approach 
Compute HMM's training set
No differential peaks detected

A second run, with four housekeeping genes

rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec

Conifg file didn't change on this one. Out is as follows:

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.644
Normalize input of Signal 0, Rep 1 with factor 0.645
Normalize input of Signal 1, Rep 0 with factor 0.647
Normalize input of Signal 1, Rep 1 with factor 0.647
Use housekeeping gene approach
-Housekeeping gene matrix (columns-genes, rows-samples)
[[ 5655.  3364.  4323.  4224.]
 [ 5569.  2860.  4550.  4301.]
 [ 6860.  3415.  4752.  4436.]
 [ 6176.  3574.  5364.  4361.]]

-gene (column) wise evaluation
cdc19 0.00021061043602
act1 0.000309042887452
tdh3 0.000258788890782
fba1 0.000205695296173

-sample (row) wise evaluation
WTIP_S5_L001_srtd 0.000177121339107
WTIP_S5_L002_srtd 0.000458559742218
USL1IP_S7_L001_srtd 0.000271335438277
USL1IP_S7_L002_srtd 0.000422582780881

Compute HMM's training set
No differential peaks detected

A third run giving errors

With the fastq's concatenated and following note from Ivan Costa that TMM normalize actually should work, the following was tried:

$ rgt-THOR one.config --report --no-correction --output-dir ./thorsec -n THOR_DCsec
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
[fai_load] build FASTA index.
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.844
Normalize input of Signal 1, Rep 0 with factor 0.806
Use global TMM approach 
Traceback (most recent call last):
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module>
    load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')()
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
    chrom_sizes, dims, inputs, tracker)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM
    report=options.report, poisson=options.poisson)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 197, in _fit_mean_var_distr
    _plot_func(plot_data, outputdir)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 108, in _plot_func
    maxs.append(max(tmp[tmp < np.percentile(tmp, 90)]))
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile
    interpolation=interpolation)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce
    r = func(a, **kwargs)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

This may have had something to do with a numpy update that was done between the second and third runs.

I have tried further and this is what I get now

[ramon@marvin SeqAH_Thor2]$ rgt-THOR one.config --report --no-correction --output-dir ./thorfirst -n THOR_DCone
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
[fai_load] build FASTA index.
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.918
Normalize input of Signal 1, Rep 0 with factor 0.85
Use global TMM approach
TMM normalization not successfully performed, do not normalize data
TMM normalization not successfully performed, do not normalize data
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.918
Normalize input of Signal 1, Rep 0 with factor 0.85
Use global TMM approach
TMM normalization not successfully performed, do not normalize data
Traceback (most recent call last):
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module>
    load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')()
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
    chrom_sizes, dims, inputs, tracker)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM
    report=options.report, poisson=options.poisson)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 172, in _fit_mean_var_distr
    data_rep = _get_data_rep(overall_coverage, name, debug, sample_size)
  File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 143, in _get_data_rep
    r = np.random.randint(cov.shape[1], size=sample_size)
  File "mtrand.pyx", line 977, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:16117)
ValueError: low >= high


Links