Difference between revisions of "Thor"
(3 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't. | The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't. | ||
− | = Usage = | + | |
+ | = Test Data Usage = | ||
+ | |||
+ | Thor has two tests, a basic one using an B-cell lymphoma study and another advanced one. | ||
+ | |||
+ | == Basic run: B-cell lymphoma == | ||
+ | |||
+ | As usual, this ia an undemanding dataset, but is useful for ascertaining where Thir works properly. The test data is available as a tar-file download | ||
+ | and the launch of the program is the easiest possible: | ||
+ | |||
+ | rgt-THOR THOR.config | ||
+ | |||
+ | The config file contains the following: | ||
+ | |||
+ | #rep1 | ||
+ | FL5_H3K27ac.100k.bam | ||
+ | FL8_H3K27ac.100k.bam | ||
+ | #rep2 | ||
+ | CC4_H3K27ac.100k.bam | ||
+ | CC5_H3K27ac.100k.bam | ||
+ | #chrom_sizes | ||
+ | hg19.chrom.sizes | ||
+ | |||
+ | The output looks like this: | ||
+ | |||
+ | Warning: Do not compute GC-content, as there is no input file | ||
+ | Warning: Do not compute GC-content, as there is no genome file | ||
+ | Call DPs on whole genome. | ||
+ | Computing read extension sizes for ChIP-seq profiles | ||
+ | Use global TMM approach | ||
+ | Compute HMM's training set | ||
+ | Train HMM | ||
+ | Compute HMM's posterior probabilities and Viterbi path to call differential peaks | ||
+ | - taking into account chr1 | ||
+ | - taking into account chr10 | ||
+ | - taking into account chr11 | ||
+ | - taking into account chr12 | ||
+ | - taking into account chr13 | ||
+ | - taking into account chr14 | ||
+ | - taking into account chr15 | ||
+ | - taking into account chr16 | ||
+ | - taking into account chr17 | ||
+ | - taking into account chr18 | ||
+ | - taking into account chr19 | ||
+ | - taking into account chr2 | ||
+ | - taking into account chr20 | ||
+ | - taking into account chr21 | ||
+ | - taking into account chr22 | ||
+ | - taking into account chr3 | ||
+ | - taking into account chr4 | ||
+ | - taking into account chr5 | ||
+ | - taking into account chr6 | ||
+ | - taking into account chr7 | ||
+ | - taking into account chr8 | ||
+ | - taking into account chr9 | ||
+ | - taking into account chrM | ||
+ | - taking into account chrX | ||
+ | - taking into account chrY | ||
+ | |||
+ | = Real Data Usage = | ||
== a first run == | == a first run == | ||
Line 76: | Line 135: | ||
rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec | rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec | ||
+ | |||
+ | Conifg file didn't change on this one. Out is as follows: | ||
Call DPs on whole genome. | Call DPs on whole genome. | ||
Line 131: | Line 192: | ||
Compute HMM's training set | Compute HMM's training set | ||
No differential peaks detected | No differential peaks detected | ||
+ | |||
+ | == A third run giving errors == | ||
+ | |||
+ | With the fastq's concatenated and following note from Ivan Costa that TMM normalize actually should work, the following was tried: | ||
+ | |||
+ | $ rgt-THOR one.config --report --no-correction --output-dir ./thorsec -n THOR_DCsec | ||
+ | Call DPs on whole genome. | ||
+ | Computing read extension sizes for ChIP-seq profiles | ||
+ | Compute GC-content | ||
+ | [fai_load] build FASTA index. | ||
+ | Compute factors | ||
+ | Normalize input of Signal 0, Rep 0 with factor 0.844 | ||
+ | Normalize input of Signal 1, Rep 0 with factor 0.806 | ||
+ | Use global TMM approach | ||
+ | Traceback (most recent call last): | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module> | ||
+ | load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')() | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main | ||
+ | chrom_sizes, dims, inputs, tracker) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM | ||
+ | report=options.report, poisson=options.poisson) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 197, in _fit_mean_var_distr | ||
+ | _plot_func(plot_data, outputdir) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 108, in _plot_func | ||
+ | maxs.append(max(tmp[tmp < np.percentile(tmp, 90)])) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile | ||
+ | interpolation=interpolation) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce | ||
+ | r = func(a, **kwargs) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile | ||
+ | x1 = take(ap, indices_below, axis=axis) * weights_below | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take | ||
+ | return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc | ||
+ | return getattr(obj, method)(*args, **kwds) | ||
+ | IndexError: cannot do a non-empty take from an empty axes. | ||
+ | |||
+ | This may have had something to do with a numpy update that was done between the second and third runs. | ||
+ | |||
+ | I have tried further and this is what I get now | ||
+ | |||
+ | [ramon@marvin SeqAH_Thor2]$ rgt-THOR one.config --report --no-correction --output-dir ./thorfirst -n THOR_DCone | ||
+ | Call DPs on whole genome. | ||
+ | Computing read extension sizes for ChIP-seq profiles | ||
+ | Compute GC-content | ||
+ | [fai_load] build FASTA index. | ||
+ | Compute factors | ||
+ | Normalize input of Signal 0, Rep 0 with factor 0.918 | ||
+ | Normalize input of Signal 1, Rep 0 with factor 0.85 | ||
+ | Use global TMM approach | ||
+ | TMM normalization not successfully performed, do not normalize data | ||
+ | TMM normalization not successfully performed, do not normalize data | ||
+ | Compute GC-content | ||
+ | Compute factors | ||
+ | Normalize input of Signal 0, Rep 0 with factor 0.918 | ||
+ | Normalize input of Signal 1, Rep 0 with factor 0.85 | ||
+ | Use global TMM approach | ||
+ | TMM normalization not successfully performed, do not normalize data | ||
+ | Traceback (most recent call last): | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module> | ||
+ | load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')() | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main | ||
+ | chrom_sizes, dims, inputs, tracker) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM | ||
+ | report=options.report, poisson=options.poisson) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 172, in _fit_mean_var_distr | ||
+ | data_rep = _get_data_rep(overall_coverage, name, debug, sample_size) | ||
+ | File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 143, in _get_data_rep | ||
+ | r = np.random.randint(cov.shape[1], size=sample_size) | ||
+ | File "mtrand.pyx", line 977, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:16117) | ||
+ | ValueError: low >= high | ||
+ | |||
= Links = | = Links = | ||
* [http://www.regulatory-genomics.org/thor-2/basic-intrstruction Thor's home page] | * [http://www.regulatory-genomics.org/thor-2/basic-intrstruction Thor's home page] |
Latest revision as of 21:47, 28 June 2017
Contents
Introduction
This is one of the newer Differential Peak Callers from Costalab in Aachen.
It has a low traffic google groups page, but a google groups page nonetheless at:
Notes:
- Thor follows on the ODIN tool which is largely supersedes.
- For normalization it can use the TMM method widelused in RNA-Seq (EdgeR) or a bed-format list of housekeeping genes.
- There is no verbose options to see what might be going wrong, though the developers are working on it.
- There is no parallelism in this tool, it can run quite slowly. For a big sample set resercve as much as two days.
- Neither Thor nor Odin take account of paired reads in the BAM file. Recommendation from devs is to filter discordant pairs from the BAM file.
Housekeeping genes
The rationale for this approach is that they are genes that have a stable expression pattern, and so can be used to normalise all the others that don't.
Test Data Usage
Thor has two tests, a basic one using an B-cell lymphoma study and another advanced one.
Basic run: B-cell lymphoma
As usual, this ia an undemanding dataset, but is useful for ascertaining where Thir works properly. The test data is available as a tar-file download and the launch of the program is the easiest possible:
rgt-THOR THOR.config
The config file contains the following:
#rep1 FL5_H3K27ac.100k.bam FL8_H3K27ac.100k.bam #rep2 CC4_H3K27ac.100k.bam CC5_H3K27ac.100k.bam #chrom_sizes hg19.chrom.sizes
The output looks like this:
Warning: Do not compute GC-content, as there is no input file Warning: Do not compute GC-content, as there is no genome file Call DPs on whole genome. Computing read extension sizes for ChIP-seq profiles Use global TMM approach Compute HMM's training set Train HMM Compute HMM's posterior probabilities and Viterbi path to call differential peaks - taking into account chr1 - taking into account chr10 - taking into account chr11 - taking into account chr12 - taking into account chr13 - taking into account chr14 - taking into account chr15 - taking into account chr16 - taking into account chr17 - taking into account chr18 - taking into account chr19 - taking into account chr2 - taking into account chr20 - taking into account chr21 - taking into account chr22 - taking into account chr3 - taking into account chr4 - taking into account chr5 - taking into account chr6 - taking into account chr7 - taking into account chr8 - taking into account chr9 - taking into account chrM - taking into account chrX - taking into account chrY
Real Data Usage
a first run
The first few runs are usually error-prone. Thor is no exception:
rgt-THOR first.config --report --no-correction --output-dir ./thorfirst -n THOR_DCfirst
Explanation:
- first.config, this is a configuration file, see below
- --output-dir, program will fail to run if this already exists.
The contents of the first.config
is as follows:
#rep1 WTIP_S5_L001/WTIP_S5_L001_srtd.bam WTIP_S5_L002/WTIP_S5_L002_srtd.bam #rep2 USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam USL1IP_S7_L002/USL1IP_S7_L002_srtd.bam #genome /storage/home/users/as363/w303_genome/w303.fa #chrom_sizes /storage/home/users/as363/w303_genome/w303_.sizes #inputs1 WTINP_S4_L001/WTINP_S4_L001_srtd.bam WTINP_S4_L002/WTINP_S4_L002_srtd.bam #inputs2 ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam ULS1INP_S6_L002/ULS1INP_S6_L002_srtd.bam
output from this run
Call DPs on whole genome. Computing read extension sizes for ChIP-seq profiles Compute GC-content [fai_load] build FASTA index. Compute factors Normalize input of Signal 0, Rep 0 with factor 0.644 Normalize input of Signal 0, Rep 1 with factor 0.645 Normalize input of Signal 1, Rep 0 with factor 0.647 Normalize input of Signal 1, Rep 1 with factor 0.647 Use global TMM approach TMM normalization not successfully performed, do not normalize data TMM normalization not successfully performed, do not normalize data TMM normalization not successfully performed, do not normalize data TMM normalization not successfully performed, do not normalize data Compute GC-content Compute factors Normalize input of Signal 0, Rep 0 with factor 0.644 Normalize input of Signal 0, Rep 1 with factor 0.645 Normalize input of Signal 1, Rep 0 with factor 0.647 Normalize input of Signal 1, Rep 1 with factor 0.647 Use global TMM approach Compute HMM's training set No differential peaks detected
A second run, with four housekeeping genes
rgt-THOR first.config --report --housekeeping-genes fourhk_yeast.bed --no-correction --output-dir ./thorsec -n THOR_DCsec
Conifg file didn't change on this one. Out is as follows:
Call DPs on whole genome. Computing read extension sizes for ChIP-seq profiles Compute GC-content Compute factors Normalize input of Signal 0, Rep 0 with factor 0.644 Normalize input of Signal 0, Rep 1 with factor 0.645 Normalize input of Signal 1, Rep 0 with factor 0.647 Normalize input of Signal 1, Rep 1 with factor 0.647 Use housekeeping gene approach -Housekeeping gene matrix (columns-genes, rows-samples) [[ 5655. 3364. 4323. 4224.] [ 5569. 2860. 4550. 4301.] [ 6860. 3415. 4752. 4436.] [ 6176. 3574. 5364. 4361.]] -gene (column) wise evaluation cdc19 0.00021061043602 act1 0.000309042887452 tdh3 0.000258788890782 fba1 0.000205695296173 -sample (row) wise evaluation WTIP_S5_L001_srtd 0.000177121339107 WTIP_S5_L002_srtd 0.000458559742218 USL1IP_S7_L001_srtd 0.000271335438277 USL1IP_S7_L002_srtd 0.000422582780881 Compute GC-content Compute factors Normalize input of Signal 0, Rep 0 with factor 0.644 Normalize input of Signal 0, Rep 1 with factor 0.645 Normalize input of Signal 1, Rep 0 with factor 0.647 Normalize input of Signal 1, Rep 1 with factor 0.647 Use housekeeping gene approach -Housekeeping gene matrix (columns-genes, rows-samples) [[ 5655. 3364. 4323. 4224.] [ 5569. 2860. 4550. 4301.] [ 6860. 3415. 4752. 4436.] [ 6176. 3574. 5364. 4361.]] -gene (column) wise evaluation cdc19 0.00021061043602 act1 0.000309042887452 tdh3 0.000258788890782 fba1 0.000205695296173 -sample (row) wise evaluation WTIP_S5_L001_srtd 0.000177121339107 WTIP_S5_L002_srtd 0.000458559742218 USL1IP_S7_L001_srtd 0.000271335438277 USL1IP_S7_L002_srtd 0.000422582780881 Compute HMM's training set No differential peaks detected
A third run giving errors
With the fastq's concatenated and following note from Ivan Costa that TMM normalize actually should work, the following was tried:
$ rgt-THOR one.config --report --no-correction --output-dir ./thorsec -n THOR_DCsec Call DPs on whole genome. Computing read extension sizes for ChIP-seq profiles Compute GC-content [fai_load] build FASTA index. Compute factors Normalize input of Signal 0, Rep 0 with factor 0.844 Normalize input of Signal 1, Rep 0 with factor 0.806 Use global TMM approach Traceback (most recent call last): File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module> load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')() File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main chrom_sizes, dims, inputs, tracker) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM report=options.report, poisson=options.poisson) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 197, in _fit_mean_var_distr _plot_func(plot_data, outputdir) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 108, in _plot_func maxs.append(max(tmp[tmp < np.percentile(tmp, 90)])) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile interpolation=interpolation) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce r = func(a, **kwargs) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile x1 = take(ap, indices_below, axis=axis) * weights_below File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc return getattr(obj, method)(*args, **kwds) IndexError: cannot do a non-empty take from an empty axes.
This may have had something to do with a numpy update that was done between the second and third runs.
I have tried further and this is what I get now
[ramon@marvin SeqAH_Thor2]$ rgt-THOR one.config --report --no-correction --output-dir ./thorfirst -n THOR_DCone Call DPs on whole genome. Computing read extension sizes for ChIP-seq profiles Compute GC-content [fai_load] build FASTA index. Compute factors Normalize input of Signal 0, Rep 0 with factor 0.918 Normalize input of Signal 1, Rep 0 with factor 0.85 Use global TMM approach TMM normalization not successfully performed, do not normalize data TMM normalization not successfully performed, do not normalize data Compute GC-content Compute factors Normalize input of Signal 0, Rep 0 with factor 0.918 Normalize input of Signal 1, Rep 0 with factor 0.85 Use global TMM approach TMM normalization not successfully performed, do not normalize data Traceback (most recent call last): File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/rgt-THOR", line 11, in <module> load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')() File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main chrom_sizes, dims, inputs, tracker) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 81, in train_HMM report=options.report, poisson=options.poisson) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 172, in _fit_mean_var_distr data_rep = _get_data_rep(overall_coverage, name, debug, sample_size) File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/rgt/THOR/dpc_help.py", line 143, in _get_data_rep r = np.random.randint(cov.shape[1], size=sample_size) File "mtrand.pyx", line 977, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:16117) ValueError: low >= high