Difference between revisions of "Macs2"

From wiki
Jump to: navigation, search
Line 15: Line 15:
 
* '''-g''', this is the genome size, which you can calculate by coutning all the bases int he reference FASTA
 
* '''-g''', this is the genome size, which you can calculate by coutning all the bases int he reference FASTA
 
* '''-q''', is a quality values ... have seen 0.1 to 0.01 being used.
 
* '''-q''', is a quality values ... have seen 0.1 to 0.01 being used.
* ''-f BAMPE''', allows to specify we are dealing with paired-end reads.
+
* '''-f BAMPE''', allows to specify we are dealing with paired-end reads.
  
 
=== output from this command ===
 
=== output from this command ===
 +
 +
INFO  @ Mon, 05 Jun 2017 17:03:06:
 +
# Command line: callpeak -t USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -f BAMPE -g 11.1e6 -q 0.01
 +
# ARGUMENTS LIST:
 +
# name = NA
 +
# format = BAMPE
 +
# ChIP-seq file = ['USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam']
 +
# control file = ['ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam']
 +
# effective genome size = 1.11e+07
 +
# band width = 300
 +
# model fold = [5, 50]
 +
# qvalue cutoff = 1.00e-02
 +
# Larger dataset will be scaled towards smaller dataset.
 +
# Range for calculating regional lambda is: 1000 bps and 10000 bps
 +
# Broad region calling is off
 +
# Paired-End mode is on
 +
 
 +
INFO  @ Mon, 05 Jun 2017 17:03:06: #1 read fragment files...
 +
INFO  @ Mon, 05 Jun 2017 17:03:06: #1 read treatment fragments...
 +
INFO  @ Mon, 05 Jun 2017 17:03:19:  1000000
 +
INFO  @ Mon, 05 Jun 2017 17:03:32:  2000000
 +
INFO  @ Mon, 05 Jun 2017 17:03:34: #1.2 read input fragments...
 +
INFO  @ Mon, 05 Jun 2017 17:03:47:  1000000
 +
INFO  @ Mon, 05 Jun 2017 17:04:00:  2000000
 +
INFO  @ Mon, 05 Jun 2017 17:04:13:  3000000
 +
INFO  @ Mon, 05 Jun 2017 17:04:26:  4000000
 +
INFO  @ Mon, 05 Jun 2017 17:04:39:  5000000
 +
INFO  @ Mon, 05 Jun 2017 17:04:52:  6000000
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 mean fragment size is determined as 195 bp from treatment
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 note: mean fragment size in control is 197 bp -- value ignored
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 fragment size = 195
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1  total fragments in treatment: 2013032
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 user defined the maximum fragments...
 +
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
 +
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  fragments after filtering in treatment: 1779579
 +
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  Redundant rate of treatment: 0.12
 +
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  total fragments in control: 6333991
 +
INFO  @ Mon, 05 Jun 2017 17:05:10: #1 user defined the maximum fragments...
 +
INFO  @ Mon, 05 Jun 2017 17:05:10: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #1  fragments after filtering in control: 5902227
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #1  Redundant rate of control: 0.07
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #1 finished!
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Build Peak Model...
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Skipped...
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Use 195 as fragment length
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #3 Call peaks...
 +
INFO  @ Mon, 05 Jun 2017 17:05:37: #3 Pre-compute pvalue-qvalue table...
 +
INFO  @ Mon, 05 Jun 2017 17:06:20: #3 Call peaks for each chromosome...
 +
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write output xls file... NA_peaks.xls
 +
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write peak in narrowPeak format file... NA_peaks.narrowPeak
 +
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write summits bed file... NA_summits.bed
 +
INFO  @ Mon, 05 Jun 2017 17:06:26: Done!
 +
 +
=== unsuccessful output ===
 +
 +
For comparison purposes, it may be useful to have this:
  
 
  INFO  @ Fri, 02 Jun 2017 18:27:43:  
 
  INFO  @ Fri, 02 Jun 2017 18:27:43:  

Revision as of 16:12, 5 June 2017

Introduction

One of the main programs used for peak-calling on ChIP-Seq alignments.

Usage

Here are some example usages. The module must be loaded via module load macs2.

base minimum options

 macs2 callpeak -t USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -f BAMPE -g 11.1e6 -q 0.01

Explanation:

  • note how the IP files start with USL instead of ULS, this is just a specific dataset nomenclature error.
  • -g, this is the genome size, which you can calculate by coutning all the bases int he reference FASTA
  • -q, is a quality values ... have seen 0.1 to 0.01 being used.
  • -f BAMPE, allows to specify we are dealing with paired-end reads.

output from this command

INFO  @ Mon, 05 Jun 2017 17:03:06: 
# Command line: callpeak -t USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -f BAMPE -g 11.1e6 -q 0.01
# ARGUMENTS LIST:
# name = NA
# format = BAMPE
# ChIP-seq file = ['USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam']
# control file = ['ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam']
# effective genome size = 1.11e+07
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on
 
INFO  @ Mon, 05 Jun 2017 17:03:06: #1 read fragment files... 
INFO  @ Mon, 05 Jun 2017 17:03:06: #1 read treatment fragments... 
INFO  @ Mon, 05 Jun 2017 17:03:19:  1000000 
INFO  @ Mon, 05 Jun 2017 17:03:32:  2000000 
INFO  @ Mon, 05 Jun 2017 17:03:34: #1.2 read input fragments... 
INFO  @ Mon, 05 Jun 2017 17:03:47:  1000000 
INFO  @ Mon, 05 Jun 2017 17:04:00:  2000000 
INFO  @ Mon, 05 Jun 2017 17:04:13:  3000000 
INFO  @ Mon, 05 Jun 2017 17:04:26:  4000000 
INFO  @ Mon, 05 Jun 2017 17:04:39:  5000000 
INFO  @ Mon, 05 Jun 2017 17:04:52:  6000000 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 mean fragment size is determined as 195 bp from treatment 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 note: mean fragment size in control is 197 bp -- value ignored 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 fragment size = 195 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1  total fragments in treatment: 2013032 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 user defined the maximum fragments... 
INFO  @ Mon, 05 Jun 2017 17:05:01: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  fragments after filtering in treatment: 1779579 
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  Redundant rate of treatment: 0.12 
INFO  @ Mon, 05 Jun 2017 17:05:10: #1  total fragments in control: 6333991 
INFO  @ Mon, 05 Jun 2017 17:05:10: #1 user defined the maximum fragments... 
INFO  @ Mon, 05 Jun 2017 17:05:10: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Mon, 05 Jun 2017 17:05:37: #1  fragments after filtering in control: 5902227 
INFO  @ Mon, 05 Jun 2017 17:05:37: #1  Redundant rate of control: 0.07 
INFO  @ Mon, 05 Jun 2017 17:05:37: #1 finished! 
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Build Peak Model... 
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Skipped... 
INFO  @ Mon, 05 Jun 2017 17:05:37: #2 Use 195 as fragment length 
INFO  @ Mon, 05 Jun 2017 17:05:37: #3 Call peaks... 
INFO  @ Mon, 05 Jun 2017 17:05:37: #3 Pre-compute pvalue-qvalue table... 
INFO  @ Mon, 05 Jun 2017 17:06:20: #3 Call peaks for each chromosome... 
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write output xls file... NA_peaks.xls 
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write peak in narrowPeak format file... NA_peaks.narrowPeak 
INFO  @ Mon, 05 Jun 2017 17:06:26: #4 Write summits bed file... NA_summits.bed 
INFO  @ Mon, 05 Jun 2017 17:06:26: Done! 

unsuccessful output

For comparison purposes, it may be useful to have this:

INFO  @ Fri, 02 Jun 2017 18:27:43: 
# Command line: callpeak -t ULS1IP_S7_L001/ULS1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -g 11.1e6 -q0.01
# ARGUMENTS LIST:
# name = NA
# format = AUTO
# ChIP-seq file = ['ULS1IP_S7_L001/ULS1IP_S7_L001_srtd.bam']
# control file = ['ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam']
# effective genome size = 1.11e+07
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is off
  
INFO  @ Fri, 02 Jun 2017 18:27:43: #1 read tag files... 
INFO  @ Fri, 02 Jun 2017 18:27:43: #1 read treatment tags... 
Traceback (most recent call last):
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/macs2", line 617, in <module>
   main()
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/bin/macs2", line 57, in main
   run( args )
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/MACS2/callpeak_cmd.py", line 73, in run
   else:       (treat, control) = load_tag_files_options  (options)
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/site-packages/MACS2/callpeak_cmd.py", line 395, in load_tag_files_options
   tp = options.parser(options.tfile[0], buffer_size=options.buffer_size)
 File "MACS2/IO/Parser.pyx", line 58, in MACS2.IO.Parser.guess_parser (MACS2/IO/Parser.c:2786)
 File "MACS2/IO/Parser.pyx", line 81, in MACS2.IO.Parser.guess_parser (MACS2/IO/Parser.c:2413)
 File "MACS2/IO/Parser.pyx", line 774, in MACS2.IO.Parser.BAMParser.__init__ (MACS2/IO/Parser.c:11146)
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/gzip.py", line 34, in open
   return GzipFile(filename, mode, compresslevel)
 File "/usr/local/Modules/modulefiles/tools/python/2.7.6/lib/python2.7/gzip.py", line 94, in __init__
   fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: 'ULS1IP_S7_L001/ULS1IP_S7_L001_srtd.bam'
[ramon@marvin SeqAH_proj0]$ ls -l U
ULS1INP_S6_L001/ ULS1INP_S6_L002/ ULS1INP_S6_L003/ ULS1INP_S6_L004/ USL1IP_S7_L001/  USL1IP_S7_L002/  USL1IP_S7_L003/  USL1IP_S7_L004/  
[ramon@marvin SeqAH_proj0]$ macs2 callpeak -t USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -g 11.1e6 -q0.01                             
INFO  @ Fri, 02 Jun 2017 18:29:17: 
# Command line: callpeak -t USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam -c ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam -g 11.1e6 -q0.01
# ARGUMENTS LIST:
# name = NA
# format = AUTO
# ChIP-seq file = ['USL1IP_S7_L001/USL1IP_S7_L001_srtd.bam']
# control file = ['ULS1INP_S6_L001/ULS1INP_S6_L001_srtd.bam']
# effective genome size = 1.11e+07
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is off
 
INFO  @ Fri, 02 Jun 2017 18:29:17: #1 read tag files... 
INFO  @ Fri, 02 Jun 2017 18:29:17: #1 read treatment tags... 
INFO  @ Fri, 02 Jun 2017 18:29:17: Detected format is: BAM 
INFO  @ Fri, 02 Jun 2017 18:29:17: * Input file is gzipped. 
INFO  @ Fri, 02 Jun 2017 18:29:23:  1000000 
INFO  @ Fri, 02 Jun 2017 18:29:29:  2000000 
INFO  @ Fri, 02 Jun 2017 18:29:35:  3000000 
INFO  @ Fri, 02 Jun 2017 18:29:41:  4000000 
INFO  @ Fri, 02 Jun 2017 18:29:42: #1.2 read input tags... 
INFO  @ Fri, 02 Jun 2017 18:29:42: Detected format is: BAM 
INFO  @ Fri, 02 Jun 2017 18:29:42: * Input file is gzipped. 
INFO  @ Fri, 02 Jun 2017 18:29:48:  1000000 
INFO  @ Fri, 02 Jun 2017 18:29:55:  2000000 
INFO  @ Fri, 02 Jun 2017 18:30:01:  3000000 
INFO  @ Fri, 02 Jun 2017 18:30:07:  4000000 
INFO  @ Fri, 02 Jun 2017 18:30:13:  5000000 
INFO  @ Fri, 02 Jun 2017 18:30:19:  6000000 
INFO  @ Fri, 02 Jun 2017 18:30:25:  7000000 
INFO  @ Fri, 02 Jun 2017 18:30:32:  8000000 
INFO  @ Fri, 02 Jun 2017 18:30:38:  9000000 
INFO  @ Fri, 02 Jun 2017 18:30:44:  10000000 
INFO  @ Fri, 02 Jun 2017 18:30:50:  11000000 
INFO  @ Fri, 02 Jun 2017 18:30:56:  12000000 
INFO  @ Fri, 02 Jun 2017 18:31:02:  13000000 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 tag size is determined as 59 bps 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 tag size = 59 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  total tags in treatment: 2013032 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 user defined the maximum tags... 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  tags after filtering in treatment: 1208158 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  Redundant rate of treatment: 0.40 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  total tags in control: 6333991 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 user defined the maximum tags... 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  tags after filtering in control: 4408839 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1  Redundant rate of control: 0.30 
INFO  @ Fri, 02 Jun 2017 18:31:02: #1 finished! 
INFO  @ Fri, 02 Jun 2017 18:31:02: #2 Build Peak Model... 
INFO  @ Fri, 02 Jun 2017 18:31:02: #2 looking for paired plus/minus strand peaks... 
INFO  @ Fri, 02 Jun 2017 18:31:02: #2 number of paired peaks: 61 
WARNING @ Fri, 02 Jun 2017 18:31:02: Too few paired peaks (61) so I can not build the model! Broader your MFOLD range parameter may erase this error. If it still can't build 
the model, we suggest to use --nomodel and --extsize 147 or other fixed number instead. 
WARNING @ Fri, 02 Jun 2017 18:31:02: Process for pairing-model is terminated!


Help file

Here is the programs help file obtained with the command

macs2 -h
macs2 -- Model-based Analysis for ChIP-Sequencing

positional arguments:
  {callpeak,bdgpeakcall,bdgbroadcall,bdgcmp,bdgopt,cmbreps,bdgdiff,filterdup,predictd,pileup,randsample,refinepeak}
    callpeak            Main MACS2 Function: Call peaks from alignment
                        results.
    bdgpeakcall         Call peaks from bedGraph output. Note: All regions on
                        the same chromosome in the bedGraph file should be
                        continuous so only bedGraph files from MACS2 are
                        accpetable.
    bdgbroadcall        Call broad peaks from bedGraph output. Note: All
                        regions on the same chromosome in the bedGraph file
                        should be continuous so only bedGraph files from MACS2
                        are accpetable.
    bdgcmp              Deduct noise by comparing two signal tracks in
                        bedGraph. Note: All regions on the same chromosome in
                        the bedGraph file should be continuous so only
                        bedGraph files from MACS2 are accpetable.
    bdgopt              Operations on score column of bedGraph file. Note: All
                        regions on the same chromosome in the bedGraph file
                        should be continuous so only bedGraph files from MACS2
                        are accpetable.
    cmbreps             Combine BEDGraphs of scores from replicates. Note: All
                        regions on the same chromosome in the bedGraph file
                        should be continuous so only bedGraph files from MACS2
                        are accpetable.
    bdgdiff             Differential peak detection based on paired four
                        bedgraph files. Note: All regions on the same
                        chromosome in the bedGraph file should be continuous
                        so only bedGraph files from MACS2 are accpetable.
    filterdup           Remove duplicate reads at the same position, then
                        convert acceptable format to BED format.
    predictd            Predict d or fragment size from alignment results.
                        *Will NOT filter duplicates*
    pileup              Pileup aligned reads with a given extension size
                        (fragment size or d in MACS language). Note there will
                        be no step for duplicate reads filtering or sequencing
                        depth scaling, so you may need to do certain pre/post-
                        processing.
    randsample          Randomly sample number/percentage of total reads.
    refinepeak          (Experimental) Take raw reads alignment, refine peak
                        summits and give scores measuring balance of
                        waston/crick tags. Inspired by SPP.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit