Difference between revisions of "ChIP-Seq Top2 in Yeast"
m (Protected "ChIP-Seq Top2 in Yeast" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))) |
|||
(20 intermediate revisions by one other user not shown) | |||
Line 5: | Line 5: | ||
* [[ChIP-Seq Top2 peak-calling]] | * [[ChIP-Seq Top2 peak-calling]] | ||
* [[ChIP-Seq Top2 peak-calling E2]] | * [[ChIP-Seq Top2 peak-calling E2]] | ||
+ | |||
+ | = Naming conventions = | ||
+ | |||
+ | As this was arrived at half way through, not all datasets are named according to this: | ||
+ | |||
+ | PROTEIN IMMUNOPRECIPITATED_CONDITION_MUTATIONS_REPLICATE NUMBER_SAMPLE TYPE | ||
+ | |||
+ | With | ||
+ | PROTEIN IMMUNOPRECIPITATED: Uls1 or Top2 ChIP | ||
+ | CONDITION: YPD (untreated) or ACF (acriflavine treated) | ||
+ | MUTATIONS: WT (Wild type) or ULS1 (uls1 deleted) | ||
+ | REPLICATE NUMBER: R1 or R2 | ||
+ | SAMPLE TYPE: INP (input) or IP (immunoprecipitation) or broadpeaks28 (peak files and logLR) | ||
= Sample quality = | = Sample quality = | ||
− | + | In the above previous analyses, we undertook some Duplicate removel as this showed up to be an issue in the FastQC quality files. | |
+ | |||
+ | However, the effect of this removal was very low. In any case, the MACS2 program takes care of duplicates so samples were only trimmed for adapters and quality. | ||
+ | |||
+ | = Alignment to the S288C Reference = | ||
− | = Bam quality = | + | == Bam quality, no filtering == |
− | == First Experiment == | + | === First Experiment === |
* [http://stab.st-andrews.ac.uk/top2/bamquals0/ULSIP_E1_cad1tro_srtd_bamqc.html ULS1_IP_E1] | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULSIP_E1_cad1tro_srtd_bamqc.html ULS1_IP_E1] | ||
* [http://stab.st-andrews.ac.uk/top2/bamquals0/ULSINP_E1_cad1tro_srtd_bamqc.html ULS1_INP_E1] | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULSINP_E1_cad1tro_srtd_bamqc.html ULS1_INP_E1] | ||
Line 18: | Line 35: | ||
* [http://stab.st-andrews.ac.uk/top2/bamquals0/WTINP_E1_cad1tro_srtd_bamqc.html WT_INP_E1] | * [http://stab.st-andrews.ac.uk/top2/bamquals0/WTINP_E1_cad1tro_srtd_bamqc.html WT_INP_E1] | ||
− | == Second Experiment == | + | === Second Experiment === |
− | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ | + | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULS1_IP_cad1_srtd_bamqc.html ULS1_IP_E2] |
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULS1_INP_cad1_srtd_bamqc.html ULS1_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals0/WT_T2_IP_cad1_srtd_bamqc.html WT_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals0/WT_T2_INP_cad1_srtd_bamqc.html WT_T2_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULS1_T2_IP_cad1_srtd_bamqc.html ULS1_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals0/ULS1_T2_INP_cad1_srtd_bamqc.html ULS1_T2_INP_E2] | ||
+ | |||
+ | = Alignment to the new W303 Reference = | ||
+ | |||
+ | == Bam quality, no filtering == | ||
+ | |||
+ | === First Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULSIP_E1_cad1tro_srtd_bamqc.html ULS1_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULSINP_E1_cad1tro_srtd_bamqc.html ULS1_INP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/WTIP_E1_cad1tro_srtd_bamqc.html WT_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/WTINP_E1_cad1tro_srtd_bamqc.html WT_INP_E1] | ||
+ | |||
+ | === Second Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULS1_IP_cad1_srtd_bamqc.html ULS1_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULS1_INP_cad1_srtd_bamqc.html ULS1_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/WT_T2_IP_cad1_srtd_bamqc.html WT_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/WT_T2_INP_cad1_srtd_bamqc.html WT_T2_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULS1_T2_IP_cad1_srtd_bamqc.html ULS1_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamquals2/ULS1_T2_INP_cad1_srtd_bamqc.html ULS1_T2_INP_E2] | ||
+ | |||
+ | == Bam quality, filtering applied == | ||
+ | |||
+ | The bam files were filtered both for mapping quality and low integrity reads in the following manner: | ||
+ | |||
+ | samtools view -b -F 1820 -q 48 <INBAMFILE> -o ${OUTBAMFILE> -@ <NUMTHREADS> | ||
+ | |||
+ | === First Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULSIP_E1_srtd_filt_bamqc.html ULS1_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULSINP_E1_srtd_filt_bamqc.html ULS1_INP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/WTIP_E1_srtd_filt_bamqc.html WT_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/WTINP_E1_srtd_filt_bamqc.html WT_INP_E1] | ||
+ | |||
+ | === Second Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULS1_IP_srtd_filt_bamqc.html ULS1_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULS1_INP_srtd_filt_bamqc.html ULS1_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/WT_T2_IP_srtd_filt_bamqc.html WT_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/WT_T2_INP_srtd_filt_bamqc.html WT_T2_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULS1_T2_IP_srtd_filt_bamqc.html ULS1_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals0/ULS1_T2_INP_srtd_filt_bamqc.html ULS1_T2_INP_E2] | ||
+ | |||
+ | == Bam quality, 2nd filtering applied == | ||
+ | |||
+ | This time bam files were filtered to MAPQ 28: | ||
+ | |||
+ | samtools view -b -F 1820 -q 28 <INBAMFILE> -o ${OUTBAMFILE> -@ <NUMTHREADS> | ||
+ | |||
+ | === First Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULSIP_E1_srtd_filt3_bamqc.html ULS1_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULSINP_E1_srtd_filt3_bamqc.html ULS1_INP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/WTIP_E1_srtd_filt3_bamqc.html WT_IP_E1] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/WTINP_E1_srtd_filt3_bamqc.html WT_INP_E1] | ||
+ | |||
+ | === Second Experiment === | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULS1_IP_srtd_filt3_bamqc.html ULS1_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULS1_INP_srtd_filt3_bamqc.html ULS1_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/WT_T2_IP_srtd_filt3_bamqc.html WT_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/WT_T2_INP_srtd_filt3_bamqc.html WT_T2_INP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULS1_T2_IP_srtd_filt3_bamqc.html ULS1_T2_IP_E2] | ||
+ | * [http://stab.st-andrews.ac.uk/top2/bamfiltquals3/ULS1_T2_INP_srtd_filt3_bamqc.html ULS1_T2_INP_E2] | ||
+ | |||
+ | = The MACS/2 program = | ||
+ | |||
+ | The main way of running this program is as follows: | ||
+ | |||
+ | macs2 callpeak -B -t $BAM1 -c $BAM2 -f BAMPE --outdir $IPOUTDIR -g 12.07e6 -q 0.01 | ||
+ | |||
+ | * $BAM1 is the name of the immunoprecipitation alignment file, termed the '''treatment''' and therefore given with the '''-t''' option. | ||
+ | * $BAM2 is the name of the normal input alignment file, termed the '''control''' and therefore given with the '''-c''' option. | ||
+ | * BAMPE is just a setting informing the program that these a pair-end alignment files. | ||
+ | * the output directory here is given the IP (rather than the INP) directory for the sample. | ||
+ | * the size of the genome is then given for the -g option. | ||
+ | |||
+ | MACS automatically produces five files as output: | ||
+ | * NA_peaks.narrowPeak, a 6+4 bedgraph file with two of the four columns being p-values and q-value (false discovery rate). It's the principal peak-calling file. | ||
+ | * NA_summits.bed, more a summary of the previous file ... recording the max value for each of the peaks called. | ||
+ | * NA_peaks.xls | ||
+ | * NA_treat_pileup.bdg, a high resolution 4 column bedgraph file of the basic signal (floating point). Contiguous loci with same signal are grouped together. It is assumed that it refers to the IP signal. | ||
+ | * NA_control_lambda.bdg, almost the same as the pileup file, but most probably refering to the INP. The meaning of word lambda is unclear here. | ||
+ | ('''NA''' is the default prefix, another can be chosen) | ||
+ | |||
+ | = Categorisation as peak = | ||
+ | |||
+ | It is not totally clear how MACS decides that a peak should be called. This is apparent when the signal appears to rise and fall, yet MACS does not call a peak. | ||
+ | |||
+ | == Signal script == | ||
+ | |||
+ | The source for this is from the MACS developer and is detailed [https://gist.github.com/taoliu/2469050 here]. | ||
+ | |||
+ | = Differential Peak-calling = | ||
+ | |||
+ | Macs2 incorporated differential peak calling somewhat later into its set of tools. The idea is to take two conditions and compare their peaks. | ||
+ | |||
+ | The pipeline to follow is probably Macs2's recommended one at [https://github.com/taoliu/MACS/wiki/Call-differential-binding-events Differential Binding Events] |
Latest revision as of 09:16, 4 June 2018
Contents
Introduction
This wiki page combines the two sequencing experiments of ChIP-Seq on yeast samples udring June / July 2017.
Naming conventions
As this was arrived at half way through, not all datasets are named according to this:
PROTEIN IMMUNOPRECIPITATED_CONDITION_MUTATIONS_REPLICATE NUMBER_SAMPLE TYPE With PROTEIN IMMUNOPRECIPITATED: Uls1 or Top2 ChIP CONDITION: YPD (untreated) or ACF (acriflavine treated) MUTATIONS: WT (Wild type) or ULS1 (uls1 deleted) REPLICATE NUMBER: R1 or R2 SAMPLE TYPE: INP (input) or IP (immunoprecipitation) or broadpeaks28 (peak files and logLR)
Sample quality
In the above previous analyses, we undertook some Duplicate removel as this showed up to be an issue in the FastQC quality files.
However, the effect of this removal was very low. In any case, the MACS2 program takes care of duplicates so samples were only trimmed for adapters and quality.
Alignment to the S288C Reference
Bam quality, no filtering
First Experiment
Second Experiment
Alignment to the new W303 Reference
Bam quality, no filtering
First Experiment
Second Experiment
Bam quality, filtering applied
The bam files were filtered both for mapping quality and low integrity reads in the following manner:
samtools view -b -F 1820 -q 48 <INBAMFILE> -o ${OUTBAMFILE> -@ <NUMTHREADS>
First Experiment
Second Experiment
Bam quality, 2nd filtering applied
This time bam files were filtered to MAPQ 28:
samtools view -b -F 1820 -q 28 <INBAMFILE> -o ${OUTBAMFILE> -@ <NUMTHREADS>
First Experiment
Second Experiment
The MACS/2 program
The main way of running this program is as follows:
macs2 callpeak -B -t $BAM1 -c $BAM2 -f BAMPE --outdir $IPOUTDIR -g 12.07e6 -q 0.01
- $BAM1 is the name of the immunoprecipitation alignment file, termed the treatment and therefore given with the -t option.
- $BAM2 is the name of the normal input alignment file, termed the control and therefore given with the -c option.
- BAMPE is just a setting informing the program that these a pair-end alignment files.
- the output directory here is given the IP (rather than the INP) directory for the sample.
- the size of the genome is then given for the -g option.
MACS automatically produces five files as output:
- NA_peaks.narrowPeak, a 6+4 bedgraph file with two of the four columns being p-values and q-value (false discovery rate). It's the principal peak-calling file.
- NA_summits.bed, more a summary of the previous file ... recording the max value for each of the peaks called.
- NA_peaks.xls
- NA_treat_pileup.bdg, a high resolution 4 column bedgraph file of the basic signal (floating point). Contiguous loci with same signal are grouped together. It is assumed that it refers to the IP signal.
- NA_control_lambda.bdg, almost the same as the pileup file, but most probably refering to the INP. The meaning of word lambda is unclear here.
(NA is the default prefix, another can be chosen)
Categorisation as peak
It is not totally clear how MACS decides that a peak should be called. This is apparent when the signal appears to rise and fall, yet MACS does not call a peak.
Signal script
The source for this is from the MACS developer and is detailed here.
Differential Peak-calling
Macs2 incorporated differential peak calling somewhat later into its set of tools. The idea is to take two conditions and compare their peaks.
The pipeline to follow is probably Macs2's recommended one at Differential Binding Events