Output Files
Phase 1 of the ChIP-seq pipeline¶
Successful completion of Phase 1 of the ChIPseq tutorial demo data will create the following file/folder structure:
Rawdata files¶
Symlinks to the raw fastq files:
<working_dir>
│
│
├── CTCF_ChIP_macrophage_p20_1.R1.fastq.gz -> /data/CCBR_Pipeliner/testdata/chipseq//SRR3081748_1.fastq.gz
├── CTCF_ChIP_macrophage_p20_2.R1.fastq.gz -> /data/CCBR_Pipeliner/testdata/chipseq//SRR3081749_1.fastq.gz
├── ...
├── ...
Sequencing quality and contamination assessments:¶
FastQC¶
rawfastqc
and fastqc
folders contains sequencing quality assessment results using fastqc for raw and preprocess fastq files. Each file has a .zip archive and a .html report. These results are also summarized across samples in the multiqc_report.html file.
<working_dir>
│
│
├── rawfastQC
│ ├── CTCF_ChIP_macrophage_p20_1.R1_fastqc.html
│ ├── CTCF_ChIP_macrophage_p20_1.R1_fastqc.zip
│ ├── ...
│ ├── ...
├── fastQC
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_fastqc.html
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_fastqc.zip
│ ├── ...
│ ├── ...
FastQScreen¶
1 million reads from each sample are screened against the following database using Fastqscreen:
- Human
- Mouse
- Bacteria
- Fungi
- Viruses
- UniVec
- Ribosomal sequences
The results are saved in the FQscreen
and FQscreen2
folders. You can expect 6 files per sample as shown the example below:
<working_dir>
│
│
├── FQscreen
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.html
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.png
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.txt
│ ├── ...
│ ├── ...
├── FQscreen2
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.html
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.png
│ ├── CTCF_ChIP_macrophage_p20_1.R1.trim_screen.txt
│ ├── ...
│ ├── ...
Kraken/Krona¶
Kraken provides a k-mer based approach which assigns each read to a bacterial reference in an "all known bacterial species" database. These are reported in the *.taxa.txt
files per sample. These assignments can then be interactively visualized in a html file using Krona.
<working_dir>
│
│
├── kraken
│ ├── CTCF_ChIP_macrophage_p20_1.trim.fastq.kraken_bacteria.krona.html
│ ├── CTCF_ChIP_macrophage_p20_1.trim.fastq.kraken_bacteria.taxa.txt
│ ├── ...
│ ├── ...
ChIP quality and replicate concordance assessments¶
Deeptools outputs¶
deeptools` folder has the following set of files for each group (a group includes all replicates for that group including input samples):
- fingerprint plot related files:
fingerprint.metrics.Q5DD.tsv
: fingerprint plot statsfingerprint.Q5DD.pdf
: fingerprint plot with Q5DD datafingerprint.sorted.pdf
: fingerprint plot with unfiltered data- heatmap and profile plots focused on various genomic loci:
- metagene: all genes in the genome normalized by gene length
- prot.metagene: same as above but focused on protein-coding genes only
- TSS: around transcription start sites of all genes
- prot.TSS: same as above but focused on protein-coding genes only
<working_dir>
│
│
├── deeptools
│ ├── Macrophage_p20.fingerprint.metrics.Q5DD.tsv
│ ├── Macrophage_p20.fingerprint.Q5DD.pdf
│ ├── Macrophage_p20.fingerprint.sorted.pdf
│ ├── Macrophage_p20.metagene_heatmap.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.metagene_heatmap.sorted.RPGC.pdf
│ ├── Macrophage_p20.metagene_profile.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.metagene_profile.sorted.RPGC.pdf
│ ├── Macrophage_p20.prot.metagene_heatmap.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.prot.metagene_heatmap.sorted.RPGC.pdf
│ ├── Macrophage_p20.prot.metagene_profile.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.prot.metagene_profile.sorted.RPGC.pdf
│ ├── Macrophage_p20.prot.TSS_heatmap.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.prot.TSS_heatmap.sorted.RPGC.pdf
│ ├── Macrophage_p20.prot.TSS_profile.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.prot.TSS_profile.sorted.RPGC.pdf
│ ├── Macrophage_p20.TSS_heatmap.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.TSS_heatmap.sorted.RPGC.pdf
│ ├── Macrophage_p20.TSS_profile.Q5DD.RPGC.pdf
│ ├── Macrophage_p20.TSS_profile.sorted.RPGC.pdf
│ ├── ...
│ ├── ...
deeptools
folder sorted_fingerprint
subfolder with fingerprint stats for unfiltered bam files.
<working_dir>
│
│
├── deeptools
│ ├── sorted_fingerprint
│ │ ├── Macrophage_p20.fingerprint.metrics.sorted.tsv
│ │ ├── Macrophage_p3.fingerprint.metrics.sorted.tsv
│ │ └── MEF_p20.fingerprint.metrics.sorted.tsv
deeptools
folder contains principle component analysis plots with all samples.
<working_dir>
│
│
├── deeptools
│ ├── pca.Q5DD.RPGC.pdf
│ ├── pca.sorted.RPGC.pdf
deeptools
folder also contain spearman correlation plots (heatmaps and scatterplots) with all samples
<working_dir>
│
│
├── deeptools
│ ├── spearman_heatmap.Q5DD.RPGC_mqc.png
│ ├── spearman_heatmap.Q5DD.RPGC.pdf
│ ├── spearman_heatmap.sorted.RPGC.pdf
│ ├── spearman_scatterplot.Q5DD.RPGC.pdf
│ └── spearman_scatterplot.sorted.RPGC.pdf
ChIP-seq relevant QC¶
QC
folder contains more ChIP-seq relevant qc metrics. For eg.
- Preseq is used to assess library complexity.
- NGSQC is used to compare ChIP quality between replicates/samples. These results are also summarized per group.
QCTable.txt neatly aggregrates all the QC metrics into a single tab-delimited file which can be easily read into Microsoft Excel.
<working_dir>
│
│
├── QC
│ ├── CTCF_ChIP_macrophage_p20_1.ccurve
│ ├── CTCF_ChIP_macrophage_p20_1.preseq.dat
│ ├── CTCF_ChIP_macrophage_p20_1.preseq.log
│ ├── ...
│ ├── ...
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.NGSQC_report.txt
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.NGSQC.txt
│ ├── ...
│ ├── ...
│ ├── Macrophage_p20.NGSQC.Q5DD.png
│ ├── ...
│ ├── ...
│ └── QCTable.txt
Multiqc report¶
Along with some other housekeeping files the Reports
folder contains the multiqc_report.html
which graphically aggregates all QC assestments into one report.
<working_dir>
│
│
├── Reports
│ ├── ...
│ ├── ...
│ ├── multiqc_data
│ │ ├── multiqc_bcbio_metrics.txt
│ │ ├── multiqc_chip-specific_qc_metrics.txt
│ │ ├── multiqc_data.json
│ │ ├── multiqc_fastqc.txt
│ │ ├── multiqc_fastq_screen.txt
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc.log
│ │ ├── multiqc_samtools_flagstat.txt
│ │ ├── multiqc_samtools_idxstats.txt
│ │ ├── multiqc_sources.txt
│ │ ├── seqbuster_isomirs.txt
│ │ └── seqbuster_mirs.txt
│ ├── multiqc_report.html
│ ├── ...
│ ├── ...
Alignment and Visualization files:¶
Bams¶
bam
folder has following files for each sample:
.bam
: binary version of the alignment file. There are 3 versions of alignment files:sorted.bam
: all alignments sorted by coordinatesQ5.bam
: sorted alignments filtered to exclude alignments with MAPQ<5Q5DD.bam
: deduplicated version of theQ5.bam
file- Each of the
.bam
files may also have the following secondary files: .bai
: index for the.bam
file.flagstat
: file containing the alignment statistics.idxstat
: file containing number of reads aligning per chromosome.ppqt
: cross-correlation statistics.pdf
: cross-correlation plots.tagAlign.gz
: alignments in bed format for downstream peak calling by MACS2
<working_dir>
│
│
├── bam
│ ├── CTCF_ChIP_macrophage_p20_1.Q5.bam.bai
│ ├── CTCF_ChIP_macrophage_p20_1.Q5.bam.flagstat
│ ├── CTCF_ChIP_macrophage_p20_1.Q5.bam.idxstat
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.bam
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.bam.bai
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.bam.flagstat
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.bam.idxstat
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.pdf
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.ppqt
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.tagAlign.gz
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.bam
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.bam.bai
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.bam.flagstat
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.bam.idxstat
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.pdf
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.ppqt
│ ├── ...
│ ├── ...
Bigwigs¶
.bigwig
folder contains alignments in bigwig format which is smaller than bam format and easy for quick visualizations using genome browsers like IGV. Each sample has 3 bigwig files:
.sorted.RPGC.bw
: created by normalizing the corresponding.sorted.bam
file to 1X genome coverage.Q5DD.RPGC.bw
: created by normalizing the corresponding.Q5DD.bam
file to 1X genome coverage.Q5DD.RPGC.inputnorm.bw
: created by subtracting the normalized input bigwig file from the normalized ChIP bigwig
<working_dir>
│
│
├── bigwig
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.RPGC.bw
│ ├── CTCF_ChIP_macrophage_p20_1.Q5DD.RPGC.inputnorm.bw
│ ├── CTCF_ChIP_macrophage_p20_1.sorted.RPGC.bw
│ ├── ...
│ ├── ...
Other important files¶
Some of the other important files in the working_dir
are:
cluster.json
: outlines the resources requested for each Snakemake ruleSnakefile
: contains all the Snakemake rulesHPC_usage_table.txt
: tabular report of how each Snakemake rule utilized the HPC cluster with detailsrun.json
: json file saving all GUI selectionspeakcall.tab
: tab-delimited file defining which samples are ChIP and what are their corresponding input samples
<working_dir>
│
│
├── cluster.json
├── Snakefile
├── HPC_usage_table.txt
├── run.json
├── peakcall.tab
├── ...
├── ...
All other folder and files in the working_dir
are for housekeeping and are required for successful execution of the CCBR_Pipeliner.
Phase 2 of the ChIP-seq pipeline¶
Successful completion of Phase 2 of the ChIPseq tutorial demo data will create the following file/folder structure in addition to the files created in Phase 1:
Peak calls:¶
GEM¶
gem
folder has a subfolder for each sample, which should contain the files ending with GEM_events.narrowPeak
which represent the peak calls from GEM in narrowPeak format.
<working_dir>
│
│
├── gem
│ ├── CTCF_ChIP_macrophage_p20_1
│ │ ├── CTCF_ChIP_macrophage_p20_1.GEM_events.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p20_1.GEM_events.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1.GPS_events.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p20_1.GPS_events.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_outputs
│ │ │ ├── ...
│ │ │ ├── ...
│ ├── CTCF_ChIP_macrophage_p20_2
│ ├── ...
│ ├── ...
MACS2 for broad peaks¶
macs2Broad
folder has a subfolder for each sample, which should contain the files ending with .broadPeak
which represent the peak calls from macs2 in broadPeak format. The peaks are also available in Excel format.
<working_dir>
│
│
├── macsBroad
│ ├── CTCF_ChIP_macrophage_p20_1
│ │ ├── CTCF_ChIP_macrophage_p20_1_peaks.broadPeak
│ │ ├── CTCF_ChIP_macrophage_p20_1_peaks.gappedPeak
│ │ └── CTCF_ChIP_macrophage_p20_1_peaks.xls
│ ├── CTCF_ChIP_macrophage_p20_2
│ │ ├── CTCF_ChIP_macrophage_p20_2_peaks.broadPeak
│ │ ├── CTCF_ChIP_macrophage_p20_2_peaks.gappedPeak
│ │ └── CTCF_ChIP_macrophage_p20_2_peaks.xls
│ ├── CTCF_ChIP_macrophage_p3_1
│ │ ├── CTCF_ChIP_macrophage_p3_1_peaks.broadPeak
│ │ ├── CTCF_ChIP_macrophage_p3_1_peaks.gappedPeak
│ │ └── CTCF_ChIP_macrophage_p3_1_peaks.xls
│ ├── CTCF_ChIP_macrophage_p3_2
│ │ ├── CTCF_ChIP_macrophage_p3_2_peaks.broadPeak
│ │ ├── CTCF_ChIP_macrophage_p3_2_peaks.gappedPeak
│ │ └── CTCF_ChIP_macrophage_p3_2_peaks.xls
│ ├── CTCF_ChIP_MEF_p20_1
│ │ ├── CTCF_ChIP_MEF_p20_1_peaks.broadPeak
│ │ ├── CTCF_ChIP_MEF_p20_1_peaks.gappedPeak
│ │ └── CTCF_ChIP_MEF_p20_1_peaks.xls
│ └── CTCF_ChIP_MEF_p20_2
│ ├── CTCF_ChIP_MEF_p20_2_peaks.broadPeak
│ ├── CTCF_ChIP_MEF_p20_2_peaks.gappedPeak
│ └── CTCF_ChIP_MEF_p20_2_peaks.xls
MACS2 for narrow peaks¶
macs2Narrow
folder has a subfolder for each sample, which should contain the files ending with .narrowPeak
which represent the peak calls from macs2 in narrowPeak format. The peaks are also available in Excel format, along with peak summits in bed format.
<working_dir>
│
│
├── macsNarrow
│ ├── CTCF_ChIP_macrophage_p20_1
│ │ ├── CTCF_ChIP_macrophage_p20_1_peaks.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p20_1_peaks.xls
│ │ └── CTCF_ChIP_macrophage_p20_1_summits.bed
│ ├── CTCF_ChIP_macrophage_p20_2
│ │ ├── CTCF_ChIP_macrophage_p20_2_peaks.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p20_2_peaks.xls
│ │ └── CTCF_ChIP_macrophage_p20_2_summits.bed
│ ├── CTCF_ChIP_macrophage_p3_1
│ │ ├── CTCF_ChIP_macrophage_p3_1_peaks.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p3_1_peaks.xls
│ │ └── CTCF_ChIP_macrophage_p3_1_summits.bed
│ ├── CTCF_ChIP_macrophage_p3_2
│ │ ├── CTCF_ChIP_macrophage_p3_2_peaks.narrowPeak
│ │ ├── CTCF_ChIP_macrophage_p3_2_peaks.xls
│ │ └── CTCF_ChIP_macrophage_p3_2_summits.bed
│ ├── CTCF_ChIP_MEF_p20_1
│ │ ├── CTCF_ChIP_MEF_p20_1_peaks.narrowPeak
│ │ ├── CTCF_ChIP_MEF_p20_1_peaks.xls
│ │ └── CTCF_ChIP_MEF_p20_1_summits.bed
│ └── CTCF_ChIP_MEF_p20_2
│ ├── CTCF_ChIP_MEF_p20_2_peaks.narrowPeak
│ ├── CTCF_ChIP_MEF_p20_2_peaks.xls
│ └── CTCF_ChIP_MEF_p20_2_summits.bed
Sicer¶
sicer
folder has a subfolder for each sample which contains the peakcalls from sicer in bed, broadPeak and tabular formats.
├── sicer
│ ├── CTCF_ChIP_macrophage_p20_1
│ │ ├── CTCF_ChIP_macrophage_p20_1_broadpeaks.bed
│ │ ├── CTCF_ChIP_macrophage_p20_1_broadpeaks.txt
│ │ └── CTCF_ChIP_macrophage_p20_1_sicer.broadPeak
│ ├── CTCF_ChIP_macrophage_p20_2
│ │ ├── CTCF_ChIP_macrophage_p20_2_broadpeaks.bed
│ │ ├── CTCF_ChIP_macrophage_p20_2_broadpeaks.txt
│ │ └── CTCF_ChIP_macrophage_p20_2_sicer.broadPeak
│ ├── CTCF_ChIP_macrophage_p3_1
│ │ ├── CTCF_ChIP_macrophage_p3_1_broadpeaks.bed
│ │ ├── CTCF_ChIP_macrophage_p3_1_broadpeaks.txt
│ │ └── CTCF_ChIP_macrophage_p3_1_sicer.broadPeak
│ ├── CTCF_ChIP_macrophage_p3_2
│ │ ├── CTCF_ChIP_macrophage_p3_2_broadpeaks.bed
│ │ ├── CTCF_ChIP_macrophage_p3_2_broadpeaks.txt
│ │ └── CTCF_ChIP_macrophage_p3_2_sicer.broadPeak
│ ├── CTCF_ChIP_MEF_p20_1
│ │ ├── CTCF_ChIP_MEF_p20_1_broadpeaks.bed
│ │ ├── CTCF_ChIP_MEF_p20_1_broadpeaks.txt
│ │ └── CTCF_ChIP_MEF_p20_1_sicer.broadPeak
│ └── CTCF_ChIP_MEF_p20_2
│ ├── CTCF_ChIP_MEF_p20_2_broadpeaks.bed
│ ├── CTCF_ChIP_MEF_p20_2_broadpeaks.txt
│ └── CTCF_ChIP_MEF_p20_2_sicer.broadPeak
Peaks-based QC¶
FRiP/Jaccard¶
FRiP and jaccard barplots/scatterplots/table are saved in the PeakQC
folder for each of the peak callers:
- gem
- macs
- broad
- narrow
- sicer
<working_dir>
│
│
├── PeakQC
│ ├── gem.FRiP_barplot.png
│ ├── gem.FRiP_scatterplot.png
│ ├── gem.FRiP_table.txt
│ ├── gem.jaccard_heatmap.pdf
│ ├── gem.jaccard_PCA.pdf
│ ├── gem_jaccard.txt
│ ├── macsBroad.FRiP_barplot.png
│ ├── macsBroad.FRiP_scatterplot.png
│ ├── macsBroad.FRiP_table.txt
│ ├── macsBroad.jaccard_heatmap.pdf
│ ├── macsBroad.jaccard_PCA.pdf
│ ├── macsBroad_jaccard.txt
│ ├── macsNarrow.FRiP_barplot.png
│ ├── macsNarrow.FRiP_scatterplot.png
│ ├── macsNarrow.FRiP_table.txt
│ ├── macsNarrow.jaccard_heatmap.pdf
│ ├── macsNarrow.jaccard_PCA.pdf
│ ├── macsNarrow_jaccard.txt
│ ├── sicer.FRiP_barplot.png
│ ├── sicer.FRiP_scatterplot.png
│ ├── sicer.FRiP_table.txt
│ ├── sicer.jaccard_heatmap.pdf
│ ├── sicer.jaccard_PCA.pdf
│ └── sicer_jaccard.txt
The above files allow you to make inter-sample comparison for a peak caller at a time. If you want to compare accross peak caller then you can you the following files in PeakQC
folder.
<working_dir>
│
│
├── PeakQC
│ ├── jaccard_heatmap.pdf
│ ├── jaccard_PCA.pdf
│ └── jaccard.txt
Replicate concordance with IDR¶
IDR
folder contains the outputs generated by comparing replicates peaks using idr for the following peak callers:
- sicer
- macs2
- broad
- narrow
Please note that IDR is not run for GEM.
<working_dir>
│
│
├── IDR
│ ├── macsBroad
│ │ ├── Macrophage_p20
│ │ │ ├── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt
│ │ │ └── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt.png
│ │ ├── Macrophage_p3
│ │ │ ├── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt
│ │ │ └── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt.png
│ │ └── MEF_p20
│ │ ├── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt
│ │ └── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt.png
│ ├── macsNarrow
│ │ ├── Macrophage_p20
│ │ │ ├── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt
│ │ │ └── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt.png
│ │ ├── Macrophage_p3
│ │ │ ├── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt
│ │ │ └── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt.png
│ │ └── MEF_p20
│ │ ├── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt
│ │ └── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt.png
│ └── sicer
│ ├── Macrophage_p20
│ │ ├── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt
│ │ └── CTCF_ChIP_macrophage_p20_1_vs_CTCF_ChIP_macrophage_p20_2.idrValue.txt.png
│ ├── Macrophage_p3
│ │ ├── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt
│ │ └── CTCF_ChIP_macrophage_p3_1_vs_CTCF_ChIP_macrophage_p3_2.idrValue.txt.png
│ └── MEF_p20
│ ├── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt
│ └── CTCF_ChIP_MEF_p20_1_vs_CTCF_ChIP_MEF_p20_2.idrValue.txt.png
Peak annotations with UROPA¶
UROPA annotations are provided while prioritizing the following genomic features:
- all genes
- protein-coding genes
- transcription start sites of all genes
- transcription start sites of protein-coding genes
Files are saved for all four peak callers (gem/macs2Narrow/macs2Broad/sicer) in individual subfolders for all called peaks. Similarly the DiffBind results are also annotated with UROPA in the DiffBind
folder if any contrasts are provided at the time of running phase2 of the pipeline.
<working_dir>
│
│
├── UROPA_annotations
│ ├── gem
│ │ ├── CTCF_ChIP_macrophage_p20_1.gem.genes.json
│ │ ├── CTCF_ChIP_macrophage_p20_1.gem.prot.json
│ │ ├── CTCF_ChIP_macrophage_p20_1.gem.TSSgenes.json
│ │ ├── CTCF_ChIP_macrophage_p20_1.gem.TSSprot.json
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_genes_allhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_genes_finalhits.bed
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_genes_finalhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_genes_summary.pdf
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_prot_allhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_prot_finalhits.bed
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_prot_finalhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_prot_summary.pdf
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSgenes_allhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSgenes_finalhits.bed
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSgenes_finalhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSgenes_summary.pdf
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSprot_allhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSprot_finalhits.bed
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSprot_finalhits.txt
│ │ ├── CTCF_ChIP_macrophage_p20_1_gem_uropa_TSSprot_summary.pdf
│ │ ├── ...
│ │ ├── ...
│ ├── macsNarrow
│ ├── macsBroad
│ ├── sicer
│ ├── DiffBind
Motif analysis with HOMER¶
de novo and known motif enrichment is performed using HOMER with the entire genome as background. The HOMER_motifs
folder contains a subfolder for all four peak callers (gem/macsBroad/macsNarrow/sicer). Each of these subfolders further contain a subfolder per sample peak call as shown below:
<working_dir>
│
│
├── HOMER_motifs
│ ├── macsBroad
│ │ ├── CTCF_ChIP_macrophage_p20_1_macsBroad_GW
│ │ │ ├── homerMotifs.all.motifs
│ │ │ ├── homerMotifs.motifs10
│ │ │ ├── homerMotifs.motifs12
│ │ │ ├── homerMotifs.motifs8
│ │ │ ├── homerResults
│ │ │ ├── ...
│ │ │ ├── ...
│ │ │ ├── homerResults.html
│ │ │ ├── knownResults
│ │ │ ├── knownResults.html
│ │ │ ├── knownResults.txt
│ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ ├── CTCF_ChIP_macrophage_p20_2_macsBroad_GW
│ │ ├── CTCF_ChIP_macrophage_p3_1_macsBroad_GW
│ │ ├── CTCF_ChIP_macrophage_p3_2_macsBroad_GW
│ │ ├── ...
│ │ ├── ...
│ ├── macsNarrow
│ │ ├── CTCF_ChIP_macrophage_p20_1_macsNarrow_GW
│ │ ├── CTCF_ChIP_macrophage_p20_2_macsNarrow_GW
│ │ ├── CTCF_ChIP_macrophage_p3_1_macsNarrow_GW
│ │ ├── CTCF_ChIP_macrophage_p3_2_macsNarrow_GW
│ │ ├── ...
│ │ ├── ...
│ ├── sicer
│ │ ├── CTCF_ChIP_macrophage_p20_1_sicer_GW
│ │ ├── CTCF_ChIP_macrophage_p20_2_sicer_GW
│ │ ├── CTCF_ChIP_macrophage_p3_1_sicer_GW
│ │ ├── CTCF_ChIP_macrophage_p3_2_sicer_GW
│ │ ├── ...
│ │ ├── ...
DiffBind results (Optional)¶
DiffBind is run for all contrast provided in the contrast.tab
file using:
- DESeq2
- EdgeR
Each contrasts' results are saved in a separate subfolder along with a combined html report as shown below:
<working_dir>
│
│
├── DiffBind
│ ├── Macrophage_p3_vs_Macrophage_p20-gem
│ │ ├── DiffBind_pipeliner.Rmd
│ │ ├── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind_Deseq2.bed
│ │ ├── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind_Deseq2.txt
│ │ ├── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind_EdgeR.bed
│ │ ├── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind_EdgeR.txt
│ │ ├── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind.html
│ │ └── Macrophage_p3_vs_Macrophage_p20-gem_Diffbind_prep.csv
│ ├── Macrophage_p3_vs_Macrophage_p20-macsNarrow
│ │ ├── ...
│ │ ├── ...
│ ├── Macrophage_p3_vs_Macrophage_p20-macsBroad
│ │ ├── ...
│ │ ├── ...
│ ├── Macrophage_p3_vs_Macrophage_p20-sicer
│ │ ├── ...
│ │ ├── ...
│ ├── ...
│ ├── ...
Other important files¶
Some of the other important files in the working_dir
are:
contrast.tab
: if contrasts are supplied for running DiffBind, then they are saved in this tab-delimited fileHPC_usage_table.txt
: tabular report of how each Snakemake rule utilized the HPC cluster with details. The older version of this file from the phase1 execution is also retained by renaming it.
<working_dir>
│
│
├── ...
├── ...
├── contrast.tab
├── HPC_usage_table.txt
├── HPC_usage_table.txt.2020_06_22_05_14_24
├── ...
├── ...
All other folder and files in the working_dir
are for housekeeping and are required for successful execution of the CCBR_Pipeliner.