CHAMPAGNE Workflow Overview¶

Workflow Diagram

CHAMPAGNE is a comprehensive ChIP-seq analysis pipeline that integrates multiple preprocessing, quality control, peak calling, annotation, and downstream analysis steps. Below is a detailed breakdown of each workflow stage.

Input Validation [INPUT_CHECK]¶

Validates the sample sheet and contrast sheet to ensure the samples and contrasts names are defined correctly.

Adapter Trimming [CUTADAPT]¶

Removes sequencing adapters and low-quality bases from raw reads.

Input Pooling [POOL_INPUTS]¶

Concatenates trimmed reads from multiple input files belonging to the same sample.

Genome Preparation [PREPARE_GENOME]¶

Prepares genome information for a custom genome when given a fasta & gtf file.

Generates reference genome files and indices
Creates chromosome sizes, effective genome size, blacklist regions, and other reference data
Outputs annotation files for downstream use

Blacklist Filtering [FILTER_BLACKLIST]¶

Removes reads mapping to known problematic genomic regions (blacklisted regions).

Genome Alignment [ALIGN_GENOME]¶

Aligns reads to reference genome using BWA
Filters alignments by quality score
Sorts BAM files for downstream processing
Generates flagstat metrics at alignment and filtering stages

Deduplication [DEDUPLICATE]¶

Removes PCR duplicates using MACS2 filterdup (single-end) or Picard MarkDuplicates (paired-end)
Converts BAM files to tag-align format for MACS2/SICER peak calling
Maintains BAM/BAMPE format for GEM peak calling
Generates flagstat metrics for quality assessment

Quality Metrics [PHANTOM_PEAKS]¶

Calculates fragment length estimates and quality metrics using PhantomPeaks
Assesses library complexity and cross-correlation

Quality Control [QC]¶

Runs FastQC on raw and trimmed reads for sequence quality assessment
Runs FASTQ_SCREEN for contamination detection
Estimates library complexity using PRESEQ
Compiles alignment and deduplication statistics
Generates comprehensive QC report for MultiQC

Spike-in Normalization [ALIGN_SPIKEIN] (Optional)¶

Aligns reads to spike-in genome (e.g. Drosophila) for exogenous normalization and calculates normalization scaling factors using either the Guenther or deLorenzi method. See the spike-in normalization page for more details.

Genome Coverage Visualization [DEEPTOOLS]¶

Generates genome-wide coverage bigWig files
Creates input-normalized bigWig files (background subtraction)
Produces correlation matrices and heatmaps
Calculates fingerprint plots for quality assessment

Peak Calling [CALL_PEAKS]¶

Identifies enriched regions using multiple peak-calling algorithms:
- MACS2 (broad and narrow)
- GEM
- SICER
Calculates fraction of reads in peaks (FRiP) for quality assessment
Computes Jaccard index for peak overlap between samples
Analyzes peak width distributions
Outputs peak files in BED format

Consensus Peak Calling (Optional)¶

Union Method [CONSENSUS_UNION]¶

Merges peaks across replicates using union approach
Retains all peaks detected in any replicate
Annotates and performs motif analysis on consensus peaks

Corces Method [CONSENSUS_CORCES]¶

Merges peaks using the Corces algorithm (based on peak summits)
More stringent consensus approach
Annotates and performs motif analysis on consensus peaks

Peak Annotation [ANNOTATE]¶

Uses ChIPseeker to assign genomic features to peaks and identify nearest genes

Motif Discovery [MOTIFS]¶

Generates motif predictions and enrichment statistics and identifies transcription factors likely binding at detected peaks.

HOMER performs reference and de novo motif discovery in peak sequences
MEME-AME performs known motif enrichment analysis against genomic motif databases

Differential Analysis [DIFF] (Optional)¶

Performs differential binding analysis when contrasts are provided using one of two methods:

DiffBind, used when each sample has at least 2 replicates
MAnorm, used when any sample has only 1 replicate

Quality Report Aggregation [MULTIQC]¶

Compiles all quality metrics and statistics into an interactive HTML report summarizing all pipeline results