Expected Outputs¶

Upon successful completion, CARLISLE generates a comprehensive directory structure under WORKDIR/results. Each subdirectory contains outputs corresponding to specific stages of the workflow — from raw alignment statistics to annotated peak results and quality control summaries.

Directory Overview¶

alignment_stats/ – Contains detailed alignment reports for each sample, including mapping efficiency, read depth, and spike-in alignment metrics.
bam/ – Stores sorted and indexed BAM files for all samples. This directory also includes per-sample and spike-in alignment statistics useful for downstream normalization and QC.
bedgraph/ – Includes BEDGRAPH coverage tracks summarizing read density across the genome. These files serve as intermediates for visualization and peak-calling validation.
bigwig/ – Contains BigWig files generated via bamCoverage --scaleFactor, where the scale factor reflects the spike-in read count or library size depending on norm_method. These are the same normalization values applied to the bedgraph tracks, ensuring genome browser tracks and deepTools heatmaps/profiles are directly comparable. Suitable for visualization in UCSC Genome Browser or IGV.

ℹ️ Note: There are no separately RPGC-normalized bigwigs. All bigwigs — including those consumed by deepTools heatmap and profile rules — come from this single bamCoverage step.

fragments/ – Stores fragment length distributions and deduplicated fragment data (particularly important for CUT&RUN and CUT&Tag experiments). Useful for assessing fragment size enrichment and MNase digestion efficiency.
peaks/ – The core results directory containing called peaks, differential comparisons, and annotations.
all.peaks.with_control.txt / all.peaks.without_control.txt – Aggregate peak count tables written at the top of peaks/ (not inside a threshold subdirectory). Filename encodes the run_without_controls setting; older versions used all.peaks.txt.
Peak_counts.with_control.xlsx / Peak_counts.without_control.xlsx – Excel peak count summaries, also at the top of peaks/. Older versions used Peak counts.xlsx with a space — update any downstream scripts accordingly.
Subdirectories are organized by quality thresholds (e.g., 0.05, 0.01), representing the significance cutoffs applied during peak calling.
Each quality threshold directory includes:
- contrasts/ – Contains results of differential binding analyses defined in the contrast manifest, including:
- Differential enrichment results from DESeq2 (AUC-based and fragment-based)
- 3-column BED files for up-regulated peaks in each group (up_group1.bed, up_group2.bed)
- homer_deg/ – HOMER motif enrichment analysis for differentially enriched peaks (when run_motif_enrichment_deg_peaks: true)
- ame_deg/ – AME (Analysis of Motif Enrichment) results for differentially enriched peaks using HOCOMOCO v14 CORE motifs (when run_motif_enrichment_deg_peaks: true)
- <peak_caller>/ – Subdirectories for each peak caller (e.g., macs2, seacr, gopeaks). Each includes raw peak calls and annotated results.
- peak_output/ – Raw peak calls organized by control mode:
  - individual/ – Peaks called using individual replicate controls (present for all analyses)
  - pooled/ – Peaks called using merged high-depth controls (present when pool_controls: true)
- annotation/ – Contains enriched feature and pathway analyses, organized by control mode:
  - go_enrichment/ – Results from ChIP-Enrich gene set enrichment, generated when run_go_enrichment: true is enabled.
  - homer/ – Output from HOMER motif discovery and annotation.
  - rose/ – Output from ROSE super-enhancer analysis, generated when run_rose: true is specified.
deeptools/ – Genome-wide sample correlation and PCA outputs, generated from 10 kb bin read counts across all samples. Two sets of files are produced:
all.{dupstatus}.* – Includes all samples (treatments + IgG controls). Files:
- all.{dupstatus}.readCounts.npz – Compressed read count matrix (deepTools format)
- all.{dupstatus}.PearsonCorr.tab – Pairwise Pearson correlation matrix
- all.{dupstatus}.Pearson_heatmap.png – Hierarchically clustered Pearson correlation heatmap
- all.{dupstatus}.PCA.tab – PCA coordinates table
- all.{dupstatus}.PearsonPCA.png – Formatted PCA plot with sample labels and explained variance
treatments_only.{dupstatus}.* – Includes treatment samples only (IgG controls excluded). Generated only when control samples are present (run_without_controls: false). Same file set as above with treatments_only prefix. This plot reveals treatment-to-treatment differences that are often masked when IgG controls dominate a principal component.

ℹ️ Why two PCA plots? IgG controls have very low signal and distinctive read distributions. In the all-samples PCA, IgG samples often drive separation along PC2, making it hard to see differences between treatment conditions. The treatments-only PCA removes this effect and is typically more informative for assessing replicate consistency and condition separation.

ℹ️ {dupstatus} is either dedup (after linear deduplication) or no_dedup (without deduplication), matching the dupstatus config parameter.

qc/ – Centralized quality control directory containing:
fastqc_raw/ – Per-sample FastQC reports on raw reads
fqscreen_raw/ – FastQ Screen contamination reports
multiqc_report.html – Aggregated MultiQC report covering FastQC, alignment, and duplication statistics across all samples
spikein/ – Spike-in normalization QC plots and tables (present when norm_method: spikein), including per-sample spike-in alignment rates and scaling factors

Example Directory Layout¶

Below is an example of the CARLISLE output structure for a typical CUT&RUN experiment:

results/
├── alignment_stats/
├── bam/
│   └── pooled_controls/           # Merged control BAMs (when pool_controls: true)
├── bedgraph/
│   └── pooled_controls/           # Pooled control bedgraphs (when pool_controls: true)
├── bigwig/
├── deeptools/
│   ├── all.dedup.PearsonPCA.png           # PCA plot: all samples (dedup)
│   ├── all.dedup.Pearson_heatmap.png      # Correlation heatmap: all samples
│   ├── treatments_only.dedup.PearsonPCA.png    # PCA plot: treatments only (when controls present)
│   └── treatments_only.dedup.Pearson_heatmap.png
├── fragments/
│   └── pooled_controls/           # Pooled control fragments (when pool_controls: true)
├── peaks/
│   ├── all.peaks.with_control.txt              # aggregate peak counts (with controls)
│   ├── all.peaks.without_control.txt           # aggregate peak counts (control-free runs)
│   ├── Peak_counts.with_control.xlsx           # Excel peak counts (with controls)
│   ├── Peak_counts.without_control.xlsx        # Excel peak counts (control-free runs)
│   ├── 0.05/
│   │   ├── contrasts/
│   │   │   ├── homer_deg/                          # DEG motif enrichment (HOMER)
│   │   │   ├── ame_deg/                            # DEG motif enrichment (AME/HOCOMOCO)
│   │   │   ├── up_group1.bed                       # 3-column BED for group1 up-regulated peaks
│   │   │   └── up_group2.bed                       # 3-column BED for group2 up-regulated peaks
│   │   ├── gopeaks/
│   │   │   ├── peak_output/
│   │   │   │   ├── individual/    # Peaks with individual controls
│   │   │   │   └── pooled/        # Peaks with pooled controls
│   │   │   └── annotation/
│   │   │       ├── individual/
│   │   │       │   ├── go_enrichment/
│   │   │       │   ├── homer/
│   │   │       │   └── rose/
│   │   │       └── pooled/
│   │   │           ├── go_enrichment/
│   │   │           ├── homer/
│   │   │           └── rose/
│   │   ├── macs2/
│   │   │   ├── peak_output/
│   │   │   │   ├── individual/
│   │   │   │   └── pooled/
│   │   │   └── annotation/
│   │   │       ├── individual/
│   │   │       │   ├── go_enrichment/
│   │   │       │   ├── homer/
│   │   │       │   └── rose/
│   │   │       └── pooled/
│   │   │           ├── go_enrichment/
│   │   │           ├── homer/
│   │   │           └── rose/
│   │   └── seacr/
│   │       ├── peak_output/
│   │       │   ├── individual/
│   │       │   └── pooled/
│   │       └── annotation/
│   │           ├── individual/
│   │           │   ├── go_enrichment/
│   │           │   ├── homer/
│   │           │   └── rose/
│   │           └── pooled/
│   │               ├── go_enrichment/
│   │               ├── homer/
│   │               └── rose/
│   └── 0.01/
│       └── ...
└── qc/
    ├── fastqc_raw/
    ├── fqscreen_raw/
    ├── multiqc_report.html
    └── spikein/                   # Spike-in QC plots/tables (when norm_method: spikein)

🧭 Tip: The structure is intentionally hierarchical, enabling automated report generation and simplifying downstream integration with visualization tools and statistical frameworks.

Interpreting Results¶

QC: What to Check First¶

Alignment stats / MultiQC (alignment_stats/, qc/multiqc_report.html):

Genome alignment rate should be >70% for typical CUT&RUN. Rates <50% suggest adapter contamination, wrong genome, or library prep failure.
Spike-in alignment rate should be consistent across replicates within a condition. High CV in spike-in rates makes spike-in normalization unreliable; fall back to norm_method: library.
Duplication rates for CUT&RUN/CUT&Tag are typically lower than ChIP-seq (~10–40%) due to MNase-based cleavage. Very low duplication (<5%) may indicate insufficient library complexity. Very high duplication (>60%) suggests over-amplification or low-complexity input.

Fragment length distribution (fragments/):

For histone marks (H3K4me3, H3K27me3, etc.), expect a nucleosomal ladder: a dominant sub-nucleosomal peak (~150–200 bp) for mono-nucleosome protection and smaller peaks at ~300–400 bp and ~500–600 bp. CUT&RUN enriches for this pattern more sharply than ChIP-seq.
For transcription factors, expect a dominant sub-nucleosomal peak at ~100–150 bp (accessible DNA footprint). A flat or bimodal distribution without a sub-nucleosomal peak indicates poor antibody specificity or failed experiment.

Spike-in QC (qc/spikein/):

Examine the scaling factors table. Scaling factors should be inversely proportional to genome-aligned read depth — high-yield samples get smaller scale factors. If scaling factors vary >10-fold within a condition, review whether spike-in input was consistent across samples.

Peak Calls: Choosing and Filtering¶

Peak caller selection:

Caller	Best for	Output type	Notes
GoPeaks (narrow)	TFs, sharp histone marks (H3K4me3)	narrowPeak	Recommended for most CUT&RUN; explicitly optimized for low-background assays
GoPeaks (broad)	Broad histone marks (H3K27me3, H3K9me3)	broadPeak	—
MACS2 (narrow)	TFs, sharp marks	narrowPeak	Use when comparing to ChIP-seq datasets; widely adopted
MACS2 (broad)	Broad marks	broadPeak	DESeq2 differential analysis often fails on broadPeak due to excessive peak counts
SEACR (stringent/relaxed)	CUT&RUN-specific	bedGraph-derived	Uses AUC signal thresholding; no model fitting; less sensitive but very specific

Quality thresholds: Peaks are called at each threshold in quality_thresholds independently. For discovery analyses, start with q=0.05. Tighten to q=0.01 for high-confidence sets used in motif enrichment or validation. Do not use a single threshold for all downstream analyses — cross-threshold consistency is itself a signal of peak robustness.

Individual vs. pooled controls:

Individual-mode peaks are more sensitive to replicate-specific variation but can include replicate-specific artifacts.
Pooled-mode peaks benefit from increased control depth and lower noise floors, but may miss low-occupancy binding events present only in one replicate.
Use pooled-mode peaks as the primary call set when replicates are well-correlated (Pearson r > 0.9 in deepTools); fall back to individual if replicates diverge.

Differential Enrichment: DESeq2 Results¶

DESeq2 tests for differential AUC (area under the coverage curve) or fragment counts at consensus peak loci between two conditions. Results in contrasts/ include:

*_AUC_DESeq2_results.tsv — AUC-based: integrates signal across the peak; better for broad marks or when peak boundaries shift between conditions.
*_fragments_DESeq2_results.tsv — Fragment count-based: more analogous to RNA-seq count models; better for narrow marks with stable peak boundaries.

Key columns to interpret:

Column	Interpretation
`log2FoldChange`	Direction and magnitude; filtered at `contrasts_lfc_cutoff` (default 0.59 ≈ 1.5-fold)
`padj`	BH-adjusted p-value; filtered at `contrasts_fdr_cutoff` (default 0.05)
`baseMean`	Mean normalized signal across all samples; very low baseMean peaks are often noise

Peaks with baseMean < 5 that pass FDR are suspect — flag them before reporting. DESeq2 can return significant p-values for very low-count features under low dispersion estimates.

Up-regulated BED files (up_group1.bed, up_group2.bed): 3-column BED of peaks with significantly higher enrichment in each group. Use these directly as input for downstream tools (GREAT, HOMER, AME, bedtools intersect with other genomic features).

Motif Enrichment: HOMER and AME¶

HOMER (homer_deg/, homer/): Performs de novo motif discovery and known motif enrichment against the JASPAR/HOMER database. The top de novo motifs for DEG peaks point to TFs whose binding sites are preferentially accessible or occupied under that condition.

The knownResults.html file ranks enrichment of HOMER's curated motif library. Focus on motifs with p < 1e-5 and % of target sequences > 10%.
De novo motifs that match known TF families validate antibody specificity and experimental biology; unexpected motifs suggest co-occupancy or indirect recruitment.

AME (ame_deg/): Tests enrichment of HOCOMOCO v14 CORE TF motifs (human/mouse) in up-regulated peak sequences vs. a shuffled background. HOCOMOCO v14 covers ~900 TF binding models with high positional weight matrix precision.

ame.tsv columns: motif_ID, adj_p-value, E-value, TP (true positives), FP (false positives). Use adj_p-value < 0.05 and TP / (TP + FP) > 0.1 as a conservative filter.
Cross-reference HOMER and AME hits: motifs appearing in both analyses under the same condition are high-confidence candidates for further validation (e.g., ChIP-seq, luciferase assay).

GO / Pathway Enrichment: ChIP-Enrich¶

go_enrichment/ contains ChIP-Enrich results, which correct for gene locus length when testing peak-to-gene assignment. This avoids the systematic bias in standard Fisher-based enrichment where long genes accumulate more peaks by chance.

chipenrich method: assigns peaks to nearest TSS within a configurable window and tests genesets. Best for narrow marks (TFs, H3K4me3) where peaks map to specific promoter/enhancer-gene links.
polyenrich method: counts peaks per gene locus rather than presence/absence; better for broad marks.
Key output file: *_enrichment_results.tsv. Columns FDR and Effect are most informative. Positive Effect = geneset enriched relative to background gene set size.

Super-Enhancers: ROSE ¶

rose/ identifies super-enhancers by stitching H3K27ac (or other active mark) peaks within stitch_distance (default 12,500 bp), then applying an inflection-point cutoff on the ranked signal curve. The inflection point separates typical enhancers from super-enhancers.

*_SuperEnhancers.bed: the final super-enhancer calls. Overlap with known oncogenes or lineage TFs to nominate driver regulatory elements.
*_Enhancers_withSuper.bed: all stitched enhancers ranked by signal; use this to inspect the inflection point and evaluate whether the cutoff is biologically reasonable.
ROSE is only meaningful for active chromatin marks (H3K27ac, H3K4me1). Running it on H3K27me3 or TF datasets is not informative.