Pipeline Tutorial¶

Welcome to the CARLISLE Pipeline Tutorial! This guide walks you through running the CARLISLE pipeline using the provided test dataset on the NIH Biowulf HPC environment.

Getting Started¶

Before beginning, review the Getting Started Guide for installation, environment setup, and dependency loading instructions.

Step 1. Set Your Working Directory¶

Navigate to your project directory on Biowulf:

cd /path/to/your/project/directory

Step 2. Initialize the Pipeline¶

Load the CARLISLE module and initialize your working directory:

module load ccbrpipeliner
carlisle --runmode=init --workdir=/path/to/output/dir

This command copies the required configuration, manifest, and Snakefiles into your chosen output directory (WORKDIR). Initialization must be done before any other CARLISLE operation.

Submitting the Test Data¶

The test dataset provided with CARLISLE enables you to validate the installation and confirm correct execution. The test includes minimal FASTQ files, configurations, and manifests.

Step 3. Run the Test Command¶

Execute the built-in test run to validate pipeline functionality:

carlisle --runmode=runtest --workdir=/path/to/output/dir

This command prepares the test data, performs a dry-run to validate workflow dependencies, and then submits the pipeline to the Biowulf SLURM cluster.

Expected Output¶

During a successful test run, you should see a job summary similar to the one below, detailing the number of tasks executed per Snakemake rule:

Job stats:
job                              count    min threads    max threads
-----------------------------  -------  -------------  -------------
DESeq                                  24              1              1
align                                   9             56             56
alignstats                              9              2              2
all                                     1              1              1
bam2bg                                  9             32             32
create_contrast_data_files             24              1              1
create_contrast_peakcaller_files       12              1              1
create_reference                        1             32             32
create_replicate_sample_table           1              1              1
diffbb                                 24              1              1
filter                                 18              2              2
findMotif                              96              6              6
gather_alignstats                       1              1              1
go_enrichment                          12              1              1
gopeaks_broad                          16              2              2
gopeaks_narrow                         16              2              2
macs2_broad                            16              2              2
macs2_narrow                           16              2              2
make_counts_matrix                     24              1              1
multiqc                                 2              1              1
qc_fastqc                               9              1              1
rose                                   96              2              2
seacr_relaxed                          16              2              2
seacr_stringent                        16              2              2
spikein_assessment                      1              1              1
trim                                    9             56             56
total                                 478              1             56

💡 Tip: This job summary confirms successful rule execution, resource allocation, and workflow orchestration.

Running the Test in Control-Free Mode¶

To validate the control-free code path, you can run the test dataset without controls by editing config/config.yaml in your working directory after init:

run_without_controls: true
quality_thresholds: "0.01"

You also need a simplified sample manifest that omits control entries. Edit config/samples.tsv so that all rows have isControl: N and leave controlName and controlReplicateNumber blank:

sampleName	replicateNumber	isControl	controlName	controlReplicateNumber	path_to_R1	path_to_R2
53_H3K4me3	1	N			`<path>/53_H3K4me3_1.R1.fastq.gz`	`<path>/53_H3K4me3_1.R2.fastq.gz`
53_H3K4me3	2	N			`<path>/53_H3K4me3_2.R1.fastq.gz`	`<path>/53_H3K4me3_2.R2.fastq.gz`

Then dryrun to confirm the DAG resolves cleanly:

carlisle --runmode=dryrun --workdir=/path/to/output/dir

In control-free mode the job count will be lower (no pooled-controls rules, no control-paired peak calls), but all three peak callers (MACS2, SEACR, GoPeaks) will run against each treatment replicate using their no-control execution paths.