Pipeline Tutorial¶
Welcome to the CARLISLE Pipeline Tutorial! This guide walks you through running the CARLISLE pipeline using the provided test dataset on the NIH Biowulf HPC environment.
Getting Started¶
Before beginning, review the Getting Started Guide for installation, environment setup, and dependency loading instructions.
Step 1. Set Your Working Directory¶
Navigate to your project directory on Biowulf:
cd /path/to/your/project/directory Step 2. Initialize the Pipeline¶
Load the CARLISLE module and initialize your working directory:
module load ccbrpipeliner
carlisle --runmode=init --workdir=/path/to/output/dir This command copies the required configuration, manifest, and Snakefiles into your chosen output directory (WORKDIR). Initialization must be done before any other CARLISLE operation.
Submitting the Test Data¶
The test dataset provided with CARLISLE enables you to validate the installation and confirm correct execution. The test includes minimal FASTQ files, configurations, and manifests.
Step 3. Run the Test Command¶
Execute the built-in test run to validate pipeline functionality:
carlisle --runmode=runtest --workdir=/path/to/output/dir This command prepares the test data, performs a dry-run to validate workflow dependencies, and then submits the pipeline to the Biowulf SLURM cluster.
Expected Output¶
During a successful test run, you should see a job summary similar to the one below, detailing the number of tasks executed per Snakemake rule:
Job stats:
job count min threads max threads
----------------------------- ------- ------------- -------------
DESeq 24 1 1
align 9 56 56
alignstats 9 2 2
all 1 1 1
bam2bg 9 32 32
create_contrast_data_files 24 1 1
create_contrast_peakcaller_files 12 1 1
create_reference 1 32 32
create_replicate_sample_table 1 1 1
diffbb 24 1 1
filter 18 2 2
findMotif 96 6 6
gather_alignstats 1 1 1
go_enrichment 12 1 1
gopeaks_broad 16 2 2
gopeaks_narrow 16 2 2
macs2_broad 16 2 2
macs2_narrow 16 2 2
make_counts_matrix 24 1 1
multiqc 2 1 1
qc_fastqc 9 1 1
rose 96 2 2
seacr_relaxed 16 2 2
seacr_stringent 16 2 2
spikein_assessment 1 1 1
trim 9 56 56
total 478 1 56 💡 Tip: This job summary confirms successful rule execution, resource allocation, and workflow orchestration.
Running the Test in Control-Free Mode¶
To validate the control-free code path, you can run the test dataset without controls by editing config/config.yaml in your working directory after init:
run_without_controls: true
quality_thresholds: "0.01" You also need a simplified sample manifest that omits control entries. Edit config/samples.tsv so that all rows have isControl: N and leave controlName and controlReplicateNumber blank:
| sampleName | replicateNumber | isControl | controlName | controlReplicateNumber | path_to_R1 | path_to_R2 |
|---|---|---|---|---|---|---|
| 53_H3K4me3 | 1 | N | <path>/53_H3K4me3_1.R1.fastq.gz | <path>/53_H3K4me3_1.R2.fastq.gz | ||
| 53_H3K4me3 | 2 | N | <path>/53_H3K4me3_2.R1.fastq.gz | <path>/53_H3K4me3_2.R2.fastq.gz |
Then dryrun to confirm the DAG resolves cleanly:
carlisle --runmode=dryrun --workdir=/path/to/output/dir In control-free mode the job count will be lower (no pooled-controls rules, no control-paired peak calls), but all three peak callers (MACS2, SEACR, GoPeaks) will run against each treatment replicate using their no-control execution paths.