Anatomy of a modular toolchain for reproducible bioinformatics workflows
Kelly Sovacool, PhD
Bioinformatics Software Engineer
CCR Collaborative Bioinformatics Resource
May 14, 2026
Enabling Reproducibility
In order for a computational biology study to be reproducible, other people must be able to run the analysis code themselves.
Enabling Reproducibility: Bare Minimum
- Share the code publicly
- Document how to run the code
Minimal Reproducibility
- Share the code publicly
- Document how to run the code
Beyond Minimal Reproducibility
Scalability, Reusability, & Sustainability
- Ensure the code scales for large datasets
- Reuse components of your code, and ensure others can reuse it too
- Adopt maintenance practices to sustain the project for the long term
CCBR Pipelines
On the Biowulf HPC
$ module load ccbrpipeliner
PIPELINES:
ASPEN v1.1 ATAC-seq https://ccbr.github.io/ASPEN/1.1
CARLISLE v2.7 CUT&RUN https://ccbr.github.io/CARLISLE/2.7
CHAMPAGNE v0.5 ChIP-seq https://ccbr.github.io/CHAMPAGNE/0.5
CHARLIE v0.12 circRNAs https://ccbr.github.io/CHARLIE/0.12
CRISPIN v1.2 CRISPR https://ccbr.github.io/CRISPIN/1.2
ESCAPE v1.2 EV-seq https://ccbr.github.io/ESCAPE/1.2
LOGAN v0.3 whole genome seq https://ccbr.github.io/LOGAN/0.3
RENEE v2.7 bulk RNA-seq https://ccbr.github.io/RENEE/2.7
SINCLAIR v0.3 scRNA-seq https://ccbr.github.io/SINCLAIR/0.3
XAVIER v3.2 whole exome-seq https://ccbr.github.io/XAVIER/3.2
TOOLS:
spacesavers2 v0.14 https://ccbr.github.io/spacesavers2/
permfix v0.6 https://github.com/ccbr/permfix/
ccbr_tools v0.4 https://ccbr.github.io/Tools/
Workflow template for new pipelines
Workflow template for new pipelines
The Command-line Interface
Nextflow native interface
nextflow run ./path/to/CCBR/CHAMPAGNE \
-profile biowulf,slurm,singularity \
--input /data/$USER/chipseq-project/data/samplesheet.csv
-resume
CCBR custom interface
champagne run \
--input /data/$USER/chipseq-project/data/samplesheet.csv
Same interface, different pipeline
sinclair run \
--input /data/$USER/scrna-project/data/samplesheet.csv
The Command-line interface
Python Package
Nextflow modules & subworkflows
Modular design enables code re-use
Docker containers
Dependency management: solved
Contribution Process
Sustainable development practices for high-quality code
Contribution Process: Pull Requests
Contribution Process
Code review checklist
Continuous Integration
- Integrate source code changes frequently
- Ensure the main branch is always in a working state
GitHub Actions runs CI checks on every change
Deploy Documentation with GitHub Pages
Continuous Integration
Reusable GitHub Actions
Anatomy of our workflow template
Anatomy of our workflow template
Discussion
- What barriers to reproducibility do you face in your development & analysis projects?
- How can you reduce repetitive code in your own projects?
- Would your team benefit from developing your own reusable packages and template repositories?
- Are you taking advantage of GitHub’s features like Actions and Pages?
- Is your contribution process helping maintain code quality?
- Are there any tools or processes you would add to this stack?