Anatomy of a modular toolchain for reproducible bioinformatics workflows

Kelly Sovacool, PhD

Bioinformatics Software Engineer

CCR Collaborative Bioinformatics Resource

May 14, 2026

Reproducible Bioinformatics

Reproducible Bioinformatics

  • Reproducibilty is the ability to repeat an analysis with the same methods on the same data and get the same result
  • Replicability is the ability to repeat an analysis with the same methods on different data and get the same result


Methods Same data Different data
Same methods Reproducibility Replicability
Different methods Robustness Generalizability

Enabling Reproducibility

In order for a computational biology study to be reproducible, other people must be able to run the analysis code themselves.

Enabling Reproducibility: Bare Minimum

  • Share the code publicly
  • Document how to run the code

Minimal Reproducibility

  • Share the code publicly
  • Document how to run the code

Beyond Minimal Reproducibility

Scalability, Reusability, & Sustainability

  • Ensure the code scales for large datasets
  • Reuse components of your code, and ensure others can reuse it too
  • Adopt maintenance practices to sustain the project for the long term

CCBR Pipelines

On the Biowulf HPC

$ module load ccbrpipeliner

PIPELINES:
ASPEN       v1.1    ATAC-seq            https://ccbr.github.io/ASPEN/1.1
CARLISLE    v2.7    CUT&RUN             https://ccbr.github.io/CARLISLE/2.7
CHAMPAGNE   v0.5    ChIP-seq            https://ccbr.github.io/CHAMPAGNE/0.5
CHARLIE     v0.12   circRNAs            https://ccbr.github.io/CHARLIE/0.12
CRISPIN     v1.2    CRISPR              https://ccbr.github.io/CRISPIN/1.2
ESCAPE      v1.2    EV-seq              https://ccbr.github.io/ESCAPE/1.2
LOGAN       v0.3    whole genome seq    https://ccbr.github.io/LOGAN/0.3
RENEE       v2.7    bulk RNA-seq        https://ccbr.github.io/RENEE/2.7
SINCLAIR    v0.3    scRNA-seq           https://ccbr.github.io/SINCLAIR/0.3
XAVIER      v3.2    whole exome-seq     https://ccbr.github.io/XAVIER/3.2

TOOLS:
spacesavers2    v0.14           https://ccbr.github.io/spacesavers2/
permfix         v0.6            https://github.com/ccbr/permfix/
ccbr_tools      v0.4            https://ccbr.github.io/Tools/

How can we develop bioinformatics workflows that are scalable, reusable, and sustainable to maximize reproducibility?

Toolchain for Reproducible Bioinformatics

The Bioinformatics workflow

Example workflow diagram from CHAMPAGNE: ChIP-seq pipeline

The Bioinformatics workflow

Nextflow for scalable orchestration

Workflow template for new pipelines

Workflow template for new pipelines

The Command-line Interface

Nextflow native interface

nextflow run ./path/to/CCBR/CHAMPAGNE \
  -profile biowulf,slurm,singularity \
  --input /data/$USER/chipseq-project/data/samplesheet.csv
  -resume

CCBR custom interface

champagne run \
  --input /data/$USER/chipseq-project/data/samplesheet.csv

Same interface, different pipeline

sinclair run \
  --input /data/$USER/scrna-project/data/samplesheet.csv

The Command-line interface

Python Package

Nextflow modules & subworkflows

Modular design enables code re-use

Docker containers

Dependency management: solved

Contribution Process

Sustainable development practices for high-quality code

Contribution Process: Pull Requests

Contribution Process

Code review checklist

Continuous Integration

  • Integrate source code changes frequently
  • Ensure the main branch is always in a working state

GitHub Actions runs CI checks on every change

Deploy Documentation with GitHub Pages

Continuous Integration

Reusable GitHub Actions

Anatomy of our workflow template

Anatomy of our workflow template

Scalable, Reusable, Sustainable, Reproducible Bioinformatics

Scalable, Reusable, Sustainable, Reproducible Bioinformatics

Resources

Discussion

  • What barriers to reproducibility do you face in your development & analysis projects?
  • How can you reduce repetitive code in your own projects?
  • Would your team benefit from developing your own reusable packages and template repositories?
  • Are you taking advantage of GitHub’s features like Actions and Pages?
  • Is your contribution process helping maintain code quality?
  • Are there any tools or processes you would add to this stack?