LOGAN 🔬

whoLe genOme-sequencinG Analysis pipeliNe

Call germline and somatic variants, CNVs, and SVs and annotate variants!

Overview

Welcome to LOGAN! Before getting started, we highly recommend reading through LOGAN's documentation.

LOGAN is a comprehensive whole genome-sequencing pipeline following the Broad's set of best practices. It relies on technologies like Singularity¹ to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Nextflow², a flexible and scalable workflow management system, to submit jobs to a cluster or cloud provider.

Before getting started, we highly recommend reading through the usage section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github.

Original pipelining and code forked from the CCBR Exome-seek Pipeline Exome-seek and OpenOmics

Dependencies

Requires: singularity>=3.5 nextflow>=22.10.2

singularity must be installed on the target system. Nextflow orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step relies on versioned images from DockerHub. Nextflow uses singularity to pull these images onto the local filesystem prior to job execution, and as so, nextflow and singularity are the only two dependencies.

Set up

LOGAN is installed on the Biowulf HPC. For installation in other execution environments, refer to the docs.

Biowulf

LOGAN is available on Biowulf in the ccbrpipeliner module. You'll first need to start an interactive session, then load the module:

# start an interactive node
sinteractive --mem=2g --cpus-per-task=2 --gres=lscratch:200

# load the ccbrpipeliner module
module load ccbrpipeliner

Usage

Input Files

LOGAN supports inputs of either

paired end fastq files

--fastq_input- A glob can be used to include all FASTQ files. Like --fastq_input "*R{1,2}.fastq.gz". Globbing requires quotes.

Pre aligned BAM files with BAI indices

--bam_input- A glob can be used to include all FASTQ files. Like --bam_input "*.bam". Globbing requires quotes.

A sheet that indicates the sample name and either FASTQs or BAM file locations

--fastq_file_input- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations

Example

Sample1_TUMOR  /path/to/the/fastq/folder/Sample1_TUMOR.R1.fq.gz /path/to/the/fastq/folder/Sample1_TUMOR.R2.fq.gz
Sample1_NORMAL  /path/to/the/fastq/folder/Sample1_NORMAL.R1.fq.gz /path/to/the/fastq/folder/Sample1_NORMAL.R2.fq.gz
Sample2_TUMOR  /path/to/the/fastq/folder/Sample2_TUMOR.R1.fq.gz /path/to/the/fastq/folder/Sample2_TUMOR.R2.fq.gz
Sample2_NORMAL  /path/to/the/fastq/folder/Sample2_NORMAL.R1.fq.gz /path/to/the/fastq/folder/Sample2_NORMAL.R2.fq.gz

--bam_file_input - A headerless tab delimited sheet that has the sample name, bam, and bam index (bai) file locations

Example

Sample1_TUMOR   /path/to/the/BAM/folder/Sample1_TUMOR.bam  /path/to/the/BAM/folder/Sample1_TUMOR.bam.bai
Sample1_NORMAL  /path/to/the/BAM/folder/Sample1_NORMAL.bam  /path/to/the/BAM/folder/Sample1_NORMAL.bam.bai

Genome

--genome - A flag to indicate which genome to run. hg38, hg19 and mm10 are supported.
Example: --genome hg38 to run the hg38 genome

--genome hg19 and --genome mm10 are also supported

hg38 has options for either

--genome hg38 - Based off the GRCh38.d1.vd1.fa which is consistent with TCGA/GDC processing pipelines

--genome hg38_sf - Based off the Homo_sapiens_assembly38.fasta which is derived from the Broad Institute/NCI Sequencing Facility The biggest difference between the two is that GRCh38.d1.vd1.fa only the GCA_000001405.15_GRCh38_no_alt_analysis_set, Sequence Decoys (GenBank Accession GCA_000786075), and Virus Sequences. Homo_sapiens_assembly38.fasta has HLA specific contigs which may not be compatible with certain downstream tools.

Operating Modes

1. Paired Tumor/Normal Mode

Required for Paired Tumor/Normal Mode

--sample_sheet In Paired mode a sample sheet must be provided with the basename of the Tumor and Normal samples. This sheet must be Tab separated with a header for Tumor and Normal.

Example

Tumor  Normal
Sample1_TUMOR  Sample1_Normal
Sample2_TUMOR  Sample1_Normal

2. Tumor only mode

No additional flags for sample sheet are required as all samples will be used to call variants

Calling Mode

Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling modes

--vc or --snv - Enables somatic SNV calling using mutect2, vardict, varscan, octopus, deepsomatic, strelka (TN only), MUSE (TN only), and lofreq (TN only)

--gl or --germline - Enables germline calling using Deepvariant

--sv or --structural- Enables somatic SV calling using Manta, GRIDSS, and SVABA

--cnv or --copynumber- Enables somatic CNV calling using FREEC, Sequenza, ASCAT, CNVKit, and Purple (hg19/hg38 only)

Optional Arguments

--callers - Comma separated argument for selecting only specified callers, the default is to use all. Example: --callers mutect2,octopus

--cnvcallers - Comma separated argument for selecting only specified CNV callers, the default is to use all. Example: --cnvcallers purple

--svcallers - Comma separated argument for selecting only specified SV callers, the default is to use all. Example: --svcallers gridss

--ffpe - Adds additional filtering for FFPE by detecting strand orientation bias using SOBDetector.

--exome - When using exome data, this flag limits calling to intervals provided in target bed to reduce time and to account for exome sequencing specific parameters. An intervals file is required.

--indelrealign - Enables indel realignment using the GATK pipeline when running alignment steps. May be helpful for certain callers (VarScan, VarDict) that do not have local haplotype reassembly.

Running LOGAN

Example of Tumor_Normal calling mode

# Step 0: Set up

sinteractive --mem=8g -N 1 -n 4
module load ccbrpipeliner # v8 

# set up directories

DATADIR="/path/to/fastq/folder"

# Step1: Initilalization

logan init \
--output logan_output

# Step2: stub-run

logan run  \
--output logan_output \
--mode local -profile ci_stub \
--genome hg38 \
--sample_sheet samplesheet.txt \
--fastq_file_input logan_fastq_input.txt \
-stub --vc --sv --cnv

# Step3: Full run
logan run  \
--output logan_output \
--mode slurm -profile slurm \
--genome hg38 \
--sample_sheet samplesheet.txt \
--fastq_file_input logan_fastq_input.txt \
--vc --sv --cnv

# NOTE: In case of globbing instead of using a fastq input sheet

logan run  \
--output logan_output \
--mode local -profile ci_stub \
--genome hg38 \
--sample_sheet samplesheet.txt \
--fastq_input "/path/to/fastq/folder/*R{1,2}.fastq.gz" \
-stub --vc --sv --cnv

logan run  \
--output logan_output \
--mode slurm -profile slurm \
--genome hg38 \
--sample_sheet samplesheet.txt \
--fastq_input "/path/to/fastq/folder/*R{1,2}.fastq.gz" \
--vc --sv --cnv

Example of Tumor only calling mode

# preview the logan jobs that will run
logan run --mode local -profile ci_stub --genome hg38 --outdir logan_output --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv
# run a stub/dryrun of the logan jobs
logan run --mode local -profile ci_stub --genome hg38 --outdir logan_output --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv
# launch a logan run on slurm with the test dataset
logan run --mode slurm -profile slurm --genome hg38 --outdir logan_output --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv

Pipeline Tools and Overview

alt text

Contribute

This site is a living document, created for and by members like you. LOGAN is maintained by the members of CCBR and is improved by continuous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our repository.

References

This repo was originally generated from the CCBR Nextflow Template.

^{1. Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.}