3. Running the Pipeline¶
Setting Biowulf Interactive Session¶
Before running SINCLAIR, open the terminal and log in to Biowulf using your NIH credentials. Then, create an interactive session and navigate to your project / working directory. A guide to navigating Biowulf can be found in (https://hpc.nih.gov/docs/userguide.html).
# login to Biowulf
ssh your_username@biowulf.nih.gov
# create interactive session with desired job specifications
sinteractive --mem=<ram> --cpus-per-task=<cores> --time=<wall_time> --gres=lscratch:<local_scratch_space>
# load ccbrpipeliner containing SINCLAIR and other analytical tools
module load ccbrpipeliner
Running the SINCLAIR command¶
As of ccbrpipeliner version 8, sinclair can be run with the command:
# initialize the pipeline (only needs to be done once)
sinclair init --output <output_dir>
# run the pipeline
sinclair run --output <output_dir> [OPTIONS]
Various options can be controlled in the command line call and pipeline parameters can be set in the params.yml file.
The most commonly used options are described below.
Default values indicated with *
General CLI arguments¶
--helpPrints the help statement--outputThe pipeline output directory (same as the nextflowlaunchDir)--modeDetermines if the workflow runs on the current system or is submitted as a slurm joblocal*slurm--forceallForces all steps of the workflow to be run
Nextflow arguments¶
Note that nextflow arguments are prepended by a single hyphen rather than a double hyphen
-params-file assets/params.ymlSpecify the pipeline parameters in a YAML file-profileUses pre-defined profiles to determine particular run configurationstestApplies samples and manifests for the test dataset run-previewPreview the pipeline without executing it
Pipeline parameters¶
These are parameters used within the nextflow workflow. They can be passed in via the command line or set in the params.yml file. View the full list of pipeline parameters here.
Input and output parameters¶
--inputThe input manifest.csvfile./assets/input_manifest_cellranger.csv*./assets/input_manifest.csvother/user-defined/manifest.csv--contrastThe contrast manifest.csvfile./assets/contrasts.csv*--outdirThe nextflow results directory inside the pipeline output directory. Can be manually set./output*--speciesWhich species and genome is to be used for reference in alignment (option) cell type annotationhg19*hg38mm10--run_cellrangerWhether to run CellRanger for alignment. Also indicates which input manifest file to parsetruefalse
Seurat parameters¶
The following is a list containing parameters that can be used for downstream Seurat analysis.
- `vars_to_regress` Variables that should be regressed out during analysis to eliminate potential noise and signals from technical differences across samples - `percent.mt` percentage of reads mapped to mitochondrial genes. High values may indicate dead / stressed cells whose mitochondrial transcripts becomes overrepresented due to cytoplasmic degradation - `nFeature_RNA` Number of detected features - `S.Score` S-phase cell cycle score - `G2M.Score` G2/M-phase cell cycle score - `nCount_RNA` Total RNA molecule count per cell - `qc_filtering` Filtering method - `miqc`\* Uses the MiQC parameters - `manual` Uses the - `nCount_RNA_max` Maximum number of reads allowed per cell. Cells exceeding the threshold are removed - 50000\* - `nCount_RNA_min` Minimum number of reads allowed per cell. Cells below the threshold are removed - 1000\* - `nFeatures_RNA_max` Maximum number of features (e.g. genes) allowed per cell - 5000\* - `nFeature_RNA_min` Minimum number of features (e.g. genes) allowed per cell - 200\* - `percent_mt_max` Maximum mitochondrial percentage allowed per cell - 10\* - `percent_mt_min` Minimum mitochondrial percentage allowed per cell - 0\* - `run_doublet_finder` Boolean for running the DoubletFinder tool (default T) - `seurat_resolution` Comma-separated string for resolutions to use when finding unsupervised clusters - "0.1,0.2,0.3,0.5,0.6,0.8,1"\* - `npcs` Number of principal components calculated and used downstream in neighbor-identification, dimensionality reduction (e.g. UMAP/T-SNE), and unsupervised clustering - 50\*
Examples¶
This run will operate on the slurm workflow manager, perform CellRanger alignment to the mm10 mouse genome, and cluster the cells at the specified resolutions:
sinclair run --mode slurm --run_cellranger true --species mm10 --seurat_resolution 0.2,0.4,0.6,0.8,1
This run will operate locally, starting from pre-aligned .h5 files generated from CellRanger and take the human hg38 genome as its cue for downstream cell type annotation, while forcing the run to start from the beginning.
sinclair run --mode local --run_cellranger false --species hg38 --forceall
Specify pipeline parameters in the params.yml file and show a preview of the pipeline run (without actually running it):
sinclair run -params-file assets/params.yml -preview