🚀 ASPEN Outputs¶

📂 Workdir¶

The workdir which is supplied as -w while running aspen init, dryrun and run commands will contain the following files:

WORKDIR
├── cluster.json
├── config.yaml
├── contrasts.tsv
├── dryrun_git_commit.txt
├── dryrun.log
├── fastqs
├── logs
├── results
├── run_git_commit.txt
├── runinfo.yaml
├── runslurm_snakemake_report.html
├── sampleinfo.txt
├── samples.tsv
├── scripts
├── slurm-XXXXXXX.out
├── snakemake.log
├── snakemake.log.jobby
├── snakemake.log.jobby.short
├── snakemake.stats
├── submit_script.sbatch
└── tools.yaml

Here are more details about these files:

File	File Type	Mode (`-m`) When This File is Created/Overwritten	Description
`cluster.json`	JSON	init	Defines cluster resources per snakemake rule; this file can be edited to override default computate resource allocations per snakemake rule
`config.yaml`	YAML	init; can be edited later	Configurable parameters for this specific run
`contrasts.tsv`	TSV	Needs to be added in after init	List of contrasts to run, one per line; has no header
`dryrun_git_commit.txt`	TXT	dryrun	The git commit hash of the version of ASPEN used at dryrun
`dryrun.log`	TXT	dryrun	Log from `-m=dryrun`
`fastqs`	FOLDER	dryrun	Folder containing symlinks to raw data
`logs`	FOLDER	dryrun	Folder containing all logs including Slurm `.out` and `.err` files. Also contains older timestamped `runinfo.yaml` and `snakemake.stats` files.
`results`	FOLDER	Created at dryrun but populated during run	Main outputs folder
`runinfo.yaml`	YAML	After completion of run	Metadata about the run executor, etc.
`runslurm_snakemake_report.html`	HTML	After completion of run	HTML report including DAG and resource utilization
`sampleinfo.txt`	TXT	dryrun, run	Tab-delimited mappings between `replicateNames` and `sampleNames`
`samples.tsv`	TSV	init; can be edited later	Tab-delimited manifest with `replicateName`, `sampleName`, `path_to_R1_fastq`, `path_to_R2_fastq`. This file has a header.
`scripts`	FOLDER	init	Folder keeps local copy of scripts called by various rules
`run_git_commit.txt`	TXT	run	The git commit hash of the version of ASPEN used at run
`slurm-XXXXXXX.out`	TXT	run	Slurm `.out` file for the master job
`snakemake.log`	TXT	run	Snakemake `.log` file for the master job; older copies timestamped and moved into `logs` folder
`snakemake.stats`	JSON	run	per rule runtime stats
`submit_script.sbatch`	TXT	run	Slurm script to kickstart the main Snakemake job
`tools.yaml`	YAML	run	YAML containing the version of tools used in the pipeline (obsolete; was used to load specific module versions prior to moving over to Docker/Singularity containers)

📊 `results` folder¶

The results directory contains the actual output files. Below are the folders that you may find within it.

WORKDIR
├── results
    ├── alignment
    │   ├── dedupBam
    │   ├── filteredBam
    │   ├── qsortedBam
    │   └── tagAlign
    ├── peaks
    │   ├── genrich
    │   │   ├── DiffATAC
    │   │   │   ├── reads
    │   │   │   └── tn5sites
    │   │   └── fixed_width
    │   └── macs2
    │       ├── DiffATAC
    │       │   ├── reads
    │       │   └── tn5sites
    │       └── fixed_width
    ├── QC
    │   ├── fastqc
    │   ├── fld
    │   ├── FQscreen
    │   ├── frip
    │   ├── multiqc_data
    │   ├── peak_annotation
    │   ├── preseq
    │   └── tss
    ├── spikein
    │   ├── <sample_1>
    │   ├── <sample_2>
    │   ├── <sample_3>
    │   │ ...
    │   └── <sample_n>
    ├── tmp
    │   ├── BL
    │   ├── genrichReads
    │   └── trim
    └── visualization
        ├── reads_bam
        ├── reads_bed
        ├── reads_bigwig
        ├── tn5sites_bam
        └── tn5sites_bigwig

Content details:

Folder	SubFolder	Description
alignment	qsortedBam	- Query sorted Bowtie2 alignments in BAM format. - Excludes unmapped and platform/vendor quality failing reads. - Used for Genrich peak calling.
alignment	filteredBam	- Filtered BAM files after excluding non-primary, supplementary, and MAPQ <=5 alignments. - Used for counting reads/tn5 nicks. - Derived from `qsortedBam`.
alignment	dedupBam	- Deduplicated filtered BAM files. - PCR or optical duplicates marked with PicardTools and excluded. - Can be used downstream with CCBR_TOBIAS pipeline. - Derived from `filteredBam`.
alignment	tagAlign	- `tagAlign.gz` files used for MACS2 peak calling. - Derived from `dedupBam`.
peaks	genrich & macs	- Genrich/MACS2 peak calls (raw, consensus, fixed-width). - Contains ROI files with Diff-ATAC results if `contrasts.tsv` is provided. - Calculated with DESeq2 using both read counts and tn5 nicking sites in ROI.
QC	various	- Flagstats. - Dupmetrics. - Read counts. - Motif enrichments. - FLD stats. - Fqscreen. - FRiP. - ChIPSeeker results. - TSS enrichments. - Preseq. - Homer/AME motif enrichments. - MultiQC.
QC	peak_annotation	detailed peak annotations described below
spikein	1 folder per sample	- Per sample spike-in counts. - Overall scaling factors table.
tmp	various	- Can be deleted. - Blacklist index. - Intermediate FASTQs. - Genrich output reads.
visualization	reads_bam	- Tn5 nick adjusted reads in BAM format. - Derived from `filteredBam`.
visualization	reads_bed	- Tn5 nick adjusted reads in BED format. - Derived from `reads_bam`. - Can be used by ChromVar.
visualization	reads_bigwig	- Tn5 nick adjusted reads in BIGWIG format. - Scaled using spike-in scaling factors if present. - Derived from `reads_bam`.
visualization	tn5sites_bam	- Tn5 nicking sites in BAM format. - Derived from `filteredBam`.
visualization	tn5sites_bigwig	- Tn5 nicking sites in BIGWIG format. - Scaled using spike-in scaling factors if present. - Derived from `tn5sites_bam`.

Note

BAM files from dedupBam can be used for downstream footprinting analysis using CCBR_TOBIAS pipeline

Note

bamCompare from deeptools can be run to compare BAMs from dedupBam for comprehensive BAM comparisons.

Note

BAM files from dedupBam can also be converted to BED format and processed with chromVAR to identify variability in motif accessibility across samples and assess differentially active transcription factors from the JASPAR database.

Peak Annotation folder¶

This folder will contain ChIPseeker results for:

individual replicate *.narrowPeak files
*.consensus.bed files
*.fixed_width.consensus.narrowPeak files

The QC folder contains the multiqc_report.html file which provides a comprehensive summary of the quality control metrics across all samples, including read quality, duplication rates, and other relevant statistics. This report aggregates results from various QC tools such as FastQC, FastqScreen, FLD, TSS enrichment, Peak Annotations, and others, presenting them in an easy-to-read format with interactive plots and tables. It helps in quickly identifying any issues with the sequencing data and ensures that the data quality is sufficient for downstream analysis.

File	Description
`*.narrowPeak.annotated.gz`	peak calls annotated using ChIPseeker, gzipped
`*.narrowPeak.annotated.distribution`	annotation bins : - 3'UTR: No. of peaks in the 3' untranslated region. - 5'UTR: No. of peaks in the 5' untranslated region. - Distal Intergenic: No. of peaks in distal intergenic regions. - Downstream (<1kb): No. of peaks annotated downstream within 1kb. - Downstream (1-2kb): No. of peaks annotated downstream between 1-2kb. - Downstream (2-3kb): No. of peaks annotated downstream between 2-3kb. - Promoter (<=1kb): No. of peaks in promoters within 1kb. - Promoter (1-2kb): No. of peaks in promoters between 1-2kb. - Exon: No. of peaks in exonic regions.
`*.narrowPeak.annotated_summary`	More stats on each of the above bins .. like: - medianWidth - medianpValue - medianqValue
`*.narrowPeak.genelist`	ensemblID and gene symbols of genes with peaks in their promoter regions (including 5' UTR)

MACS2 output folder¶

For a typical 2 sample analysis with 2 replicates each this folder should look like this:

WORKDIR
├── results
    ├── peaks
        └── macs2
            ├── sample1
            │   ├── sample1_replicate1.macs2.narrowPeak
            │   ├── sample1_replicate1.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1_replicate1.macs2.unfiltered.narrowPeak
            │   ├── sample1_replicate2.macs2.narrowPeak
            │   ├── sample1_replicate2.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1_replicate2.macs2.unfiltered.narrowPeak
            │   ├── sample1.macs2.consensus.bed
            │   ├── sample1.macs2.consensus.bed_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1.macs2.pooled.narrowPeak
            │   ├── sample1.macs2.pooled_summits.bed
            │   └── sample1.macs2.pooled.unfiltered.narrowPeak
            ├── sample1.consensus.macs2.peakfiles
            ├── sample1.replicate.macs2.peakfiles
            ├── DiffATAC
            │   ├── reads
            │   │   ├── all_diff_atacs.html
            │   │   ├── all_diff_atacs.tsv
            │   │   ├── degs.done
            │   │   ├── sample2_vs_sample1.html
            │   │   └── sample2_vs_sample1.tsv
            │   └── tn5sites
            │       ├── all_diff_atacs.html
            │       ├── all_diff_atacs.tsv
            │       ├── degs.done
            │       ├── sample2_vs_sample1.html
            │       └── sample2_vs_sample1.tsv
            ├── sample2
            │   ├── sample2_replicate1.macs2.narrowPeak
            │   ├── sample2_replicate1.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2_replicate1.macs2.unfiltered.narrowPeak
            │   ├── sample2_replicate2.macs2.narrowPeak
            │   ├── sample2_replicate2.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2_replicate2.macs2.unfiltered.narrowPeak
            │   ├── sample2.macs2.consensus.bed
            │   ├── sample2.macs2.consensus.bed_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2.macs2.pooled.narrowPeak
            │   ├── sample2.macs2.pooled_summits.bed
            │   └── sample2.macs2.pooled.unfiltered.narrowPeak
            ├── sample2.consensus.macs2.peakfiles
            ├── sample2.replicate.macs2.peakfiles
            └── fixed_width
                ├── sample1_replicate1.macs2.fixed_width.narrowPeak
                ├── sample1_replicate2.macs2.fixed_width.narrowPeak
                ├── sample1.fixed_width.consensus.narrowPeak
                ├── sample1.renormalized.fixed_width.consensus.narrowPeak
                ├── sample1.renormalized.fixed_width.consensus.narrowPeak.annotated.gz
                ├── counts
                │   ├── ROI.macs2.reads_counts.tsv
                │   └── ROI.macs2.tn5sites_counts.tsv
                ├── sample2_replicate1.macs2.fixed_width.narrowPeak
                ├── sample2_replicate2.macs2.fixed_width.narrowPeak
                ├── sample2.fixed_width.consensus.narrowPeak
                ├── sample2.renormalized.fixed_width.consensus.narrowPeak
                ├── sample2.renormalized.fixed_width.consensus.narrowPeak.annotated.gz
                ├── ROI.macs2.bed
                ├── ROI.macs2.bed.annotated.gz
                ├── ROI.macs2.bed.annotated.gz.gz
                ├── ROI.macs2.bed.annotation_distribution
                ├── ROI.macs2.bed.annotation_summary
                ├── ROI.macs2.bed.genelist
                ├── ROI.macs2.gtf
                ├── ROI.macs2.narrowPeak
                ├── ROI.macs2.renormalized.narrowPeak
                └── Rplots.pdf

Some of the key output files are:

File	Description
`*.macs2.narrowPeak`	peak calls from MACS2 filtered by q-value for each samples each replicate
`*.macs2.unfiltered.narrowPeak`	peak calls from MACS2 (unfiltered) for each samples each replicate
`*.narrowPeak_motif_enrichment/ame_results.txt`	motif enrichment results from AME tool from MEME suite using HOCOMOCO v11 database
`*.narrowPeak_motif_enrichment/knownResults.txt`	motif enrichment results using HOMER with HOCOMOCO v11 database
`*.macs2.consensus.bed`	consensus peak call between multiple replicates of each sample. Note: consensus bed annotations are located in `QC/peak_annotations`
`DiffATAC/reads`	folder containing differential open chromatin results: - computated using read counts in MACS2 regions of interest (ROIs) - `all_diff_atacs.html` HTML report aggregated across all contrasts from `contrasts.tsv` - `all_diff_atacs.tsv` DESeq2 results in TSV format aggregated across all contrasts from `contrasts.tsv` - HTML and TSV file each per contrast in `contrasts.tsv`
`DiffATAC/tn5sites`	folder containing differential open chromatin results: - computated using Tn5 nicking site counts in MACS2 regions of interest (ROIs) - `all_diff_atacs.html` HTML report aggregated across all contrasts from `contrasts.tsv` - `all_diff_atacs.tsv` DESeq2 results in TSV format aggregated across all contrasts from `contrasts.tsv` - HTML and TSV file each per contrast in `contrasts.tsv`
`fixed_width`	`fixed_width` can be set in `config.yaml` to create peaks of a user defined fixed width (default 500bp). This folder contains: - individual replicate `.fixed_width.narrowPeak` files - `.renormalized.fixed_width.consensus.narrowPeak` per sample; Corces et. al. method is used for consensus calling; used to generate MACS2 regions of interest (ROI) peaks which are used to generate a reads or Tn5 sites counts matrix for DESeq2 - ROI related files: `ROI.macs2.bed`, `ROI.macs2.bed.annotated.gz`, `ROI.macs2.annotation_summary`, `ROI.macs2.annotation_distribution`
`fixed_width/counts/ROI.macs2.read_counts.tsv`	read counts in MACS2 ROIs using featureCounts
`fixed_width/counts/ROI.reads_scaled_counts.tsv`	`ROI.macs2.read_counts.tsv` scaled using spike-in scaling factors
`fixed_width/counts/ROI.tn5sites_counts.tsv`	Tn5 nicking site counts in MACS2 ROIs using featureCounts
`fixed_width/counts/ROI.tn5sites_scaled_counts.tsv`	`ROI.macs2.tn5sites_counts.tsv` scaled using spike-in scaling factors

Genrich output folder¶

For a typical 2 sample analysis with 2 replicates each this folder should look like very similar to the MACS2 output structure described above.

`logs` folder¶

This directory contains all .err and .out log files generated by SLURM for jobs submitted via Snakemake. Each file follows a consistent naming convention:

<SLURM_JOB_ID of master/head job>.<SLURM_JOB_ID of child job>.<Snakemake Rule Name>.<wildcard1_name=wildcard1_value,wildcard2_name=wildcard2_value>.<out or err>

This structure is particularly useful for troubleshooting and debugging, especially when the SLURM job IDs of failed jobs are known. By examining the corresponding .err or .out files, users can efficiently identify the source of errors within specific Snakemake rules and wildcards.

DISCLAIMER: This folder hierarchy is significantly different than v1.0.6 and is subject to change with subsequent versions.

🚀 ASPEN Outputs¶

📂 Workdir¶

📊 results folder¶

Peak Annotation folder¶

MACS2 output folder¶

Genrich output folder¶

logs folder¶

📊 `results` folder¶

`logs` folder¶