🚀 ASPEN Outputs

📂 Workdir

The workdir which is supplied as -w while running aspen init, dryrun and run commands will contain the following files:

WORKDIR
├── cluster.json
├── config.yaml
├── contrasts.tsv
├── dryrun_git_commit.txt
├── dryrun.log
├── fastqs
├── logs
├── results
├── run_git_commit.txt
├── runinfo.yaml
├── runslurm_snakemake_report.html
├── sampleinfo.txt
├── samples.tsv
├── scripts
├── slurm-XXXXXXX.out
├── snakemake.log
├── snakemake.log.jobby
├── snakemake.log.jobby.short
├── snakemake.stats
├── submit_script.sbatch
└── tools.yaml

Here are more details about these files:

File File Type Mode (-m) When This File is Created/Overwritten Description
cluster.json JSON init Defines cluster resources per snakemake rule; this file can be edited to override default computate resource allocations per snakemake rule
config.yaml YAML init; can be edited later Configurable parameters for this specific run
contrasts.tsv TSV Needs to be added in after init List of contrasts to run, one per line; has no header
dryrun_git_commit.txt TXT dryrun The git commit hash of the version of ASPEN used at dryrun
dryrun.log TXT dryrun Log from -m=dryrun
fastqs FOLDER dryrun Folder containing symlinks to raw data
logs FOLDER dryrun Folder containing all logs including Slurm .out and .err files. Also contains older timestamped runinfo.yaml and snakemake.stats files.
results FOLDER Created at dryrun but populated during run Main outputs folder
runinfo.yaml YAML After completion of run Metadata about the run executor, etc.
runslurm_snakemake_report.html HTML After completion of run HTML report including DAG and resource utilization
sampleinfo.txt TXT dryrun, run Tab-delimited mappings between replicateNames and sampleNames
samples.tsv TSV init; can be edited later Tab-delimited manifest with replicateName, sampleName, path_to_R1_fastq, path_to_R2_fastq. This file has a header.
scripts FOLDER init Folder keeps local copy of scripts called by various rules
run_git_commit.txt TXT run The git commit hash of the version of ASPEN used at run
slurm-XXXXXXX.out TXT run Slurm .out file for the master job
snakemake.log TXT run Snakemake .log file for the master job; older copies timestamped and moved into logs folder
snakemake.stats JSON run per rule runtime stats
submit_script.sbatch TXT run Slurm script to kickstart the main Snakemake job
tools.yaml YAML run YAML containing the version of tools used in the pipeline (obsolete; was used to load specific module versions prior to moving over to Docker/Singularity containers)

📊 results folder

The results directory contains the actual output files. Below are the folders that you may find within it.

WORKDIR
├── results
    ├── alignment
    │   ├── dedupBam
    │   ├── filteredBam
    │   ├── qsortedBam
    │   └── tagAlign
    ├── peaks
    │   ├── genrich
    │   │   ├── DiffATAC
    │   │   │   ├── reads
    │   │   │   └── tn5sites
    │   │   └── fixed_width
    │   └── macs2
    │       ├── DiffATAC
    │       │   ├── reads
    │       │   └── tn5sites
    │       └── fixed_width
    ├── QC
    │   ├── fastqc
    │   ├── fld
    │   ├── FQscreen
    │   ├── frip
    │   ├── multiqc_data
    │   ├── peak_annotation
    │   ├── preseq
    │   └── tss
    ├── spikein
    │   ├── <sample_1>
    │   ├── <sample_2>
    │   ├── <sample_3>
    │   │ ...
    │   └── <sample_n>
    ├── tmp
    │   ├── BL
    │   ├── genrichReads
    │   └── trim
    └── visualization
        ├── reads_bam
        ├── reads_bed
        ├── reads_bigwig
        ├── tn5sites_bam
        └── tn5sites_bigwig

Content details:

Folder SubFolder Description
alignment qsortedBam - Query sorted Bowtie2 alignments in BAM format.
- Excludes unmapped and platform/vendor quality failing reads.
- Used for Genrich peak calling.
alignment filteredBam - Filtered BAM files after excluding non-primary, supplementary, and MAPQ <=5 alignments.
- Used for counting reads/tn5 nicks.
- Derived from qsortedBam.
alignment dedupBam - Deduplicated filtered BAM files.
- PCR or optical duplicates marked with PicardTools and excluded.
- Can be used downstream with CCBR_TOBIAS pipeline.
- Derived from filteredBam.
alignment tagAlign - tagAlign.gz files used for MACS2 peak calling.
- Derived from dedupBam.
peaks genrich & macs - Genrich/MACS2 peak calls (raw, consensus, fixed-width).
- Contains ROI files with Diff-ATAC results if contrasts.tsv is provided.
- Calculated with DESeq2 using both read counts and tn5 nicking sites in ROI.
QC various - Flagstats.
- Dupmetrics.
- Read counts.
- Motif enrichments.
- FLD stats.
- Fqscreen.
- FRiP.
- ChIPSeeker results.
- TSS enrichments.
- Preseq.
- Homer/AME motif enrichments.
- MultiQC.
QC peak_annotation detailed peak annotations described below
spikein 1 folder per sample - Per sample spike-in counts.
- Overall scaling factors table.
tmp various - Can be deleted.
- Blacklist index.
- Intermediate FASTQs.
- Genrich output reads.
visualization reads_bam - Tn5 nick adjusted reads in BAM format.
- Derived from filteredBam.
visualization reads_bed - Tn5 nick adjusted reads in BED format.
- Derived from reads_bam.
- Can be used by ChromVar.
visualization reads_bigwig - Tn5 nick adjusted reads in BIGWIG format.
- Scaled using spike-in scaling factors if present.
- Derived from reads_bam.
visualization tn5sites_bam - Tn5 nicking sites in BAM format.
- Derived from filteredBam.
visualization tn5sites_bigwig - Tn5 nicking sites in BIGWIG format.
- Scaled using spike-in scaling factors if present.
- Derived from tn5sites_bam.

Note

BAM files from dedupBam can be used for downstream footprinting analysis using CCBR_TOBIAS pipeline

Note

bamCompare from deeptools can be run to compare BAMs from dedupBam for comprehensive BAM comparisons.

Note

BAM files from dedupBam can also be converted to BED format and processed with chromVAR to identify variability in motif accessibility across samples and assess differentially active transcription factors from the JASPAR database.

Peak Annotation folder

This folder will contain ChIPseeker results for:

  • individual replicate *.narrowPeak files
  • *.consensus.bed files
  • *.fixed_width.consensus.narrowPeak files

The QC folder contains the multiqc_report.html file which provides a comprehensive summary of the quality control metrics across all samples, including read quality, duplication rates, and other relevant statistics. This report aggregates results from various QC tools such as FastQC, FastqScreen, FLD, TSS enrichment, Peak Annotations, and others, presenting them in an easy-to-read format with interactive plots and tables. It helps in quickly identifying any issues with the sequencing data and ensures that the data quality is sufficient for downstream analysis.

File Description
*.narrowPeak.annotated.gz peak calls annotated using ChIPseeker, gzipped
*.narrowPeak.annotated.distribution annotation bins :
- 3'UTR: No. of peaks in the 3' untranslated region.
- 5'UTR: No. of peaks in the 5' untranslated region.
- Distal Intergenic: No. of peaks in distal intergenic regions.
- Downstream (<1kb): No. of peaks annotated downstream within 1kb.
- Downstream (1-2kb): No. of peaks annotated downstream between 1-2kb.
- Downstream (2-3kb): No. of peaks annotated downstream between 2-3kb.
- Promoter (<=1kb): No. of peaks in promoters within 1kb.
- Promoter (1-2kb): No. of peaks in promoters between 1-2kb.
- Exon: No. of peaks in exonic regions.
*.narrowPeak.annotated_summary More stats on each of the above bins .. like:
- medianWidth
- medianpValue
- medianqValue
*.narrowPeak.genelist ensemblID and gene symbols of genes with peaks in their promoter regions (including 5' UTR)

MACS2 output folder

For a typical 2 sample analysis with 2 replicates each this folder should look like this:

WORKDIR
├── results
    ├── peaks
        └── macs2
            ├── sample1
            │   ├── sample1_replicate1.macs2.narrowPeak
            │   ├── sample1_replicate1.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1_replicate1.macs2.unfiltered.narrowPeak
            │   ├── sample1_replicate2.macs2.narrowPeak
            │   ├── sample1_replicate2.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1_replicate2.macs2.unfiltered.narrowPeak
            │   ├── sample1.macs2.consensus.bed
            │   ├── sample1.macs2.consensus.bed_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample1.macs2.pooled.narrowPeak
            │   ├── sample1.macs2.pooled_summits.bed
            │   └── sample1.macs2.pooled.unfiltered.narrowPeak
            ├── sample1.consensus.macs2.peakfiles
            ├── sample1.replicate.macs2.peakfiles
            ├── DiffATAC
            │   ├── reads
            │   │   ├── all_diff_atacs.html
            │   │   ├── all_diff_atacs.tsv
            │   │   ├── degs.done
            │   │   ├── sample2_vs_sample1.html
            │   │   └── sample2_vs_sample1.tsv
            │   └── tn5sites
            │       ├── all_diff_atacs.html
            │       ├── all_diff_atacs.tsv
            │       ├── degs.done
            │       ├── sample2_vs_sample1.html
            │       └── sample2_vs_sample1.tsv
            ├── sample2
            │   ├── sample2_replicate1.macs2.narrowPeak
            │   ├── sample2_replicate1.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2_replicate1.macs2.unfiltered.narrowPeak
            │   ├── sample2_replicate2.macs2.narrowPeak
            │   ├── sample2_replicate2.macs2.narrowPeak_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2_replicate2.macs2.unfiltered.narrowPeak
            │   ├── sample2.macs2.consensus.bed
            │   ├── sample2.macs2.consensus.bed_motif_enrichment
            │   │   ├── ame_results.txt
            │   │   ├── background.fa
            │   │   ├── knownResults
            │   │   ├── knownResults.html
            │   │   ├── knownResults.txt
            │   │   ├── motifFindingParameters.txt
            │   │   ├── seq.autonorm.tsv
            │   │   └── target.fa
            │   ├── sample2.macs2.pooled.narrowPeak
            │   ├── sample2.macs2.pooled_summits.bed
            │   └── sample2.macs2.pooled.unfiltered.narrowPeak
            ├── sample2.consensus.macs2.peakfiles
            ├── sample2.replicate.macs2.peakfiles
            └── fixed_width
                ├── sample1_replicate1.macs2.fixed_width.narrowPeak
                ├── sample1_replicate2.macs2.fixed_width.narrowPeak
                ├── sample1.fixed_width.consensus.narrowPeak
                ├── sample1.renormalized.fixed_width.consensus.narrowPeak
                ├── sample1.renormalized.fixed_width.consensus.narrowPeak.annotated.gz
                ├── counts
                │   ├── ROI.macs2.reads_counts.tsv
                │   └── ROI.macs2.tn5sites_counts.tsv
                ├── sample2_replicate1.macs2.fixed_width.narrowPeak
                ├── sample2_replicate2.macs2.fixed_width.narrowPeak
                ├── sample2.fixed_width.consensus.narrowPeak
                ├── sample2.renormalized.fixed_width.consensus.narrowPeak
                ├── sample2.renormalized.fixed_width.consensus.narrowPeak.annotated.gz
                ├── ROI.macs2.bed
                ├── ROI.macs2.bed.annotated.gz
                ├── ROI.macs2.bed.annotated.gz.gz
                ├── ROI.macs2.bed.annotation_distribution
                ├── ROI.macs2.bed.annotation_summary
                ├── ROI.macs2.bed.genelist
                ├── ROI.macs2.gtf
                ├── ROI.macs2.narrowPeak
                ├── ROI.macs2.renormalized.narrowPeak
                └── Rplots.pdf

Some of the key output files are:

File Description
*.macs2.narrowPeak peak calls from MACS2 filtered by q-value for each samples each replicate
*.macs2.unfiltered.narrowPeak peak calls from MACS2 (unfiltered) for each samples each replicate
*.narrowPeak_motif_enrichment/ame_results.txt motif enrichment results from AME tool from MEME suite using HOCOMOCO v11 database
*.narrowPeak_motif_enrichment/knownResults.txt motif enrichment results using HOMER with HOCOMOCO v11 database
*.macs2.consensus.bed consensus peak call between multiple replicates of each sample. Note: consensus bed annotations are located in QC/peak_annotations
DiffATAC/reads folder containing differential open chromatin results:
- computated using read counts in MACS2 regions of interest (ROIs)
- all_diff_atacs.html HTML report aggregated across all contrasts from contrasts.tsv
- all_diff_atacs.tsv DESeq2 results in TSV format aggregated across all contrasts from contrasts.tsv
- HTML and TSV file each per contrast in contrasts.tsv
DiffATAC/tn5sites folder containing differential open chromatin results:
- computated using Tn5 nicking site counts in MACS2 regions of interest (ROIs)
- all_diff_atacs.html HTML report aggregated across all contrasts from contrasts.tsv
- all_diff_atacs.tsv DESeq2 results in TSV format aggregated across all contrasts from contrasts.tsv
- HTML and TSV file each per contrast in contrasts.tsv
fixed_width fixed_width can be set in config.yaml to create peaks of a user defined fixed width (default 500bp). This folder contains:
- individual replicate *.fixed_width.narrowPeak files
- *.renormalized.fixed_width.consensus.narrowPeak per sample; Corces et. al. method is used for consensus calling; used to generate MACS2 regions of interest (ROI) peaks which are used to generate a reads or Tn5 sites counts matrix for DESeq2
- ROI related files: ROI.macs2.bed, ROI.macs2.bed.annotated.gz, ROI.macs2.annotation_summary, ROI.macs2.annotation_distribution
fixed_width/counts/ROI.macs2.read_counts.tsv read counts in MACS2 ROIs using featureCounts
fixed_width/counts/ROI.reads_scaled_counts.tsv ROI.macs2.read_counts.tsv scaled using spike-in scaling factors
fixed_width/counts/ROI.tn5sites_counts.tsv Tn5 nicking site counts in MACS2 ROIs using featureCounts
fixed_width/counts/ROI.tn5sites_scaled_counts.tsv ROI.macs2.tn5sites_counts.tsv scaled using spike-in scaling factors

Genrich output folder

For a typical 2 sample analysis with 2 replicates each this folder should look like very similar to the MACS2 output structure described above.

logs folder

This directory contains all .err and .out log files generated by SLURM for jobs submitted via Snakemake. Each file follows a consistent naming convention:

<SLURM_JOB_ID of master/head job>.<SLURM_JOB_ID of child job>.<Snakemake Rule Name>.<wildcard1_name=wildcard1_value,wildcard2_name=wildcard2_value>.<out or err>

This structure is particularly useful for troubleshooting and debugging, especially when the SLURM job IDs of failed jobs are known. By examining the corresponding .err or .out files, users can efficiently identify the source of errors within specific Snakemake rules and wildcards.

DISCLAIMER: This folder hierarchy is significantly different than v1.0.6 and is subject to change with subsequent versions.