Resources

1. Reference genomes¶

On Biowulf, RENEE comes bundled with the following pre-built GENCODE¹ reference genomes:

As of RENEE v2.6.0, all hg19 and hg38 indices were built using the NCI Genomic Data Commons reference fasta, which contains the primary genome from Encode plus virus and decoy sequences. The hg38 fasta files were downloaded from the GDC with virus and decoy sequences already added, while these sequences were manually added to the hg19 fasta from Encode. See details here: https://github.com/CCBR/build-renee-refs

Genome	Species	Annotation Version	Notes
hg38_48	Homo sapiens (human)	Gencode Release 48	GRCh38, Annotation Release date: 05/2025
hg38_45	Homo sapiens (human)	Gencode Release 45	GRCh38, Annotation Release date: 03/2023
hg38_41	Homo sapiens (human)	Gencode Release 41	GRCh38, Annotation Release date: 07/2022
hg38_38	Homo sapiens (human)	Gencode Release 38	GRCh38, Annotation Release date: 05/2021
hg38_36	Homo sapiens (human)	Gencode Release 36	GRCh38, Annotation Release date: 05/2020
hg38_34	Homo sapiens (human)	Gencode Release 34	GRCh38, Annotation Release date: 04/2020
hg38_30	Homo sapiens (human)	Gencode Release 30	GRCh38, Annotation Release date: 11/2018
hg19_36	Homo sapiens (human)	Gencode Release 36-lift-37	GRCh37
hg19_19	Homo sapiens (human)	Gencode Release 19	GRCh37, Annotation Release date: 07/2013
mm39_M37	Mus musculus (mouse)	Gencode Release M37	GRCm39, Annotation Release date: 05/2025
mm39_M36	Mus musculus (mouse)	Gencode Release M36	GRCm39, Annotation Release date: 10/2024
mm10_M25	Mus musculus (mouse)	Gencode Release M25	GRCm38, Annotation Release date: 04/2020
mm10_M23	Mus musculus (mouse)	Gencode Release M23	GRCm38, Annotation Release date: 09/2019
mm10_M21	Mus musculus (mouse)	Gencode Release M21	GRCm38, Annotation Release date: 04/2019
mCalJac1_2021	Callithrix jacchus (white-tufted-ear marmoset)	Genome assembly mCalJa1.2.pat.X	Annotation release date: 04/2021
mmul10_108	Macaca mulatta (rhesus macaque)	Ensemble 108: fasta; gtf	Annotation release date: 09/2022

You can run renee run --help to view the most up-to-date list of genome annotations available in your installation of RENEE.

Note: Newer annotations versions may be added upon request and may be already available. Please contact Vishal Koparde for details.

However, building new reference genomes is easy!

If you do not have access to Biowulf or you are looking for a reference genome and/or annotation that is currently not available, it can be built with RENEE's build sub-command. Given a genomic FASTA file (ref.fa) and a GTF file (genes.gtf), renee build will create all of the required reference files to run the RENEE pipeline. Once the build pipeline completes, you can supply the newly generated reference.json to the --genome of renee run. For more information, please see the help page for the run and build sub commands.

2. Tools and versions¶

Raw data > Adapter Trimming > Alignment > Quantification (genes and isoforms, gene-fusions)

Tool	Version	Docker	Notes
FastQC²	0.11.9	nciccbr/ccbr_fastqc_0.11.9	Quality-control step to assess sequencing quality, run before and after adapter trimming
Cutadapt³	1.18	nciccbr/ccbr_cutadapt_1.18	Data processing step to remove adapter sequences and perform quality trimming
Kraken⁴	2.1.1	nciccbr/ccbr_kraken_v2.1.1	Quality-control step to assess microbial taxonomic composition
KronaTools⁵	2.7.1	nciccbr/ccbr_kraken_v2.1.1	Quality-control step to visualize kraken output
FastQ Screen⁶	0.13.0	nciccbr/ccbr_fastq_screen_0.13.0	Quality-control step to assess contamination; additional dependencies: `bowtie2/2.3.4`, `perl/5.24.3`
STAR⁷	2.7.6a	nciccbr/ccbr_arriba_2.0.0	Data processing step to align reads against reference genome (using its two-pass mode)
bbtools⁸	38.87	nciccbr/ccbr_bbtools_38.87	Quality-control step to calculate insert_size of assembled reads pairs with `bbmerge`
QualiMap⁹	2.2.1	nciccbr/ccbr_qualimap	Quality-control step to assess various alignment metrics
Picard¹⁰	2.18.20	nciccbr/ccbr_picard	Quality-control step to run `MarkDuplicates`, `CollectRnaSeqMetrics` and `AddOrReplaceReadGroups`
Preseq¹¹	2.0.3	nciccbr/ccbr_preseq	Quality-control step to estimate library complexity
SAMtools¹²	1.7	nciccbr/ccbr_arriba_2.0.0	Quality-control step to run `flagstat` to calculate alignment statistics
bam2strandedbw	custom	nciccbr/ccbr_bam2strandedbw	Summarization step to convert STAR aligned PE bam file into forward and reverse strand bigwigs suitable for a genomic track viewer like IGV
RSeQC¹³	4.0.0	nciccbr/ccbr_rseqc_4.0.0	Quality-control step to infer stranded-ness and read distributions over specific genomic features
RSEM¹⁴	1.3.3	nciccbr/ccbr_rsem_1.3.3	Data processing step to quantify gene and isoform counts
Arriba¹⁵	2.0.0	nciccbr/ccbr_arriba_2.0.0	Data processing step to quantify gene-fusions
RNA Report	custom	nciccbr/ccbr_rna	Summarization step to identify outliers and assess technical sources of variation
MultiQC¹⁶	1.12	skchronicles/multiqc	Reporting step to aggregate sample statistics and quality-control information across all sample

3. Acknowledgements¶

3.1 Biowulf¶

If you utilized NIH's Biowulf cluster to run RENEE, please do not forget to provide an acknowlegement!

The continued growth and support of NIH's Biowulf cluster is dependent upon its demonstrable value to the NIH Intramural Research Program. If you publish research that involved significant use of Biowulf, please cite the cluster.

Suggested citation text:

This work utilized the computational resources of the NIH HPC Biowulf cluster. (http://hpc.nih.gov)

4. References¶

^{1. Harrow, J., et al., GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res, 2012. 22(9): p. 1760-74.}
^{2. Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.}
^{3. Martin, M. (2011). "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet 17(1): 10-12.}
^{4. Wood, D. E. and S. L. Salzberg (2014). "Kraken: ultrafast metagenomic sequence classification using exact alignments." Genome Biol 15(3): R46.}
^{5. Ondov, B. D., et al. (2011). "Interactive metagenomic visualization in a Web browser." BMC Bioinformatics 12(1): 385.}
^{6. Wingett, S. and S. Andrews (2018). "FastQ Screen: A tool for multi-genome mapping and quality control." F1000Research 7(2): 1338.}
^{7. Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013. 29(1): p. 15-21.}
^{8. Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge - Accurate paired shotgun read merging via overlap. PloS one, 12(10), e0185056.}
^{9. Okonechnikov, K., et al. (2015). "Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data." Bioinformatics 32(2): 292-294.}
^{10. The Picard toolkit. https://broadinstitute.github.io/picard/.}
^{11. Daley, T. and A.D. Smith, Predicting the molecular complexity of sequencing libraries. Nat Methods, 2013. 10(4): p. 325-7.}
^{12. Li, H., et al. (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 25(16): 2078-2079.}
^{13. Wang, L., et al. (2012). "RSeQC: quality control of RNA-seq experiments." Bioinformatics 28(16): 2184-2185.}
^{14. Li, B. and C.N. Dewey, RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 2011. 12: p. 323.}
^{15. Uhrig, S., et al. (2021). "Accurate and efficient detection of gene fusions from RNA sequencing data". Genome Res. 31(3): 448-460.}
^{16. Ewels, P., et al. (2016). "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32(19): 3047-3048.}