Skip to content

Resources

Quality-control pipeline

QC assessment tools

Tool Version Notes
FastQC1 0.11.5 Assess sequencing quality, run before and after adapter trimming
Kraken2 1.1 Assess microbial taxonomic composition
KronaTools3 2.7 Visualize kraken output
FastQ Screen4 0.9.3 Assess contamination; additional dependencies: bowtie2/2.3.4, perl/5.24.3
Preseq5 2.0.3 Estimate library complexity
NGSQC6 Infers a set of global QC indicators to asses data quality
MultiQC7 1.7 Aggregate sample statistics and quality-control information across all samples

Data processing tools

Tool Version Notes
Cutadapt8 1.18 Remove adapter sequences and perform quality trimming
BWA9 mem 0.7.17 Read alignment, first to identify reads aligning to blacklisted regions and later for the remainder of the genome
Picard10 2.17.11 Run SamToFastq (for blacklist read removal) and MarkDuplicates (to remove PCR duplicates in PE data)
SAMtools11 1.6 Remove reads with mapQ less than 6. Also run flagstat and idxstats to calculate alignment statistics.
MACS12 2.1.1 Run filterdup on SE data (--keep-dup=”auto”) to remove PCR duplicates
Bedtools13 2.27.1 Run intersect and bedtobam to convert .tag.Align.gz to .bam for use with Deeptools (specific to SE data)
ppqt14 2.0 Also known as phantompeakqualtools, used to calculate estimated fragment length (used for bigwig and peak calling for SE data). Also produces QC metrics: NSC and RSC.
deepTools15 3.0.1 Used for bigwig creation and multiple QC metrics. Use bamcoverage to create RPGC-normalized data: --binSize 25 --smoothLength 75 --normalizeUsing RPGC. For PE data, add --centerReads. For Control SE, add -e 200. For ChIP SE, add -e [estimated fragment length]. For control subtraction (inputnorm), use bigwigCompare: --binSize 25 --operation ‘subtract’. Run multiBigWigSummary, plotCorrelation, plotPCA, plotFingerprint, computeMatrix, plotHeatmap, and plotProfile for QC plots.

Peak calling and differential binding pipeline

Peak calling and differential peak calling tools

Tool Version Notes
MACS16 2.1.1
Sicer17 1.1
GEM18 3.0
MANorm19 1.1.4
DiffBind20,21 2.10.0

Annotations, motifs, and QC assesment tools

Tool Version Notes
Uropa22 4.0.2
Homer23 4.10.1
IDR24 2.0.3
Jaccard
FRiP

References

1. FastQC: Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
2. Kraken: Wood, D. E. and S. L. Salzberg (2014). "Kraken: ultrafast metagenomic sequence classification using exact alignments." Genome Biol 15(3): R46. http://ccb.jhu.edu/software/kraken/
3. Krona: Ondov, B. D., et al. (2011). "Interactive metagenomic visualization in a Web browser." BMC Bioinformatics 12(1): 385. https://github.com/marbl/Krona/wiki
4. FastQ Screen: Wingett, S. and S. Andrews (2018). "FastQ Screen: A tool for multi-genome mapping and quality control." F1000Research 7(2): 1338. https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
5. Preseq: Daley, T. and A.D. Smith (2013). Predicting the molecular complexity of sequencing libraries. Nat Methods 10(4): 325-7. http://smithlabresearch.org/software/preseq/
6. NGSQC: Mendoza-Parra M., et al. (2013). A quality control system for profiles obtained by ChIP sequencing. Nucleic Acids Research 41(21,): e196.
7. MultiQC: Ewels, P., et al. (2016). "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32(19): 3047-3048. https://multiqc.info/docs/
8. Cutadapt: Martin, M. (2011). "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet 17(1): 10-12. https://cutadapt.readthedocs.io/en/stable/
9. BWA: Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25: 1754-60. http://bio-bwa.sourceforge.net/bwa.shtml
10. Picard: The Picard toolkit. https://broadinstitute.github.io/picard/
11. SAMtools: Li, H., et al. (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 25(16): 2078-2079. http://www.htslib.org/doc/samtools.html
12. MACS: Zhang, Y., et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9: R137. https://github.com/macs3-project/MACS
13. Bedtools: Quinlan, A.R. (2014). BEDTools: The Swiss‐Army Tool for Genome Feature Analysis. Current Protocols in Bioinformatics, 47: 11.12.1-11.12.34. https://bedtools.readthedocs.io/en/latest/index.html
14. ppqt: Landt S.G., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22(9): 1813-31. https://github.com/kundajelab/phantompeakqualtools
15. deepTools: Ramírez, F., et al. (2016). deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis, Nucleic Acids Research, 44(W1), W160-W165.
16. MACS: Zhang Y., et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137
17. Sicer: Xu S., et al. (2014). Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) to Map Regions of Histone Methylation Patterns in Embryonic Stem Cells. Methods Mol Biol 1150: 97–111.
18. GEM: Guo Y., Mahony S., and D. K. Gifford. (2012). High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Comput Biol 8(8): e1002638.
19. MANorm: Shao, Z., et al. (2012). MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biology 13: R16. https://manorm.readthedocs.io/en/latest/index.html
20. DiffBind: Ross-Innes C.S., et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481: 389–393.
21. DiffBind: Stark R. and G. Brown. (2011). DiffBind: differential binding analysis of ChIP-Seq peak data. http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf.
22. Uropa: Kondili M., et al. (2017). UROPA: a tool for Universal RObust Peak Annotation. Scientific Reports 7: 2593.
23. Homer: Heinz S., et al. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38(4): 576–589.
24. IDR: Li Q., et al. (2011). Measuring reproducibility of high-throughput experiments. Ann Appl Stat 5(3): 1752-1779.


Last update: 2022-11-04