Remove adapter sequences and perform quality trimming
BWA9 mem
0.7.17
Read alignment, first to identify reads aligning to blacklisted regions and later for the remainder of the genome
Picard10
2.17.11
Run SamToFastq (for blacklist read removal) and MarkDuplicates (to remove PCR duplicates in PE data)
SAMtools11
1.6
Remove reads with mapQ less than 6. Also run flagstat and idxstats to calculate alignment statistics.
MACS12
2.1.1
Run filterdup on SE data (--keep-dup=”auto”) to remove PCR duplicates
Bedtools13
2.27.1
Run intersect and bedtobam to convert .tag.Align.gz to .bam for use with Deeptools (specific to SE data)
ppqt14
2.0
Also known as phantompeakqualtools, used to calculate estimated fragment length (used for bigwig and peak calling for SE data). Also produces QC metrics: NSC and RSC.
deepTools15
3.0.1
Used for bigwig creation and multiple QC metrics. Use bamcoverage to create RPGC-normalized data: --binSize 25 --smoothLength 75 --normalizeUsing RPGC. For PE data, add --centerReads. For Control SE, add -e 200. For ChIP SE, add -e [estimated fragment length]. For control subtraction (inputnorm), use bigwigCompare: --binSize 25 --operation ‘subtract’. Run multiBigWigSummary, plotCorrelation, plotPCA, plotFingerprint, computeMatrix, plotHeatmap, and plotProfile for QC plots.
1. FastQC: Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 2. Kraken: Wood, D. E. and S. L. Salzberg (2014). "Kraken: ultrafast metagenomic sequence classification using exact alignments." Genome Biol 15(3): R46. http://ccb.jhu.edu/software/kraken/ 3. Krona: Ondov, B. D., et al. (2011). "Interactive metagenomic visualization in a Web browser." BMC Bioinformatics 12(1): 385. https://github.com/marbl/Krona/wiki 4. FastQ Screen: Wingett, S. and S. Andrews (2018). "FastQ Screen: A tool for multi-genome mapping and quality control." F1000Research 7(2): 1338. https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/ 5. Preseq: Daley, T. and A.D. Smith (2013). Predicting the molecular complexity of sequencing libraries. Nat Methods 10(4): 325-7. http://smithlabresearch.org/software/preseq/ 6. NGSQC: Mendoza-Parra M., et al. (2013). A quality control system for profiles obtained by ChIP sequencing. Nucleic Acids Research 41(21,): e196. 7. MultiQC: Ewels, P., et al. (2016). "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32(19): 3047-3048. https://multiqc.info/docs/ 8. Cutadapt: Martin, M. (2011). "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet 17(1): 10-12. https://cutadapt.readthedocs.io/en/stable/ 9. BWA: Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25: 1754-60. http://bio-bwa.sourceforge.net/bwa.shtml 10. Picard: The Picard toolkit. https://broadinstitute.github.io/picard/ 11. SAMtools: Li, H., et al. (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 25(16): 2078-2079. http://www.htslib.org/doc/samtools.html 12. MACS: Zhang, Y., et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9: R137. https://github.com/macs3-project/MACS 13. Bedtools: Quinlan, A.R. (2014). BEDTools: The Swiss‐Army Tool for Genome Feature Analysis. Current Protocols in Bioinformatics, 47: 11.12.1-11.12.34. https://bedtools.readthedocs.io/en/latest/index.html 14. ppqt: Landt S.G., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22(9): 1813-31. https://github.com/kundajelab/phantompeakqualtools 15. deepTools: Ramírez, F., et al. (2016). deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis, Nucleic Acids Research, 44(W1), W160-W165. 16. MACS: Zhang Y., et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137 17. Sicer: Xu S., et al. (2014). Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) to Map Regions of Histone Methylation Patterns in Embryonic Stem Cells. Methods Mol Biol 1150: 97–111. 18. GEM: Guo Y., Mahony S., and D. K. Gifford. (2012). High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Comput Biol 8(8): e1002638. 19. MANorm: Shao, Z., et al. (2012). MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biology 13: R16. https://manorm.readthedocs.io/en/latest/index.html 20. DiffBind: Ross-Innes C.S., et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481: 389–393. 21. DiffBind: Stark R. and G. Brown. (2011). DiffBind: differential binding analysis of ChIP-Seq peak data. http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf. 22. Uropa: Kondili M., et al. (2017). UROPA: a tool for Universal RObust Peak Annotation. Scientific Reports 7: 2593. 23. Homer: Heinz S., et al. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38(4): 576–589. 24. IDR: Li Q., et al. (2011). Measuring reproducibility of high-throughput experiments. Ann Appl Stat 5(3): 1752-1779.