Xia X. Comparative Genomics; 2013. https://doi.org/10.1007/978-3-642-37146-2.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990; 215(3):403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.
CAS
PubMed
Google Scholar
J Lipman D, Pearson W. Rapid and sensitive protein similarity searches. Science (New York, N.Y.) 1985; 227:1435–41. https://doi.org/10.1126/science.2983426.
Article
Google Scholar
Wheeler WC, S. Gladstein D. Malign: A multiple sequence alignment program. J Hered. 1994; 85. https://doi.org/10.1093/oxfordjournals.jhered.a111492.
Rice P, Longden I, Bleasby A. Emboss: The european molecular biology open software suite. Trends Genet TIG. 2000; 16:276–7. https://doi.org/10.1016/S0168-9525(00)02024-2.
Article
CAS
Google Scholar
James Kent W. Blat - the blast-like alignment tool. Genome Res. 2002; 12:656–64. https://doi.org/10.1101/gr.229202..
Article
Google Scholar
Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012; 9(4):357–9. https://doi.org/10.1038/nmeth.1923.
Article
CAS
Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009; 25(14):1754–60. https://doi.org/10.1093/bioinformatics/btp324.
Article
CAS
Google Scholar
Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 2016; 32(14):2103–10. https://doi.org/10.1093/bioinformatics/btw152.
Article
CAS
Google Scholar
Myers G. Efficient local alignment discovery amongst noisy long reads In: Brown D, Morgenstern B, editors. Algorithms in Bioinformatics. Berlin, Heidelberg: Springer: 2014. p. 52–67.
Google Scholar
Turakhia Y, Bejerano G, Dally WJ. Darwin: A genomics co-processor provides up to 15,000x acceleration on long read assembly. SIGPLAN Not. 2018; 53(2):199–213. https://doi.org/10.1145/3296957.3173193.
Article
Google Scholar
Li H. The sequence alignment/map format and samtools. Bioinformatics. 2009; 25:2078–9.
Article
Google Scholar
Picard toolkit. Broad Institute, GitHub repository. 2019. http://broadinstitute.github.io/picard/. Accessed 11 Apr 2019.
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of ngs alignment formats. Bioinformatics. 2015; 31(12):2032–4. https://doi.org/10.1093/bioinformatics/btv098.
Article
CAS
Google Scholar
Faust GG, Hall IM. Samblaster: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014; 30(17):2503–5. https://doi.org/10.1093/bioinformatics/btu314.
Article
CAS
Google Scholar
Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012; 22(3):568–76. https://doi.org/10.1101/gr.129684.111.
Article
CAS
Google Scholar
Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, Johnson J, Dougherty B, Barrett JC, Dry JR. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 2016; 44(11):108. https://doi.org/10.1093/nar/gkw227.
Article
Google Scholar
Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013; 31:213.
Article
CAS
Google Scholar
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. 2012. http://arxiv.org/abs/arXiv:1207.3907. Accessed 11 Apr 2019.
Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 2011; 39(19):132–132. https://doi.org/10.1093/nar/gkr599.
Article
Google Scholar
Wilm A, Aw PPK, Bertrand D, Yeo GHT, Ong SH, Wong CH, Khor CC, Petric R, Hibberd ML, Nagarajan N. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 2012; 40(22):11189–201. https://doi.org/10.1093/nar/gks918.
Article
CAS
Google Scholar
Dunn T, Berry G, Emig-Agius D, Jiang Y, Lei S, Iyer A, Udar N, Chuang H-Y, Hegarty J, Dickover M, Klotzle B, Robbins J, Bibikova M, Peeters M, Strömberg M. Pisces: an accurate and versatile variant caller for somatic and germline next-generation sequencing data. Bioinformatics. 2018; 35(9):1579–81. https://doi.org/10.1093/bioinformatics/bty849.
Article
Google Scholar
Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, Källberg M, Chen X, Kim Y, Beyter D, Krusche P, Saunders CT. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018; 15(8):591–4. https://doi.org/10.1038/s41592-018-0051-x.
Article
CAS
Google Scholar
Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY, DePristo MA. A universal snp and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018; 36:983.
Article
CAS
Google Scholar
Diao Y, Roy A, Bloom T. Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis. In: CIDR: 2015.
Wong H-P, Raoux S, Kim S, Liang J, Reifenberg JP, Rajendran B, Asheghi M, Goodson KE. Phase change memory. Proc IEEE. 2010; 98(12):2201–27. https://doi.org/10.1109/JPROC.2010.2070050.
Article
Google Scholar
Burr G, J. Breitwisch M, Franceschini M, Garetto D, Gopalakrishnan K, Jackson B, Kurdi B, Lam C, A. Lastras L, Padilla A, Rajendran B, Raoux S, S. Shenoy R. Phase change memory technology. J Vac Sci Technol B Microelectron Nanometer Struct Process Meas Phenom Off J Am Vac Soc. 2010; 28. https://doi.org/10.1116/1.3301579.
Condit J, Nightingale EB, Frost C, Ipek E, Lee B, Burger D, Coetzee D. Better i/o through byte-addressable, persistent memory. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles. SOSP ’09. New York: ACM: 2009. p. 133–46. https://doi.org/10.1145/1629575.1629589.
Google Scholar
Broad Institute. Genome Analysis Toolkit. 2010. https://software.broadinstitute.org/gatk/. Accessed 11 Apr 2019.
The SAM/BAM Format Specification Working Group. Sequence Alignment/Map Format Specification. 2010. https://samtools.github.io/hts-specs/SAMv1.pdf. Accessed 11 Apr 2019.
Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Commun ACM. 2008; 51(1):107–13. https://doi.org/10.1145/1327452.1327492.
Article
Google Scholar
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 2010. https://doi.org/10.1101/gr.107524.110.
Broad Institute. GATK Best Practices Workflows. 2010. https://github.com/gatk-workflows. Accessed 11 Apr 2019.
Institute B. GATK Variant Calling Pipelines. https://software.broadinstitute.org/gatk/best-practices/.
Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature. 2004; 431(7011):931–45. https://doi.org/10.1038/nature03001.
Article
Google Scholar
Gurdasani D, Sandhu MS, Porter T, Pollard MO, Mentzer AJ. Long reads: their purpose and place. Hum Mol Genet. 2018; 27(R2):234–41. https://doi.org/10.1093/hmg/ddy177.
Article
Google Scholar
Apache. Apache Arrow: A Cross-language Development Platform for In-memory Data. 2019. https://arrow.apache.org/. Accessed 29 Dec 2019.
Peltenburg J, van Straten J, Brobbel M, Hofstee HP, Al-Ars Z. Supporting columnar in-memory formats on fpga: The hardware design of fletcher for apache arrow In: Hochberger C, Nelson B, Koch A, Woods R, Diniz P, editors. Applied Reconfigurable Computing. Cham: Springer: 2019. p. 32–47.
Google Scholar
Apache. Plasma In-Memory Object Store. 2019. https://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/. Accessed 29 Dec 2019.
Ahmad T, Peltenburg J, Ahmed N, Al Ars Z. Arrowsam: In-memory genomics data processing through apache arrow framework. 2019. https://doi.org/10.1101/741843.
Herzeel C, Costanza P, Decap D, Fostier J, Verachtert W. elPrep 4: A multithreaded framework for sequence analysis. PLOS ONE. 2019; 14(2):0209523. https://doi.org/10.1371/journal.pone.0209523.
Article
Google Scholar
Illumina. Illumina Cambridge Ltd. 2012. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA12878/sequence_read/. Accessed 24 May 2019.
Apache. Apache Spark: Lightning-fast Unified Analytics Engine. 2019. https://spark.apache.org/. Accessed 2 Apr 2019.
Mushtaq H, Liu F, Costa C, Liu G, Hofstee P, Al-Ars Z. Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics. ACM-BCB ’17. New York: ACM: 2017. p. 148–57. https://doi.org/10.1145/3107411.3107438.
Google Scholar
Massie M, Nothaft F, Hartl C, Kozanitis C, Schumacher A, Joseph AD, Patterson DA. ADAM: Genomics formats and processing patterns for cloud scale computing. Technical report, UCB/EECS-2013-207, EECS Department, University of California, Berkeley. 2013.
Wang S, Yang W, Zhang X, Yu R. Performance evaluation of imp: A rapid secondary analysis pipeline for ngs data: 2018. p. 1170–6. https://doi.org/10.1109/BIBM.2018.8621573.
Freed DN, Aldana R, Weber JA, Edwards JS. The sentieon genomics tools - a fast and accurate solution to variant calling from next-generation sequence data. 2017. https://doi.org/10.1101/115717.
Herzeel C, Costanza P, Decap D, Fostier J, Reumers J. elPrep: High-performance preparation of sequence alignment/map files for variant calling. PLOS ONE. 2015; 10(7):0132868. https://doi.org/10.1371/journal.pone.0132868.
Article
Google Scholar
Becker M, Chabbi M, Warnat-Herresthal S, Klee K, Schulte-Schrepping J, Biernat P, Guenther P, Bassler K, Craig R, Schultze H, Singhal S, Ulas T, Schultze JL. Memory-driven computing accelerates genomic data processing. 2019. https://doi.org/10.1101/519579.
ApacheFoundation. Python library for Apache Arrow. 2019. https://pypi.org/project/pyarrow/. Accessed 29 Dec 2019.
Shanshan R, Koen B, Zaid Al-Ars. Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units. Evol Bioinforma. 2018; 14. https://doi.org/10.1177/1176934318760543.
Ernst JH, Vlad-Mihai S, Koen B, Zaid Al-Ars. Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths. Computa Biol Chem. 2018; 75:54–64.
Article
Google Scholar