Skip to main content

Table 4 External tools used in the GenomeQC web-application and standalone application. Note that the LTR retriever package is included in the standalone application only

From: GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations

External tools

Short Description

BUSCO v3.0.2

Dependencies:

NCBI BLAST+ v2.28.0

Augustus v3.2.1 [37]

HMMER v3.1b2 [38]

BUSCO Package is used for assessing gene space completeness using an ortholog set of conserved genes. BUSCO assessment of genome assembly involves constructing gene models from the candidate regions identified by tblastn searches against the consensus sequences. BUSCO pipeline uses AUGUSTUS de novo gene predictor to construct the gene models. These gene predictions are then used by HMMER which classifies the matches of gene predictions with the BUSCO lineage profiles as complete and single copy (C&S), duplicated (D), fragmented (F) or missing (M).

Gffread 0.9.12 [39]

Gffread is a Cufflinks utility that is used to extract the transcript sequences given the genome fasta file and annotation GFF file. (http://ccb.jhu.edu/software/stringtie/gff.shtml)

NCBI UniVec Database

Database of vector sequences, adaptors, linkers and primer sequences used in DNA cloning

Taxify module, BtIO.py, BtLog.py (Blobtools v1.1)

This script is used to add NCBI TaxID to the blast hits of the input contig/scaffold sequences to the UniVec Database

LTR retriever v2.8.2

Dependencies:

NCBI BLAST+  2.9.0

RepeatMasker 4.0.9 [40]

HMMER 3.2.1

CDHIT 4.8.1 [41]

LTRFINDER parallel [42]

LTRharvest 1.5.10 [43]

LTR retriever package is used to calculate LTR Assembly index (LAI)23 of the input genome assembly. LTRharvest and LTRFinder tools are first used to obtain retrotransposon candidates. LTR retriever package filters out false positives and generates high confidence intact LTR retrotransposons from the candidate sequences. Repeat Masker is used for whole genome LTR annotation to annotate all possible LTR-RTs present in the genome. LAI is finally calculated as the percentage of the total length of intact LTR retrotransposons present in the assembled genome sequence.