Skip to main content

Advertisement

Table 2 Summary of assessment of the de novo and integrated transcriptome assemblies

From: A high-quality annotated transcriptome of swine peripheral blood

Transcriptome assembly Type of assessment Purpose Reference data Software Results
The de novo transcriptome assembly RNA-seq read representation of the assembly To determine representation of RNA-seq reads Normalized RNA-seq reads Trintiy [43, 44] 66.2% of normalized RNA-seq reads could be mapped back to the de novo assembly
Representation of full-length assembled protein-coding transcripts To assess the number of full-length PTs All protein sequences in the Swiss-Prot database BLASTX [83] 22,831 (nearly) full-length PTs covered more than 80% of the full length of 10,097 protein sequences in the Swiss-Prot database
Representation of full-length assembled transcripts To assess the number of full-length PTs NCBI pig RefSeq mRNAs DC-megaBLAST [83] 16,010 (nearly) full-length PTs covered more than 80% of the full length of 9228 pig RefSeq mRNAs
Origin of assembled transcripts To assess whether the assembled PTs were of porcine genomic origin Pig reference genomes: SSC10.2 and USMARCv1.0 GMAP [49] 94.2% and 99.4% of the PTs could be mapped to SSC10.2 and USMARCv1.0, respectively
Similarity-based assessment To annotate the assembled PTs with known sequences of significant similarity Sequences in the NCBI NT and NR databases DC-megaBLAST and BLASTX [83] 69.42% and 21.9% of the PTs shared significant similarities to sequences in the NCBI NT and NR databases, respectively
The integrated transcriptome assembly Similarity-based assessment To annotate the assembled PTs with known sequences of significant similarity Sequences in the NCBI NT and NR databases DC-megaBLAST and BLASTX [83] ~90% and 63% of the PTs shared significant similarities to sequences in the NCBI NT and NR databases, respectively
Correctness of exon-intron splicing junctions of PTs To validate the exon-intron splicing junctions of PTs Porcine IsoSeq full-length cDNA read data from the liver, spleen and thymus, SSC10.2 transcripts and NCBI RefSeq mRNAs Bedtools [48] and custom Perl scripts 15,303 PTs and 106,483 IsoSeq sequences had the same exon-intron junctions; and 63,845 uniquely mapping, spliced PTs shared at least one intron or exon with 390,943 IsoSeq reads; 4155 and 6641 PTs shared the same exon-intron junctions as 4010 SSC10.2 annotated transcripts and 6418 RefSeq mRNA sequences, respectively; 54,402 and 60,180 PTs shared at least one intron or one exon with 18,437 SSC10.2 transcripts and 33,870 RefSeq mRNA sequences, respectively
Completeness of 5′ termini of PTs To validate the completeness of 5′ termini of PTs FANTOM5 CAGE data for humans and mouse, and porcine macrophage CAGE data CAGEr [55], Bedtools [48] and custom Perl scripts Completeness of the 5′ termini of 37,569 PTs were verified by 43,845 proximal promoters determined by CAGE data
Length extension of existing transcripts To determine to what extent the assembled PTs improved over the existing porcine annotation SSC10.2 transcripts and NCBI pig RefSeq mRNAs Bedtools [48] and custom Perl scripts 12,262 PTs had both longer 5′ and 3′ termini than the maximally overlapping SSC10.2 transcripts; 9764 PTs had only longer 3′ termini; and14,650 PTs had only longer 5′ termini
Novelty of PTs To determine novel PTs SSC10.2 transcripts and NCBI pig RefSeq mRNAs Bedtools [48] and custom Perl scripts 41,838 and 35,738 spliced PTs that did not overlap any spliced, uniquely mapping SSC10.2 transcripts or with any spliced, uniquely mapping pig RefSeq mRNA sequence were potential novel transcritps relative to the two reference sets, respectively