From: A high-quality annotated transcriptome of swine peripheral blood
Transcriptome assembly | Type of assessment | Purpose | Reference data | Software | Results |
---|---|---|---|---|---|
The de novo transcriptome assembly | RNA-seq read representation of the assembly | To determine representation of RNA-seq reads | Normalized RNA-seq reads | 66.2% of normalized RNA-seq reads could be mapped back to the de novo assembly | |
Representation of full-length assembled protein-coding transcripts | To assess the number of full-length PTs | All protein sequences in the Swiss-Prot database | BLASTX [83] | 22,831 (nearly) full-length PTs covered more than 80% of the full length of 10,097 protein sequences in the Swiss-Prot database | |
Representation of full-length assembled transcripts | To assess the number of full-length PTs | NCBI pig RefSeq mRNAs | DC-megaBLAST [83] | 16,010 (nearly) full-length PTs covered more than 80% of the full length of 9228 pig RefSeq mRNAs | |
Origin of assembled transcripts | To assess whether the assembled PTs were of porcine genomic origin | Pig reference genomes: SSC10.2 and USMARCv1.0 | GMAP [49] | 94.2% and 99.4% of the PTs could be mapped to SSC10.2 and USMARCv1.0, respectively | |
Similarity-based assessment | To annotate the assembled PTs with known sequences of significant similarity | Sequences in the NCBI NT and NR databases | DC-megaBLAST and BLASTX [83] | 69.42% and 21.9% of the PTs shared significant similarities to sequences in the NCBI NT and NR databases, respectively | |
The integrated transcriptome assembly | Similarity-based assessment | To annotate the assembled PTs with known sequences of significant similarity | Sequences in the NCBI NT and NR databases | DC-megaBLAST and BLASTX [83] | ~90% and 63% of the PTs shared significant similarities to sequences in the NCBI NT and NR databases, respectively |
Correctness of exon-intron splicing junctions of PTs | To validate the exon-intron splicing junctions of PTs | Porcine IsoSeq full-length cDNA read data from the liver, spleen and thymus, SSC10.2 transcripts and NCBI RefSeq mRNAs | Bedtools [48] and custom Perl scripts | 15,303 PTs and 106,483 IsoSeq sequences had the same exon-intron junctions; and 63,845 uniquely mapping, spliced PTs shared at least one intron or exon with 390,943 IsoSeq reads; 4155 and 6641 PTs shared the same exon-intron junctions as 4010 SSC10.2 annotated transcripts and 6418 RefSeq mRNA sequences, respectively; 54,402 and 60,180 PTs shared at least one intron or one exon with 18,437 SSC10.2 transcripts and 33,870 RefSeq mRNA sequences, respectively | |
Completeness of 5′ termini of PTs | To validate the completeness of 5′ termini of PTs | FANTOM5 CAGE data for humans and mouse, and porcine macrophage CAGE data | Completeness of the 5′ termini of 37,569 PTs were verified by 43,845 proximal promoters determined by CAGE data | ||
Length extension of existing transcripts | To determine to what extent the assembled PTs improved over the existing porcine annotation | SSC10.2 transcripts and NCBI pig RefSeq mRNAs | Bedtools [48] and custom Perl scripts | 12,262 PTs had both longer 5′ and 3′ termini than the maximally overlapping SSC10.2 transcripts; 9764 PTs had only longer 3′ termini; and14,650 PTs had only longer 5′ termini | |
Novelty of PTs | To determine novel PTs | SSC10.2 transcripts and NCBI pig RefSeq mRNAs | Bedtools [48] and custom Perl scripts | 41,838 and 35,738 spliced PTs that did not overlap any spliced, uniquely mapping SSC10.2 transcripts or with any spliced, uniquely mapping pig RefSeq mRNA sequence were potential novel transcritps relative to the two reference sets, respectively |