Skip to main content

Table 1 Summarised EnTAP annotation statistics

From: Defence transcriptome assembly and pathogenesis related gene family analysis in Pinus tecunumanii (low elevation)

  Pinus tecunumanii (LE) Pinus patula
Assembly Statistics
Assembly PnteEviGene Pnte_v1.0 PiptEviGene Pipt_v2.0
Total Sequences 91,552 28,621 325,974 52,735
 Total Transcriptome Length (Mb) 100.95 50.01 259.02 72.15
 Average Sequence Length (nt) 1103 1747 794 1368
 N50 (nt) 1551 2296 870 1897
 Longest Sequence (nt) 16,886 16,886 16,570 16,570
 Shortest Sequence (nt) 351 351 351 351
 %GC 48% 44% 50% 46%
Sequence filteringa
 Sequences with FPKM > 1 77,563   203,996  
(84.72%) (62.58%)
 Sequences with GeneMarkS-T predicted CDS 74,556 194,568
(81.44%) (59.69%)
 Total proteins after clustering to 90% identity 70,748 167,961
(77.28%) (51.53%)
Annotationb
  Predicted protein frame
 Complete 32,193 18,971 61,134 26,835
(45.50%) (66.27%) (36.40%) (50.89%)
 Internal 18,279 3347 57,027 9602
(25.84%) (11.70%) (33.95%) (18.21%)
 3′-partial 6686 1895 21,848 4354
(9.45%) (6.63%) (13.01%) (8.26%)
 5′-partial 17,398 4408 54,559 11,944
(24.59%) (15.40%) (32.48%) (22.65%)
  Similarity Search Annotation
 Sequences with informative BLASTp alignments 19,296 15,192 38,714 27,328
(27.27%) (53.09%) (23.05%) (51.82%)
 Sequences with uninformative BLASTp alignments 15,437 6131 34,101 9261
(21.82%) (21.41%) (20.30%) (17.56%)
 Sequences without BLASTp alignments 36,015 7300 95,146 16,147
(50.91%) (25.50%) (56.65%) (30.62%)
  Functional Annotation
 Sequences with family assignment 55,627 28,484 128,952 52,166
(78.63%) (99.52%) (76.77%) (98.92%)
 Sequences with at least one GO term 31,640 16,197 77,063 33,712
(44.72%) (56.60%) (45.88%) (63.93%)
 Sequences with at least one pathway (KEGG) assignment 17,303 8004 47,383 21,094
(24.46%) (27.98%) (28.21%) (40.00%)
  Annotation Summary
Unannotated Sequences 14,568 0 36,698 0
(20.59%) (0.00%) (21.85%) (0.00%)
Total sequences annotated 56,180 28,621 131,263 52,735
(79.41%) (100.00%) (78.15%) (100.00%)
Non-pine origin sequences 27,550 0 78,527 0
(38.94%) (0.00%) (46.75%) (0.00%)
  1. aPercentages relative to EviGene assemblies
  2. bPercentages relative to clustered GeneMarkS-T assemblies for EviGene columns and relative to total sequences for Pnte_v1.0 and Pipt_v2.0