Skip to main content

Table 3 Comparison of gene family content

From: Comparative genomic analysis of human infective Trypanosoma cruzi lineages with the bat-restricted subspecies T. cruzi marinkellei

  T. c. marinkellei T. c. cruzi Sylvio X10  
Gene familya Size in assemblyb % Short readsc Size in assemblyb % Short readsc SEd
DGF 2,129,983 (6.22 %) 3.433 1,265,650 (3.28 %) 1.324 Tcm
TS 2,109,163 (6.16 %) 6.291 2,953,602 (7.65 %) 6.298 Tcc X10
MASP 540,360 (1.58 %) 1.317 727,537 (1.88 %) 1.434 Tcc X10
RHS 521,665 (1.52 %) 2.234 1,314,589 (3.41 %) 2.915 Tcc X10
GP63 452,732 (1.32 %) 1.229 514,422 (1.33 %) 0.898 Tcm
TcMUC mucin 273,890 (0.80 %) 0.557 334,544 (0.87 %) 0.515 Tcc X10
ABC 37,490 (0.11 %) 0.124 42,072 (0.11 %) 0.162 Tcc X10
RBP 25,946 (0.08 %) 0.080 26,732 (0.07 %) 0.074 Tcc X10
  1. a Gene family abbreviations: DGF=Dispersed Gene Family, TS=trans-sialidase, MASP=Mucin-associated surface protein, GP63=Surface protease, RHS=Retrotransposon Hot Spot protein, ABC=ABC Transporter, RBP=RNA Binding Protein.
  2. b The combined number of base pairs of this gene family that was identified in the assembly. Sequences were identified using RepeatMasker and a repeat library of coding sequences from the Tcc CLBR genome. These numbers include partial coding sequences. The number inside parenthesis refers to the percentage of total assembly size.
  3. c The percentage of short reads that mapped to these features.
  4. d SE=Significantly Enriched. Refers to if one genome contained significantly more of this gene family. The significance was determined from an empirical distribution of read depth differences from homologous regions of Tcm and Tcc X10, corrected for genome size. The empirical distribution was used to calculate a p-value.