The complete mitochondrial genomes for three Toxocara species of human and animal health significance

Background Studying mitochondrial (mt) genomics has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms. Toxocara canis, Toxocara cati and Toxocara malaysiensis cause significant health problems in animals and humans. Although they are of importance in human and animal health, no information on the mt genomes for any of Toxocara species is available. Results The sizes of the entire mt genome are 14,322 bp for T. canis, 14029 bp for T. cati and 14266 bp for T. malaysiensis, respectively. These circular genomes are amongst the largest reported to date for all secernentean nematodes. Their relatively large sizes relate mainly to an increased length in the AT-rich region. The mt genomes of the three Toxocara species all encode 12 proteins, two ribosomal RNAs and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with all other species of Nematode studied to date, with the exception of Trichinella spiralis. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The contents of A+T of the complete genomes are 68.57% for T. canis, 69.95% for T. cati and 68.86% for T. malaysiensis, among which the A+T for T. canis is the lowest among all nematodes studied to date. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. The mt genome structures for three Toxocara species, including genes and non-coding regions, are in the same order as for Ascaris suum and Anisakis simplex, but differ from Ancylostoma duodenale, Necator americanus and Caenorhabditis elegans only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus,Dirofiliria immitis and Strongyloides stercoralis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes revealed that the newly described species T. malaysiensis was more closely related to T. cati than to T. canis, consistent with results of a previous study using sequences of nuclear internal transcribed spacers as genetic markers. Conclusion The present study determined the complete mt genome sequences for three roundworms of human and animal health significance, which provides mtDNA evidence for the validity of T. malaysiensis and also provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.


Background
Mitochondria are sub-cellular organelles involved in oxidative phosphorylation, offering energy to organisms. They play important roles in cellular metabolism, living and apoptosis. Within these organelles, most metazoan species possess a compact, circular mitochondrial (mt) genome, which varies in size from 14 to 20 kb [1]. The metazoan mt genome usually contains a complement of genes encoding 12-13 protein subunits of the enzymes involved in oxidative phosphorylation, 22 transfer RNAs, and two ribosomal RNAs. There are no introns within genes, and no to limted spacer regions between genes [1,2]. Studying mt genomes has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms due to their maternal inheritance, higher mutation rates than nuclear genes and relatively conserved genome structures [3][4][5].
Although Nematoda is the second largest animal phylum, to date only over 40 complete mitochondrial DNA sequences of nematode species have been deposited in GenBank™ [6][7][8][9][10][11][12][13][14][15]. In the order Ascaridada, the mt genomes of only two species have been sequenced [6,15]. The lack of knowledge of mt genomics for parasitic nematodes in this order is a major limitation for population genetic and phylogenetic studies of the pathogens in the order Ascaridada including the species in the genus Toxocara.
Toxocara canis, Toxocara cati and Toxocara malaysiensis are the common ascaridoid nematodes of dogs and cats, causing significant health problems. Infection of dogs with T. canis is quite common, with the prevalence ranging from 5.5% to 64.7% [16][17][18]. Infection of cats with T. cati is also found worldwide, with infection rates of up to 25.2% to 66.2% [19][20][21]. More importantly, T. canis and T. cati are of public health significance due to their larvae of being able to invade humans and cause diseases such as ocular larva migrans (OLM), visceral larva migrans (VLM), eosinophilic meningoencephalitis (EME) and/or covert toxocariasis (CT) [22][23][24]. Although they are of importance in human and animal health, there is still no information on the mt genomes available for any of Toxocara species.
The objectives of the present study were to fill some of these knowledge gaps by determining the structure, organization and sequence of the complete mt genomes of T. canis, T. cati and T. malaysiensis of human and animal significance, and to provide mtDNA evidence for the recently described species T. malaysiensis [25,26]. Also, the features of mt genomes of the three ascaridoid nematodes, such as the gene arrangements, structures, compositions, as well as translation and initiation codons and codon usage patterns were compared with those of other nematodes in the same order Ascaridata, namely Ascaris suum and Anisakis simplex. The phylogenetic relationships among these ascaridoid nematodes were also investigated using the protein-coding amino acid sequences.

Results and Discussion
General features of the mt genome of three Toxocara species The complete mt genomes of T. canis, T. cati and T. malaysiensis are 14,322 bp, 14,029 bp and 14266 bp in length, respectively. These complete mt genome sequences have been deposited in the GenBank™ under the accession numbers AM411108 (for T. canis), AM411622 (for T. cati) and AM412316 (for T. malaysiensis). All three mt genomes contain 12 protein-coding genes (cox1-3, nad1-6, nad4L, atp6 and cytb), 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene (Table 1). These circular genomes are typical of other nematode mitochondrial genomes except for Trichinella spiralis in which the atp8 gene is encoded. All genes are transcribed in the same direction as found in other members of the secernentean nematodes sequenced to date, but in contrast to T. spiralis and Xiphinema americanus [8,13].
The mt genome arrangement of T. canis, T. cati and T. malaysiensis are the same as those of A. suum and A. simplex and almost identical to the genome structures of strongylida nematodes Ancylostoma duodenale, Necator americanus, Cooperia oncophora and rhabditid nematode Caenorhabditis elegans, with the exception of the relative position of an AT-rich region and the number of non-coding regions (NCR). In the mt genomes of the three Toxocara species, the AT-rich region is located between trnS2 and trnN, whereas it is positioned between trnA and trnP in A. duodenale, N. americanus, Co. oncophora and C. elegans. The only NCR found in Toxocara spp. was between nad4 and cox1, as has been identified in the aforementioned species. The second NCR located between nad3 and nad5 in the hookworms A. duodenale and N. americanus was not found in Toxocara spp. [9]. The genome structures of these Toxocara species differ significantly from those of Onchocerca volvulus, Strongyloides stercoralis, Dirofiliria immitis, X. americanus and T. spiralis in the location of the AT-rich region, and some tRNA and protein-coding genes. Comparison of the gene arrangements of these three Toxocara species with those of the other nine representatives of the secernentean nematodes suggests that T. canis, T. cati and T. malaysiensis are more closely related to A. suum and A. simplex than to C. elegans, A. duodenale, N. americanus, Co. oncophora, O. volvulus,D. immitis and S. stercoralis.
The nucleotide compositions of the entire mtDNA sequences for T. canis, T. cati and T. malaysiensis are biased  22.0% G and 9.1% C), respectively ( Table 2).

Protein-coding genes and codon usage patterns
The boundaries between protein-coding genes of mt genomes of T. canis, T. cati and T. malaysiensis were determined by aligning their sequences and by identifying translation initiation and termination codons with those of A. suum. For each of the three Toxocara species, the lengths of protein-coding genes cox3, nad1, nad2, nad3 and nad4 are the same as those of A. suum, whereas the lengths of genes atp6, nad4L,nad5 and nad6 are reduced, and the lengths of cox2 and cytb are increased (Table 1) when compared to those of A. suum. The length of cox1 in T. canis and T. cati is the same as that of A. suum, but the length of cox1 in T. malaysiensis is increased ( Table 3).
The inferred nucleotide and amino acid sequences for each of the 12 mt proteins of T. canis, T. cati and T. malaysiensis were compared with those of A. suum and A. simplex. The identity of the nucleotide and amino acid sequences is 71~90% and 72.2~95.8%, respectively (Table 3). Based on identity, cox1 is the most conserved protein-coding gene, while the cytb is the least conserved. For all 12 proteins, the amino acid sequence identities are higher when compared between each of the three Toxocara species than between each of Toxocara specis and A. suum or A. simplex. The identities of the nucleotide and amino acid sequences among five ascaridoid species are higher than those among C. elegans, A. duodenale, N. americanus and Co. oncophora (data not shown). These findings reinforce the conclusion that the three Toxocara species are genetically more closely related to A. suum and A. simplex than to C. elegans, A. duodenale, N. americanus and Co. oncophora.
The predicted initiation and termination codons for the protein-coding genes of the three Toxocara species were compared with those of ascaridoid species (A. suum and A. simplex) and with selected species representing different nematode orders including the human hookworm A. duodenale, filarioid worm O. volvulus, and rhabditid species S. stercoralis. The most common start codon for three Toxocara species is TTG (four of 12 protein-coding genes), followed by ATT (three of 12 protein-coding genes for T. canis and T. malaysiensis, two for T. cati), ATA (two of 12 protein-coding genes for T. canis and T. cati, one for T. malaysiensis), and ATG, GTG, GTT and GTA are used as initiation codons. GTG, which is used in the cytb of T. canis and T. malaysiensis, and cox3 of T. malaysiensis, is not used as a start codon in the other nematodes compared. GTA used in the cox2 of T. malaysiensis is also used as a start codon for nad4L in D. immitis (data not shown). Seven of the 12 protein-coding genes were predicted to have a TAG or TAA translation termination codon. The remaining protein-coding genes were inferred to end with an abbreviated stop codon, such as TA or T. For the three Toxocara species, the 3'-end of most of these genes is immediately adjacent to a downstream tRNA gene (Table 1), which is  consistent with the arrangement in the mt genomes of A. suum and A. simplex [6,15], but in contrast to that of C. elegans where both the nad1 and nad3 genes terminate in T or TA, and are followed by the putative ATT translation initiation codon of their downstream protein-coding genes. The protein-coding gene nad6 ended with an abbreviated stop codon TA is followed by the putative ATT translation initiation codon of its downstream protein coding gene nad4L, which is similar to C. elegans.
In general, the nucleotides of metazoan mt genomes are not randomly distributed, and such nucleotide bias is often associated with unequal usage of synonymous codons. The mt genome nucleotide composition of nematodes is biased toward A and T. The A+T content of protein-coding genes ranged from 63.2% to 74.9% for all three Toxocara species (Table 2). This bias in nucleotide composition toward AT (Table 2) affects both the codon usage pattern and amino acid composition of proteins. In these three species examined, all 64 possible codons were used, and the most frequently used codon was TTT (Phe) while the least used codon was CGC (Arg). The preferred nucleotide usage at the third codon position of Toxocara mt protein-coding genes reflects the overall nucleotide composition of the mtDNA. At this position, T is the most frequently used, and C the least frequently used. The codons ending in G have higher frequencies than the codons ending in A, which is similar to A. suum, but opposite to C. elegans, A. duodenale and N. americanus.
The protein-coding genes of the three Toxocara genomes are biased toward using amino acids encoded by T-, Aand G-rich codons (data not shown). The AT-rich codons represent amino acids Phe, Ile, Met, Tyr, Asn or Lys, and the GC-rich codons represent Pro, Ala, Arg or Gly. T-rich codons (with ≥2 Ts in a triplet) comprise Phe (13.4% TTT and 0.6% TTC), Leu (9.0% TTG, 3.8% TTA, and 1.5% CTT), Ile (5.3% ATT), Val (6.2% GTT), Tyr (4.1% TAT), Ser (4.0% TCT), and Cys (1.5% TGT), and account for approximately half (49.6%) of the total amino acid composition. A-and G-rich codons (with ≥2 As and Gs, respectively) represent 9.7% and 13.3% of the total amino acid composition, respectively (data not shown). In contrast, the proportion of C-rich codons (with ≥2 Cs) is much lower (4.0%). This codon bias against C is even more evident when only the third codon positions are considered in both four fold and two fold degenerate codon families. When the frequencies of synonymous codons within the AT-rich group were compared, the frequency was always decreased if the third position was substituted with a C. For instance, the relative frequencies of codons for Phe are 13.4% TTT and 0.6% TTC, respectively. This result suggests that the third codon positions mostly reflect mutational bias against C. A greater translational efficiency has also been considered to be a potential cause underlying observed codon usage bias [27].

Transfer RNA genes
Twenty-two tRNA sequences (ranging from 52 to 63 bases in size; see Table 1) were identified in the mt genomes of the three Toxocara species. Their putative secondary structures (not shown) are similar to those of other nematode mtDNAs [6,7,[9][10][11]15], with the exception of T. spiralis [8], and differ from the conventional cloverleaf-like structures found in other metazoan mtDNA molecules. Common features of the predicted secondary structures (not shown) of 22 tRNA genes in T. canis, T. cati and T. malaysiensis mtDNA include a 7 bp amino-acyl stem, a 4 bp DHU stem with a 4-8 bp loop, a 5 bp anticodon stem with a loop of 7 bp (a T always preceding an anticodon as well as an A or a G always following an anticodon), and a TV replacement loop of 6-12 bp with some exceptions, in accordance with other nematodes. The exception is trnS1 (AGN) in which the DHU-arm is lacking.

Ribosomal RNA genes
The rrnS and rrnL genes of the three roundworm species were identified by sequence comparison with those of A. suum. The rrnS is located between trnE and trnS (UCN), and rrnL is located between trnH and nad3. The two genes are separated from one another by the protein-coding genes nad3, nad5, nad6 and nad4L.

Non-coding regions
Like A. suum and A. simplex, the longest non-coding region (AT-region) in the three Toxocara mt genomes is located between the trnS2 and trnN. Their sizes are 985 bp (T. canis), 711 bp (T. cati) and 936 bp (T. malaysiensis), and A+T contents are 79.4% (T. canis), 81.3% (T. cati) and 78.4% (T. malaysiensis), respectively, which are significantly lower than the comparable NCRs of nematodes studied to date (Table 2). Repeated sequence motifs (CR1-CR6) present in the C. elegans AT-rich region [6] are not found in Toxocara spp. However, there are a lot of AT dinucleotide repeats in the AT-region of Toxocara mt genomes of which the longest consists of repeat units (34 base pairs). Similar AT dinucleotide repeats has been found in A. suum [6]. The function or role of these AT repeats is currently unknown [6,7]. Although nothing is yet known about the replication process(es) in the mtDNA of parasitic nematodes, the high A+T content and the predicted structure of the AT-rich non-coding region suggests an involvement in the initiation of replication [28].
For the three roundworm species, the second longest noncoding region is located between genes cox1 and nad4, as in the mt genomes of A. suum [6]. Its length is 111 bp (T. canis), 116 bp (T. cati) and 112 bp (T. malaysiensis), with an A+T content of 86.2%, 74.1% and 74.1%, respectively, and is shorter than that of A. suum (117 bp). The non-coding region for the three Toxocara species could form a hairpin loop structure (AATTTTTAAAAATT).

Phylogenetic analyses
The final alignment of the amino acid sequences of 12 proteins for the six taxa (T. canis, T. cati, T. malaysiensis, A. suum,A. simplex and O. volvulus) yield 3516 characters (2079 variable, 339 parsimony-informative). In all three phylogenetic analyses, three Toxocara species were clustered together (Fig. 1). T. malaysiensis, the recently described Toxocara species from cat [26], was inferred to be the sister species of T. cati with high bootstrap values. This result was consistent with that of a previous study [25] which used sequences of internal transcribed spacers of nuclear ribosomal DNA, thus providing mt DNA evidence for the validity of T. malaysiensis as an ascaridoid of cats. T. malaysiensis is more closely related to T. cati, the common ascaridoid of cats, than to T. canis, the common ascaridoid of canids.
Toxocara species was resolved being more closely related to A. suum than to A. simplex with moderate support in the phylogenetic analyses, which was consistent with results of previous morphological and molecular studies [29,30]. But relationship between A. suum and A. simplex was poorly inferred in the MP and ML analyses (Fig. 1).

Conclusion
Toxocara species are the important socio-economic parasites because they have significant impact on human health. The determined mt genomes of the three roundworms, T. canis, T. cati and T. malaysiensis, add the mtDNA data to the order Ascaridida, which includes a broad range of parasites of major socio-economic importance. Determination of the complete mt genome sequences for three Toxocara species of human and animal health significance provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.

Parasites and DNA extraction
The ascaridoid nematodes used in the present study were from Zhanjiang city for T. canis, Changsha city for T. cati and Guangzhou city for T. malaysiensis, China, respectively. Adult nematodes of three Toxocara species were obtained from the intestines of dogs and/or cats, washed in physiological saline, identified primarily based on morphological characters to species, fixed in 70% (v/v) ethanol and stored at -20°C until use. Total DNA was isolated from individual nematodes using sodium dodecylsulphate/proteinase K treatment, followed by spin-column purification (Wizard Clean-Up, Promega). The specific identity of each individual nematode was verified by species-specific PCR amplification using the sequences of the first and/or second internal transcribed spacers (ITS-1 and/or ITS-2) of ribosomal DNA (rDNA) as the speciesspecific genetic markers [31,32].

Long-PCR amplification and sequencing
Using primer set 5F/40R (~9 kb region) and 39F/42R (~6 kb region) [33], the entire mt genome of each Toxocara species was amplified in two overlapping fragments by long-PCR from approximately 20 ng of total genomic DNA purified from an individual nematode, respectively. The primers were designed to mt sequences found to be relatively conserved among A. suum and C. elegans. Primer 5F (forward, 5'-TATGAGCGTCATTTATTGGG-3') and its complementary primer 42R (reverse, 5'-CCCAATAAAT-GACGCTCATA-3') were designed to the nad1 gene, while primers 39F (forward, 5'-TAAATGGCAGTCATT-AGCGTGA-3') and 40R (reverse, 5'-GAATTAAACTAATAT-CACGT-3') were designed to the rrnL gene [33]. Long-PCR cycling conditions used were 92°C for 2 min (initial denaturation), then 92°C for 10 s (denaturation), 50°C for 30 s (annealing), and 60°C (for 6 kb) or 68°C (for 9 kb) for 10 min (extension) for 10 cycles, followed by 92°C for 10 s, 50°C for 30 s, and 60°C (for 6 kb) or 68°C (for 9 kb) for 10 min for 20 cycles, with a cycle elongation of 10 s for each cycle and a final extension at 60 or 68°C for 7 min. Each PCR reaction yielded a single band detected in a 0.8% (w/v) agarose gel upon ethidium-bromide staining (not shown). PCR products were sent to TaKaRa Company (Dalian, China) for sequencing using a primer walking strategy, and the following primers were used for sequencing the complete genomes of three Toxocara species: MW1F

Sequence analyses
Sequences were assembled manually and aligned against the complete mt genome sequence of A. suum (GenBank™ accession number NC001327) using the program Clustal X to identify gene boundaries. The open-reading frames and codon usage profiles of protein-coding genes were analyzed using the program MacVector 4.1.4 (Kodak, version 4.0). Translation initiation and translation termination codons were identified based on comparison with those reported previously for A. suum. The amino acid sequences inferred for the mt genes of three ascaridoids were aligned with those of A. simplex (GenBank™ accession number AY994157) and A. suum using Clustal X. Based on pairwise alignments, amino acid identity (%) was calculated for homologous genes. Codon usage was examined using the relationships between the nucleotide composition of codon families and amino acid occurrence, where the genetic codons are partitioned into ATrich codons (i.e. those which are AT-rich at the first two codon positions), GC-rich codons (those which are GCrich at the first two codon positions) and unbiased codons. For analyzing ribosomal RNA genes, putative secondary structures of 22 tRNA genes were identified using tRNAscan-SE [34], or by recognizing potential secondary structures and anticodon sequences by eye by aligning sequences with those of A. simplex and A. suum.

Phylogenetic analyses
Phylogenetic analyses were performed using the five ascaridoid species (T. canis, T. cati, T. malaysiensis, A. suum,A. simplex) as ingroups, and one filarioid species (O. volvulus) serving as outgroup (GenBank™ accession number AF015193), based on amino acid sequences of 12 protein-coding genes. Amino acid sequences for each gene were individually aligned using Clustal X under default setting, and then concatenated into single alignments for phylogenetic analyses. Standard unweighted maximum parsimony (MP) were performed in PAUP* 4.0b10 [35] using heuristic searches with tree-bisection-reconnection branch swapping and 1000 random-addition sequence replicates with 10 trees held at each step. The Dayhoff matrix model was utilized in the analyses of neighbour joining (NJ), implemented by MEGA 3.1 [36], and maximum likelihood (ML) implemented by PhyML 2.1 [37]. Branch supports were estimated by bootstrap analysis of 1000 replicates for NJ and MP trees, and 100 replicates for ML tree.