Frequent tRNA gene translocation towards the boundaries with control regions contributes to the highly dynamic mitochondrial genome organization of the parasitic lice of mammals

Background The typical single-chromosome mitochondrial (mt) genome of animals has fragmented into multiple minichromosomes in the lineage Mitodivisia, which contains most of the parasitic lice of eutherian mammals. These parasitic lice differ from each other even among congeneric species in mt karyotype, i.e. the number of minichromosomes, and the gene content and gene order in each minichromosome, which is in stark contrast to the extremely conserved single-chromosome mt genomes across most animal lineages. How fragmented mt genomes evolved is still poorly understood. We use Polyplax sucking lice as a model to investigate how tRNA gene translocation shapes the dynamic mt karyotypes. Results We sequenced the full mt genome of the Asian grey shrew louse, Polyplax reclinata. We then inferred the ancestral mt karyotype for Polyplax lice and compared it with the mt karyotypes of the three Polyplax species sequenced to date. We found that tRNA genes were entirely responsible for mt karyotype variation among these three species of Polyplax lice. Furthermore, tRNA gene translocation observed in Polyplax lice was only between different types of minichromosomes and towards the boundaries with the control region. A similar pattern of tRNA gene translocation can also been seen in other sucking lice with fragmented mt genomes. Conclusions We conclude that inter-minichromosomal tRNA gene translocation orientated towards the boundaries with the control region is a major contributing factor to the highly dynamic mitochondrial genome organization in the parasitic lice of mammals. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07859-w.

Why is the fragmented mt genome organization so variable in these parasitic lice? A few comparative studies to date have suggested four models or mechanisms. First, Shao et al., (2012) [2] showed that point mutation at the third anti-codon position or homologous recombination between tRNA genes accounted for the swap of identity and location between trnL 1 and trnL 2 and between trnR and trnG in the human lice. Dong et al., (2014a) [4] confirmed that this mechanism accounted for the swap of identity and location between trnL 1 and trnL 2 in the spiny rat louse, Polyplax spinulosa. Second, Shao et al., (2012) [2] showed that one minichromosome could split into two via gene degeneration followed by deletion, based on analysis of pseudo gene sequences of the human pubic louse. Third, Song et al., (2014) [6] showed that inter-minichromosomal recombination accounted for gene translocation in the horse louse, Haematopinus asini, and mt karyotype variation among Haematopinus lice. Fourth, Shao et al., (2017) [9] showed that both split and merger of minichromosomes had occurred and contributed to the variation in mt karyotypes in sucking lice.
In the present study, we used Polyplax sucking lice, which parasitize rodents and shrews, as a model to investigate the role of tRNA gene translocation in shaping the dynamic mt karyotypes. We show that frequent directional tRNA gene translocation between different types of minichromosomes is a major contributor to the highly dynamic mt genome organization of the parasitic lice of mammals.

Materials and methods
Louse collection, DNA extraction, mitochondrial genome amplification and sequencing Specimens of Polyplax reclinata (family Polyplacidae) were collected in Yunnan Province, China, from the Asian grey shrew, Crocidura attenuata (family Soricidae). Polyplax reclinata was identified according to Chin (1999) [24]; voucher specimens (# 364) were kept at the Institute of Pathogens and Vectors, Dali University (Additional file 1). Sucking lice collected on the body surface of Crocidura attenuata were preserved in 95 % ethanol at − 20°C prior to DNA extraction. Capture of shrews was approved by health authorities in Yunnan Province, China. Animal capture protocols and procedures were approved by the animal ethics committees at Dali University. Genomic DNA was extracted from individual lice with DNeasy Blood & Tissue kit (QIAGEN). Two pairs of primers, 12SA-12SB [25] and 16SF-Lx16SR [9], were used to amplify fragments of mt genes rrnS (375 bp) and rrnL (360 bp) (Additional file 2). These primers target conserved gene sequence motifs among arthropods. The two gene fragments were sequenced directly using Sanger method at the Thermo Fisher Scientific Genome Sequencing Facility (Guangzhou). Two pairs of specific primers for P. reclinata, 12S364F-12S364R and 16S364F-16S364R, were designed from rrnS and rrnL sequences obtained (Additional file 2). The two specific primers in each pair go outwards with 15 and 21 bp in between, respectively. PCRs with these specific primers amplified the near fulllength rrnS and rrnL minichromosomes of P. reclinata (Additional file 3A). The amplicons from rrnS and rrnL minichromosomes, 2.3 and 2.6 kb in size respectively, were sequenced using Sanger method to obtain the sequences of non-coding regions. Another pair of primers specific to P. reclinata, 364F-364R (Additional file 2), was designed from conserved sequences of the noncoding regions that flank the coding regions of the rrnS and rrnL minichromosomes. The PCR with the primer pair 364F-364R produced a mixture of amplicons ranging from 1.1 to 1.8 kb in size, expected from the coding regions of the entire set of mt minichromosomes of P. reclinata (Additional file 3A). These amplicons were sequenced from both ends (i.e. paired-end) with Illumina Miseq platform: insert size 400 bp, read length 250 bp. The PCR strategy used in this study was developed from our observations in previous studies that each mt minichromosome has a distinct coding region but also a well-conserved non-coding region [1][2][3][4][5][6]9].
Takara Ex Taq was used in the initial short PCRs with the following cycling conditions: 94°C for 1 min; 35 cycles of 98°C for 10 s, 45°C for 30 s, 72°C for 1 min; and a final extension of 72°C for 2 min. Takara LA Taq was used in the long PCRs with the cycling conditions: 94°C for 1 min; 35 cycles of 98°C for 10 s, 55-65°C (depending on primers) for 40 s, 68°C for 4 min; and a final extension of 72°C for 8 min. Negative controls were executed with each PCR experiment. PCR amplicons were checked by agarose gel (1 %) electrophoresis; the sizes of amplicons were estimated by comparing with DNA markers. PCR amplicons were purified with Wizard SV Gel/PCR clean-up system (Promega). Sanger sequencing was done at the Thermo Fisher Scientific Genome Sequencing Facility (Guangzhou). Illumina sequencing was done at the Majorbio Genome Sequencing Facility (Shanghai).

Assembly of sequence reads, gene identification and minichromosome verification
Illumina sequence reads (250 bp each) were assembled into contigs with Geneious 11.1.5 [26]. The assembly parameters were minimum overlap 150 bp and minimum identity 98 %. tRNA genes were identified using tRNAscan-SE [27] and ARWEN [28]. Protein-coding genes and rRNA genes were identified by BLAST searches of GenBank [29,30]. Sequences were aligned with Clustal X [31]. The size and circular organization of each mt minichromosome of P. reclinata were verified by PCR using outbound specific primers designed from the coding region of each minichromosome (Additional file 2). The forward primer and reverse primer in each pair were next to each other with a small gap in between. PCRs with these primers amplified each circular minichromosome in full or near full length (Additional file 3B); these amplicons were also sequenced with Illumina Miseq platform to obtain the full-length sequence of the non-coding region of each minichromosome. PCR set-up, cycling conditions, agarose gel electrophoresis and size measurement were the same as described above. Negative controls were run for each PCR test. The annotated mt genome sequence of P. reclinata was deposited in GenBank (accession numbers MW291451-MW291461).

Phylogenetic analyses
We retrieved the mt genome sequences of all of the 14 species of sucking lice available in GenBank and the elephant louse (Table 1) and combined these sequences with that of P. reclinata generated in the present study. The elephant louse was used as the outgroup because it was in the suborder Rhynchophthirina, which was most closely related to sucking lice, the suborder Anoplura [10]. The sequences of eight mt protein-coding genes (atp6, atp8, cox1, cox2, cox3, cob, nad4L, nad6) and two rRNA genes (rrnS and rrnL) of these lice were used in phylogenetic analysis. Other mt protein-coding genes (nad1, nad2, nad3, nad4, and nad5) were excluded from our analysis because their sequences were not available for all of the species above. Protein-coding gene sequences were aligned based on amino acid sequences using the MAFFT algorithm implemented in Transla-torX online platform [32]. rRNA genes were aligned using the MAFFT v7.0 online server with G-INS-i strategy [33]. Individual gene alignments were concatenated after removing poorly aligned sites using GBlocks v0.91b [34]. A concatenated alignment, PCGRNA matrix, was used in subsequent analyses; this matrix combines the sequences of the eight protein-coding genes and the two rRNA genes (4,631 bp in total). The matrix was analyzed using maximum likelihood (ML) and Bayesian methods with MEGA-X [35] and 2) Bayesian inference method (BI) with MrBayes 3.2.6 [36], respectively. For ML analysis, the number of bootstrap replicates was 500; the substitution model selected was Tamura-Nei and the heuristic method selected was Nearest-Neighbor-Interchange (NNI). For Bayesian analyses, four independent Markov chains were run for 25,000 MCMC generations, sampling a tree every 10 generations. Bayesian analysis was run until the average standard deviation of split frequencies was below 0.001. The initial 10,000 trees of each MCMC run were discarded as burn-in. ML trees were directly generated in MEGA-X and the Bayesian trees were drawn with Figtree v1.4.3.

Inference of the ancestral mitochondrial karyotype of Polyplax lice
We used a parsimony method described in Shao et al., (2017) [9] to infer the ancestral mt karyotype of Polyplax lice based on information from the complete mt genomes of three Polyplax species: P. reclinata (present study), P. asiatica [4] and P. spinulosa [4]. We mapped characters of mt karyotypes on the phylogenetic tree of sucking lice inferred from mt genome sequences to identify shared characters. We inferred an mt minichromosome character to be ancestral to Polyplax: (1) if it was present in at least one Polyplax species and also in  (2) if it was present in all of the three Polyplax species. We considered tRNA genes separately from protein-coding and ribosome RNA genes because protein-coding and rRNA genes were much more stable than tRNA genes [9] (Figs. 1 and 2). For each inferred ancestral minichromosome character of Polyplax lice, we counted the changes needed to explain the observed mt karyotypes of Polyplax lice. If two or more minichromosome characters conflicted with one another, the character with the least required changes was inferred to be the ancestral. If two or more characters are equally parsimonious, all characters are inferred to be equally likely to be ancestral (Figs. 1 and 2).

Results and discussion
Mitochondrial genome of the Asian grey shrew louse,

Polyplax reclinata
We obtained the complete mt genome of P. reclinata by assembling the 53,915,414 clean Illumina sequence reads (paired-end, 250 bp each) generated from the minichromosome amplicons ( Table 2). All of the 37 mt genes  Note: a underlined nad1 is opposite to other genes in the orientation of transcription typical of bilateral animals were identified in P. reclinata; these genes were on 11 circular minichromosomes (Fig.  3). The mt minichromosomes of P. reclinata range from 2,332 to 2,869 bp in size (Table 2); each minichromosome has a coding region and a non-coding region ( Fig.  3; Table 2). The coding region of each minichromosome has 2-6 genes and ranges from 847 bp for atp8-atp6 minichromosome to 1,809 bp for H-nad5-F minichromosome (minichromosomes named after the genes they contain). Seven of the 11 minichromosomes of P. reclinata contain a single protein-coding or rRNA gene each; the other four minichromosomes contain two protein-coding genes each. There are 1-4 tRNA genes in each minichromosome except atp8-atp6 minichromosome, which has no tRNA genes (Fig. 3). Each of the 37 mt genes in P. reclinata is found only in one minichromosome; all genes have the same orientation of transcription relative to the non-coding region except for nad1, which has an opposite orientation to other genes ( Fig. 3).
We sequenced the non-coding regions of all of the 11 minichromosomes of P. reclinata in full length. The non-coding regions range from 1,060 to 1,578 bp (Table 2); the size variation is due to the additional sequences in the middle of the non-coding regions of M-L 1 -rrnL-V, S 1 -S 2 -rrnS-C, K-nad4, atp8-atp6, Q-nad2-N and E-cob-I minichromosomes (Additional file 4). If the additional sequences are excluded, the noncoding regions of all 11 minichromosomes have high sequence similarity to each other. Overall, the noncoding regions have high C and G content in one end but high A and T content in the other end. Indeed, a CG-rich motif (54 bp, 59 % C and G) was found immediately downstream from the 3'-end of the coding regions and an AT-rich motif (113 bp, 68 % A and T) was found upstream from the 5'-end of the coding regions in all of the minichromosomes (Additional file 4).

Mitochondrial karyotype variation among Polyplax species
Polyplax reclinata differs in mt karyotype from the two other Polyplax species reported previously [4], although these three species all have 11 minichromosomes.
Polyplax reclinata shares nine minichromosomes with P. spinulosa (louse of the Asian house rat) [4] but differs in the distribution of trnL 1 and trnL 2 between the other two minichromosomes (Table 3). In P. spinulosa, trnL 1 is downstream from cox1 and trnL 2 is upstream from rrnL [4]; in P. reclinata, however, it is the opposite (Fig. 3; Table 3). Polyplax reclinata shares only four minichromosomes with P. asiatica (louse of the greater bandicoot rat) [4] and differs in the distribution of six tRNA genes among the other seven minichromosomes: trnA, trnF, trnP, trnS 1 , trnS 2 and trnT (Table 3). If tRNA genes are excluded, the three Polyplax species are identical to one another in the distribution of protein-coding genes and rRNA genes among the 11 minichromosomes (Table 3; Fig. 3).

Phylogeny of sucking lice inferred from mt genome sequences
To assist the inference of the ancestral mt karyotype of Polyplax lice, we inferred the phylogeny of sucking lice using mt genome sequences of 15 species of sucking lice of seven genera including Polyplax reclinata, which was sequenced in the present study. The elephant louse, which was in the suborder Rhynchophthirina, was used as the outgroup (Table 1). Rhynchophthirina is the sister  suborder to the sucking lice (suborder Anoplura) [10]. We obtained two trees from PCGRNA matrix (4,631 bp of protein-coding and rRNA gene sequences) with ML and Bayesian methods. These two trees have similar topologies and consistently support the monophyly of the genus Polyplax (Fig. 4): bootstrap support value (BSV) = 100 %, posterior probability (PP) = 1. Within the Polyplax, P. reclinata is more closely related to P. spinulosa than to P. asiatica (BSV = 100 %, PP = 1), despite the fact that the latter two species are found mostly on rodents whereas P. reclinata is found on shrews [37,38]. The other four genera, Hoplopleura, Pediculus, Pedicinus and Haematopinus, also had multiple species from each genus included in our analysis; each of these genera was monophyletic with strong support (BSV = 100 %, PP = 1) (Fig. 4).  (Fig. 4). Each of the genera included in our analysis also represents a family ( Table 1)

Inferred ancestral mitochondrial karyotype of Polyplax lice
We inferred the ancestral mt karyotype of Polyplax lice based on the data available for the three Polyplax species and the other 12 species of sucking lice ( Table 1; Fig. 5). We mapped mt karyotype characters on the phylogeny inferred from the mt genome sequences of these lice to identify shared characters ( Figs. 1 and 2). The inferred ancestral mt karyotype of Polyplax lice by parsimony consists of 11 minichromosomes; each minichromosome has a coding region with

Mitochondrial tRNA gene translocation in Polyplax lice
When compared with the inferred ancestral mt karyotype of Polyplax lice (Fig. 5), all three Polyplax species sequenced to date retained 11 minichromosomes and the same distribution pattern of protein-coding and rRNA genes among the minichromosomes (Table 3). Therefore, no split nor merger of minichromosomes have occurred in any of these Polyplax species since their most recent common ancestor (MRCA) (Fig. 4). However, these three Polyplax species are different from one another and from their MRCA in the position of tRNA genes. In total, eight tRNA genes have changed their positions in the three Polyplax lice since they diverged from their MRCA: trnA, trnF, trnL 1 , trnL 2 , trnP, trnS 1 , trnS 2 and trnT (Table 3). trnA and trnT were likely translocated (trnT also inverted) in the MRCA of P. reclinata and P. spinulosa (Fig. 4) and thus are shared by these two species (Table 3). A pseudo trnT can be found in P. reclinata (Fig. 3) and P. spinulosa [4], indicating the translocation of trnT occurred relatively recently. trnL 1 and trnL 2 swapped their positions in P. spinulosa [4], most likely caused by point mutations at the third anti-codon position (or by homologous recombination between trnL 1 and trnL 2 )a mechanism suggested by previous studies [2,9]. trnF and trnP were translocated in P. asiatica (Table 3); as a cluster, trnS 1 -trnS 2 was translocated either in P. asiatica or equally likely in the MRCA of P. reclinata and P. spinulosa (Table 3; Fig. 5).
tRNA gene translocation is frequent and directional towards the boundaries with control region in parasitic lice with fragmented mitochondrial genomes We noted in the Polyplax lice above that mt tRNA gene translocation was closely associated with the control region, i.e. the non-coding region (NCR). With no exception, the destination of all of the translocated tRNA genes (i.e. trnA, trnF, trnS 1 -trnS 2 , trnP and trnT excluding trnL 1 and trnL 2 which swapped positions in P. spinulosa as discussed above) were either immediately next to the NCR or in the case of trnP of P. asiatica, a short distance (189 bp) inside the NCR (Table 3) [4], indicating a possible role of NCR in tRNA gene translocation. The observed probability of tRNA gene translocation to be adjacent to NCR in these Polyplax species is significantly higher (p < 0.00001) than the expected probability (Table 4). We investigated further if this was a broader pattern for parasitic lice with fragmented mt genomes by comparing the mt karyotypes: (1) between the MRCA of sucking lice [9] and the MRCA of Polyplax (Fig. 5); and (2) between the MRCA of sucking lice and each of the 12 non-Polyplax sucking louse species that have been sequenced to date [9,12]. The MRCA of Polyplax species retained the same distribution pattern of protein-coding and rRNA genes and the same number of minichromosomes as the MRCA of sucking lice. However, five tRNA genes have translocated in the MRCA of Polyplax relative to the MRCA of sucking lice: all of these tRNA genes were translocated immediately next to the NCR despite alternative intergenic locations available for translocation ( Table 5). The observed probability of tRNA gene translocation to be adjacent to NCR in the MRCA of Polyplax species is significantly higher (p < 0.00001) than the expected probability (Table 6). When compared with the MRCA of sucking lice, 19 of the 22 tRNA genes in total have translocated in the 12 species of non-Polyplax sucking lice: 32 out of the 54 (i.e. 59 %) translocation events resulted in tRNA genes moved to be adjacent to NCR despite alternative intergenic locations available for translocation ( Table 7). The observed probability of tRNA gene translocation to be adjacent to NCR in these non-Polyplax species is 13.6 % higher than the expected probability but this difference is not statistically significant (p = 0.116709) ( Table 8). Most likely tRNA gene translocation to be adjacent to NCR has been under-captured in non-Polyplax species (Table 7) due to the fact that any newly translocated tRNA gene adjacent to a NCR would inevitably push the previously translocated tRNA gene away from the NCR. After all, there can only be one gene at any time adjacent to each end of the NCR. Among the three comparisons above, the third comparison has the longest evolutionary time frame. The longer the evolutionary time is, the more likely tRNA gene translocation information to be adjacent to NCR will be under-captured.
Clearly, the boundaries between coding regions and NCRs are hot spots for tRNA gene translocation in parasitic lice that have fragmented mt genomes. Of the models and mechanisms proposed in previous studies (introduced above in the Background) [2,4,6,9,10], only the inter-minichromosomal recombination model proposed by Song et al., [6] based on the mt karyotype variation among Haematopinus lice can explain tRNA gene translocation between mt minichromosomes. However, it cannot explain why translocated tRNA genes tend to be in the boundaries between coding regions and NCRs. As DNA double-strand breaks (DSBs) are apparently necessary for gene translocation between mt minichromosomes, the boundary regions may have more frequent DSBs than other regions of the mt minichromosomes, possibly due to the specific AT-rich and GCrich sequence elements in the boundary regions; these sequence elements are conserved among all mt minichromosomes of each species of parasitic louse (Additional file 4). Apparently, DSBs do not occur randomly. Tubbs et al. showed that in mouse and human cells, DSBs occur mostly in AT-rich regions, in particular at

Yes Yes
Notes: Refer to Table 3 for tRNA gene translocation in Polyplax species. "Expected probability of translocation to be adjacent to NCR" = "total sites adjacent to NCR" / "total sites adjacent to NCR + total intergenic sites". "Expected probability of translocation to an intergenic region" = "total intergenic sites" / "total sites adjacent to NCR + total intergenic sites". "Observed probability of translocation to be adjacent to NCR" = "translocation adjacent to NCR" / "translocation adjacent to NCR + translocation to intergenic sites". "Observed probability of translocation to an intergenic region" = "translocation to intergenic sites" / "translocation adjacent to NCR + translocation to intergenic sites". T-Test was done at: https://www.socscistatistics.com/tests/studentttest/default2.aspx homopolymeric (dA/dT) tracts [40]. Sinai et al. showed that integration of AT-dinucleotide rich sequences into human chromosomes created recurrent gaps and breaks at the integration sites, due to a significantly increased tendency to fold into branched secondary structures at these sites [41]. Baudat et al. showed that a histone methyl transferase, PRDM9, is a major determinant of meiotic recombination hotspots in humans and mice, and furthermore, the human PRDM9 binding sites contain a 13-mer GC-rich motif and are DSB hotspots [42]. Sundararajan et al. showed that DSB hotspots have a strong tendency to occur in genes with high GC content in maize [43]. He et al. investigated the DNA sequence of DSB hotspots and identified a 20-bp-long GC-rich degenerate DNA sequence motif (named Maize Hotspot Sequence) present in about 72 % of genic hotspots in maize [44]. In general, noncanonical DNA structures such as hairpins and Gquadruplexes render a genome prone to damage including DSBs [45]. We counted the number of translocated tRNA genes in sucking lice at each end of the mt non-coding regions (Tables 3, 5 and 7) the distribution is rather even (20 vs. 20) between the two ends where the AT-rich and GC-rich motifs were found respectively. Our observed pattern of mt tRNA gene translocation in sucking lice aligns well with the evidence presented in these experimental studies on DSB hotspots. Future studies on DSBs and DNA repair in parasitic lice may help understand further the mechanisms of mt tRNA gene translocation observed in sucking lice in the present study.
In conclusion, we sequenced the fragmented mt genome of the Asian grey shrew louse, Polyplax reclinata, which comprises 11 circular minichromosomes. We also inferred the ancestral mt karyotype of Polyplax lice based on the data available to date. We found that tRNA genes are entirely responsible for mt karyotype variation among the three species of Polyplax lice sequenced to date. Furthermore, tRNA gene translocation in these Polyplax lice is much more frequent than that of protein-coding and rRNA genes; all tRNA gene translocation observed occurs between different types of minichromosomes and is directional towards the boundaries with the control region. A similar pattern of tRNA gene translocation is also conserved in other sucking lice with fragmented mt genomes.
Our results show that interminichromosomal tRNA gene translocation orientated

D-A-M-F-nad6
Pedicinus obtusus [12] Yes No Yes Yes Pediculus capitis [2] Pediculus humanus [2] V M-L 1 -rrnL-V G-nad3-V-W-S 2 Pthirus pubis [2] Yes No G-nad3-W-D-A-V Pedicinus badii [12] Yes Yes nad2-Y-cox2-V Pedicinus obtusus [12] Yes Yes G-nad4L-V Pediculus schaeffi [7] Yes Yes Pediculus capitis [2] Pediculus humanus [2] Notes: Translocation events shared by congeneric species are grouped together and counted as single events. Genes underlined have opposite orientation of transcription to other genes towards the two ends of the control region is a major contributing factor to the highly dynamic mitochondrial genome organization in the parasitic lice of mammals. trnM or M: tRNA gene for methionine; trnN or N: tRNA gene for asparagine; trnP or P: tRNA gene for proline; trnQ or Q: tRNA gene for glutamine; trnR or R: tRNA gene for arginine; trnS 1 or S 1 : tRNA gene for serine (anticodon NCU); trnS 2 or S 2 : tRNA gene for serine (anticodon NGA); trnT or T: tRNA gene for threonine; trnV or V: tRNA gene for valine; trnW or W: tRNA gene for tryptophan; trnY or Y: tRNA gene for tyrosine; U: Uracil
Additional file 2: The primers used to amplify the mitochondrial genes, minichromosomes and coding regions of the Asian grey shrew louse, Polyplax reclinata.

No No
Notes: Refer to Table 7 for tRNA gene translocation in non-Polyplax sucking lice. Refer to Table 4 notes for probability calculation and T-Test Additional file 4: Alignment of the non-coding region (NCR) sequences of 11 mitochondrial minichromosomes of the Asian grey shrew louse, Polyplax reclinata. The primer pair, 364 F and 364R, were used to amplify the coding regions of the 11 minichromosomes (see also Additional file 3). Asterisk symbol "*" indicates conserved nucleotides; hyphen "-" indicates absent nucleotides.