LTR retroelements in the genome of Daphnia pulex
© Rho et al. 2010
Received: 29 June 2009
Accepted: 9 July 2010
Published: 9 July 2010
Skip to main content
© Rho et al. 2010
Received: 29 June 2009
Accepted: 9 July 2010
Published: 9 July 2010
Long terminal repeat (LTR) retroelements represent a successful group of transposable elements (TEs) that have played an important role in shaping the structure of many eukaryotic genomes. Here, we present a genome-wide analysis of LTR retroelements in Daphnia pulex, a cyclical parthenogen and the first crustacean for which the whole genomic sequence is available. In addition, we analyze transcriptional data and perform transposon display assays of lab-reared lineages and natural isolates to identify potential influences on TE mobility and differences in LTR retroelements loads among individuals reproducing with and without sex.
We conducted a comprehensive de novo search for LTR retroelements and identified 333 intact LTR retroelements representing 142 families in the D. pulex genome. While nearly half of the identified LTR retroelements belong to the gypsy group, we also found copia (95), BEL/Pao (66) and DIRS (19) retroelements. Phylogenetic analysis of reverse transcriptase sequences showed that LTR retroelements in the D. pulex genome form many lineages distinct from known families, suggesting that the majority are novel. Our investigation of transcriptional activity of LTR retroelements using tiling array data obtained from three different experimental conditions found that 71 LTR retroelements are actively transcribed. Transposon display assays of mutation-accumulation lines showed evidence for putative somatic insertions for two DIRS retroelement families. Losses of presumably heterozygous insertions were observed in lineages in which selfing occurred, but never in asexuals, highlighting the potential impact of reproductive mode on TE abundance and distribution over time. The same two families were also assayed across natural isolates (both cyclical parthenogens and obligate asexuals) and there were more retroelements in populations capable of reproducing sexually for one of the two families assayed.
Given the importance of LTR retroelements activity in the evolution of other genomes, this comprehensive survey provides insight into the potential impact of LTR retroelements on the genome of D. pulex, a cyclically parthenogenetic microcrustacean that has served as an ecological model for over a century.
Transposable elements (TEs) have been found in most eukaryotic genomes and often constitute a significant portion of the genome (e.g., 80% of maize , 45% of human , and 5.3% of the fruit fly genome [3, 4] are known to be comprised of TEs). Because they can transpose from one location to another within the genome or across genomes, the identification of TEs and analysis of their dynamics are important for a better understanding of the structure and evolution of both genomes and TEs themselves [5, 6]. Based on the mechanism of transposition, TEs are categorized into two major classes. The elements in class I (retroelements) are transposed through reverse transcription of an RNA intermediate, whereas the elements in class II (DNA transposons) are transposed through a cut-and-paste transposition mechanism . LTR retroelements, one type of class I retroelements, are characterized by long terminal repeats (LTRs) at their 5' and 3' ends, and encode genes required for their retrotransposition (e.g., gag and pol). In several species, LTR retroelements have amplified to high levels resulting in major modifications of the host genome (e.g., in rice [7, 8])
In order to identify LTR retroelements in whole genome sequences, many computational methods have been developed . De novo approaches search for putative pairs of LTRs in the genome [10, 11]. The identified LTRs can then be combined with other important sequence features, including target site duplications (TSDs) and conserved protein domains, to identify intact LTR retroelements. Once the intact LTR retroelements are found, homology-based searching (e.g., using RepeatMasker with a library of intact LTR retroelement sequences) can be used to identify additional fragmented elements and solo LTRs in the genome.
Although newly-sequenced genomes may contain many TEs, it is often unclear what proportion of the identified elements remains active in the population. Recent advances in tiling array technology provide opportunities for measuring gene transcription levels at a genome-wide scale, which can also be used to detect the activity of the TEs that are identified in silico. Even though transcription of TEs is not sufficient to cause their transposition, it is a necessary first step for mobilization of retroelements. In addition, recent work suggests transposable elements may upregulate expression of host genes  or, more generally, that TEs may function as part of genome-wide regulatory networks . Because transcription patterns of TEs are known to vary under different environmental conditions and/or at developmental stages, analysis of transcription profiles is the first step towards understanding what factors might induce mobilization of TEs in the host genome.
Transposon display can be used to compare differences in TE load among individuals or populations over time or from different regions. One of the features of the host genomic environment that has been proposed to significantly impact TE mobility and distribution is the frequency of recombination [14, 15]. Because D. pulex is a cyclical parthenogen, it is possible to assess the role of recombination in TE proliferation in this species without many of the confounding variables that have plagued past comparisons (e.g., species differences ). This is because natural populations of D. pulex are known to lose the ability to reproduce sexually (thereby becoming obligate asexuals) and sexual reproduction can be suppressed or promoted by manipulating laboratory conditions. Thus, it is possible to use this system to look more closely at the short-and long-term impact of recombination on TE abundance by combining laboratory and field comparisons.
The analysis of D. pulex presented in this paper represents the first such data for a freshwater aquatic arthropod and cyclical parthenogen and provides an opportunity to better understand the dynamics of TEs via comparison with other well-studied systems. LTR retroelements have been shown to exert a strong impact on the genome of other organisms (see  for a recent review) and may be capable of similar mobility and influence in this species as well.
Summary of LTR retroelements in D. pulex.
# of elements
Avg. Length of LTR (bp)
(min - max)
Avg. Length of elements (bp)
(min - max)
(193 - 1735)
(3349 - 12536)
(172 - 602)
(4064 - 8184)
(88 - 170)
(134 - 938)
(4026 - 12862)
In order to understand how the LTR retroelements in the D. pulex genome are different from those in other invertebrate genomes, we applied MGEScan-LTR  to four additional genomes: Anopheles gambiae, Bombyx mori, Drosophila melanogaster, and Oryza sativa. Although these genomes have been analyzed in previous studies [3, 18, 20, 21], we searched for the intact LTR retroelements following the same procedure used for D. pulex (Additional file 1 Table S2). The elements that we identified using our pipeline largely overlap with previously described elements for each species. Small differences might be due to the difference between the versions of genomic sequences and/or the criteria used in these analyses.
The D. pulex genome has fewer BEL elements compared with other insect genomes for which data exist (D. melanogaster and A. gambiae), which have more BEL elements than copia elements (Figure 2). A total of 66 intact BEL retroelements were identified and clustered into 26 families, which correspond to 20% of all intact LTR retroelements found in this genome. The BEL/Pao retroelements are known to have four major lineages: Cer, Pao/Ninja, Bel/Moose, and Suzu [25–29]. Six BEL families identified in the D. pulex genome were close to the Cer retroelements from C. elegans in the neighbor-joining tree (bootstrap value of 87, Figure 1). The other 20 BEL families in the D. pulex genome were close to the Pao/Ninja lineage.
DIRS retroelements typically contain inverted repeats instead of direct repeats, and are typically much shorter than classic LTRs [30, 31]. Hence, we modified MGEScan-LTR program accordingly to search for proximal inverted repeats and ORFs encoding proteins such as RT and tyrosine recombinase (YR). A total of 19 intact DIRS retroelements (from 16 families) were identified in the D. pulex genome, which correspond to 6% of all elements identified in this genome. Given that no DIRS element has been identified in any previously surveyed arthropod genome except Tribolium castaneum , D. pulex has the largest number of DIRS elements among the arthropods so far.
LTR retroelements overlapping with transcriptionally active regions.1
of TARs (bp)
of LTRs (%)3
Among the three experiments under different conditions, Dpul_G5 and Dpul_G7 showed transcriptional activity across all six different conditions. On the other hand, 20 elements were expressed in only one of the conditions. The expression pattern of these LTR retroelements is shown for each condition (Figure 3). The elements showed higher overall transcriptional activity in the dataset of adults, including female and male (Figure 3a and 3b) than in the other two data sets (mature stage-specific and 4th instar juvenile). In the kairomone-exposed condition, more elements were transcribed than in the control set (Figures 3e and 3f).
In order to assess the role of reproductive mode in retroelement distribution and abundance among sexually-and asexually-reproducing isolates, we developed a transposon display assay for two families of DIRS elements identified in the D. pulex genome. We chose DIRS elements because they exhibited intact open-reading frames (which are thought to be a prerequisite for potential activity) and were low-copy number (perhaps making them less likely targets for silencing and readily quantifiable using transposon display; see methods for details). We surveyed mutation-accumulation (MA) lines of D. pulex to try and identify if there was any detectable activity and if patterns differed among lines where sex was promoted or prohibited. In addition, we compared TE loads for these two families of retroelements among natural populations in which sex occurs annually (cyclical parthenogens), and in which it does not occur (obligate asexuals).
Rate of loss (per element per generation) and putative somatic gains (per element) observed in two families of transposable elements across mutation-accumulation lines of D. pulex where sex was promoted and prohibited (means, SE, t-statistic [t] and probability values [P] reported).
No. of scored sites
Putative Somatic Gains
Mean number of occupied sites (± SE) for two families of retroelements assayed across natural populations of D. pulex.1
Total no. of occupied sites
(across all isolates)
Mean no. of occupied sites per isolate
1.95 (± 0.2)
2.09 (± 0.2)
0 - 6
1.41 (± 0.2)
0.48 (± 0.1)
0 - 5
In this study, we have identified 333 intact LTR retroelements in the D. pulex genome which were clustered into 142 families. With the library of intact elements identified, 3774 LTR retroelements were found by using Repeatmasker. These retroelements constitute 7.9% of the D. pulex genome, which is much higher than D. melanogaster (2.6% of 120 Mb genome)  and lower than that found in B. mori (11.8% of 427 Mb genome) . These levels are all, however, much lower than those found in plants which are known to typically have a much higher proportion of LTR retroelements in their genomes (e.g., 17% in O. sativa ). In addition to quantifying the LTR retroelement content, our survey showed that the families of LTR retroelements in D. pulex are more divergent than previous whole genome analyses have shown. For example, while only 26 copia elements were identified in D. melanogaster , in D. pulex there are 95 families (Additional file 1 Table S1; Figure 2). In all invertebrate genomes surveyed in this study, the number of copia families are very low (Additional file 1 Table S2), which is also consistent with previous studies [3, 21]. Our study also confirmed the presence of 19 DIRS elements in the D. pulex genome, which is much higher than any other invertebrate genomes sequenced so far. Only a few DIRS elements have been found in T. castaneum , Dictyostelium discoideum, and some fish (e.g., Danio rerio ), but none have been identified in the model organisms D. melanogaster, A. gambiae, and O. sativa.
Since transcription of the LTR retroelements is the first step required for their transposition, genome-wide screening of transcriptional data was used to determine what proportion of the LTR retroelements might be active. Tiling arrays use unbiased probes, in contrast to cDNA microarrays which are designed to target gene expression alone, thus providing a general picture of expression patterns under various conditions. Overall, the transcription of more than 20% (71 out of 333) of the intact LTR retroelements was detected in the D. pulex genome. For the purpose of comparison, we retrieved the expression pattern for 136 intact non-LTR retroelements that were identified in the D. pulex genome , and found that only eight (~5%) elements showed transcriptional activity and one of them had significantly long TARs (1138 bp). Additionally, we collected tiling array data for D. melanogaster at different developmental stages from the ENCODE website (Additional file 1 Table S5) and matched the TARs with the annotated LTR retroelements. In total, 25 (out of 412) intact elements from 12 families match with TARs, including 3 BEL, 1 copia, and 21 gypsy elements. Four elements from roo and rover families that have been shown to transpose previously [33, 34], also showed transcriptional activity here (TAR length > 500 bp). The LTR retroelements in D. pulex exhibit higher transcriptional activity (in terms of the number and diversity of the elements) than those in D. melanogaster, even though there are fewer intact LTR retroelements identified in the D. pulex genome than those in the D. melanogaster genome.
Several elements in plant genomes are known to be able to transpose under specific conditions (e.g., high temperature [35, 36]). Our study shows that the kairomone-exposed Daphnia show higher TE transcription levels than controls. Notably, under the same condition, the protein-coding genes of Daphnia also showed an overall higher transcription level, implying that global transcription activity is induced under the kairomone-exposed condition. On the other hand, the transcription level of LTR retroelements is not significantly different in the experiments comparing female vs. male and metal exposure. Although our analysis shows general trends in transcriptional activity, further experiments are required to investigate the activity of individual LTR retroelement families.
Although no germline gains were observed in the mutation-accumulation lines, evidence for putative somatic gains was observed in both DIRS families assayed, providing additional evidence that there may be active retroelements in the D. pulex genome. The higher rate of putative somatic gains observed in lines in which sex occurred for the Dpul_D15 family is the opposite of the trend observed in DNA transposon families (Schaack et al. accepted). In addition to gains, lineages undergoing sex exhibited frequent losses for one family assayed, presumably because this family included heterozygotic copies (presence-absence) at the beginning of the experiment, which subsequently were lost 25% of the time via independent assortment of chromosomes during sex (which in this case was selfing). This difference highlights the importance of reproductive mode for the accumulation of mutation loads in the genome. Sexually-reproducing organisms can purge deleterious mutations (such as TE insertions) during recombination. Asexuals cannot purge TE insertions (other than via mitotic recombination at heterozygotic loci). As asexuals accumulate new mutations over time (Muller's ratchet ), it is thought that their fitness will decline and eventually they will go extinct .
Although the results of the transposon display assay support the idea that TEs may build up in asexual lineages over time, the data from the natural isolates indicate that, in nature, sexual isolates build up higher TE loads than asexuals, at least in one of the two families assayed (Dpul_D5). This result corroborates previous studies in D. pulex on the DNA transposon Pokey assayed among natural populations [39, 40]. The increased number of TEs in sexuals could be explained in a number of ways. First, despite the increased efficiency of selection in sexual lineages, sex is a good way for new TE copies to spread among lineages in a population (whereas a new insertion in an asexual lineage is, effectively, at a genetic dead end). It is also possible that TE copies in recombining genomic backgrounds are able to better evade host suppression mechanisms because there is a higher chance of meiotic recombination among TE copies and therefore the production of novel genotypes undetectable by co-evolved suppression mechanisms. Alternatively, recombination events among retroelements belonging to the same family may render individual copies inactive, leading to a build-up over time of inactive copies in sexual lineages which is less likely in asexuals. Lastly, obligate asexuals that are able to persist in nature may represent isolates that evolved from especially low load sexual lineages, thereby minimizing the so-called "lethal hangover" from their sexual ancestors .
We have performed a genome-wide analysis of the LTR retroelement content of the D. pulex genome, the first aquatic microcrustacean and cyclical parthenogen for which such an analysis has been performed. We identified 333 intact LTR retroelements in the D. pulex genome, and categorized them into BEL/Pao, copia, DIRS, and gypsy groups, respectively. As with other insects such as D. melanogaster and A. gambiae, the major group of retroelements in the Daphnia genome is gypsy, which includes almost half of the intact retroelements identified in this study. Notably, a very significant number of intact copia retroelements were identified as well. In addition, the D. pulex genome has been found to house the most DIRS elements among the arthropod genomes sequenced to date.
Transcriptional activity of intact LTR retroelements was surveyed by using tiling array data across the whole genome sequence. A total of 71 LTR retroelements showed expression signals, among which 12 elements contain long TAR regions. Transposon display assays of two intact DIRS retroelements were also performed and provide evidence for possible activity in mutation-accumulation lines of D. pulex. Patterns of TE load and polymorphism in natural populations indicate sexually-reproducing isolates have heavier TE loads and higher insertion site polymorphism among isolates for one family. Consistent with previously identified DIRS elements in fish and other animals, the Daphnia DIRS elements assayed here exhibit different structures of IR and protein domains (e.g., the YRs), compared with the elements from the other three groups. Further investigation of population-level differences for other families identified in this survey will help pinpoint which families of LTR retroelements remain active in the D. pulex genome and the extent to which they may influence genome evolution in this species.
The genomic sequences of A. gambiae, B. mori, D. melanogaster, D. pulex, and O. sativa genomes were obtained from public databases. The genomic sequence of B. mori (SW_scaffold_ge2k), D. pulex (release 1, jgi060905), and O. sativa (Build 4) were downloaded from VectorBase http://www.vectorbase.org, silkDB http://silkworm.genomics.org.cn, wFleaBase http://wFleaBase.org, JGI Genome Portal http://www.jgi.doe.gov/Daphnia/ and IRGSP http://rgp.dna.affrc.go.jp, respectively. The genomic sequence of A. gambiae (anoGam1) and D. melanogaster (dm3) were downloaded from UCSC Genome Bioinformatics site http://genome.ucsc.edu.
The RT sequences used in the phylogenetic analysis were obtained from NCBI web site: BEL12 (CAJ14165), BEL (U23420), copia (X04456), GATE (CAA09069), Cer1 (U15406), Gulliver (AF243513), Mag (X17219) , gypsy (X03734), TED (M32662) , Yoyo (U60529) , Zam (AJ000387) , Tom (Z24451) , Tv1 (AF056940) , mdg1 (X59545) , 412 (CAA27750) , CsRn1 (AAK07487) , Kabuki (BAA92689) , Woot (U09586) , Osvaldo (AJ133521) , Blastopia (CAA81643) , mdg3 (T13798) , Cyclops (AB007466) , Maggy (D18348), Ninja (AB043239), Pao (L09635), Sushi (AF030881), Suzu (AAN15112) , 1731 (X07656), Hopscotch (T02087), Fourf (AAK73108).
We applied an automatic computational tool  to find intact LTR retroelements in the whole genome sequences listed above. The method in this study was improved to locate the TSDs and flanking ends of LTRs. Since it is not necessary for all intact LTR retroelements to have these features, we modified the program to be flexible by making this information optional. For example, although the majority of LTR flanking regions are di-nucleotides TG/CA, the well-known family DM297 in the D. melanogaster genome has di-nucleotides AG/CT. In the next step, the identified LTR retroelements were clustered into families based on the sequence similarity of LTRs between elements (sequence similarity > 80% for clustering elements in a family). Finally, the classified families were verified by using multiple sequence alignment of LTRs and IRs.
The element name consists of four parts: genome name, family name, scaffold name (release 1 from wfleaBase), and the ID in each scaffold. For example, the element Dpul_G2_147_2 corresponds to the second element in scaffold 147, which is in the family G2 (G for gypsy elements, C for copia elements, B for BEL element, and D for DIRS element) in the D. pulex genome.
For phylogenetic analysis, representative RT sequences were obtained from NCBI (see Materials and Methods section and Additional file 1 Table S6). Multiple sequence alignments of RT amino acid sequences were performed with default parameters by using CLUSTALW . Phylogenetic trees were generated by using neighbor-joining tree method with poisson correction and 1000 bootstrap replicates in MEGA .
Primer sequences for transposon display of D. pulex retroelements.
We would like to thank Dr. John Colbourne and Dr. Jeong-Hyeon Choi for helpful discussion and allowing us to access tiling array data. We thank Dr. Ellen Pritham for reading the manuscript and helpful discussion. This work is supported by MetaCyt Initiative at Indiana University, funded by Lilly Endowment, Inc. It is also supported by NSF DDIG (DEB-0608254) to SS and ML, NIH training grant fellowship to SS, and NIH fellowship F32GM083550 to XG. The sequencing and portions of the analyses were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC) http://daphnia.cgb.indiana.edu. Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by the Center for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Our work benefits from, and contributes to the Daphnia Genomics Consortium.
Long terminal repeat
Target site duplication
Open reading frame, TAR: Transcriptionally active region.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.