Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome
© González and Deyholos; licensee BioMed Central Ltd. 2012
Received: 21 July 2012
Accepted: 15 November 2012
Published: 21 November 2012
Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression.
Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution.
The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.
KeywordsTransposable elements Flax Genome evolution LTR elements Gene expression
Transposable elements (TEs) influence the evolution, structure, amplification, gene creation, mutation and transcriptional regulation of genes and genomes [1–6]. They are also useful as genetic markers in basic and applied science [7, 8]. TEs occupy a substantial fraction of sequenced plant genomes , ranging from over 14% in Arabidopsis  to more than 80% in maize . Because of their nature and characteristic patterns of insertion , TEs may influence large portions of the genome. A study found that one-sixth of all rice genes had some kind of association with TEs . Some TE insertions occur within or near genes, thereby disrupting normal gene expression . Such insertions may influence phenotypic characteristics, as in petal color of gentians , or disruption of vitamin E synthesis in sunflower . However, due to gene redundancy or to insertion in regions of the genome that do not affect gene expression, the majority of TE insertions do not have detectable effects on morphology or physiology. For example, neither the insertion of a Stowaway element in an intron of the manganese superoxide dismutase gene , nor the insertion of retrotransposon Vine-1 in one member of the alcohol dehydrogenase multigene family  affected plant growth and development. Nevertheless, TEs can influence the evolution of plant gene families, as exemplified by disease resistance genes in several plants . Insertions can also result in the capture of gene fragments by TEs, or the adoption of parts of TEs by genes. Some of the clearest examples of gene capture by TEs involve Pack-MULEs. In rice, over 3000 of these gene-carrying transposon-derived elements were found in 440 Mb of sequence , and the acquisition of multiple gene fragments from multiple loci may result in the creation of new genes . Genes such as FAR1 and FHY3 (involved in the phytochrome signalling pathway), have a conserved transposase-derived region, whose DNA binding and regulatory capacities have been adopted for transcriptional control of downstream genes [21, 22]. As was first shown by McClintock in the early experiments that uncovered the Ac/Ds TE system in maize [23–26], some types of stress can activate TEs, which can in turn modify gene expression. TE expression triggered by stress has been reported for several elements including: Tnt1 [27, 28] and Tto1[29, 30] in tobacco; Tos17 in rice [31, 32]; and BARE-1 in barley . However, relatively few active TEs have been identified and several expression studies indicate that transcription and transposition are rare for most elements . While some studies have focused on the expression of individual elements, more recent approaches have compared genome-wide expression data of TEs. These kind of studies have been used to identify TE cassettes in expressed genes in coffee species  and Arabidopsis , and the activity of different TE families in maize  and sugarcane . Flax (L. usitatissimum) is one of over 270 species within the family Linaceae, and is a member of the order Malpighiales along with three other species with published whole genome sequences: poplar (Populus trichocarpa), cassava (Manihot esculenta), and castor (Ricinus communis) . Flax is a predominantly self-polinating annual crop grown in temperate regions . Distinct varieties of flax are cultivated for either seed (i.e. linseed) or bast fibers. We recently reported a whole genome shotgun (WGS) assembly of a linseed variety, CDC Bethune . The assembly contains 302Mb of the estimated 373Mb nuclear genome, in scaffolds with N50=694kb. Flax is considered a diploid (2n=2x=30), although our genome analysis pointed to a recent whole genome duplication 5-9MYa. Flax appears to have originated from its wild relative, L. bienne, with cultivation and domestication probably starting in the Mesopotamian valleys between 8000–10000 years ago . Flax has been studied for decades as a model of genome plasticity [42–45]. In the variety Stormont Cirrus, individuals exposed to certain stresses can produce first generation progeny that show stable changes in several traits including an up to 15% difference in nuclear DNA content. Highly repetitive, tandemly arrayed elements (e.g. 5S rDNA) are among the major contributors to this DNA content variation. A novel, non-TE, low-copy insertion sequence (LIS-1) is also associated with these changes [42, 46]. It should be noted that most elite flax varieties, including CDC Bethune, which is the subject of the WGS assembly, do not exhibit this rapid change in genome size. Nevertheless, the study of flax and its repetitive sequences remains of special relevance to understanding genome evolution in general. We previously reported the preliminary identification of TEs as part of the description of the flax WGS assembly . The assembly contained 23.06% TEs as defined by sequence coverage. While the calculated proportion of the genome covered by TEs in flax is slightly lower than other plant species with small genomes, much variation exists in TE content in plants . Only a small proportion of the TEs described in the flax genome could be identified through alignment to previously characterized elements from other species . Instead, most of the TEs were identified only by de novo prediction methods. Here we extend this previous report to present a detailed characterization of the main superfamilies of TEs in flax and to explore their potential influence on genome evolution and gene expression.
TEs in the flax genome
Annotation of TE superfamilies in flax WGS assembly determined using a filtered consolidated library produced with de-novo repeats from PILER, RepeatScout, LTR_finder and LTR_STRUC, and a library of TE from the viridiplantae division from Repbase
No. of matches
Elements percentage (%)
Sequence occupied (bp)
Sequence percentage of TEs (%)
Sequence percentage of genome (%)
Putative expression and abundance of main families of TEs
Putative expression of de-novo identified families of TEs. TE families are constituted by sequences with similarity ≥80% among all its members
total number of TE families
Number of families hitting at least one EST with a minimum coverage of 70% EST
proportion of TE families with putative expression
Relationship of TEs to genes
Insertions of full LTR elements and evolution
Frequency of recent LTR elements insertions in proximity to predicted genes
distance from the closest gene
scaffolds over 1MB
number of LTR elements
proportion of the number of LTR elements
number of LTR elements
proportion of the number of LTR elements
TEs in the flax genome
The flax genome is estimated to be 373Mb in length, and we have reported here and previously that at least 23% of its sequence is made of TEs . We expect that the actual TE coverage of the complete genome is higher than the 23% we reported, for the following reasons: (i) unclassified repeated sequences found by our de novo approach could constitute new or highly divergent types of TEs, but these were not used for masking; (ii) numerous LTR elements with unknown or non-TE internal domains were not included in the masking, and we did not use specific algorithms to identify possible TEs that lacked internal recognized domains; (iii) the WGS assembly may be missing some regions that are rich in repeated sequences . If the complete genome sequence could be analyzed, including regions missing from the WGS assembly, we expect not only that the proportion of TEs would increase, but also the relative abundance of the main superfamilies could change since Gypsy elements are rich in heterochromatic regions [51–57], which are usually more difficult to assemble. Nevertheless, our estimate of genome coverage by TEs is comparable to what has been found for other sequenced plant genomes with sizes slightly larger than flax (e.g. Oryza sativa - 35% TEs, Lotus japonicus - 30.8%, Medicago truncatula 38%) ; indeed it has been proposed that in angiosperms, approximately one third of the genome is made up of TEs , which is in general agreement with our estimate for flax. Although TE content may be more related to genome size variation in plants with larger genomes  there is a trend showing that genome size correlates positively with the abundance and expansion of TEs . While there are exceptions to this rule, flax with its smaller genome has a much lower percentage of TEs when compared to larger genomes like maize with over 85% TEs . We found that LTR elements (especially Copia) dominated the population of TEs in the flax genome (Table 1). LTR retrotransposon abundance has been described in numerous plant species including some of the closely related species to flax that have been fully sequenced. In castor bean (Ricinus communis) the length covered by LTR elements accounts for about one third of all repeats while DNA TEs constitute less than 2% ; while in poplar (Populus trichocarpa) LTR elements constitute around 17% of the bases of all repeats (including low complexity repeats), and DNA TE content is close to 5% . Although the proportion of sequence covered by LTR elements in flax is larger than in castor or poplar, the predominance of LTR elements is typical in many plant genomes [see supplementary table 7 in . However, in most characterized genomes it is the Gypsy group that outnumbers the Copia group (Additional file 10). Ty3-gypsy elements are dominant in: Brachypodium distachyon, Oryza sativa, Zea mays, Sorghum bicolor, Carica papaya, Arabidopsis thaliana- [LTR element coverage obtained from 61], Fragaria vesca, Malus domestica, Glycine max, Phaseolus vulgaris (data obtained from Phytozome - http://www.phytozome.net/), Populus trichocarpa - [LTR element coverage obtained from 61] and Ricinus communis. Only Linum usitatissimum (this study), Vitis vinifera, Theobroma cacao and Cucumis sativus seem to have higher coverage by the Copia superfamily, although in these last two genomes only the number of elements and not the coverage in bp was shown in the referenced papers and therefore they could not be included in Additional file 10. The prevalence of a superfamily may be related to amplification events of specific groups of TEs and to the activity of such elements, which may depend on just a few active copies of the family . An interesting example is the genus Gossypium, in which one of the species with the smallest genomes had a high density of Copia elements. Gossypium species with larger genomes had an increased copy number of Gypsy elements, most of which represented just one subgroup of the Gypsy sequences. Such amplification can be lineage-specific and therefore result in changes in genome size [71–73]. In flax we found that Copia elements were abundant, diverse and some members were recently active (see below), which would explain a higher current influence of such elements. LINEs and Mutator elements were the most abundant after the LTR retrotransposons (Table 1). Although these two types of elements seem to be or have been fairly active, their lesser abundance when compared to LTR elements can be explained at least in part by their transposition mechanisms. For example, the mechanism of non-LTR retrotransposition generally creates truncated copies of the elements, which would largely decrease their coverage in a genome ; additionally plant LINEs are very diverse and heterogeneous due to the error-prone mechanism of their reverse transcriptase, and the accumulation of mutations during long evolutionary periods , which limits their identification. In the case of the Mutator elements, their cut and paste transposition does not increase copy number as much as retrotransposition. Additionally, non-autonomous gene-carrying Mutators or MULEs (Mu-like elements) can sometimes be difficult to identify by traditional bioinformatics approaches, and seem to be widely divergent . Thus, in flax, identification of such elements may also be influenced by the high mutation rates and transposition mechanisms, resulting in lower percentages of identified Mutator and LINE elements.
Putative expression and abundance of main families of TEs
Besides being abundant, LTR elements were also diverse in the flax WGS assembly. The number of families (Table 2) was probably overestimated, since as a result of the masking process, some of the fragments we found may in fact be different segments of a single element. Nevertheless, there was a general correlation between superfamily genome coverage and the number of families found. Alignment of TEs to EST databases showed that just a small proportion of flax TEs might be active, most of which were Copia LTR elements (Table 2). Our results are in agreement with the survey done in over 200,000 ESTs for sugarcane where Copia elements had more matching ESTs than Gypsy retrotransposons , although in sugarcane (but not flax), DNA transposons also seemed to be fairly active. In plants, TE activity depends on regulatory factors including stress-driven transcriptional regulation and epigenetic silencing, which allow activation of just a few elements under specific environmental and developmental circumstances [12, 37]. For example in maize, where more than 80% of the genome is made of TEs, a survey of over 2 million ESTs showed that only 1.5% of them matched TEs, and most of the families with putative activity were LTR retroelements. Thus for flax, as well as for most plant species studied, the activity of TEs seems relatively low, and may increase to detectable levels only in response to stress. Additionally, it has been shown that in certain families of TEs the percentage of polyadenylated expressed sequences is low . Because most EST libraries are built by poly-A extension, this may artificially limit the proportion of expressed TEs that can be detected by alignment to ESTs. We also found that across all TEs there were fewer families with high copy numbers throughout the genome and most families within each superfamily had less than 10 copies (Figure 1), which is in agreement with findings in soybean where 78% of LTR families are present at copy numbers below 10 . While low copy number could be related to low transposition rates, mechanisms like high mutation rate , recombination  and nested insertions , create rapid variability in TEs that results in divergence among TEs, and therefore, a low number of similar sequences. Since we did not find a correlation between copy numbers and putative expression (related ESTs), it is more likely that mechanisms of divergence and not the transposition of low copy number families account for the trend we found. This lack of correlation is in agreement with previous findings in maize  and contradicts a previous view that low copy number elements are the ones that are predominantly expressed .
Relationship of TEs to genes
The location of TEs in the flax genome was not completely random. It was evident that some scaffolds came from genomic regions rich in TEs (especially retrotransposons, which constituted the bulk of flax TEs – Figure 3) and were highly depleted in genes. Conversely, other scaffolds were rich in genes but depleted in TEs, and still others had similar coverage of TEs and genes. A global negative correlation of TE coverage and gene coverage agrees with a model in where there is purifying selection against TEs in coding regions to avoid detrimental effects on genome function; this model was clearly presented for Arabidopsis . In sequenced genomes such as those of Sorghum bicolor or Brachypodium distachyon, where the distribution of TEs has been mapped to chromosomes, the bulk of retrotransposons seem to be clustered around the centromeres, while less are close to gene-rich regions probably due to rapid elimination by controlling and selective host mechanisms . When each of the 71 largest scaffolds was analyzed as an individual unit, there was no evidence for an overall pattern in which TEs occurred inside genes more that expected by chance (Additional file 4). However, there were certain superfamilies that were more likely to do so when compared to the rest of TEs (Figure 5). Several DNA TE superfamilies and L1 elements fell inside and close to genes more often than expected, while Gypsy elements were always underrepresented in and around genes, and Copia retrotransposons were only significant in the first 1 kb flanking genes. In Arabidopsis an analysis to find chimeric genes/TEs showed significant differences for Copia, En-Spm, Gypsy and Helitrons. While both for flax and Arabidopsis there was an overrepresentation of En-Spm and underrepresentation of Gypsy TEs, Copia elements were overrepresented inside genes in Arabidopsis but not in flax, and Helitrons were underrepresented in Arabidopsis while this superfamily and hATs were significantly overrepresented in flax. The overrepresentation of class II TEs in flax genes is consistent with reviews describing the close association of genes and these elements including the domestication of transposon proteins into genes [4, 83]. For example, TEs like En-Spm/CACTA are closely associated with genes in the Triticaceae and they may even capture gene fragments as they move and recombine in the genome . In the case of Helitrons, extensive gene capture and shuffling mediated by these elements has been reported [85–88]. For hATs, gene shuffling has been reported in maize , and experiments with rice have shown that the nDart1 and its relatives belonging to hATs tend to fall within of very close to genes . In the meantime, while Mutator elements were not overrepresented inside genes, they were abundant in the 1 kb of DNA flanking them. The close relationship of Mutator elements with genes allow TE-mediated gene movement, as has been shown for Mutator-like elements (MULEs) in rice  and Arabidopsis , and relates to fixation of TE enzymes like transposase, which is part of FHY3 and FAR1 genes involved in phytochrome A signalling [21, 22]. Putative homologs of these two transposase bearing genes were also found in flax (result not shown). The Gypsy underrepresentation in gene coding regions of flax could be related to their tendency to cluster close to centromeric regions. This has been shown in grass species , in plants like sunflower [51, 55, 56] and has more recently been proposed for plants like poplar  and Arabidopsis . It has been speculated that the reason for this insertion bias may be related to a specific domain in the integrase protein [91–93]; and such differences in integrase proteins may also be related to the differing distributions between Gypsy and Copia elements. Finally, Copia elements were overrepresented in the 1 kb of sequence that flanked genes (Figure 5). In Arabidopsis, the random pattern of Copia insertion allows them to insert close to coding regions , although in time, the elements are subjected to negative selection. A similar pattern could be true for flax since we found that many recently inserted Copia TEs were close to genes (Table 3). This insertion pattern might have important implications since TEs close to genes can become positive regulators of gene expression via their cis- acting elements (in LTRs) or may become targets for epigenetic silencing, which would affect the adjacent gene regions [94–96]. To test if there was a general pattern of regulation of genes by TEs, we matched available ESTs to the predicted genes of the flax genome (Figure 4). The negative correlation of TE coverage and gene expression found means that genes in regions that are rich in TEs could be affected by their nearby insertion. It is likely that genes in close proximity of TEs are affected negatively, because most often these regions are targeted for heterochromatization (and silencing) [96–98] and TE insertion can also cause disruption of genes.
Insertions of full LTR elements
Most of the non-redundant elements with two identifiable LTRs belonged to the Copia superfamily, but a large proportion of retroelements had non-identifiable internal regions, or regions that corresponded to host genes or other non-LTR TEs as has also been shown for poplar (Additional file 5) . Many of these may constitute either non-autonomous elements, or genes captured by TEs. As it turns out, these two concepts are not mutually exclusive. For example, in soybean (Glycine max), an element has been described with an insertion of 10.5 kb containing a mixture of segments derived from non-coding sequence and disease resistance genes . These elements could still be actively driven by autonomous elements if they conserve their LTRs, polypurine tracts (PPTs), and primer binding sites (PBS’). Many of the undetermined elements in flax had such features (these are part of the recognition algorithm of LTR_STRUC and LTR_FINDER), and therefore may still be active. In fact 32 of these undetermined elements had 100% LTR pair similarity and 69 had at least 99% similarity, meaning these TEs constitute relatively recent insertions. It is likely that at least some of the larger flax LTR elements could be classified as LARDs (Large Retrotransposon Derivatives) which have been characterized in detail in the Triticaceae, rice and Medicago[100–102], while some of the shorter than expected LTR retrotransposons are probably TEs that have lost their internal coding regions and are usually classified as Terminal-repeat Retrotransposons in Miniature (TRIMs) . In terms of TE sizes, calculated estimates for plants range from 2–11.8 kb for Copia elements, and from 4.6-18 kb for Gypsy elements . However, a survey of LTR retroelements in rice using LTR_finder found large variation ranges in LTR retrotransposons  which is in agreement with the larger variations found in flax (Additional file 5). Nevertheless our averages agree with the Gypsy elements having larger sizes Copia as is common in most plants. When comparing the activity of the LTR elements (Figure 6), the Copia elements appeared to be increasingly and continuously active in the last 5 million years. In the meantime Gypsy elements have been active for the last 7–8 million years but to a lesser extent than Copia and the undetermined elements. In fact, after a peak of activity 3–4 Mya, Gypsy elements have been less active until the present. In comparison, for poplar, the activity of Copia full length TEs does not seem to overshadow the activity of Gypsy elements, but full Copia elements are more abundant than Gypsy. Although the activity of all these retroelements varies, it is interesting to notice that between 5 to 10 Mya, all of them may have been triggered. It is tempting to speculate that a duplication event of the flax genome may have triggered activity of the retrotransposons, and indeed whole genome duplication in this time frame has been inferred based on molecular phylogenies and analysis of Ks distribution in protein coding genes [38, 40]. However, rapid turnover of elements is also common [106, 107] and could account for the absence of detection in more ancient evolutionary times since TEs may become unrecognizable. When evaluating only the most recent flax LTR element insertions, it was shown that Copia LTR elements have more copies, putative expression and are located close to genes. The lone, recently inserted Gypsy element had no related ESTs. A similar insertion pattern was seen in Arabidopsis where the number of Copia elements with identical LTRs is higher than in Gypsy elements, and recent Copia insertions are closer to genes than Gypsy. It can not be ruled out that the short read assembly methodology used for the flax WGS  is biased towards more efficient identification of regions surrounding genes. Nevertheless we found that Gypsy elements followed the opposite trend of Copia, meaning that both types of elements were detected, whether they were closely associated with genes or not. This observation and the agreement with other studies on this trend  supports our conclusions.
We showed that transposable elements in flax occupied more than 23% of the flax WGS assembly and were dominated by LTR elements. The distribution of TEs was not random and there were genomic regions that were enriched by these repetitive sequences, which may constitute heterochromatic sections of the genome. In regions shared by both TEs and genes, transposons may have a repressive effect on gene expression as demonstrated by a negative correlation between TE coverage and gene expression. Overrepresented families in close proximity or overlapping genes were mainly from the DNA transposon group, but the Copia group was also often localized to the flanking regions of genes. Copia retrotransposons have been increasingly active in the last 5 million years and have more members with conserved internal domains that contrast with a lower activity and conservation of Gypsy elements. It is possible, however, that older insertions are more difficult to tag by the high rate of mutations especially for TEs located to heterochromatic regions. Because of their recent activity, abundance and diversity, the Copia elements are potential shapers of the flax genome. Further studies, especially under stress-eliciting conditions, are necessary to understand the regulatory effect on adjacent genes and how their activation patterns may have influenced evolution of other flax species.
Identification of putative TEs within the flax WGS assembly
An unmasked WGS assembly of flax comprising 318,250,901 bases was used as input for TE detection . De-novo identification of transposable elements was performed using RepeatScout , PILER , LTR_finder  and LTR_STRUC . Repeats identified by RepeatScout, under default parameters, were filtered for low complexity using Tandem Repeats Finder  and nseg . Repeats with less than 10 hits in the genome were eliminated from the library. For PILER-DF  analysis the full genome was compared to itself using PALS (part of the PILER implementation) using the default parameters. Families of dispersed repeats were created using a minimum family size of 3 members and a maximum length difference of 5% between all family members. The consensus sequence for each family was created after aligning the sequences with MUSCLE . LTR TEs were found using LTR_finder using the option –w 2 to get a table output that could be parsed to obtain the sequences corresponding to the elements. LTR_STRUC was used under default parameters. The sequences output by all of these programs were used to create a unified repeat library that could be compared to previously characterized elements. Annotation of the repeats was performed comparing the library to a Viridiplantae TEs database downloaded from Repbase (http://www.girinst.org/repbase/ - update 20110920) and a Plant Repeat Database (http://plantrepeats.plantbiology.msu.edu/) of TEs created from the families Brassicaceae, Fabaceae, Gramineae and Solanaceae (v2_1_0 update 20112006), using tBLASTx and BLASTn, and to the RepeatPeps database of TEs that comes with RepeatMasker (update 20110920) using BLASTx. To test whether TEs might have captured fragments of other genes or belong to gene families instead of TE families, BLASTx was performed against the Genbank nr database. Repeats were classified in a TE superfamily  if they showed E values of at least 1e-5 with a common annotation in at least two of the databases to which they were compared. Repeats characterized as putative TEs by the previous approach were joined to the Viridiplantae database of TEs (update 20110920) to use as a library for comparison to find the distribution and coverage TEs in the genome assembly using RepeatMasker v-3.3.0 (http://www.repeatmasker.org/). RMBlast was used as search algorithm with Smith-Waterman cutoff of 225 (this cutoff was used for all RepeatMasker analyses). To automatically annotate the masked regions (matches of the TEs in the genome) in their respective TE superfamilies a custom Perl script was used (kindly provided by Robert Hubley - Institute for Systems Biology, http://www.systemsbiology.org/ -). A table for TEs abundance and coverage was built after filtering and annotation. The percentages were calculated for the elements based on the total number of bases including runs of Xs and Ns since some elements can also include at times undetermined bases; therefore total percentages may differ a slightly from those reported in the original description of the flax genome . The TE values of the WGS assembly were compared to BAC-end sequencing TEs .
Putative expression and distribution of TE families
Clusters of TEs with 80% similarity within each superfamily were created using CD-HIT . Only de-novo identified members were used for this analysis since they represented TE sequences identified from the flax genome. The members of each cluster were said to represent a family of TEs, according to the terminology presented by Wicker et al., . One representative member of each family (longest sequence in each cluster) was used for comparison against 286,252 flax ESTs from Genbank using BLAT . A hit to an EST was classified as positive only if 70% of the EST sequence matched to the query sequence. A family was said to be putatively expressed if it had at least one EST match. The proportion of TE families with expression was calculated for each of the major groups. The same analysis was done comparing the TE family representative sequences to the flax assembly, and a TE was considered as a representative copy of the TE family if it matched in 80% of its sequence to the query. A coefficient of correlation was established between copy number in each family and ESTs matches.
Relationship of TEs to genes
The distribution of TEs relative to predicted genes in the WGS assembly was analyzed for all scaffolds ≥ 1 Mb (71 large scaffolds). The proportional coverage and the statistics applied for both genes and TEs were obtained for both: full scaffolds and windows of 50 kb within each scaffold after mapping the coordinates of predicted genes and TEs to the scaffolds using the Genomic Hyperbrowser  (http://hyperbrowser.uio.no/rc/ - candidate version). To test whether the distribution of TEs and genes was correlated, a correlation coefficient was calculated for the proportional coverage of the large scaffolds. Proportional coverage graphs and heat maps for comparison of TEs and genes were built for selected scaffolds divided in 50 kb window units or bins (four scaffolds with large proportions of TEs, four with large proportion of genes and four with similar proportions of TEs and genes). The heat maps were built using with Multi Experiment Viewer . To test whether TEs overlapped genes more than expected by chance in any of the scaffolds over 1 Mb we used Monte Carlo (MC) methods , preserving the segment lengths and position of the genes and changing the positions of the TEs to create the random probability, with a minimum of 100 MC samples and unlimited maximum number, a sequential MC threshold of 20 and a MCFDR of 0.05; the analysis was repeated for the scaffolds divided into 50 kb bins (2182 bins in total). For generating the random samples the lengths of genes and TEs were conserved, and only the TE positions were randomized which closely reflects the biological context. The proportion of gene coverage overlapping TE sections was calculated from the total base pairs covered by genes in each scaffold and the total base pairs calculated as being overlapped both by genes and TEs. Scaffolds having high proportion of overlap were further analyzed by calculating the overlap proportion in 50 kb bins. The MC statistical analysis was repeated using TE superfamilies. Putative expression of the genes in the scaffolds over 1Mb was determined by comparing the predicted mRNAs to 286,252 ESTs of flax from Genbank using BLAT . A hit to an EST was classified as positive only if it matched 70% of the EST sequence. A gene was said to be putatively expressed if it had at least one EST match. The proportion of genes with expression was calculated for each one of the large scaffolds, and compared with the proportional coverage of TEs. To find out if any of the superfamilies had a bias to insert within genes when compared to the other superfamilies, the number of TE hits inside genes of each superfamily was determined with the Genomic Hyperbrowser  using the middle point of the TE sequences to determine if the TE was inside the gene. Then the number of TE hits inside genes was compared to the number of hits in all the scaffolds over 1 Mb using heterogeneity chi-square tests and a Bonferroni correction . These analyses were then repeated to compare the TEs in the adjacent 1 kb, and in the adjacent 5 kb (upstream or downstream from the genes).
Insertions of full LTR elements and evolution
Since LTRs were the most prevalent elements in the flax genome, they were analyzed in further detail. Results from LTR_finder and LTR_STRUC were filtered for redundant sequences using CD-HIT . Since at the time of insertion both LTRs from LTR retrotransposons are 100% similar, the divergence between LTR pairs in every putative element can be used to determine the age of the elements. We used ClustalW  for aligning LTR pairs and used the Kimura two parameter method  to estimate the nucleotide substitution (K). To estimate the age of insertion we used the following equation: t = K/2r, where t corresponds to the insertion time in millions of years, K corresponds to the number of nucleotide substitutions per site and r corresponds to the nucleotide substitution rate. In this case we chose a rate of 1.5 X 10-8 as reported for chalcone synthase and alcohol dehydrogenase genes in Arabidopsis and Arabis species; this rate has been previously used for dating LTR retrotransposon insertions in Arabidopsis , and it is very close to the estimate used for dating LTR retroelements in rice  which assumes at least a 2-fold higher mutation rate in TEs than in coding regions. The library of non-redundant elements with 100% LTR similarity was used to search the flax assembly and the flax ESTs using BLAT  to establish the abundance, distribution and overall putative expression of the recent insertions. Only hits that covered 100% of the query sequence were selected (no gaps or miss-matches), as these represented complete elements mapped to the genome. The segment distances between LTR retrotransposons elements and the closest genes were determined using the Genomic Hyperbrowser . Finally, non-redundant LTR element sequences were used as input to extract protein domains from both Copia and Gypsy elements using RepeatExplorer by comparing flax LTR elements to a database of curated LTR retrotransposon domain sequences; the parameters for comparison were: minimum similarity 60%, minimum identity 40% and the proportion of the hit length from the length of the database sequence was set to 0.8 . The domains were tabulated to discover the distribution of conserved domains in each superfamily.
Availability of supporting data
LGG: Department of Biological Sciences, University of Alberta, Edmonton, AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS), 5–114.MKD: Department of Biological Sciences, University of Alberta, Edmonton, AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS), 5–114.
The authors thank Robert Hubley (Institute for Systems Biology, Seattle, WA, USA), for providing a custom Perl script for the automatic annotation of masked regions in the flax genome. This research was supported by TUFGEN (Total Utilization of Flax Genomics), project funded by Genome Alberta, Genome Prairie, and Genome Canada. MKD is also supported by NSERC (Natural Sciences and Engineering Research Council) Canada.
- Bennetzen JL: Transposable element contributions to plant gene and genome evolution. Plant Mol Biol. 2000, 42: 251-269. 10.1023/A:1006344508454.PubMed
- Bennetzen JL: Transposable elements, gene creation and genome rearrangement in flowering plants. Curr Opin Genet Dev. 2005, 15: 621-627. 10.1016/j.gde.2005.09.010.PubMed
- Kumar A, Bennetzen JL: Plant retrotransposons. Annu Rev Genet. 1999, 33: 479-532. 10.1146/annurev.genet.33.1.479.PubMed
- Dooner HK, Weill CF: Give-and-take: interactions between DNA transposons and their host plant genomes. Curr Opin Genet Dev. 2007, 17: 486-492. 10.1016/j.gde.2007.08.010.PubMed
- Kazazian HH: Mobile elements: drivers of genome evolution. Science. 2004, 303: 1626-1632. 10.1126/science.1089670.PubMed
- Feschotte C, Pritham EJ: DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007, 41: 331-368. 10.1146/annurev.genet.40.110405.090448.PubMed CentralPubMed
- Smykal P, Bacova-Kerteszova N, Kalendar R, Corander J, Schulman AH, Pavelek M: Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2011, 122: 1385-1397. 10.1007/s00122-011-1539-2.PubMed
- Kalendar R, Flavell AJ, Ellis THN, Sjakste T, Moisy C, Schulman AH: Analysis of plant diversity with retrotransposon-based molecular markers. Heredity. 2011, 106: 520-530. 10.1038/hdy.2010.93.PubMed CentralPubMed
- Civan P, Svec M, Hauptvogel P: On the coevolution of transposable elements and plant genomes. J Bot. 2011, 2011: 10.1155/2011/893546.
- The_Arabidopsis_Genome_Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei FS, Pasternak S, Liang CZ, Zhang JW, Fulton L, Graves TA, et al: The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science. 2009, 326: 1112-1115. 10.1126/science.1178534.PubMed
- Feschotte C, Jiang N, Wessler SR: Plant transposable elements: Where genetics meets genomics. Nat Rev Genet. 2002, 3: 329-341.PubMed
- Krom N, Recla J, Ramakrishna W: Analysis of genes associated with retrotransposons in the rice genome. Genetica. 2008, 134: 297-310. 10.1007/s10709-007-9237-3.PubMed
- Nakatsuka T, Nishihara M, Mishiba K, Hirano H, Yamamura S: Two different transposable elements inserted in flavonoid 3 ',5 '-hydroxylase gene contribute to pink flower coloration in Gentiana scabra. Mol Genet Genomics. 2006, 275: 231-241. 10.1007/s00438-005-0083-7.PubMed
- Tang SX, Hass CG, Knapp SJ: Ty3/gypsy-like retrotransposon knockout of a 2-methyl-6-phytyl-1,4-benzoquinone methyltransferase is non-lethal, uncovers a cryptic paralogous mutation, and produces novel tocopherol (vitamin E) profiles in sunflower. Theor Appl Genet. 2006, 113: 783-799. 10.1007/s00122-006-0321-3.PubMed
- Baek KH, Skinner DZ, Ling P, Chen XM: Molecular structure and organization of the wheat genomic manganese superoxide dismutase gene. Genome. 2006, 49: 209-218. 10.1139/G05-102.PubMed
- Verries C, Bes C, This P, Tesniere C: Cloning and characterization of Vine-1, a LTR-retrotransposon-like element in Vitis vinifera L., and other Vitis species. Genome. 2000, 43: 366-376.PubMed
- Richter TE, Ronald PC: The evolution of disease resistance genes. Plant Mol Biol. 2000, 42: 195-204. 10.1023/A:1006388223475.PubMed
- Jiang N, Bao ZR, Zhang XY, Eddy SR, Wessler SR: Pack-MULE transposable elements mediate gene evolution in plants. Nature. 2004, 431: 569-573. 10.1038/nature02953.PubMed
- Hoen DR, Park KC, Elrouby N, Yu ZH, Mohabir N, Cowan RK, Bureau TE: Transposon-mediated expansion and diversification of a family of ULP-like genes. Mol Biol Evol. 2006, 23: 1254-1268. 10.1093/molbev/msk015.PubMed
- Hudson ME, Lisch DR, Quail PH: The FHY3 and FAR1 genes encode transposase-related proteins involved in regulation of gene expression by the phytochrome A-signaling pathway. Plant J. 2003, 34: 453-471. 10.1046/j.1365-313X.2003.01741.x.PubMed
- Lin RC, Ding L, Casola C, Ripoll DR, Feschotte C, Wang HY: Transposase-derived transcription factors regulate light signaling in Arabidopsis. Science. 2007, 318: 1302-1305. 10.1126/science.1146281.PubMed CentralPubMed
- McClintock B: Mutable Loci in Maize. Carnegie Institute of Washington Year Book. 1948, 47: 155-169.
- McClintock B: The Origin and Behavior of Mutable Loci in Maize. Proc Natl Acad Sci USA. 1950, 36: 344-355. 10.1073/pnas.36.6.344.PubMed CentralPubMed
- McClintock B: Chromosome Organization and Genic Expression. Cold Spring Harb Symp Quant Biol. 1951, 16: 13-47. 10.1101/SQB.1951.016.01.004.PubMed
- McClintock B: Controlling Elements and the Gene. Cold Spring Harb Symp Quant Biol. 1956, 21: 197-216. 10.1101/SQB.1956.021.01.017.PubMed
- Grandbastien MA, Spielmann A, Caboche M: Tnt1, a Mobile Retroviral-Like Transposable Element of Tobacco Isolated by Plant-Cell Genetics. Nature. 1989, 337: 376-380. 10.1038/337376a0.PubMed
- Pouteau S, Grandbastien MA, Boccara M: Microbial Elicitors of Plant Defense Responses Activate Transcription of a Retrotransposon. Plant J. 1994, 5: 535-542. 10.1046/j.1365-313X.1994.5040535.x.
- Hirochika H: Activation of Tobacco Retrotransposons During Tissue-Culture. EMBO J. 1993, 12: 2521-2528.PubMed CentralPubMed
- Takeda S, Sugimoto K, Otsuki H, Hirochika H: Transcriptional activation of the tobacco retrotransposon Tto1 by wounding and methyl jasmonate. Plant Mol Biol. 1998, 36: 365-376. 10.1023/A:1005911413528.PubMed
- Hirochika H, Sugimoto K, Otsuki Y, Tsugawa H, Kanda M: Retrotransposons of rice involved in mutations induced by tissue culture. Proc Natl Acad Sci USA. 1996, 93: 7783-7788. 10.1073/pnas.93.15.7783.PubMed CentralPubMed
- Han FP, Liu ZL, Tan M, Hao S, Fedak G, Liu B: Mobilized retrotransposon Tos17 of rice by alien DNA introgression transposes into genes and causes structural and methylation alterations of a flanking genomic region. Hereditas. 2004, 141: 243-251.PubMed
- Kalendar R, Tanskanen J, Immonen S, Nevo E, Schulman AH: Genome evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in response to sharp microclimatic divergence. Proc Natl Acad Sci USA. 2000, 97: 6603-6607. 10.1073/pnas.110587497.PubMed CentralPubMed
- Lopes FR, Carazzolle MF, Pereira GAG, Colombo CA, Carareto CMA: Transposable elements in Coffea (Gentianales: Rubiacea) transcripts and their role in the origin of protein diversity in flowering plants. Mol Genet Genomics. 2008, 279: 385-401. 10.1007/s00438-008-0319-4.PubMed
- Lockton S, Gaut BS: The Contribution of Transposable Elements to Expressed Coding Sequence in Arabidopsis thaliana. J Mol Evol. 2009, 68: 80-89. 10.1007/s00239-008-9190-5.PubMed
- Vicient CM: Transcriptional activity of transposable elements in maize. BMC Genomics. 2010, 11: 601-10.1186/1471-2164-11-601.PubMed CentralPubMed
- de Araujo PG, Rossi M, de Jesus EM, Saccaro NL, Kajihara D, Massa R, de Felix JM, Drummond RD, Falco MC, Chabregas SM, et al: Transcriptionally active transposable elements in recent hybrid sugarcane. Plant J. 2005, 44: 707-717. 10.1111/j.1365-313X.2005.02579.x.PubMed
- McDill J, Repplinger M, Simpson BB, Kadereit JW: The Phylogeny of Linum and Linaceae Subfamily Linoideae, with Implications for Their Systematics, Biogeography, and Evolution of Heterostyly. Systematic Botany. 2009, 34: 386-405. 10.1600/036364409788606244.
- Millam S, Obert B, Pret'ova A: Plant cell and biotechnology studies in Linum usitatissimum - a review. Plant Cell Tissue Organ Cult. 2005, 82: 93-103. 10.1007/s11240-004-6961-6.
- Wang Z, Hobson N, Galindo L, Zhu S, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, Lambert G, et al: The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 2012, 10.1111/j.1365-313X.2012.05093.x.
- Muir A, Westcott N: Flax, the genus Linum. 2003, Taylor & Francis Inc
- Cullis CA: Mechanisms and control of rapid genomic changes in flax. Ann Bot. 2005, 95: 201-206. 10.1093/aob/mci013.PubMed CentralPubMed
- Cullis CA, Cleary W: Rapidly Varying DNA-Sequences in Flax. Can J Genet Cytol. 1986, 28: 252-259.
- Cullis CA: DNA Differences between Flax Genotrophs. Nature. 1973, 243: 515-516. 10.1038/243515a0.PubMed
- Cullis CA: DNA-Sequence Organization in the Flax Genome. Biochimica Et Biophysica Acta. 1981, 652: 1-15. 10.1016/0005-2787(81)90203-3.PubMed
- Chen YM, Schneeberger RG, Cullis CA: A site-specific insertion sequence in flax genotrophs induced by environment. New Phytol. 2005, 167: 171-180. 10.1111/j.1469-8137.2005.01398.x.PubMed
- Ragupathy R, Rathinavelu R, Cloutier S: Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics. 2011, 12: 217-10.1186/1471-2164-12-217.PubMed CentralPubMed
- Venglat P, Xiang DQ, Qiu SQ, Stone SL, Tibiche C, Cram D, Alting-Mees M, Nowak J, Cloutier S, Deyholos M, et al: Gene expression analysis of flax seed development. BMC Plant Biology. 2011, 11: Art#74-
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al: A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007, 8: 973-982. 10.1038/nrg2165.PubMed
- Staton SE, Ungerer MC, Moore RC: The Genomic Organization of Ty3/Gypsy-Like Retrotransposons in Helianthus (Asteraceae) Homoploid Hybrid Species. Am J Bot. 2009, 96: 1646-1655. 10.3732/ajb.0800337.PubMed
- Miller JT, Dong FG, Jackson SA, Song J, Jiang JM: Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics. 1998, 150: 1615-1623.PubMed CentralPubMed
- Presting GG, Malysheva L, Fuchs J, Schubert IZ: A TY3/GYPSY retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 1998, 16: 721-728. 10.1046/j.1365-313x.1998.00341.x.PubMed
- Jiang JM, Birchler JA, Parrott WA, Dawe RK: A molecular view of plant centromeres. Trends Plant Sci. 2003, 8: 570-575. 10.1016/j.tplants.2003.10.011.PubMed
- Santini S, Cavallini A, Natali L, Minelli S, Maggini F, Cionini PG: Ty1/copia- and Ty3/gypsy-like DNA sequences in Helianthus species. Chromosoma. 2002, 111: 192-200. 10.1007/s00412-002-0196-2.PubMed
- Cavallini A, Natali L, Zuccolo A, Giordani T, Jurman I, Ferrillo V, Vitacolonna N, Sarri V, Cattonaro F, Ceccarelli M, et al: Analysis of transposons and repeat composition of the sunflower (Helianthus annuus L.) genome. Theor Appl Genet. 2010, 120: 491-508. 10.1007/s00122-009-1170-7.PubMed
- Cossu RM, Buti M, Giordani T, Natali L, Cavallini A: A computational study of the dynamics of LTR retrotransposons in the Populus trichocarpa genome. Tree Genetics & Genomes. 2012, 8: 61-75. 10.1007/s11295-011-0421-3.
- Kidwell MG: Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002, 115: 49-63. 10.1023/A:1016072014259.PubMed
- Chan AP, Crabtree J, Zhao Q, Lorenzi H, Orvis J, Puiu D, Melake-Berhan A, Jones KM, Redman J, Chen G, et al: Draft genome sequence of the oilseed species Ricinus communis. Nat Biotechnol. 2010, 28: 951-U953. 10.1038/nbt.1674.PubMed CentralPubMed
- Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.PubMed
- Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, et al: The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet. 2010, 42: 833-+-PubMed
- he_International_Brachypodium_Initiative: Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010, 463: 763-768. 10.1038/nature08747.
- International_Rice_Genome_Sequencing_Project: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457: 551-556. 10.1038/nature07723.PubMed
- Ming R, Hou SB, Feng Y, Yu QY, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KLT, et al: The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008, 452: 991-U997. 10.1038/nature06856.PubMed CentralPubMed
- Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, Jaiswal P, Mockaitis K, Liston A, Mane SP, et al: The genome of woodland strawberry (Fragaria vesca). Nat Genet. 2011, 43: 109-116. 10.1038/ng.740.PubMed CentralPubMed
- Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, Hyten DL, Song QJ, Thelen JJ, Cheng JL, et al: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463: 178-183. 10.1038/nature08670.PubMed
- The_French-Italian_Public_Consortium_for_Grapevine_Characterization: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-U465. 10.1038/nature06148.
- Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, et al: The genome of Theobroma cacao. Nat Genet. 2011, 43: 101-108. 10.1038/ng.736.PubMed
- Huang SW, Li RQ, Zhang ZH, Li L, Gu XF, Fan W, Lucas WJ, Wang XW, Xie BY, Ni PX, et al: The genome of the cucumber, Cucumis sativus L. Nat Genet. 2009, 41: 1275-U1229. 10.1038/ng.475.PubMed
- Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF: Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 2006, 16: 1252-1261. 10.1101/gr.5282906.PubMed CentralPubMed
- Hawkins JS, Hu GJ, Rapp RA, Grafenberg JL, Wendel JF: Phylogenetic determination of the pace of transposable element proliferation in plants: copia and LINE-like elements in Gossypium. Genome. 2008, 51: 11-18. 10.1139/G07-099.PubMed
- Hu GJ, Hawkins JS, Grover CE, Wendel JF: The history and disposition of transposable elements in polyploid Gossypium. Genome. 2010, 53: 599-607. 10.1139/G10-038.PubMed
- Schmidt T: LINEs, SINEs and repetitive DNA: non-LTR retrotransposons in plant genomes. Plant Mol Biol. 1999, 40: 903-910. 10.1023/A:1006212929794.PubMed
- Lisch D: Mutator transposons. Trends in Plant Science. 2002, 7: 498-504. 10.1016/S1360-1385(02)02347-6.PubMed
- Rossi M, Araujo PG, Van Sluys MA: Survey of transposable elements in sugarcane expressed sequence tags (ESTs). Genetics and Molecular Biology. 2001, 24: 147-154.
- Chang W, Schulman AH: BARE retrotransposons produce multiple groups of rarely polyadenylated transcripts from two differentially regulated promoters. Plant J. 2008, 56: 40-50. 10.1111/j.1365-313X.2008.03572.x.PubMed
- Casacuberta JM, Vernhettes S, Grandbastien MA: Sequence Variability within the Tobacco Retrotransposon Tnt1 Population. EMBO J. 1995, 14: 2670-2678.PubMed CentralPubMed
- Vitte C, Panaud O: Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol. 2003, 20: 528-540. 10.1093/molbev/msg055.PubMed
- SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, MelakeBerhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al: Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996, 274: 765-768. 10.1126/science.274.5288.765.PubMed
- Meyers BC, Tingley SV, Morgante M: Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res. 2001, 11: 1660-1676. 10.1101/gr.188201.PubMed CentralPubMed
- Wright SI, Agrawal N, Bureau TE: Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res. 2003, 13: 1897-1903.PubMed CentralPubMed
- Volff JN: Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays. 2006, 28: 913-922. 10.1002/bies.20452.PubMed
- Wicker T, Guyot R, Yahiaoui N, Keller B: CACTA transposons in Triticeae. A diverse family of high-copy repetitive elements. Plant Physiology. 2003, 132: 52-63.PubMed
- Lal SK, Giroux MJ, Brendel V, Vallejos CE, Hannah LC: The maize genome contains a Helitron insertion. Plant Cell. 2003, 15: 381-391. 10.1105/tpc.008375.PubMed CentralPubMed
- Gupta S, Gallavotti A, Stryker GA, Schmidt RJ, Lal SK: A novel class of Helitron-related transposable elements in maize contain portions of multiple pseudogenes. Plant Mol Biol. 2005, 57: 115-127. 10.1007/s11103-004-6636-z.PubMed
- Kapitonov VV, Jurka J: Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 2001, 98: 8714-8719. 10.1073/pnas.151269298.PubMed CentralPubMed
- Du C, Fefelova N, Caronna J, He LM, Dooner HK: The polychromatic Helitron landscape of the maize genome. Proc Natl Acad Sci USA. 2009, 106: 19916-19921.PubMed CentralPubMed
- Zhang JB, Zhang F, Peterson T: Transposition of reversed Ac element ends generates novel chimeric genes in maize. PLoS Genet. 2006, 2: 1535-1540.
- Takagi K, Maekawa M, Tsugane K, Iida S: Transposition and target preferences of an active nonautonomous DNA transposon nDart1 and its relatives belonging to the hAT superfamily in rice. Mol Genet Genomics. 2010, 284: 343-355. 10.1007/s00438-010-0569-9.PubMed
- Pereira V: Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol. 2004, 5: R79-10.1186/gb-2004-5-10-r79.PubMed CentralPubMed
- Sandmeyer S: Integration by design. Proc Natl Acad Sci USA. 2003, 100: 5586-5588. 10.1073/pnas.1031802100.PubMed CentralPubMed
- Malik HS, Eickbush TH: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999, 73: 5186-5190.PubMed CentralPubMed
- White SE, Habera LF, Wessler SR: Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of gene structure and expression. Proc Natl Acad Sci USA. 1994, 91: 11792-11796. 10.1073/pnas.91.25.11792.PubMed CentralPubMed
- Kashkush K, Feldman M, Levy AA: Transcriptional activation of retrotransposons alters the expression of adjacent genes in wheat. Nat Genet. 2003, 33: 102-106. 10.1038/ng1063.PubMed
- Hollister JD, Smith LM, Guo YL, Ott F, Weigel D, Gaut BS: Transposable elements and small RNAs contribute to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. Proc Natl Acad Sci USA. 2011, 108: 2322-2327. 10.1073/pnas.1018222108.PubMed CentralPubMed
- Hollister JD, Gaut BS: Epigenetic silencing of transposable elements: A trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res. 2009, 19: 1419-1428. 10.1101/gr.091678.109.PubMed CentralPubMed
- Ahmed I, Sarazin A, Bowler C, Colot V, Quesneville H: Genome-wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis. Nucleic Acids Res. 2011, 39: 6919-6931. 10.1093/nar/gkr324.PubMed CentralPubMed
- Wawrzynski A, Ashfield T, Chen NWG, Mammadov J, Nguyen A, Podicheti R, Cannon SB, Thareau V, Ameline-Torregrosa C, Cannon E, et al: Replication of Nonautonomous Retroelements in Soybean Appears to Be Both Recent and Common. Plant Physiol. 2008, 148: 1760-1771. 10.1104/pp.108.127910.PubMed CentralPubMed
- Kalendar R, Vicient CM, Peleg O, Anamthawat-Jonsson K, Bolshoy A, Schulman AH: Large retrotransposon derivatives: Abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics. 2004, 166: 1437-1450. 10.1534/genetics.166.3.1437.PubMed CentralPubMed
- Vitte C, Chaparro C, Quesneville H, Panaud O: Spip and Squiq, two novel rice non-autonomous LTR retro-element families related to RIRE3 and RIRE8. Plant Sci. 2007, 172: 8-19. 10.1016/j.plantsci.2006.07.008.
- Wang H, Liu JS: LTR retrotransposon landscape in Medicago truncatula: more rapid removal than in rice. BMC Genomics. 2008, 9: 382-10.1186/1471-2164-9-382.PubMed CentralPubMed
- Witte CP, Le QH, Bureau T, Kumar A: Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc Natl Acad Sci USA. 2001, 98: 13778-13783. 10.1073/pnas.241341898.PubMed CentralPubMed
- Vitte C, Panaud O: LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model. Cytogenet Genome Res. 2005, 110: 91-107. 10.1159/000084941.PubMed
- Xu L, Zhang Y, Su Y, Liu L, Yang J, Zhu YY, Li CY: Structure and evolution of full-length LTR retrotransposons in rice genome. Plant Systematics and Evolution. 2010, 287: 19-28. 10.1007/s00606-010-0285-2.
- Vitte C, Panaud O, Quesneville H: LTR retrotransposons in rice (Oryza sativa, L.): recent burst amplifications followed by rapid DNA loss. BMC Genomics. 2007, 8: 218-10.1186/1471-2164-8-218.PubMed CentralPubMed
- Ma JX, Devos KM, Bennetzen JL: Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004, 14: 860-869. 10.1101/gr.1466204.PubMed CentralPubMed
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21: I351-I358. 10.1093/bioinformatics/bti1018.PubMed
- Edgar RC, Myers EW: PILER: identification and classification of genomic repeats. Bioinformatics. 2005, 21: I152-I158. 10.1093/bioinformatics/bti1003.PubMed
- Xu Z, Wang H: LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007, 35: W265-W268. 10.1093/nar/gkm286.PubMed CentralPubMed
- McCarthy EM, McDonald JF: LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003, 19: 362-367. 10.1093/bioinformatics/btf878.PubMed
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27: 573-580. 10.1093/nar/27.2.573.PubMed CentralPubMed
- Wootton JC, Federhen S: Statistics of Local Complexity in Amino-Acid-Sequences and Sequence Databases. Comput Chem. 1993, 17: 149-163. 10.1016/0097-8485(93)85006-X.
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.PubMed CentralPubMed
- Huang Y, Niu BF, Gao Y, Fu LM, Li WZ: CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010, 26: 680-682. 10.1093/bioinformatics/btq003.PubMed CentralPubMed
- Kent WJ: BLAT - The BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMed CentralPubMed
- Sandve GK, Gundersen S, Rydbeck H, Glad IK, Holden L, Holden M, Liestol K, Clancy T, Ferkingstad E, Johansen M, et al: The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 2010, 11: R121-10.1186/gb-2010-11-12-r121.PubMed CentralPubMed
- Saeed AI, Hagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li JW, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. DNA Microarrays, Part B: Databases and Statistics. 2006, 411: 134-+-
- Sandve GK, Ferkingstad E, Nygard S: Sequential Monte Carlo multiple testing. Bioinformatics. 2011, 27: 3235-3241. 10.1093/bioinformatics/btr568.PubMed CentralPubMed
- Zar JH: Biostatistical analysis. 1999, USA: Prentice-Hall, Inc.
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralPubMed
- Kimura M: A Simple Method for Estimating Evolutionary Rates of Base Substitutions through Comparative Studies of Nucleotide-Sequences. J Mol Evol. 1980, 16: 111-120. 10.1007/BF01731581.PubMed
- Ma JX, Bennetzen JL: Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA. 2004, 101: 12404-12410. 10.1073/pnas.0403715101.PubMed CentralPubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.