- Research article
- Open Access
Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes
- Ingrid Garbus†1,
- José R Romero†1,
- Miroslav Valarik2,
- Hana Vanžurová2,
- Miroslava Karafiátová2,
- Mario Cáccamo3,
- Jaroslav Doležel2,
- Gabriela Tranquilli4,
- Marcelo Helguera5 and
- Viviana Echenique1Email author
© Garbus et al.; licensee BioMed Central. 2015
Received: 2 January 2015
Accepted: 24 April 2015
Published: 12 May 2015
The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively.
Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC).
Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.
The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.
Wheat (Triticum aestivum L. em Thell, 2n = 42; AABBDD) has an allohexaploid genome structure that arose from two polyploidization events. The first brought together the genomes of two diploid species related to the wild species Triticum urartu (2n = 2× = 14; AuAu) and a species related to Aegilops speltoides (2n = 14; SS) . This hybridization formed the allotetraploid Triticum turgidum (2n = 4x = 28; AABB) that suffered the second hybridization event with a diploid grass species, Aegilops tauschii (DD), producing the ancestral allohexaploid T. aestivum (2n = 6x = 42; AABBDD) . Thus, the hexaploid wheat genome is characterized by its large size (~17 Gb) and complexity, with repetitive sequences accounting for ~ 80% of the genome [2,3].
The number and complexity of repetitive elements varies between species, and those with larger genomes generally have more repetitive elements . Repetitive sequences can be divided into three main classes: transposable elements, tandem repeats, and high copy number genes, such as ribosomal or histone genes. Transposable elements (TEs) are the best-defined class and constitute the most abundant component of many genomes, ranging from 10% to 85% . Based on transposition mechanism, TEs can be subdivided into two classes. Class I, retrotransposons, move via so-called “copy and paste” mechanisms using RNA intermediates, and is mainly composed of long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons, such as LINEs and SINEs (long and short interspersed nuclear elements, respectively) . Class II DNA transposons replicate without an RNA intermediate, either by a cut-and-paste mechanism (terminal inverted repeats; TIRs), by rolling-circle DNA replication (helitrons), or by mechanisms that remain unknown [5,6].
Tandem repeats represent a second class of repetitive sequences that can account for a large portion of genomic DNA, comprising any sequence found in consecutive copies along a DNA strand, arranged in tandem arrays of the monomeric unit . Typically localized to specialized chromosome regions such as centromeres, telomeres, and heterochromatic knobs of many eukaryotes , tandem repeats can be categorized according to the size of the repeated units. Microsatellites or simple sequence repeats consist of 1–6 nucleotides, minisatellites are 10–60 nucleotides, and satellites include more than 60 nucleotides. Satellites are the main class of tandem repeats and are thought to play a role in organizing and stabilizing the specialized chromosome regions in which they are found, which are important for chromosome behavior during cell division . Whereas some satellite repeats are chromosome-specific, others are more broadly distributed [9,10].
Repetitive sequences have a large influence on genome structure, function and evolution but, at the same time, complicate genomic analysis. These highly variable genome components, especially TEs, are subject of dynamic evolution mainly due to insertions, illegitimate and unequal recombination, and interchromosomal and tandem duplications . In bread (hexaploid) wheat, polyploidization and the prevalence of TEs has resulted in massive gene duplication and movement. From a practical point of view, repetitive sequences constitute a potential source of a wide range of markers useful for genome diversity and evolution analysis, genetic mapping and marker-assisted selection. Among them we can find markers based on short tandem repeats, such as Sequence Tagged Microsatellite Sites (STMS)  and Simple Sequence Repeats (SSRs) (reviewed in ), or markers based on transposable elements like: sequence-specific amplification polymorphism (SSAP) , retrotransposon based insertion polymorphism (RBIP) , interretrotransposon amplified polymorphism (IRAP) and retrotransposon-microsatellite amplified polymorphism (REMAP) , repeat junction– junction marker (RJJM) , insertion-site-based polymorphism (ISBP) [18,19], and repeat junction marker (RJM) .
The complete characterization of TEs, as well as the elucidation of their distribution across genomes and the mechanisms responsible for that distribution, constitutes essential information for understanding the nature and consequences of genome size variations between different species, as well as the large-scale organization and evolution of plant genomes. However, this type of analysis is hindered by the large genome and the polyploid nature of bread wheat. The International Wheat Genome Sequencing Consortium (IWGSC,  has adopted the flow-sorted chromosome arms genomic approach to the analysis of the wheat genome, achieving a great reduction in complexity . The combination of second generation sequencing technologies and DNA from flow-sorted chromosomes and chromosome arms became base for survey sequencing of all chromosome arms of wheat . With some limitations in the building of contigs/scaffolds, this survey sequences provides unique opportunity to study the repetitive portion of each chromosome individually, enables comparisons among different chromosomes , and may enable identification of chromosome or genome specific sequences.
As members of IWGSC , our laboratory obtained a survey sequence of wheat chromosome arms 4DS and 4DL and, through a combination of different approaches, a virtual map including 1973 syntenic genes was built and ~5,700 genes were predicted on bread wheat chromosome 4D . An even distribution of repetitive elements was also reported in both arms , but the repeat fraction of this chromosome was not characterized. Here, we focused on chromosome 4D repeatome and analyzed and characterized the repetitive sequences of chromosome 4D arms obtained through Roche 454 sequencing technology (JROL0000000,  and compared it with the 4A, 4B and 4D sequences obtained through Illumina sequencing technology , hereinafter differenced with the subscripts 454 and I, respectively. Identified transposable elements were analyzed and sorted by class and classified to families. Novel LTR subfamilies were identified, analyzed, and characterized using specific bioinformatics tools. Their physical localization and distribution along the whole wheat genome was assessed by fluorescent in situ hybridization (FISH).
Results and discussion
Quantification of repetitive sequences from wheat homeologous group 4 chromosome arms
Repetitive elements identified in Triticum aestivum (var. Chinese Spring) homeologous group 4 chromosome arms
Comparison of the repeat content of T. aestivum chromosomes and chromosome arms obtained through different sequencing technologies
The repetitive regions of 4DS454 and 4DL454 were almost homogeneously distributed along both chromosome arms  what, likely, may be due to limitations of repetitive sequences assembly, used genetic map and GenomeZipper which are positively biased toward the gene-containing regions .
Classification of repetitive sequences from wheat homeologous group 4 chromosome arms
SSR classification according to the classes
The SSR markers still have potential for whole genome or sub-genome mapping [39,40] and breeding . Additionally, some of the SSRs were found very useful as physical markers for cytogenetic mapping, metaphase chromosome identification  and enhancing chromosome sorting by FISHIS . Since most of the di and tri-nucleotide SSRs were already localized  we focused on SSRs with longer subunit. The comparison of sequence occurrence of unique SSR motifs among chromosomes and chromosome arms allowed identification of SSRs suggestive to be putative arm-specific (Additional file 2: Table S2). The (CAGCG)n/(CGCTG)n and (CCGTA)n/(TACGG)n motifs showed specificity for 4DL and (CGTAG)n/(CTACG)n showed specificity for 4BL. Additionally, (TTACG)n/(CGTAA)n was found specific for chromosome 4D. FISH localization on metaphase chromosomes showed that microsatellites produced weak dispersed signals on almost all chromosomes (data not shown). These findings suggest that quantitative assessment of SSRs in the survey sequence assembly may not be representative due to, already above discussed, the possibility of collapsing of highly repetitive tandem repeats in assemblies of short sequencing reads, but catalog of available microsatellites and other repeats can provide useful information for marker candidate sequence identification and marker development.
Identification and annotation of novel LTR retrotransposons
LTR retrotransposons account for a significant fraction of many genomes and even are the predominant component of some large genomes . Typical structural characteristics include: 1) two highly similar LTR sequences; 2) target site duplications; 3) a primer binding site and a polypurine tract; 4) protein-coding domains for enzymes important to retrotransposition . Additionally, non-autonomous LTR retrotransposons have been described in plants as large retrotransposon derivatives (LARDs) and terminal repeat retrotransposons in miniature (TRIMs), both of which have the typical features of LTR retrotransposons but lack protein-coding capability in their internal domain [44,45].
Description of the 6 LTR retrotransposon candidates identified on 4D chromosome scaffolds
# in genome
LTR region similarity
Insertion time (years x 10 6 )
BLASTX alignment of coding sequences encoded by the novel LTR retrotransposons with TREP database
TREP protein code 1
LTR retrotransposon associated 2
Score 3 (bits)
Conservative substitutions 5
Description of the novel LTR retrotransposons taxonomy and family members
Non autonomous retrotransposon (TRIM)
Non autonomous retrotransposon (TRIM)
Identification of members of the novel LTR retrotransposon families
The presence of full-length copies of the novel LTR retrotransposon in genome was tested, using the candidate LTR retrotransposons as probes against the T. aestivum chromosome arm assemblies acquired from the IWGSC database, following the criteria proposed by . There were identified 21 to 757 copies for each candidate, being RLC_Facunda_JROL01000922-1 the most abundant one (Table 4).
However, such values are probably miscalculated due to the short length of the sequences deposited in the databases; thus, a single unique large LTR retrotransposon could give rise to several hits. To address this, we adopted an additional approach consisting of BLASTN searches against the T. aestivum (WGS project accession CALP000000000; ) and Ae. tauschii (WGS project accession AOCO000000000; ) whole genome shotgun sequence (wgs) databases using the six candidate LTR retrotransposons as probes. The resulting sequences were used as input for the LTR_FINDER and LTR_STRUC programs and the output sequences were extracted from the wgs and manually analyzed to verify the identity with the probed LTR retrotransposon. Such procedure allowed identification of one member of the LTR retrotransposon family for candidates RLC_Genoveva_JROL01007197-1 and RLX_Gabrielle_JROL01007833-1, two for RLC_Facunda_JROL01000922-1 and RLG_Francisca_JROL01008273-1 and three for RLC_Carmen_JROL01007734-1 (Additional file 5: Table S4). Interestingly, thirty one new LTR retrotransposons were identified when probed with RLX_Victoria_JROL01006440-1 (Additional file 5: Table S4).
Finally, all the positive hits obtained through BLASTN search of T. aestivum and Ae. tauschii wgs probed with the six candidate LTR retrotransposons were additionally BLAST searched against a local database, constructed by adding to the MIPs database the six novel LTR retrotransposons. The alignments among the wgs and the local database were manually analyzed. To be considered a candidate LTR retrotransposon copy the wgs needed to: i) show identity to the candidate exclusively or, b) exhibit remarkably higher identity to the probed LTR retrotransposon than to any other LTR retrotransposon. Sequences that fulfilled these parameters were extracted from wgs. This approach showed that at least one strong-hit copy was present in the wgs database for each of the six candidates, together with several partial copies (Table 6; Additional file 6: Table S5).
Special attention was centered in the LTR family RLX_Victoria_JROL01006440-1 since most members were identified by LTR_FINDER and/or LTR_STRUC and thus several structural information about them is available. Most of the members of the family ranged in size from 1898 to 3250 bp and carried LTRs of 120 to 1051 bp, whereas one member was 8698 bp in length. Detailed insight in such member revealed that it was not a single retrotransposon but four Victoria LTR retrotransposons in tandem. Complete elements were flanked by 4 to 6-bp target site duplications. BLASTX alignment of the members of the family with retrotransposon proteins across the GyDb and NCBI databases revealed the presence of short fragments of some proteins, such as AP and INT. Since no complete ORF could be identified, it could be deduced that the internal domains of the elements lack coding capability. Regarding the internal region, the primer binding site was complementary to the methionine tRNA in 50% of the sequences, whereas 32% corresponded to other tRNAs and it could not be identified for 18% of the tRNAs. A 15-nt polypurine tract was identified upstream of the 3′LTR. As demonstrated through BLASTX searches in the NCBI and GyDB databases, none of the identified members of the family possess the complete ORFs necessary to be considered an autonomous TE. Thus, taking into account the size of the members, the family was classified as TRIM non-autonomous LTR retrotransposons. The six novel retrotransposon families will be included in the next update of the Plant Genome and Systems Biology Repeat Element Database (PGSB-REdat).
The present work constitutes the first insight of wheat homeologous group 4 chromosomes repetitive sequences analyzed at the chromosome arm level. Detailed study of repetitive elements becomes more interesting as it has been thought before, since repetitive elements seems to play important roles in genome structure and size variation and also contribute to the evolution of genes and their function. In accordance with results obtained for other grasses, CACTA/En-Spm and Gypsy were the most abundant DNA transposons and retrotransposons, respectively, suggestive of their conserved roles in genome regulation. The characterization of the tandem repeat content along the homeologous group 4 allowed creating a list of SSR motifs in wheat chromosomes of the homeologous group 4. Six novel LTR retrotransposon families were characterized, including three Copias, one Gypsy, and two TRIM LTR retrotransposons. In spite of the extensive research performed in Triticeae genomes and the high number of reported elements, the fact that six new elements could be identified indicates that new families probably remain to be described. However, for more detailed study of quantitative repeatome content and structure a reference sequence is crucial.
Sequences from chromosome 4D
Sequences from Triticum aestivum cv. Chinese Spring ditelosomic (DT) lines for the 4D chromosome arms were obtained through Roche 454 sequencing technology and assembled into 8141 and 7077 scaffolds for 4DS and 4DL, respectively, hereafter named 4DS454 and 4DL454, as described in . Additionally, the sequences belonging to the wheat homeologous group 4 chromosome arms obtained through Illumina sequencing technology were downloaded from the IWGSC website  and are referred as 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI. When additional comparisons needed to be done, chromosome arms sequences other than those from the homeologous group 4 were also downloaded from the IWGSC website.
Identification of repetitive elements
Repetitive sequences were identified using RepeatMasker (RM) . The program inputs were FASTA-formatted archives, whereas the program output consisted of a detailed annotation of the repeats present in the query sequence. Sequence comparisons were performed using the alignment software cross_match (version open-3.3.0) .
From the Cross_match output list, the name of the matching interspersed repeat and the class of the repeat were used to classify and count elements belonging to SMALL RNA, satellites, simple repeats and low complexity regions using a homemade Perl script.
TE interspersed repeat family signatures were identified using Mips-REdat_v9.0p database hosted by the MIPS at PlantsDB , that contains ~42.000 sequences with total length of ~350 Mb. The sequences with > =95% identity over > =95% of its length were considered as redundant and only the longest element from the clusters was used for further analysis.
The repetitive element classification was performed according to hierarchy as suggested by IWGSC : class, subclass and superfamily. DNA transposons were divided into subclasses based on whether they contained terminal inverted repeats (TIRs) or not. RNA retrotransposons were classified as LTR or Non-LTR retrotransposons on the basis of the presence or absence of LTRs.
Identification and annotation of novel LTR retrotransposons
The scaffolds obtained from 4DS454 and 4DL454 sequences were scanned for LTR retrotransposons using LTR_FINDER  and LTR_STRUC . The FASTA-formatted scaffolds from the chromosome arm database were used as input data for both programs, whereas the output consisted of putative novel LTR retrotransposon sequences. LTR_FINDER was used with default parameters with the following exceptions: the minimum LTR size was set to 100 and the minimum distance of LTRs (internal domain) was set at 1000 bp. The Arabidopsis thaliana (639 tRNAs; Release Feb 2004), Brachypodium distachyon JGI v1.08x (661 tRNAs), Oryza sativa (764 tRNAs), Sorghum bicolor version 1.0 (649 tRNAs) and Zea mays version 4a.53 (1168 tRNAs) databases deposited at Genomic tRNA Database  were used to predict the tRNA binding sites typical for LTR structure. tRNA genes prediction was performed using the program tRNAscan-SE . Additional de novo LTR transposons identification was based on sequence homology independent structural features search using LTR_STRUC software . The output candidate LTR retrotransposons were extracted from the scaffolds and manually inspected. Candidate LTR retrotransposons were clustered using CD-HIT (ver. 4.5.7, Jan 3 2012 ). The candidates were further BLAST aligned against MIPS-REdat and manually checked if they belonged to known families, using criteria proposed by Wicker et al. . Two elements belong to the same family if they are at least 80% identical in at least 80% of their coding regions and internal domains, or within their LTRs, or in both. The LTRs were aligned using ClustalX . Transposon-associated proteins were identified using BLASTX alignments with NCBI  and GyDB . Annotation of LTR retrotransposons was performed according to . The copy number of the candidate LTRs retrotransposons was estimated from alignments with survey sequences of all T. aestivum chromosome arms deposited at URGI. The alignments showing at least 80% of identity and at least 80% coverage after manually inspection were considered positive hits. Additional copies of the novel LTR retrotransposon were searched in the T. aestivum (WGS project accession CALP000000000; ) and Ae. tauschii (WGS project accession AOCO000000000; ) whole genome sequence databases deposited at NCBI.
Estimation of insertion time
The insertion time of retrotransposons was estimated using the formula T = K/2r , where T, K and r are time of divergence, average number of substitutions per aligned site and average synonymous substitution rate, respectively. To estimate the divergence time of LTR retrotransposons, r was set to 1.36x10-8 substitutions per site per year . The 5′LTR and 3′LTR of each candidate were aligned using ClustalW .
Phylogenetic tree construction
Phylogenetic analyses were conducted in MEGA4 . Aligned sequences were used to generate trees using the Maximum Parsimony method . The bootstrap consensus tree inferred from 500 replicates  is taken to represent the evolutionary history of the LTR analyzed . Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentages of replicate trees in which the associated sequence clustered together in the bootstrap test (500 replicates) are shown next to the branches . The MP tree was obtained using the Close-Neighbor-Interchange algorithm with search level 3 [65,66] in which the initial trees were obtained with the random addition of sequences (10 replicates). All positions containing gaps or missing data were eliminated from the dataset (Complete deletion option).
In situ localization of newly identified and highly abundant repetitive elements
Here, it was applied fluorescent in situ hybridization labeling in suspension (FISHIS)  that uses an additional chromosome specific fluorescent marker which can quantitatively bind to chromosomes. The FISHIS uses microsatellite markers, but only GAA SSR proved to be applicable for reliable chromosome sorting for 12-13 out of 21 bread wheat chromosomes.
Probes for Afa repeat was labeled with digoxigenin (Roche, Mannheim, Germany) and probes for selected microsatellites (4DL_SSR1, 4DL_SSR2, 4BL_SSR1, and 4D_SSR) with Texas Red (Invitrogen, Camarillo, CA, USA) according to . A 260-bp fragment of the Afa family repeat was prepared using PCR with primers AS-A and AS-B on wheat genomic DNA . The SSR repeats labeled by Texas Red were prepared according to  using PCR primers 4DL_SSR1 – (CAGCG)6/(CGCTG)3, 4DL_SSR2 - (CCGTA)6/(TACGG)4, 4BL_SSR1 (CGTAG)6/(CTACG)4, 4D_SSR - (TTACG)6/(CGTAA)4. The PCR amplification was carried out in a C-1000 Touch™ thermal cycler (Bio-Rad, USA) in a volume of 15 μl containing 1 μmol/l of each primer, 200 μmol/l of each of the dNTPs, but dATP is supplemented with mixture of dUTP labeled with Texas Red and dATP in ratio 1:2, 1,5 mmol/l of MgCl2, 0,5 U OneTaq DNA Polymerase (New England Biolabs, USA) in supplier recommended buffer. The amplification was done by 40 cycles of 30 sec at 95°C, 30 sec at 60°C, and elongation was done 30 sec at 72°C.
Probes for the newly identified transposable elements were labelled directly with Texas red (Invitrogen, Camarillo, CA, USA) using Nick translation approach  of PCR product from primers designed for insertion site and internal regions of the transposons. For each transposon two pairs of primers were designed (Additional file 7: Table S6). One of each primers pair was designed directly to the insertion site overlapping host sequence, TSD, and LTR sequence.
The amplicons were designed to be 0,5-4 kb long. Primers were designed using Primer3 software . The amplification was carried out in a C-1000 Touch™ thermal cycler (Bio-Rad, USA) in a volume of 15 μl containing 15 ng of Chinese Spring genomic DNA, 1 μmol/l of each primer, 200 μmol/l of each of the dNTPs, 1,5 mmol/l of MgCl2, 0,5 U OneTaq DNA Polymerase (New England Biolabs, USA) in supplier recommended buffer. The PCR products were separated in 1% agarose gel. In case of multiple PCR products, the band of expected size was excised from agarose gel, extracted and used for labeling as described above. The identities of the PCR fragments were verified by Sanger sequencing from both corresponding primers.
Chromosome localization of the probes was performed using FISH on wheat metaphase chromosomes (cv. Chinese Spring). Chromosomes were isolated from the meristematic tissue of the root tips treated with ice water for two days and slides were prepared according to . The quality of chromosome spreads was checked under the microscope and the best slides were used for FISH. Post-fixation was performed according to .
Hybridization mixture consisting of 40% formamide, 250 ng of calf thymus DNA, 2x SSC, 15 ng Afa probe, 60 ng transposable element probe and 50% dextran sulphate up to final 25 μl was applied onto the slides. The slides were denatured at 80°C for 2.5 min and incubated in humid chamber at 37°C overnight. After the hybridization, slides were stringently washed as described in . The signals of Texas Red labelled probes were observed directly. Digoxigenin-labelled probes were detected using anti-digoxigenin-FITC (Roche, Mannheim, Germany) in the concentration recommended by manufacturer. Chromosome DNA was counterstained with 4′,6′-diamidino-2-phenylindole (DAPI) in Vectashield (Vector Laboratories, USA).
The preparations were evaluated using Axio Imager Z.2 Zeiss microscope (Zeiss, Oberkochen, Germany) equipped with Cool Cube 1 (Metasystems, Altlussheim, Germany) camera and appropriate filter sets. The capture of fluorescence signals and merging the layers were performed with ISIS software (Metasystems, Germany) and the final image adjustment was done in Adobe Photoshop 6.0.
Availability of supporting data
The data sets supporting the results of this article are included within the article as Additional file 8 (File 1 and Table S7) that comprises a fasta-formatted file with the nucleotidic sequences of members of the six novel retrotransposon families sequences and table with the main features of each individual sequence. Furthermore, the six novel retrotransposon families will be included in the next update of the Plant Genome and Systems Biology Repeat Element Database (PGSB-REdat).
We are grateful to the IWGSC for the access to the chromosome assemblies. This work was supported by the following Institutions: CONICET (International Cooperation Grant Res. 456/ 10/2/2011) ANPCyT (Préstamo BID 2012, PICT 0660), INTA (Res. 418/2012, PNCYO1127041) and Universidad Nacional del Sur (PGI–TIR, CSU-142/14).
- Petersen G, Seberg O, Yde M, Berthelsen K. Phylogenetic relationships of Triticum and Aegilops and evidence for the origin of the A, B, and D genomes of common wheat (Triticum aestivum). Mol Phylogenet Evol. 2006;39:70–82.View ArticlePubMedGoogle Scholar
- Bennetzen JL, Ma J, Devos KM. Mechanisms of recent genome size variation in flowering plants. Ann Bot (Lond). 2005;95:127–32.View ArticleGoogle Scholar
- The International Wheat Genome Sequencing Consortium. A chromosome based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science. 2014;345:1251788.View ArticleGoogle Scholar
- Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002;115:49–63.View ArticlePubMedGoogle Scholar
- Rebollo R, Romanish MT, Mager DL. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu Rev Genet. 2012;46:21–42.View ArticlePubMedGoogle Scholar
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82.View ArticlePubMedGoogle Scholar
- Lerat E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity. 2010;104:520–33.View ArticlePubMedGoogle Scholar
- Ugarković D, Plohl M. Variation in satellite DNA profiles causes and effects. EMBO J. 2002;21:5955–9.View ArticlePubMedGoogle Scholar
- Plohl M, Luchetti A, Mestrović N, Mantovani B. Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin. Gene. 2008;409:72–82.View ArticlePubMedGoogle Scholar
- Ananiev EV, Phillips RL, Rines HW. Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc Natl Acad Sci U S A. 1998;95:13073–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Feuillet C, Salse J. Comparative genomics in the Triticeae. In: Feuillet C, Muehlbauer GJ, editors. Plant Genetics and Genomics. New York: Springer; 2009. p. 451–77.Google Scholar
- Beckmann JS, Soller M. Toward a unified approach to genetic-mapping of eukaryotes based on sequence tagged microsatellite sites. BIO-TECHNOLOGY. 1990;8:930–2.View ArticlePubMedGoogle Scholar
- Gupta PK, Varshney RK. The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat. Euphytica. 2000;113:163–85.View ArticleGoogle Scholar
- Waugh R, McLean K, Flavell AJ, Pearce SR, Kumar A, Thomas BB, et al. Genetic distribution of Bare-1-like retrotransposable elements in the barley genome revealed by sequence-specific amplification polymorphisms (S-SAP). Mol Gen Genet. 1997;253:687–94.View ArticlePubMedGoogle Scholar
- Flavell AJ, Knox MR, Pearce SR, Ellis TH. Retrotransposon-based insertion polymorphisms (RBIP) for high throughput marker analysis. Plant J. 1998;16:643–50.View ArticlePubMedGoogle Scholar
- Kalendar R, Schulman AH. IRAP and REMAP for retrotransposon-based genotyping and fingerprinting. Nat Protoc. 2006;1:2478–84.View ArticlePubMedGoogle Scholar
- Luce AC, Sharma A, Mollere OS, Wolfgruber TK, Nagaki K, Jiang J, et al. Precise centromere mapping using a combination of repeat junction markers and chromatin immunoprecipitation-polymerase chain reaction. Genetics. 2006;174:1057–61.View ArticlePubMed CentralPubMedGoogle Scholar
- Devos KM, Ma J, Pontaroli AC, Pratt LH, Bennetzen JL. Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat. Proc Natl Acad Sci U S A. 2005;102:19243–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Paux E, Roger D, Badaeva E, Gay G, Bernard M, Sourdille P, et al. Characterizing the composition and evolution of homeologous genomes in hexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J. 2006;48:463–74.View ArticlePubMedGoogle Scholar
- Wanjugi H, Coleman-Derr D, Huo NX, Kianian SF, Luo MC, Wu JJ, et al. Rapid development of PCR-based genome-specific repetitive DNA junction markers in wheat. Genome. 2009;52:576–87.View ArticlePubMedGoogle Scholar
- International Wheat Genome Sequencing Consortium at www.wheatgenome.org
- Doležel J, Simkova H, Kubalakova M, Safar J, Suchankova P, Cihalikova J, et al. Chromosome genomics in the Triticeae. In: Feuillet C, Muehlbauer GJ, editors. Plant Genetics and Genomics. New York: Springer; 2009. p. 285–316.Google Scholar
- Doležel J, Kubaláková M, Paux E, Bartoš J, Feuillet C. Chromosome-based genomics in the cereals. Chromosom Res. 2007;15:51–66.View ArticleGoogle Scholar
- Helguera M, Rivarola M, Clavijo B, Marthis M, Vanzetti L, González S, et al. Sequence of chromosome 4D of bread wheat reveals its structure and virtual gene order. Plant Sci. 2015;233:200–12.View ArticlePubMed CentralPubMedGoogle Scholar
- Ling HQ, Zhao S, Liu D, Wang J, Sun H, Zhang C, et al. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature. 2013;496:87–90.View ArticlePubMedGoogle Scholar
- Jia J, Zhao S, Kong X, Li Y, Zhao G, He W, et al. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature. 2013;496:91–5.View ArticlePubMedGoogle Scholar
- Vitulo N, Albiero A, Forcato C, Campagna D, Dal Pero F, Bagnaresi P, et al. First Survey of the wheat chromosome 5A composition through a next generation sequencing approach. PLoS One. 2011;6(10):e26421.View ArticlePubMed CentralPubMedGoogle Scholar
- Sehgal SK, Li W, Rabinowicz PD, Chan A, Simková H, Doležel J, et al. Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat. BMC Plant Biology. 2012;12:64.View ArticlePubMed CentralPubMedGoogle Scholar
- Hernandez P, Martis M, Dorado G, Pfeifer M, Gálvez S, Schaaf S, et al. Next generation sequencing and syntenic integration of flow sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J. 2012;69:377–86.View ArticlePubMedGoogle Scholar
- Tanaka T, Kobayashi F, Joshi GP, Onuki R, Sakai H, Kanamori H, et al. Next-Generation Survey Sequencing and the Molecular Organization of Wheat Chromosome 6B. DNA Res. 2014;21:103–14.View ArticlePubMed CentralPubMedGoogle Scholar
- Sergeeva EM, Afonnikov DA, Koltunova MK, Gusev VD, Miroshnichenko LA, Vrána J, et al. Common wheat chromosome 5B composition analysis using low-coverage 454 Sequencing. The Plant Genome. 2014;7:1–16.Google Scholar
- Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46.View ArticlePubMedGoogle Scholar
- Kurtz S, Narechania A, Stein JC, Ware D. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics. 2008;9:517.View ArticlePubMed CentralPubMedGoogle Scholar
- Mayer KFX, Taudien S, Martis M, Šimková H, Suchánková P, Gundlach H, et al. Gene content and virtual gene order of barley chromosome 1 H. Plant Physiol. 2009;151:496–505.View ArticlePubMed CentralPubMedGoogle Scholar
- Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, et al. Optimizing Illumina next-generation sequencing library preparation for extremely AT biased genomes. BMC Genomics. 2012;13:1.View ArticlePubMed CentralPubMedGoogle Scholar
- Kubaláková M, Kovářová P, Suchánková P, Číhalíková J, Bartoš J, Lucretti S, et al. Chromosome sorting in tetraploid wheat and its potential for genome analysis. Genetics. 2005;170:823–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002;30:194–200.View ArticlePubMedGoogle Scholar
- Huo N, Lazo GR, Vogel JP, You FM, Ma Y, Hayden DM, et al. The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences. Funct Integr Genomics. 2008;8:135–47.View ArticlePubMedGoogle Scholar
- Röder MS, Korzun V, Wandehake K, Planschke J, Tixier MH, Leroy P, et al. A microsatellite map of wheat. Genetics. 1998;149:2007–23.PubMed CentralPubMedGoogle Scholar
- Pestsova E, Ganal MW, Röder MS. Isolation and mapping of microsatellite markers specific for the D genome of bread wheat. Genome. 2000;43:689–97.View ArticlePubMedGoogle Scholar
- Vrána J, Kubaláková M, Šimková H, Číhalíková J, Lysák MA, Doležel J. Flow-sorting of mitotic chromosomes in common wheat (Triticum aestivum L.). Genetics. 2000;156:2033–41.PubMed CentralPubMedGoogle Scholar
- Giorgi D, Farina A, Grosso V, Gennaro A, Ceoloni C, Lucretti S. FISHIS: fluorescence in situ hybridization in suspension and chromosome flow sorting made easy. PLoS One. 2013;8:e57994.View ArticlePubMed CentralPubMedGoogle Scholar
- Cuadrado A, Cardoso M, Jouve N. Physical organisation of simple sequence repeats (SSRs) in Triticeae: structural, functional and evolutionary implications. Cytogenet Genome Res. 2008;120:210–9.View ArticlePubMedGoogle Scholar
- Witte CP, Le QH, Bureau T, Kumar A. Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc Natl Acad Sci U S A. 2001;98:13778–83.View ArticlePubMed CentralPubMedGoogle Scholar
- Kalendar R, Vicient CM, Peleg O, Anamthawat-Jonsson K, Bolshoy A, Schulman A. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics. 2004;166:1437–50.View ArticlePubMed CentralPubMedGoogle Scholar
- Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Web Server issue):W265–W26.View ArticlePubMed CentralPubMedGoogle Scholar
- McCarthy EM, McDonald JF. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003;19:362–7.View ArticlePubMedGoogle Scholar
- CD-HIT Suite: Biological Sequence Clustering and Comparison. http://weizhong-lab.ucsd.edu/cdhit_suite/cgi-bin/index.cgi?cmd=h-cd-hit-est.
- Gypsy Database. www.gydb.org.
- Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 2012;491:705–10.View ArticlePubMed CentralPubMedGoogle Scholar
- Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, International Wheat Genome Sequencing Consortium, et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345:6194.View ArticleGoogle Scholar
- Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009;10:691–703.View ArticlePubMed CentralPubMedGoogle Scholar
- Sharma A, Presting GG. Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity. Mol Genet Genomics. 2008;279:133–47.View ArticlePubMedGoogle Scholar
- Neumann P, Navrátilová A, Koblížková A, Kejnovský E, Hřibová E, Hobza R, et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob DNA. 2011;2:4.View ArticlePubMed CentralPubMedGoogle Scholar
- Repeatmasker. www.repeatmasker.org.
- Bedell JA, Korf I, Gish W. MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics. 2000;16:1040–1.View ArticlePubMedGoogle Scholar
- MIPS Plant DB. ftp://ftpmips.helmholtz-muenchen.de/plants/REdat/.
- Guidelines for Annotating Wheat Genomic Sequences. http://wheat.pw.usda.gov/ITMI/Repeats/gene_annotation.pdf.
- Genomic tRNa database. http://lowelab.ucsc.edu/GtRNAdb/.
- Lowe TM, Eddy SR. tRNA scan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–64.View ArticlePubMed CentralPubMedGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. ClustalW and ClustalX version 2. Bioinformatics. 2007;23:2947–8.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.View ArticlePubMedGoogle Scholar
- SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nature. 1998;20:43–5.Google Scholar
- Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci U S A. 2004;101:12404–10.View ArticlePubMed CentralPubMedGoogle Scholar
- Eck RV, Dayhoff MO. Atlas of Protein Sequence and Structure, National Biomedical Research Foundation. Maryland: Silver Springs; 1966.Google Scholar
- Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39:783–91.View ArticleGoogle Scholar
- Nagaki K, Tsujimoto H, Isono K, Sasakuma T. Molecular characterization of a tandem repeat, Afa family, and its distribution among Triticeae. Genome. 1995;38:479–86.View ArticlePubMedGoogle Scholar
- Vrána J, Šimková H, Kubaláková M, Číhalíková J, Doležel J. Flow cytometric chromosome sorting in plants: The next generation. Methods. 2012;57:331–7.View ArticlePubMedGoogle Scholar
- Kato A, Albert PS, Vega JM, Bichler JA. Sensitive fluorescence in situ hybridization signal detection in maize using directly labelled probes produced by high concentration DNA polymerase nick translation. Biotech Histochem. 2006;81:71–8.View ArticlePubMedGoogle Scholar
- Primer 3 Software. http://bioinfo.ut.ee/primer3-0.4.0/.
- Masoudi-Nejad A, Nasuda S, McIntosh RA, Endo TR. Transfer of rye chromosome segments to wheat by a gametocidal system. Chromosome Res. 2002;10:349–57.View ArticlePubMedGoogle Scholar
- Ma L, Xiao Y, Huang H, Wang QW, Rao WN, Feng Y, et al. Direct determination of molecular haplotypes by chromosome microdissection. Nat Methods. 2010;7:299–301.View ArticlePubMed CentralPubMedGoogle Scholar
- Kubaláková M, Valárik M, Bartoš J, Vrána J, Cíhalíková J, Molnár-Láng M, et al. Analysis and sorting of rye (Secale cereale L.) chromosomes using flow cytometry. Genome. 2003;46:893–905.View ArticlePubMedGoogle Scholar
- Šafář J, Šimková H, Kubaláková M, Číhalíková J, Suchánková P, Bartoš J, et al. Development of chromosome-specific BAC resources for genomics of bread wheat. Cytogenet Genome Res. 2010;129:211–23.View ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007;24:1596–9.View ArticlePubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.