New insights into the origin of the B genome of hexaploid wheat: Evolutionary relationships at the SPA genomic region with the S genome of the diploid relative Aegilops speltoides
- Jérome Salse1, 2,
- Véronique Chagué1,
- Stéphanie Bolot2,
- Ghislaine Magdelenat3,
- Cécile Huneau1,
- Caroline Pont2,
- Harry Belcram1,
- Arnaud Couloux3,
- Soazic Gardais1,
- Aurélie Evrard1,
- Béatrice Segurens3,
- Mathieu Charles1,
- Catherine Ravel2,
- Sylvie Samain3,
- Gilles Charmet2,
- Nathalie Boudet1 and
- Boulos Chalhoub1Email author
© Salse et al; licensee BioMed Central Ltd. 2008
Received: 16 June 2008
Accepted: 25 November 2008
Published: 25 November 2008
Several studies suggested that the diploid ancestor of the B genome of tetraploid and hexaploid wheat species belongs to the Sitopsis section, having Aegilops speltoides (SS, 2n = 14) as the closest identified relative. However molecular relationships based on genomic sequence comparison, including both coding and non-coding DNA, have never been investigated. In an attempt to clarify these relationships, we compared, in this study, sequences of the Storage Protein Activator (SPA) locus region of the S genome of Ae. speltoides (2n = 14) to that of the A, B and D genomes co-resident in the hexaploid wheat species (Triticum aestivum, AABBDD, 2n = 42).
Four BAC clones, spanning the SPA locus of respectively the A, B, D and S genomes, were isolated and sequenced. Orthologous genomic regions were identified as delimited by shared non-transposable elements and non-coding sequences surrounding the SPA gene and correspond to 35 268, 22 739, 43 397 and 53 919 bp for the A, B, D and S genomes, respectively. Sequence length discrepancies within and outside the SPA orthologous regions are the result of non-shared transposable elements (TE) insertions, all of which inserted after the progenitors of the four genomes divergence.
On the basis of conserved sequence length as well as identity of the shared non-TE regions and the SPA coding sequence, Ae speltoides appears to be more evolutionary related to the B genome of T. aestivum than the A and D genomes. However, the differential insertions of TEs, none of which are conserved between the two genomes led to the conclusion that the S genome of Ae. speltoides has diverged very early from the progenitor of the B genome which remains to be identified.
All cereal crop species are members of the grass (Poaceae) family that is the fourth largest family of flowering plants. With about 10 000 species growing under nearly all climates and latitudes, grasses exceed all other plant families in ecological dominance and economic importance. In terms of genome organisation they represent a very diverse family with basic chromosome numbers ranging from 4 to 50 and genome sizes ranging from 350 Mb to 17 Gb . Fossil data and phylogenetic studies have estimated that the grasses have diverged from a common ancestor 50 to 70 million years ago (MYA) [2, 3]. Archaeological records suggest that farming started concomitantly in at least three widely separated regions between 10 000-5 000 years ago during the late Neolithic period. The three most important cereals were independently domesticated in three centres: wheat in south western Asia in the 'Fertile Crescent' region, maize in Mexico and rice in both south east Asia and west Africa [4–6].
Within the Poaceae, the genera Aegilops and Triticum include several diploid species (2n = 14) that, via allopolyploidization, produced several tetraploid and hexaploid wheat species, most of which have been domesticated [7–9]. T. turgidum (2n = 28, AABB) was derived from a hybridization event that happened (< 0.5 MYA) between T. urartu, (2n = 14, AA), the diploid donor of the A genome (here after gA), and another unknown species of the Sitopsis section, donor of the B genome (here after gB), for which the closest known relative is Ae. speltoides [7, 9, 10]. The hexaploid wheat (T. aestivum, 2n = 42, AABBDD) originated from an additional polyploidization event between the early-domesticated tetraploid T. turgidum ssp dicoccum and the diploid donor of the D genome (here after gD), Ae. tauschii (2n = 14, DD), 7 000 to 12 000 years ago (for review ). Several wheat phylogeny studies have tried to identify the progenitor of the B genome of polyploid wheat based on cytology , nuclear and mitochondrial DNA sequences [13–15] as well as chromosome rearrangement studies (i.e. common translocation events) [16–24]. It remains controversial from those studies whether the progenitor of the B genome is a unique Aegilops species (i.e. monophyletic) or whether this genome resulted from an introgression of several parental Aegilops species (i.e. polyphyletic origin). More recent and representative molecular comparisons using germplasm collections have shown that the B genome could be related to several Ae. speltoides lines but not to other species of the Sitopsis section [25, 26].
Transposable elements (TEs) have been shown since the seventies to be well represented in the wheat genome, ~80% [27, 28]. Comparative studies have shown that beside the general conservation in coding sequences, no TE insertions are conserved between the A, B and D genomes of wheat whereas important proportion of TE insertions are shared between the A or D genomes of polyploid wheat and their respective progenitors T. urartu and Ae. tauschii [29–33]. No such studies have been yet reported comparing the B genome of these polyploid wheat species to that of its closest known diploid relative, i.e. Ae. speltoides. In the present study, we compared for the first time coding and non-coding sequences as well as dynamics of TE insertions between the S genome of Ae. speltoïdes and that of the A, B and D co-resident in the hexaploid wheat (T. Aestivum). The SPA (for Storage Protein Activator ) locus region, belonging to BZIP (Basic Leucine Zipper), located on chromosome 1BL , has been chosen because of its importance as trans-acting elements of seed storage protein and its conservation in several other cereals such as maize (Opaque 2 [36–38]), rice (RISBZ1-5 ), and barley (BLZ1-2 [40, 41]). Updating phylogeny relationships and insights onto the origin of the B genome are discussed.
Organization of SPA locus region in the A, B, D and S genomes
Overall, the 113 460 bp A genome sequence is structured as 56 830 bp (50.1% of the sequence) of class I TE, 3 934 bp (3.5% of the sequence) of class II elements and 4.9% of unclassified TE. Fourteen class I TEs are identified as one incompletely sequenced (at the BAC sequence extremity), five truncated (with a 5' or 3' truncated region due to nested TE insertion), 4 relics (only visible through alignment remnants), one fragmented (inserted by other TEs, i.e. nested insertion) and three complete elements. The class II TEs is represented as a complete CACTA element (CACTA_1_comp, cf Additional File 2) and three MITEs (Miniature Inverted-repeat Transposable Element). Besides the identification of TEs a pseudo tubulin gene separated by 55 614 bp from the SPA gene was also identified, both genes covering 4.7% of the sequence.
The 94 732 bp B genome sequence is structured as 38 126 bp (40.2% of the sequence) of class I TEs, 22 602 bp (23.9% of the sequence) of class II elements and 0.6% of unclassified elements. Twelve Class I elements are identified as two incompletes, six truncated, two relics, one fragmented and one complete element. The class II TEs consists of two complete, one fragmented and one truncated CACTA (CACTA_1 to _4, cf Additional File 2) as well as three MITEs. The SPA gene is the only gene identified on the B genome sequence, representing 4.4% of the sequence.
The 120 879 bp D genome sequence is structured as 50 540 bp (41.8% of the sequence) of class I TEs, 9 446 bp (7.8% of the sequence) of class II elements. Twenty-two class I TEs are identified as two incomplete, eight truncated, eight relics, two fragmented and two complete elements. Class II TEs are represented as three truncated CACTA elements (CACTA_1 to 3, cf Additional File 2), one mutator relic and one MITE. Three genes have been annotated on the D genome sequence, the SPA gene, a putative kinesin and a putative cortical cell-delineating gene, covering 5.2% of a 48 440 bp interval.
The 80 493 bp S genome sequence is structured as 54 965 bp (68.3% of the sequence) of class I TEs, and a single MITE class II TE. Thirteen class I TEs are identified as one incomplete, six truncated, four fragmented and two complete TEs (cf Additional File 2). As in the B genome sequence, only the SPA gene, covering 4.3% of the annotated sequence, has been identified on the S genome sequence.
Identification and characterization of conserved sequences
Alignment of the four genomic regions allows the identification of the 'SPA orthologous region', which we have defined as the shared common regions delimitated by conserved non-coding sequence (CNS) stretches (5' and 3' locus boundaries) that do not correspond to TEs. The 'SPA orthologous region' spans respectively 35 268 bp, 22 739 bp, 43 397 bp and 53 919 bp for the A, B, D and the S genomes (cf grey boxes in the Figure 1A, 1B).
No genes, other than SPA can be predicted from these four conserved sequence stretches. As coding and non-coding sequences can evolve at different rates, we perform evolutionary analysis separately for the SPA CDS (CoDing Sequence) and the remaining conserved non-coding sequences (CNS).
Conserved non-coding sequences (CNS) analysis
Conserved Coding (SPA gene) and Non-coding Sequences (CNS) identified between SPA-gA-gB-gD-gS at the 'SPA orthologous region'
Non coding 'SPA orthologous loci' sequences
Coding 'SPA orthologous loci' sequences
CNS size (bp)
Nb of transitions
Nb of transversions
CNS size (bp)
Nb de transitions
Nb de transversions
CNS size (bp)
Nb de transitions
Nb de transversions
We also estimated divergence times on the basis of the number of base substitutions (Ks) accumulated after the split-time from the ancestor genome. Ks values were obtained for the 6 pairwise alignment combinations (Table 1). The lowest and highest Ks values correspond respectively to the gB/gS (0.617, i.e. identifying the closest related sequences), and gB/gD (1.037, i.e. the more divergent sequences).
Conserved coding sequences analysis: SPA gene structure and evolution
SPA genes are structured as six exons (cf Additional File 2). In wheat, SPA gene (and CDS) are respectively 3 426(1 218) bp, 3 486(230) bp, 3 796(1 212) bp, 4 080(1 233) bp, long for A, B, D and S genes (hereafter designated SPAgA, -gB, -gD and -gS genes). These SPA genes are composed of six exons ranging in size from 76 (SPAgA, -gB, -gD, -gS exon 4) to 432 bp (SPAgA-gB-gS exon 1) and five introns ranging in size from 92 (SPAgA, -gB, -gD intron 4) to 1297 bp (SPAgD intron 5). All of exon-intron junction sites obey the GT/AG rule as identified in other eukaryotic genes. The relative organization of the exons and introns is the same for the others SPA-like bZIP protein genes characterized to date in cereal, i.e. the number of exons and introns is conserved and individual introns occur at relatively the same sites for the maize O2 [36–43], sorghum O2 , and barley Blz1 genes . It is interesting to note that the first and fifth introns of the homoeologous SPA genes are respectively much shorter and larger, compare to the other cereal SPA-like bZIP protein genes (cf Additional File 2).
As reported by Guillaumie et al. , a stop codon TGA (+19 bp from the ATG transcription initiation) site had been identified in the SPA-gB sequence suggesting that it might be no more functional. No proof of expression could be also provided for the SPA gB haplotype presenting this stop codon as we were unable to find any corresponding ESTs. In order to clarify the apparition of the TGA stop codon in the B genome, the stop codon allele distribution was analyzed using 18 wheat genotypes which cover, 1 diploid genome S (Ae. longissima), 11 tetraploid (3 T. turgidum durum, 3 T. turgidum dicoccoïdes, 2 T. turgidum dicoccum, 2 T. timophevii, 1 T. turgidum turgidum) and 6 hexaploid (T. aestivum cv soisson, arminda, vilmorin, chinese spring, renan, recital) genotypes. Genotyping data demonstrate that the TGA allele is present at 50% in hexaploid wheat (T. cv soisson, vilmorin, renan) and for the first time in one tetraploid (T. turgidum durum) genotype over 11 tested and absent in Ae. longissima (cf Additional file 3).
Differential transposable elements insertions and evolution
Size discrepancies of the 'SPA orthologous regions' can be attributed to differential TE insertions or eliminations (cf Additional File 2 and Figures 1A and 2), which occurred after the four genomes divergence. Hence, the size increase observed for the 'SPA orthologous region' in Ae. speltoides (35 268 bp) when compared to T. aestivum-gB (22 739 bp) is due to 7 class I elements, i.e. 2 truncated Angela solo-LTRs (soloLTR_Angela_1 and _3), one complete Angela (Angela_2), one truncated Rada (Rada_1), 2 fragmented LINEs (LINE_1 and _2) and one MITE (cf Figure 2 and Additional File 2). These TEs may correspond to insertions, which occurred in the Ae. speltoides genome after its divergence from the ancestor of the B genome as they are dispersed between CNS stretches and not present in the B genome of T. aestivum. Occurrence of eight class I TEs displaying complete LTR and TSD (Target Site Duplication), identified in the four annotated genomes (highlighted with red stars in the Figure 1A) allows to estimate the insertion dates, based on nucleotide substitution pattern analysis (cf material and method; Additional File 4). Thus, the complete Angela_2 identified in Ae. speltoides (gS) located in the 'SPA orthologous region' exhibits a transition and tranversion value of 0.02 +/- 0.004 respectively associated with an estimated insertion time of 1.3 to 1.9 MYA. The youngest insertion time was observed for the Angela_5 element annotated outside the 'SPA orthologous region' in the Ae. speltoides sequence, i.e. 0.6 to 1.1 MYA.
We sequenced for the first time an Ae. speltoides genomic region (SPA locus region) and compared it to orthologous regions of the A, B and D genomes coresident in the hexaploid wheat T. aestivum at the SPA CDS, the CNS and the TE insertion dynamics levels.
SPA gene structure comparison and haplotype variability
The SPA gene is the only gene conserved across the four genomes. A phylogenic analysis involving SPA protein sequences from T. aestivum, Ae. speltoides, rice, barley, maize, sorghum, Arabidopsis thaliana, Nicotiana tabacum, Petroselinum crispum, clearly identified a Triticeae outgroup in which Ae. speltoides SPA sequence is more closely related to T. aestivum-gB SPA than any other sequence involved in the tree. Interestingly, in this study we showed that the stop codon TGA allele, 19 bases downstream the ATG transcription initiation site, previously identified in the B genome of hexaploid wheat , is also present in the tetraploid T. turgidum. This indicates that the stop-codon TGA SPA allele has been generated before the allohexaploidization event. The presence of both stop TGA and TCA SPA alleles in tetraploid and hexaploid wheat accessions provides further evidences for the hypothesis of (i) recurrent hexapolyploidization events or (ii) gene flow through introgression between the different wheat species with different ploidy levels [30–33].
Differential pattern of CNS conservation
Our results reveal that, a large proportion of the remaining non-genes and non-transposable elements sequences are highly conserved between the four genomes (CNS). At the 'SPA orthologous region', excluding the SPA gene itself, the gB/gS genome comparison shows the highest sequence identity and cumulative length as well as the lowest Ks value (89.9% over 11 976 bp with Ks = 0.617) compared to the other sequences (cf Table 1). Thus, the S genome was confirmed to be the closest to the B genome in term of cumulative conserved sequence length as well as identity as compared to any other pairwise genome combinations. Small stretches of sequences, which were more conserved between the S and/or the A and D genomes (cf Figure 1B), do not contradict with the general pattern of an overall higher CNS conservation between the S and B genomes. This is the first time that we precisely report close relationships between the S and B genomes based on both coding and non-coding sequence comparisons. CNS (within introns or upstream regulatory sequences), have been recently surveyed in cereals (maize vs rice) and mammals (human vs mouse) [45, 46]. It has been shown that CNSs are more abundant in loci embedding regulatory genes such as transcription factors (as SPA gene described in our study) and that despite divergence from a common ancestors, grass genes have dramatically fewer (5- to 20-fold) and smaller CNSs than mammalian genes. One possible explanation is that, in contrast to vertebrate genomes, plant genomes have been subjected to more rounds of whole genome duplications (polyploidization) events that have profoundly affected their organisation, the subfunctionalisation of duplicated genes leading to a greater per gene loss of CNS .
Differential TE insertion dynamics
The progenitor enigma of the B genome of polyploid wheat species
According to the two allopolyploidization events that gave rise to T. aestivum, the D genomes of the hexaploid wheat have diverged relatively recently from that of its donnor Ae. tauschii (0.08–0.12 MYA) whereas divergence of the A and B genomes from their respective progenitors occurred much more earlier (< 0.5 MYA) [7, 9, 10]. For almost 50 years, it remained controversial whether the source of the B genome is unique (i.e. monophyletic origin) related to Ae. speltoides or whether this genome resulted from an introgression of several parental Aegilops species (i.e. polyphyletic origin) [9, 12–24, 48]. Recent data on molecular comparisons using germplasm collections clearly show that the B genome could be related to several Ae. speltoides lines but not to other species of the Sitopsis section [25, 49].
Comparison between the A genome of polyploid wheat species to that of its progenitor T.urartu at the PSR920 region  has shown a very high CDS conservation (99.5% of sequence identity at the third base of codons and 99.6% for introns). Moreover, Dvorak et al.  found in the 103 kb intergenic sequences four conserved TEs (inserted prior to their divergence) whereas four and one other TEs were respectively inserted in the A genome of T. urartu and that of T. durum, after their divergence from a common ancestor. Our present comparison based on CDS and CNS confirms that the B genome is closer to the S genome of Ae. speltoides than the A and D genomes. However, SPA sequence divergence and the differential insertions/deletions of TEs, none of which is conserved between the two genomes, indicate that Ae. speltoids have diverged very early (> 3MYA, in our study) from the B genome progenitor.
The present study based on detailed CDS, CNS and TE dynamics comparisons, clearly shows that evolutionary relationship between the B genome and the S genome of Ae. speltoides is not as close as it has been reported in the literature for the A genome of polyploid wheat species compared to its identified progenitor, T. urartu. Thus, a B genome progenitor remains to be identified.
BAC Clone Isolation
A BAC (Bacterial Artificial Chromosomes) library from T. aestivum cv renan  and Ae. speltoides BAC library (Chalhoub et al., unpublished) were screened with SPA PCR markers [34, 42]. Assignment to the A, B, or D genomes of the BAC clones from the hexaploid species was based on their further characterization by HindIII restriction fragment length polymorphisms and specific PCR primers . To ensure maximum coverage of the SPA locus, the longest BAC clones for the A (Ren1424A05, Accession#: FM242575), B (Ren0871J20, Accession#: FM242576), D (Ren2409K09, Accession#: FM242578) and S (Sho42-9K3, Accession#: FM242577) genomes were sequenced.
BAC sequencing and annotation
BAC shotgun sequencing was performed at the Centre National de Sequencage (Evry, France). Genes and repeated elements (TEs and short repeats) were identified by computing and integrating results based on BLAST algorithms [51, 52], predictor programs, and different software detailed as follows.
Gene structure analysis
Gene structures and putative functions were identified by combining results of BLASTN and BLASTX alignments against dbEST http://www.ncbi.nlm.nih.gov/ and SwissProt databases http://expasy.org/sprot/, with results of 2 gene predictor programs, Eugene  with rice (Oryza sativa) training version and FgeneSH  (with default parameters http://linux1.softberry.com/berry.phtml. To incorporate heterologous information, we only recovered potential gene coding sequences. The CDS (CoDing Sequence) structures correspond to a consensus derived from the three preceding information sources. The gene content parameter represents the sum of known genes, hypothetical genes, unknown genes, and pseudogenes. Known genes were named based on BLASTX results against proteins with known functions (SwissProt). CDSs were considered as (i) hypothetical genes if their identification was only based on the predictors (as a consensus of the structures suggested by both predictors), without any evidence of putative function based on BLASTX results; (ii) unknown genes if the identification was only based on matching ESTs, without any evidence of putative function based on BLASTX results; (iii) pseudogenes if frame shifts need to be introduced within the CDS structure to better fit a putative function based on BLASTX results. Truncated pseudogenes, (genes disrupted by large insertion or deletion) and highly degenerated CDS sequences were considered as gene relics.
Transposable elements (TE)
TEs were detected by comparison with two databases of repetitive elements: TREP (; http://wheat.pw.usda.gov/ITMI/Repeats/), and Repbase (; http://www.girinst.org/Repbase_Update.html). Core domains (nucleic coordinates of known elements) were identified through BLASTN alignments against TREPnr. LTRs (Long Terminal Repeats) and TE boundaries were identified through BLASTN alignments against Repbase. Putative polyproteins were identified by BLASTX alignments against TREPprot. We used 1e-04 as a cutoff for BLASTN alignment results (either on TREPnr or Repbase). No cut-off was imposed for BLASTX results on TREPprot. Nested insertions of TEs were considered only when complete reconstruction of the split element was possible with no ambiguity. Other TE structures (either novel or highly degenerated TEs) were identified within the remaining unassigned DNA either by LTR_STRUC  or by BLASTX against the NCBI nr database http://www.ncbi.nlm.nih.gov/. When it was possible (i.e. for complete TEs), target-site duplications were indicated in the commentary of the element.
Pairwise comparisons of the four BAC clones, including the analysis of each BAC sequence against itself, were performed using the program Dotter  in order to identify or confirm direct repeats, LTRs, local duplications, and deletion events as well as MITEs. Multiple sequences comparisons were performed with PIPMAKER software . As a final screening, unassigned DNA (free of annotated genes or TEs) was aligned using BLASTX against the NCBI nonredundant database http://www.ncbi.nlm.nih.gov. This BLASTX analysis allows the extension of several TE features already identified. TEs were classified and named based on the unified classification from Wicker et al.  according to referred nomenclature (i.e., element name, BAC name, appearance rank) and designed as complete, truncated, and degenerated sequences as suggested by TREP or Repbase databases.
Short repeated motifs
Short repeated motifs were identified either as inverted repeats (by using EINVERTED with default parameters; http://emboss.bioinformatics.nl/cgi-bin/emboss/einverted) or tandem repeats (Tandem Repeat Finder, with default parameters; http://tandem.bu.edu/trf/trf.advanced.submit.html). Only repeated domains (i.e. tandem or inverted) longer than 100 bp were kept in our annotation results.
Unassigned DNA sequences
Unassigned DNA corresponds to sequences in which neither CDS nor TE was identified. Such unassigned DNA may contain short repetitive units (tandem repeats or inverted repeats).
Integration of annotation results
Cross-analysis of the information obtained for genes and TEs as short repeats was integrated into ARTEMIS .
Identification of conserved domains was performed based on multiple alignments (clustalw, ) on translated SPA CDS (identified from the sequence annotation procedure).
The phylogenetic analysis was performed using Neighbor-joining method with clustalx alignment of protein sequences with 1 000 repetition bootstraps. The BLOSUM 62 matrix was chosen for substitution identification. The sequence divergence datation was performed based on the rate of nonsynonymous (Ka) vs. synonymous (Ks) substitutions calculated with MEGA-3 . The average substitution rate (r) of 6.5 × 10-9 substitutions per synonymous site per year for grasses was used to calibrate the ages of the considered gene ([64, 65]. The time (T) since gene insertion was estimated using the formula T = Ks/r.
Determination retrotransposons insertion dates
Full-length retrotransposons were analysed by comparing their 5' and 3' LTR sequences in order to date their insertion time  based on the assumption that the two LTRs of a single element are identical at the time of insertion. The two LTRs were aligned and the number of transition and transversion mutation were counted. The insertion times were dated using the Kimura parameter method (K2P, ) and a mutation rate of 6.5 × 10-9 substitutions per synonymous site per year . The time (T) since element insertion was estimated using the formula T = K2P/2r.
The Renan BAC library and sequencing of the BAC clones from the A, B and D genomes of hexaploid wheat were supported by the Genoplante consortium http://www.genoplante.com. Sequencing of the Aegilops tauschii BAC clone was supported by the APCNS2003 project 'Comparative genome sequencing in wheat' http://www.cns.fr/externe/English/Projets/Projet_LE/LE.html.
- Feuillet C, Keller B: Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution. Ann Bot (Lond). 2002, 89: 3-10. 10.1093/aob/mcf008.View ArticleGoogle Scholar
- Kellogg EA: Evolutionary history of the grasses. Plant Physiol. 2001, 125: 1198-1205. 10.1104/pp.125.3.1198.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaut BS: Evolutionary dynamics of grass geno. New phytologist. 2002, 154: 15-28. 10.1046/j.1469-8137.2002.00352.x.View ArticleGoogle Scholar
- Harlan JR: Crops and Man. 1992, Madison, Wisconsin: American Society of Agronomy, IncGoogle Scholar
- Zohary D, Hopf M: Domestication of plants in the Old World. 2000, New York: Oxford University Press, 3Google Scholar
- Piperno DR, Flannery KV: The earliest archaeological maize (Zea mays L) from highland Mexico: new accelerator mass spectrometry dates and their implications. Proc Natl Acad Sci USA. 2001, 13: 2101-2103. 10.1073/pnas.98.4.2101.View ArticleGoogle Scholar
- Feldman M, Lupton FGH, Miller TE: Wheats. Evolution of Crops. Edited by: Smartt J, Simmonds NW. 1995, London: Longman Scientific, 184-192. 2Google Scholar
- Eckardt NA: A sense of self: The role of DNA sequence elimination in allopolyploidization. Plant Cell. 2001, 13: 1699-1704. 10.1105/tpc.13.8.1699.PubMedPubMed CentralView ArticleGoogle Scholar
- Huang S, Sirikhachornkit A, Su XJ, Faris J, Gill B, Haselkorn R, Gornicki P: Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of polyploid wheat. Proc Natl Acad Sci USA. 2002, 99: 8133-8138. 10.1073/pnas.072223799.PubMedPubMed CentralView ArticleGoogle Scholar
- Blake NK, Lehfeldt BR, Lavin M, Talbert LE: Phylogenetic reconstruction based on low copy DNA sequence data in an allopoyploid: The B genome of wheat. Genome. 1999, 42: 351-360. 10.1139/gen-42-2-351.PubMedView ArticleGoogle Scholar
- Nesbitt M, Samuel D: From staple crop to extinction? The archaeology and history of the hulled wheats. Hulled wheats. Proceedings of the First International Workshop on Hulled Wheats. Promoting the conservation and use of underutilized and neglected crops 4. Edited by: Hammer K, Heller J. 1996, 41-100.Google Scholar
- Zohary D, Feldman M: Hybridization between amphidiploids and the evolution of polyploids in the wheat (Aegilops-Triticum) group. Evolution. 1962, 16: 44-61. 10.2307/2406265.View ArticleGoogle Scholar
- Dvorák J, Zhang HB, Kota RS, Lassner M: Organization and evolution of the 5S ribosomal RNA gene family in wheat and related species. Genome. 1989, 32: 1003-1016.View ArticleGoogle Scholar
- Dvorák J, Zhang HB: Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proc Natl Acad Sci USA. 1990, 87: 9640-9644. 10.1073/pnas.87.24.9640.PubMedPubMed CentralView ArticleGoogle Scholar
- Terachi T, Ogihara Y, Tsunewaki K: The molecular basis of genetic diversity among cytoplasms of Triticum and Aegilops. 7. Restriction endonuclease analysis of mitochondrial DNA from polyploid wheats and their ancestral species. Theor Appl Genet. 1990, 80: 366-373. 10.1007/BF00210074.PubMedView ArticleGoogle Scholar
- Feldman M: Identification of unpaired chromosomes in F1 hybrids involving Triticum aestivum and T. timopheevii. Can J Genet Cytol. 1966, 8: 144-151.View ArticleGoogle Scholar
- Feldman M: The mechanism regulating pairing in Triticum timopheevii. Wheat Inf Serv. 1966, 21: 1-2.Google Scholar
- Hutchinson J, Miller TE, Jahier J, Shepherd KW: Comparison of the chromosomes of Triticum timopheevii with related wheats using the techniques of C-banding and in situ hybridization. Theor Appl Genet. 1982, 64: 31-40. 10.1007/BF00303647.PubMedView ArticleGoogle Scholar
- Gill BS, Chen PD: Role of cytoplasm specific introgression in the evolution of the polyploid wheats. Proc Natl Acad Sci USA. 1987, 84: 6800-6804. 10.1073/pnas.84.19.6800.PubMedPubMed CentralView ArticleGoogle Scholar
- Naranjo T, Roca A, Goicoechea PG, Giráldez R: Arm homoeology of wheat and rye chromosomes. Genome. 1987, 29: 873-882.View ArticleGoogle Scholar
- Naranjo T: Chromosome structure of durum wheat. Theor Appl Genet. 1990, 79: 397-400. 10.1007/BF01186085.PubMedView ArticleGoogle Scholar
- Jiang J, Gill BS: Different species-specific chromosome translocations in Triticum timopheevii and T. turgidum support the diphyletic origin of polyploid wheats. Chromosome Res. 1994, 2: 59-64. 10.1007/BF01539455.PubMedView ArticleGoogle Scholar
- Devos KM, Dubcovsky J, Dvorák J, Chinoy CN, Gale MD: Structural evolution of wheat chromosomes 4A, 5A and 7B and its impact on recombination. Theor Appl Genet. 1995, 91: 282-288. 10.1007/BF00220890.PubMedView ArticleGoogle Scholar
- Maestra B, Naranjo T: Structural chromosome differentiation between Triticum timopheevii and T. turgidum and T. aestivum. Theor Appl Genet. 1999, 98: 744-750. 10.1007/s001220051130.View ArticleGoogle Scholar
- Kilian B, Ozkan H, Deusch O, Effgen S, Brandolini A, Kohl J, Martin W, Salamini F: Independent wheat B and G genome origins in outcrossing Aegilops progenitor haplotypes. Mol Biol Evol. 2007, 24: 217-227. 10.1093/molbev/msl151.PubMedView ArticleGoogle Scholar
- Salina EA, Lim KY, Badaeva ED, Shcherban AB, Adonina IG, Amosova AV, Samatadze TE, Vatolina TY, Zoshchuk SA, Leitch AR: Phylogenetic reconstruction of Aegilops section Sitopsis and the evolution of tandem repeats in the diploids and derived wheat polyploids. Genome. 2006, 49 (8): 1023-35. 10.1139/G06-050.PubMedView ArticleGoogle Scholar
- Smith DB, Flavell RB: Characterisation of the wheat genome by renaturation kinetics. Chromosoma (Berl). 1975, 50: 223242-View ArticleGoogle Scholar
- Vedel E, Delseny M: Repetitivity and variability of higher plant genomes. Pl Physiol Biochem. 1987, 25: 191-210.Google Scholar
- Wicker T, Yahiaoui N, Guyot R, Schlagenhauf E, Liu ZD, Dubcovsky J, Keller B: Rapid genome divergence at orthologous low molecular weight glutenin loci of the A and Am genomes of wheat. Plant Cell. 2003, 15: 1186-1197. 10.1105/tpc.011023.PubMedPubMed CentralView ArticleGoogle Scholar
- Isidore E, Scherrer B, Chalhoub B, Feuillet C, Keller B: Ancient haplotypes resulting from extensive molecular rearrangements in the wheat A genome have been maintained in species of three different ploidy levels. Genome Res. 2005, 15 (4): 526-36. 10.1101/gr.3131005.PubMedPubMed CentralView ArticleGoogle Scholar
- Chantret N, Salse J, Sabot F, Rahman S, Bellec A, Laubin B, Dubois I, Dossat C, Sourdille P, Joudrier P, Gautier MF, Cattolico L, Beckert M, Aubourg S, Weissenbach J, Caboche M, Bernard M, Leroy P, Chalhoub B: Molecular basis of evolutionary events that shaped the hardness locus in diploid and polyploid wheat species (Triticum and Aegilops). Plant Cell. 2005, 17 (4): 1033-45. 10.1105/tpc.104.029181.PubMedPubMed CentralView ArticleGoogle Scholar
- Dvorak J, Akhunov ED, Akhunov AR, Deal KR, Luo MC: Molecular characterization of a diagnostic DNA marker for domesticated tetraploid wheat provides evidence for gene flow from wild tetraploid wheat to hexaploid wheat. Mol Biol Evol. 2006, 23 (7): 1386-96. 10.1093/molbev/msl004.PubMedView ArticleGoogle Scholar
- Gu YQ, Salse J, Coleman-Derr D, Dupin A, Crossman C, Lazo GR, Huo N, Belcram H, Ravel C, Charmet G, Charles M, Anderson OD, Chalhoub B: Types and rates of sequence evolution at the high-molecular-weight glutenin locus in hexaploid wheat and its ancestral genomes. Genetics. 2006, 174 (3): 1493-504. 10.1534/genetics.106.060756.PubMedPubMed CentralView ArticleGoogle Scholar
- Albani D, Hammond-Kosack MC, Smith C, Conlan S, Colot V, Holdsworth M, Bevan MW: The wheat transcriptional activator SPA: a seed-specific bZIP protein that recognizes the GCN4-like motif in the bifactorial endosperm box of prolamin genes. Plant Cell. 1997, 9: 171-184. 10.1105/tpc.9.2.171.PubMedPubMed CentralView ArticleGoogle Scholar
- Guillaumie S, Charmet G, Linossier L, Torney V, Robert N, Ravel C: Colocation between a gene encoding the bZip factor SPA and an eQTL for a high-molecular-weight glutenin subunit in wheat (Triticum aestivum). Genome. 2004, 47 (4): 705-13. 10.1139/g04-031.PubMedView ArticleGoogle Scholar
- Schmidt RJ, Ketudat M, Aukerman MJ, Hoschek G: Opaque-2 is a transcriptional activator that recognizes a pecific target site in 22-kD zein genes. Plant Cell. 1992, 4: 689-700. 10.1105/tpc.4.6.689.PubMedPubMed CentralView ArticleGoogle Scholar
- Schmidt RJ: Opaque-2 and zein gene expression. Control of Plant Gene Expression. Edited by: Verma DPS. 1993, Boca Raton, FL: CRC Press, 337-355.Google Scholar
- Vicente-Carbajosa J, Moose SP, Parsons RL, Schmidt RJ: A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proc Natl Acad Sci USA. 1997, 94 (14): 7685-90. 10.1073/pnas.94.14.7685.PubMedPubMed CentralView ArticleGoogle Scholar
- Onodera Y, Suzuki A, Wu CY, Washida H, Takaiwa F: A rice functional transcriptional activator, RISBZ1, responsible for endosperm-specific expression of storage protein genes through GCN4 motif. J Biol Chem. 2001, 276 (17): 14139-52.PubMedGoogle Scholar
- Vicente-Carbajosa J, Onate L, Lara P, Diaz I, Carbonero P: Barley BLZ1: a bZIP transcriptional activator that interacts with endosperm-specific gene promoters. Plant J. 1998, 13: 629-640. 10.1111/j.1365-313X.1998.00068.x.PubMedView ArticleGoogle Scholar
- Onate L, Vicente-Carbajosa J, Lara P, Diaz I, Carbonero P: Barley BLZ2, a seed-specific bZIP protein that interacts with BLZ1 in vivo and activates transcription from the GCN4-like motif of B-hordein promoters in barley endosperm. J Biol Chem. 1999, 274 (14): 9175-82. 10.1074/jbc.274.14.9175.PubMedView ArticleGoogle Scholar
- Ravel C, Praud S, Murigneux A, Canaguier A, Sapet F, Samson D, Balfourier F, Dufour P, Chalhoub B, Brunel D, Beckert M, Charmet G: Single-nucleotide polymorphism frequency in a set of selected lines of bread wheat (Triticum aestivum L.). Genome. 2006, 49 (9): 1131-9. 10.1139/G06-067.PubMedView ArticleGoogle Scholar
- Hartings H, Maddaloni M, Lazzaroni N, Di Fonzo N, Motto M, Sakamini F, Thompson R: The O2 gene which regulates zein deposition in maize endosperm encodes a protein with structural homologies to transcriptional activators. EMBO J. 1989, 8: 2795-2801.PubMedPubMed CentralGoogle Scholar
- Pirovano L, Lanzini S, Hartings H, Lazzaroni N, Rossi V, Joshi R, Thompson RD, Salamini F, Motto M: Structural and functional analysis of an Opaque-2-related gene from sorghum. Plant Mol Biol. 1994, 24 (3): 515-23. 10.1007/BF00024119.PubMedView ArticleGoogle Scholar
- Kaplinsky NJ, Braun DM, Penterman J, Gof SA, Freeling M: Utility and distribution of conserved noncoding sequences in the grasses. Proc Natl Acad Sci USA. 2002, 99: 6147-6151. 10.1073/pnas.052139599.PubMedPubMed CentralView ArticleGoogle Scholar
- Inada DC, Bashir A, Lee C, Thomas BC Ko C, Goff SA, Freeling M: Conserved noncoding sequences in the grasses. Genome Res. 2003, 13: 2030-2041. 10.1101/gr.1280703.PubMedPubMed CentralView ArticleGoogle Scholar
- Lockton S, Gaut BS: Plant conserved non-coding sequences and paralogue evolution. Trends Gene. 2005, 21: 60-65. 10.1016/j.tig.2004.11.013.View ArticleGoogle Scholar
- Buchner P, Prosser IM, Hawkesford MJ: Phylogeny and expression of paralogous and orthologous sulphate transporter genes in diploid and hexaploid wheats. Genome. 2004, 47 (3): 526-34. 10.1139/g04-011.PubMedView ArticleGoogle Scholar
- Wang JR, Zhang L, Wei YM, Yan ZH, Baum BR, Nevo E, Zheng YL: Sequence polymorphisms and relationships of dimeric α-amylase inhibitor genes in the B genomes of Triticum and S genomes of Aegilops. Plant Science. 2007, 173: 1-11. 10.1016/j.plantsci.2007.03.006.View ArticleGoogle Scholar
- Chalhoub B, Belcram H, Caboche M: Efficient cloning of plant genomes into bacterial artificial chromosome (BAC) libraries with larger and more uniform insert size. Plant Biotechnol J. 2004, 2 (3): 181-8. 10.1111/j.1467-7652.2004.00065.x.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Bi. 1990, 215: 403-410.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Mathe C, Sagot MF, Schiex T, Rouze P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 2002, 30 (19): 4103-17. 10.1093/nar/gkf543.PubMedPubMed CentralView ArticleGoogle Scholar
- Salamov A, Solovyev V: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 2000, 10: 516-522. 10.1101/gr.10.4.516.PubMedPubMed CentralView ArticleGoogle Scholar
- Wicker T, Matthews DE, Keller B: TREP: A database for Triticeae repetitive elements. Trends Plant Sci. 2002, 7: 561-562. 10.1016/S1360-1385(02)02372-5.View ArticleGoogle Scholar
- Jurka J: Repbase update: A database and an electronic journal of repetitive elements. Trends Genet. 2000, 9: 418-420. 10.1016/S0168-9525(00)02093-X.View ArticleGoogle Scholar
- McCarthy E, McDonald J: LTR_STRUC: A novel search and identification program for LTR retrotransposons. Bioinformatics. 2003, 19: 362-367. 10.1093/bioinformatics/btf878.PubMedView ArticleGoogle Scholar
- Sonnhammer EL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995, 167: 1-10. 10.1016/0378-1119(95)00714-8.View ArticleGoogle Scholar
- Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W: PipMaker-a web server for aligning two genomic DNA sequences. Genome Res. 2000, 10 (4): 577-86. 10.1101/gr.10.4.577.PubMedPubMed CentralView ArticleGoogle Scholar
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH: A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007, 8 (12): 973-82. 10.1038/nrg2165.PubMedView ArticleGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: Sequence visualization and annotation. Bioinformatics. 2000, 16: 944-945. 10.1093/bioinformatics/16.10.944.PubMedView ArticleGoogle Scholar
- Aiyar A: The use of CLUSTAL W and CLUSTAL X for multiple sequence alignment. Methods Mol Biol. 2000, 132: 221-41.PubMedGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.PubMedView ArticleGoogle Scholar
- Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996, 93 (19): 10274-9. 10.1073/pnas.93.19.10274.PubMedPubMed CentralView ArticleGoogle Scholar
- SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The paleontology of intergene retrotransposons of maize. Nat Genet. 1998, 20 (1): 43-5. 10.1038/1695.PubMedView ArticleGoogle Scholar
- Kimura M: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980, 16 (2): 111-20. 10.1007/BF01731581.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.