Insights into the Musa genome: Syntenic relationships to rice and between Musa species
- Magali Lescot†1, 9Email author,
- Pietro Piffanelli†1, 10,
- Ana Y Ciampi3,
- Manuel Ruiz1,
- Guillaume Blanc5,
- Jim Leebens-Mack6,
- Felipe R da Silva3,
- Candice MR Santos3,
- Angélique D'Hont1,
- Olivier Garsmeur1,
- Alberto D Vilarinhos1, 7,
- Hiroyuki Kanamori8,
- Takashi Matsumoto8,
- Catherine M Ronning2,
- Foo Cheung2,
- Brian J Haas2,
- Ryan Althoff2,
- Tammy Arbogast2,
- Erin Hine2,
- Georgios J PappasJr4,
- Takuji Sasaki8,
- Manoel T SouzaJr3,
- Robert NG Miller4,
- Jean-Christophe Glaszmann1 and
- Christopher D Town2
© Lescot et al; licensee BioMed Central Ltd. 2008
Received: 11 June 2007
Accepted: 30 January 2008
Published: 30 January 2008
Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale analyses of Musa genomic sequence have been conducted. This study compares genomic sequence in two Musa species with orthologous regions in the rice genome.
We produced 1.4 Mb of Musa sequence from 13 BAC clones, annotated and analyzed them along with 4 previously sequenced BACs. The 443 predicted genes revealed that Zingiberales genes share GC content and distribution characteristics with eudicot and Poaceae genomes. Comparison with rice revealed microsynteny regions that have persisted since the divergence of the Commelinid orders Poales and Zingiberales at least 117 Mya. The previously hypothesized large-scale duplication event in the common ancestor of major cereal lineages within the Poaceae was verified. The divergence time distributions for Musa-Zingiber (Zingiberaceae, Zingiberales) orthologs and paralogs provide strong evidence for a large-scale duplication event in the Musa lineage after its divergence from the Zingiberaceae approximately 61 Mya. Comparisons of genomic regions from M. acuminata and M. balbisiana revealed highly conserved genome structure, and indicated that these genomes diverged circa 4.6 Mya.
These results point to the utility of comparative analyses between distantly-related monocot species such as rice and Musa for improving our understanding of monocot genome evolution. Sequencing the genome of M. acuminata would provide a strong foundation for comparative genomics in the monocots. In addition a genome sequence would aid genomic and genetic analyses of cultivated Musa polyploid genotypes in research aimed at localizing and cloning genes controlling important agronomic traits for breeding purposes.
Knowledge concerning the genetic diversity, the origin of cultivars [5–12] and Musa genome structure [13–15] has greatly increased over the last few years. The haploid genome of Musa species was estimated as varying between 560 to 600 Mb in size [16, 17], just four times larger than that of the model plant Arabidopsis (125 Mb)  and 30% larger than that of rice (390 Mb) . Genetic maps have been developed [20–23] and recently, BAC resources were generated for both M. acuminata [24, 25] and M. balbisiana . A cytogenetic map based on BAC-FISH is being anchored to genetic maps in order to better characterize structural variation among M. acuminata genomes . These resources will pave the way for studies of Musa genome structure and evolution through comparisons with other monocot and eudicot genomes.
The utility of genomic comparisons of monocot and eudicot plants (e.g. [27–30]) is growing with the availability of the complete genome sequences of rice , Arabidopsis  and poplar , and active genome sequencing projects for a growing number of other angiosperms . Most genome-scale comparative investigations within the monocots have focused on analyses of closely-related species of monocots belonging to the family of Poaceae [27, 33–36]. Numerous papers have described extensive microsynteny between rice, barley, wheat, maize, Sorghum and sugarcane [27, 35, 37–42], although the degree of conservation varies between different chromosomal locations. Fewer attempts have been made to investigate the synteny between distantly-related plants. In addition, whereas extensive genomic resources have been developed for rice and other cereal species in the grass family (Poaceae), there is relatively little data on gene content or genome structure for non-grass monocots (Figure 1). Recently, the first two BAC clones genomic sequences , and a BAC end sequencing study of the M. acuminata genome  have been published. Here we present data on the genomic structure and organization of 1.8 Mb of Musa genomic nuclear DNA (including the two BAC sequenced previously ), show for the first time the existence of microsynteny between Musa, rice and Arabidopsis, characterize the extent of microsynteny between the two Musa species representing the progenitors of most cultivated genotypes, analyze monocot EST sequences and discuss the evolutionary implication of these results. The BAC clones sequenced in this study were identified by hybridization with gene sequences previously selected to correspond to one or a few loci in Musa, rice and Arabidopsis, thus possibly contain orthologous sequences with these distantly related plant species.
Selection of Musa BAC clones using broad-spectrum Sorghum cDNA and Musa RFLP probes
List of probes used to identify the Musa BAC clones sequenced as part of the present study. Estimated copy numbers of these sequences in rice, Sorghum and Musa are indicated for SbRPG (Sorghum bicolor) sequences. MA4 are BAC clones from M. acuminata cv. Calcutta-4 and MBP are BAC clones from M. balbisiana cv. Pisang Klutuk Wulung.
Probe name and AC number*
Estimated copy number in rice by Blast analysis (Rice genes locus identifier)
Estimated copy number in Sorghum by Southern blot analysis
Estimated copy number in Musa by Southern blot analysis
Number of identified Musa BAC clones
Number of Musa BAC fingerprint groups
Musa BAC clones sequenced (size) and AC number*
chlorophyll A-B binding protein type I
more than 4
MA4_8L21 (115790 bp) AC186748
MA4_78I12 (150982 bp) AC186750
(143796 bp) AC186747
mitochondrial rieske protein
MBP_81C12 (142973 bp) AC186754
beta 1–3 glucanase
One BAC clone was selected for sequencing for probe SbRPG132. For probes SbRPG373, SbRPG661 and SbRPG851, which were found to be present in one or two copies in rice, two Musa BACs with distinct HindIII fingerprints that might be derived from homeologous regions were selected for sequencing with the aim of studying the evolution of lineage-specific duplications in both Musa and rice (Table 1). Two BAC clones from M. acuminata cv. Calcutta-4 (Musa A) and two BACs from M. balbisiana cv. PKW (Musa B) isolated using the genetically-mapped RFLP single-copy probes CIR560 and CIR257  were also fully sequenced with the objective of studying the extent of synteny between Musa A and B species as well as against the rice genome. These RFLP probes were selected because they corresponded to genomic clones encoding genes of known function, CIR257 for a GA-20 oxidase and CIR560 for a beta 1–3 glucanase, previously shown to be associated to traits of agronomic importance in controlling plant height [45, 46] and stress response [47–50], respectively.
Analysis of 1.8 Mb of Musa genomic sequences reveals particular features for the Musa genes
Musa genome statistics
Features of Musa genes in comparison with those of Arabidopsis and rice.
GC content: overall (%)
Exon length (bp)
Intron length (bp)
Number of exons/gene
Gene length (bp)
Protein length (aa)
Gene density (kb/gene)
Base Composition and GC Distribution along the Musa genes
Analysis of Musa repetitive elements
Several approaches were used to characterize the genomic sequence with respect to repeats. Database searches of the predicted genes against a non-redundant protein database (see Methods section) revealed a total of 78 transposable element (TE)-related sequences. Excluding TE-rich BAC MA4_78I12, there are on average ~2.6 retrotransposons (TE of class I) per 100 kb. Only one TE of class II encoded protein was detected. BAC sequences were also screened for previously characterized Musa RADKA repeats ; an average of 1.8 RADKA-related repeats (GenBank Accessions AF399938-AF399941, AF399943-AF399946 and AF399948) per 100 kb were identified. In an attempt to identify as yet uncharacterized repeats, the BAC sequences were also analyzed by RepeatScout . After removing repeats having similarity to Arabidopsis or rice proteins, Musa CDS, RADKA sequences and transposable elements, six repeats with at least three copies were identified (data not shown). Five of these sequences have no significant hits to genes in GenBank while the sixth matches GenBank accession X99496 with a strong similarity to a part of the Musa ycf2 chloroplast gene. Analysis of individual BACs with PrintRepeats  shows that each BAC contains only a small number of regions that are repeated within the BAC, an observation that is supported by the relative ease with which the BAC sequences could be closed and finished.
Microsynteny analysis between Musa and either rice or Arabidopsis
The 443 Musa predicted proteins were aligned against the rice and Arabidopsis proteomes. The results showed that 268 and 224 Musa proteins have hits with an E-value threshold of 1e-10 against the rice and Arabidopsis proteomes, respectively. The relative positions of the homologous genes identified in the rice and Arabidopsis genomes were compared to the order of the corresponding Musa genes with i-ADHoRe software . Using this stringent approach, we were able to identify nine Musa BAC sequences showing microsynteny among the 17 Musa BACs analyzed: eight cases with rice and one case with Arabidopsis (Additional file 5).
The only significant case of microcolinearity found between Musa and Arabidopsis involved three consecutive genes (Additional file 9). Interestingly, this Musa- Arabidopsis syntenic block was not found to be conserved in rice.
Syntenic relationships between two regions of M. acuminata and M. balbisiana
Level of synonymous substitution (Ks) between homologous sequences in M. acuminata and M. balbisiana.
GDSL-motif lipase/hydrolase family protein
protein kinase family protein
protein kinase family protein
leucine-rich repeat-containing protein kinase family protein
gibberellin 20-oxidase family protein
glucose-inhibited division A family protein
leucine rich repeat family protein
transcriptional repressor protein-related
protein kinase family protein
exostosin family protein
kinesin light chain-related
Divergence between M. acuminata and M. balbisiana
In order to evaluate the degree of divergence between the two Musa genomes, we obtained maximum likelihood estimates for Ks values comparing pairs of orthologous genes identified in the M. acuminata and M. balbisiana BACs. We restricted our analysis to those genes (detailed in Table 3) that were intact and matched known gene sequences. For example the gene model for the 14th locus in the M. acuminata genome (Figure 7; L14) is similar to a pectinesterase related protein, but the gene model was excluded from the analysis because the predicted coding sequence contained several in-frame stop codons indicating that this sequence is a pseudogene. The estimated Ks values ranged from 0.0231 (Additional file 10; L19) to 0.0960 (L17), far below saturation levels (i.e. Ks << 1), with an average of 0.0410 (Table 3). Applying an average synonymous substitution rate of 4.5 per 109 years for nuclear genes in the Zingiberales (see below), this suggests that M. acuminata and M. balbisiana diverged approximately 4.6 Mya ago.
Evidence for a large-scale duplication event in the Musa ancestor
We also analyzed the 18,612 ginger (Zingiber officinale; Zingiberaceae, Zingiberales) EST-derived unigenes available on the TIGR Plant Transcript Assemblies web site  (sequences generated by David Gang, University of Arizona) and found no evidence of large-scale duplication in the Ks distribution for paralogous pairs (Figure 8). Moreover, the modal Ks for reciprocal best matches between the Musa and Zingiber unigene sets is 0.78 (Figure 8), larger than the mode for Musa paralogous pairs. The age of the most recent common ancestor for the Musacaceae and Zingiberaceae is estimated at 87 Mya [3, 72, 73]. This implies an average synonymous substitution rate of 4.5 per 109 years (0.78 synonymous substitutions per site/(2*87,000,000 years)), intermediate between rates estimated for the Poaceae (6.1–6.5 per 109 years) and palms in the order Arecales (2.61 per 109 years; . We must emphasize that all of these rate estimates are approximate, based on rough estimates of minimum divergence times. However, regardless of ambiguities in substitution rate calibrations, our results indicate that the predicted large-scale duplication that occurred in the Musa lineage (Ks = 0.55) post-dates the divergence of lineages leading to Zingiber and Musa (Ks = 0.78), but occurred well before the separation of Musa A and Musa B (Ks = 0.0410).
Ks values were also computed on 1,034 pairs of homologous genes identified between the Musa ESTs and the rice genome sequences. As expected, the distribution of Ks values between rice-Musa homologs form a single peak centred around Ks = 1.7 (Figure 8). Using this Ks value to estimate the age of the Poales-Zingiberales split is less straightforward than described above for the Musa-Zingiber split, because synonymous substitution rates clearly vary between these Commelinid monocot lineages.
BAC fingerprint analyses revealed that whereas SbRPG854 hybridized to a single locus in the Musa genome, SbRPG probes SbRPG132 hybridized to 6 regions, SbRPG663 hybridized to 5 loci, and two loci were identified for SbRPG373, SbRPG661 and SbRPG851 (Table 1 and Additional file 1). BACs representing both distinct loci hybridizing to probes SbRPG661, SbRPG373 and SbRPG851 were sequenced with the aim of dating the time of duplication relative to the divergence of the Musa and rice lineages. Pair-wise estimations of Ks, the number of synonymous substitutions per synonymous site, were 0.93 (± 0.25), 1.39 (± 0.19) and 1.43 (± 0.60) for Musa homologs of the coding regions of SbRPG661 (thioredoxin), SbRPG851 (phosphoglycerate kinase) and SbRPG373 (hypothetical protein), respectively. Phylogenetic analyses suggested that the SbRPG851 Musa homologs duplicated prior to the divergence of the Poales and the Zingiberales, (probably independent from the large-scale duplication described above), and the SbRPG661 and SbRPG373 Musa homologs are sister to each other in the gene tree, suggesting the duplications arose after the divergence of the Poales and the Zingiberales (data not shown).
We also analyzed the degree of conservation between genomic regions surrounding SbRPG661, SbRPG851 and SbRPG373 duplicated genes in Musa and rice and found no synteny in regions anchored by these homologs. This absence of synteny could be explained by duplication events and subsequent gene losses or by the translocation of the focal genes.
Analysis of Musa genes reveals some particular features
Sequencing and annotation of ~1.8 Mb of Musa genomic sequence indicated that most of the BACs analyzed were gene rich with a low content of transposable element. Our analyses of 443 Musa genes predicted revealed that Musa genes generally have a "rice-like" bimodal GC distribution with a very asymmetrical and long tail towards high GC content as in previous studies [43, 44]. However, a second class of "Arabidopsis-like" genes was found with an overall low GC content and no significant gradient along the coding sequence. In contrast to a previous comparison of grass and non-grass monocots [52, 53], our analyses suggest that Zingeberales genes share some characteristics with the genomes of both eudicots and members of the Poaceae. This result suggests that the Musa genome is more similar to cereal genomes relative to onion, asparagus and the basal-most monocot lineage, Acorus.
Syntenic relationships between distantly-related monocots
Whereas widespread conservation of synteny has been well established for members of the grass family (Poaceae), gene order has not been generally conserved between rice and Arabidopsis (e.g. reviewed by . Few studies have compared genome structure between the members of the Poaceae and other monocot families, but recent comparisons between onion, garden asparagus and rice have failed to find evidence of conservation of macro- or micro-synteny [76, 77]. However the genomic tag approach developed by  has allowed detecting anchor points between grasses and monocots. In this study we were able to identify microsyntenic regions in the Musa and rice genomes that have persisted over some 117 million years of evolution since these two lineages diverged . However, in all syntenic regions detected, the shared genes were separated by intervening genes reflecting the occurrence of numerous insertions and deletion of genes in both rice and Musa. Insertions and deletions have been observed between rice and Arabidopsis regions showing micro-colinearity  and to a much lower extent between colinear regions among Poacea genomes [37, 79]. Further sequencing of the Musa and other monocot genomes will provide more insight on the extent of lineage-specific gene gain and loss in otherwise syntenic regions.
A first insight into syntenic relationships between Musa A and B
We focused our pilot study on two genomic regions containing genes of agronomic importance for Musa and rice to gain insight into the extent of conservation between the two cultivated species, M. acuminata (A genome) and M. balbisiana (B genome). Our data revealed an extremely high level of colinearity between the two Musa genomes in both regions. However several insertions and deletions occurred during the period of divergence (~4.6 Mya) of the two Musa species. The high level of microsynteny between the two genomes is likely to accelerate gene isolation in M. balbisiana once the construction of the whole genome physical map of M. acuminata has been completed by the Global Musa Genomics Consortium.
Unveiling the paleopolyploid nature of Musa species
There is accumulating data supporting that polyploidy is one of the most important evolutionary mechanisms influencing the structure and content of angiosperm genomes . Our work indicates ancient polyploidization in the lineage leading to Musa approximately 60 Mya. Similar lineage-specific events were described in the Poaceae [81, 82], Brassicaceae [56, 83, 84], Populus , Solanaceae, Leguminoceae , Papaveraceae, Acorus, the Magnoliids and the Nymphaceae . Polyploidy has clearly been an important source of genetic variation across the angiosperms as retained duplicate genes typically show divergent patterns of gene expression [85, 86]. In Musa, as in other plant species, novel phenotypes can emerge from this genomic amalgam, including some with high visibility to natural selection, such as organ size and disease resistance.
Of particular interest is the "composite" nature of the duplicated rice regions relative to the syntenic Musa BAC MA4_25J11; different sets of genes were lost in rice chromosome 1 and 5, respectively as compared to Musa. This type of evolution is likely to reflect a dynamic of duplication  and independent evolution in both monocot lineages including recurrent cycles of genome duplication followed by diploidization. This phenomenon was also identified by  in their analysis of differential gene loss following duplication events in rice and Arabidopsis. Furthermore, our phylogenetic analyses of gene sets including the genes on Musa BAC MA4_25J11, rice orthologs and related genes found in the Arabidopsis genome and TIGR gene indices corroborate previous results suggesting that a genome-wide duplication in the common ancestor of all major cereal lineages is responsible for the large duplicated segments observed in the rice genome [61, 62, 87]. This finding illustrates how comparative analyses of distantly-related monocot species can complement studies on cereal genomes.
Is rice a good model to study the structure and evolution of Musa genomes?
The use of rice as a reference species to accelerate map-based cloning projects by extrapolating marker position data and increasing marker density in targeted regions has a proven efficiency among cereal crops (e.g. barley, wheat, Sorghum), with a perceivable trend towards decreased efficiency when phylogenetic distance increases. Our analyses of the amount of microsynteny between rice and Musa suggest that there are cases in which predictions based upon microsynteny are useful but also this may not be general. In addition although our data showed that Musa genome is more similar to grain genomes relative to onion, asparagus and the basal monocot, Acorus, the differences observed confirmed that cereal genomes are not representative of all monocots [52, 53, 76, 77]. This work also highlight that comparative analyses between distantly-related species such as rice and Musa are very important to improve our understanding of monocot genomes and more generally of angiosperms genome evolution.
In conclusion, this study represents the first effort to investigate the existence and extent of microsynteny between rice and Musa, two-distantly related monocot species. Our analyses revealed a higher degree of synteny than has been reported for other comparisons between the rice and species outside of the grass family. In addition, we identified evidence for an extensive microsynteny between the two Musa species representing the progenitors of most cultivated genotypes. In addition, we identified evidences for an ancient genome-scale duplication event in the lineage leading to Musa and highlighted the complexity of analyzing the structure and evolution of plant genomes following independent cycles of genome duplication and diploidization.
Selection of Musa BAC clones
Nine probes known from previous data to be conserved between rice, Musa acuminata cv. Madang, Musa balbisiana cv. PKW and Arabidopsis and revealing single or very low copy number locus were selected. These nine probes (SbRPG) correspond to Sorghum cDNA developed by Rustica Prograin Génétique and CIRAD . These cDNAs and two Musa genomic probes CIR257 and CIR560  were used to screen high density filters of the M. acuminata Calcutta-4 BAC library  according to standard protocols (see Table 1). The probes CIR257 and CIR560 were also used to screen M. balbisiana cv. PKW BAC library . BAC DNA of positive clones was isolated using a Qiagen Robot 9600 and Qiagen 96-well BAC DNA isolation kit and digested with the restriction enzyme HindIII. The HindIII fingerprints were then hybridized with the corresponding probe to determine the number of loci.
Chromosome preparations were made as described in D'Hont et al  from root tips of M. acuminata cv. Calcutta-4 cultivated in glasshouse. Fluorescent in situ hybridizations (FISH) were performed as described in D'Hont et al , with 30 ng of BAC DNA labeled with digoxigenin or biotin as probes and 50 ng/μl of sheared salmon sperm DNA. The chromosomes were counterstained with DAPI (4'.6-diamidino-2-phenylindole).
Selected BAC clones were sequenced by similar shotgun approaches at The Institute for Genomic Research (TIGR), Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA-CENARGEN), Universidade Catolica de Brasilia (UCB) and National Institute of Agrobiological Sciences (NIAS). At TIGR, purified BAC DNA was sheared by nebulization, size-selected (2–3 kb) and ligated into a pUC-derived vector, pHOS1, using BstXI linkers. BAC DNA sent for sequencing to EMBRAPA and UCB was fragmented at Genoscope Centre (Evry, Paris, France) using a hydroshearer, size-selected (5 kb) and ligated into pcDNA2.1 vector using BstXI linkers. Clones were sequenced from both ends using ABI Big Dye terminator chemistry on ABI 3730 sequencing machines at TIGR and using a DYEnamicTM ET Terminator Sequencing Kit (Amersham Pharmacia Biotech) on Applied Biosystems 377 sequencers at EMBRAPA and UCB. Sequences were assembled using TIGR assembler and additional directed sequencing reactions performed as necessary to complete the sequence to high quality. BAC shotgun sequencing from NIAS were performed using shotgun (2 kb and 5–7 kb) clones of 10x coverage and Big Dye Terminator Kit (ABI) on ABI 3700 sequencers, assembled with Phred/phrap software [90, 91], and contig gaps were filled if necessary.
Annotation of the BAC assemblies was carried out using the TIGR annotation pipeline, a collection of software known as Eukaryotic Genome Control (EGC) that serves as the central data management system. Each BAC sequence was processed through a series of algorithms for predicting genes (Genscan+, Genemark.hmm, Glimmer) [92–94], splice sites [95, 96] and tRNAs . The AAT package  was used for homology search against nucleotide and protein databases, that include plant-specific cDNA and EST sequences, TIGR plant gene indices , a non-redundant amino acid database filtered from public sources, and SwissProt . Protein models generated by the searches and predictions are further searched against Markov model (HMM) databases, including PFAM , and automatically assigned a putative name based on domain hits or homology to previously identified proteins. Gene structures and names were manually inspected and refined as necessary. Annotated gene models were scanned for Musa transposable element nucleotide sequences downloaded from GenBank and then compared to a curated database of transposable element-encoded proteins . The top match from each hit was used to classify the transposable element.
Comparison of BACs with one another
In order to determine whether the BACs selected by hybridization actually arose from duplicated regions of the M. acuminata (A) genome or homeologous regions of the M. balbisiana (B) genome, or to identify duplicated regions in the M. acuminata (A) genome (pairs of BACs hybridizing with the same probes), each BAC was compared against all other BACs using MUMmer ; Dotter ;  and an all-by-all BLASTP search . The sequence identity of the overlapping sequences between BACs: MA4_82I11 and MBP_81C12 or MA4_54N7 and MBP_91N22, was computed with Stretcher from the EMBOSS package .
The 443 Musa predicted proteins were aligned against the rice and Arabidopsis proteomes using the BLASTP program (e-value < 1e-10) . The i-ADHoRe software  which looks for regions where the gene order is similar between two genomic sequences was used with the following parameters: a gap size and a cluster gap of 40-40, a q value of 0.9, three anchor points and a probability cutoff of 0.001. For four BACs (MA4_25J11, MA4_8L21, MBP_91N22 and MuH9), we tried to extend the regions of synteny between Musa and rice found by i-ADHoRe by conducting reciprocal BLASTP searches between the genes corresponding on the homologous regions.
Musa genes were used in BLASTX searches to query a database of rice and Arabidopsis gene family clusters . Translated blast searches (tBLASTX) against the TIGR plant gene indices  were also performed and inferred protein sequences with e-values < 1e-30 were compiled with homologous Musa, rice and Arabidopsis sequences. Amino acid alignments of the compiled sequences were constructed using MUSCLE  and manually adjusted. Parsimony analyses were performed on the amino acid alignments using PAUP* v4.0b10 .
Contruction of unigenes
Musa EST sequences were provided by the Global Musa Genomics Consortium . These sequences were first assembled into unigenes using the TGICL package  to eliminate sequence redundancy. Because unigenes are derived from EST sequences and so have no annotated open reading frames and may contain frameshift sequencing errors, the following approach was taken. Each unigene was aligned against the rice proteome (downloaded from GenBank) using BLASTX. The best match was considered significant if the alignment length was >100 amino acids and the Expect value (E) was <1e-15. The open reading frame was then extracted from the unigene sequence using the Genewise program (which can infer frameshift sites;  with the corresponding best match protein as a guide.
Estimation of the level of synonymous substitution between two sequences
For each pair of coding sequence, the two translation products were aligned using the MUSCLE program  and the resulting alignment was used as a guide to align the nucleotide sequences. After removing gaps and N-containing codons, the level of synonymous substitution (Ks) was estimated using the maximum likelihood method implemented in CODEML  under the F3x4 model .
Distribution of the age of duplication of Musa genes
All-against-all nucleotide sequence similarity searches were done among the open reading frame extracted from the unigene sequences using BLASTN . Sequences aligned over >300 bp and showing at least 40% identity were defined as pairs of paralogs. Then we estimated Ks for each pair of paralogs. We systematically discarded one sequence from a pair of paralogs showing no synonymous substitutions (Ks = 0) as well as all Ks values involving this sequence to avoid the inclusion of redundant entries of the same gene in the analysis (see  for further details). A gene family of n members results from n-1 gene duplication events. However, the number of possible pairwise comparisons within a gene family (n × (n-1)/2) can be substantially larger than the number of gene duplications, which results in multiple estimates of the ages of some duplications. To eliminate the redundant Ks values, pairs of duplicated sequences were grouped into gene families using a single linkage clustering method. Then we used the hierarchical clustering method described in  to reconstruct the approximated phylogeny of each gene family: (1) Initially, all sequences in the family were treated as a separate clusters. (2) Then, the Ks values for all possible pairs of clusters were compared. (3) The pair of clusters having the smallest Ks value was replaced by a single new cluster containing all their sequences. (4) The median Ks value was chosen to represent the duplication event that gave rise to the two merged clusters. (5) Steps 2 to 4 were repeated until all sequences were contained in a single cluster. When two clusters A and B contained more than one sequence, their associated Ks value in step 2 corresponded to the median Ks obtained for all possible pairs of any sequence from A and any sequence from B.
- AC number:
coding DNA sequence
expressed sequence tag
fluorescent in situ hybridization
genomic in situ hybridization
synonymous substitution rate
million years ago
Pisang Klutuk Wulung
restriction fragment length polymorphism
We thank the Montpellier Languedoc-Roussillon Genopole® for hosting the BAC library production and screening. We thank the Genoscope Centre in Evry, Paris, France for assisting AYC to carry out the subcloning of the five BAC clones sequenced at EMBRAPA and UCB.
Access to the Syngenta Musa EST database, donated by Syngenta to the International Network for the Improvement of Banana and Plantain (INIBAP) for use within the framework of the Global Musa Genomics Consortium is acknowledged.
This work was supported by CIRAD, INIBAP, NIAS, EMBRAPA, UCB, the National Council for Scientific and Technological Development (CNPq) in Brazil, TIGR and Generation Challenge program.
- Arias P, Dankers C, Liu P, Pilkauskas P: The world banana economy 1985–2002. FAO. 2003, [http://www.fao.org/documents/show_cdr.asp?url_file=/docrep/007/y5102e/y5102e00.htm]
- Janssen T, Bremer K: The age of major monocot groups inferred from 800 + rbcL sequences. Botanical Journal of the Linnean Society. 2004, 146: 385-398. 10.1111/j.1095-8339.2004.00345.x.
- Sanderson MJ, Thorne JL, Wikström N, Bremer K: Molecular evidence on plant divergence times. American Journal of Botany. 2004, 91 (1656–1665):
- Simmonds N, Shepherd K: The taxonomy and origins of the cultivated bananas. Bot J Linn Soc. 1955, 55: 302-312.
- Bartos J, Alkhimova O, Dolezelova M, De Langhe E, Dolezel J: Nuclear genome size and genomic distribution of ribosomal DNA in Musa and Ensete (Musaceae): taxonomic implications. Cytogenet Genome Res. 2005, 109 (1–3): 50-57.PubMed
- Carreel F, Fauré S, Gonzalez de Leon D, Lagoda PJL, Perrier X, Bakry F, Tezenas du Montcel H, Lanaud C, Horry JP: Evaluation of the genetic diversity in diploid bananas (Musa sp.). Genetics, Selection, Evolution. 1994, 26: 125s-136s. 10.1051/gse:19940709.
- Carreel F, Gonzalez de Leon D, Lagoda P, Lanaud C, Jenny C, Horry JP, Tezenas du Montcel H: Ascertaining maternal and paternal lineage within Musa by chloroplast and mitochondrial DNA RFLP analyses. Genome. 2002, 45 (4): 679-692. 10.1139/g02-033.PubMed
- Grapin A, Noyer JL, Dambier D, Carreel F, Lanaud C, Baurens F-C, Lagoda PJL: Diploid Musa acuminata genetic diversity with Sequence Tagged Microsatellite Sites. Electrophoresis. 1998, 19: 1374-1380. 10.1002/elps.1150190829.PubMed
- Noyer JL, Causse S, Tomekpe K, Bouet A, Baurens FC: A new image of plantain diversity assessed by SSR, AFLP and MSAP markers. Genetica. 2005, 124 (1): 61-69. 10.1007/s10709-004-7319-z.PubMed
- Raboin LM, Carreel F, Noyer J-L, Baurens F-C, Horry J-P, Bakry F, Tezenas Du Montcel H, Ganry J, Lanaud C, Lagoda PJL: Diploid Ancestors of Triploid Export Banana Cultivars: Molecular Identification of 2n Restitution Gamete Donors and n Gamete Donors. Molecular breeding. 2005, 16 (4): 333-341. 10.1007/s11032-005-2452-7.
- Ude G, Pillay M, Nwakanma D, Tenkouano A: Genetic Diversity in Musa acuminata Colla and Musa balbisiana Colla and some of their natural hybrids using AFLP Markers. Theor Appl Genet. 2002, 104 (8): 1246-1252. 10.1007/s00122-002-0914-4.PubMed
- Ge XJ, Liu MH, Wang K, Schaal BA, Chiang TY: Population structure of wild bananas, Musa balbisiana, in China determined by SSR fingerprinting and cpDNA PCR-RFLP. Molecular ecology. 2005, 14: 933-944. 10.1111/j.1365-294X.2005.02467.x.PubMed
- Baurens FC, Noyer JL, Lanaud C, Lagoda PJ: Use of competitive PCR to assay copy number of repetitive elements in banana. Mol Gen Genet. 1996, 253 (1–2): 57-64.PubMed
- D'Hont A, Paget-Goy A, Escoute J, Carreel F: The interspecific genome structure of cultivated banana, Musa spp. revealed by genomic DNA in situ hybridization. Theor Appl Genet. 2000, 100: 177-183. 10.1007/s001220050024.
- Valarik M, Simkova H, Hribova E, Safar J, Dolezelova M, Dolezel J: Isolation, characterization and chromosome localization of repetitive DNA sequences in bananas (Musa spp.). Chromosome Res. 2002, 10 (2): 89-100. 10.1023/A:1014945730035.PubMed
- Kamate K, Brown S, Durand P, Bureau JM, De Nay D, Trinh TH: Nuclear DNA content and base composition in 28 taxa of Musa. Genome. 2001, 44 (4): 622-627. 10.1139/gen-44-4-622.PubMed
- Lysak M, Dolezelova M, Horry J, Swennen R, Dolezel J: Flow cytometric analysis of nuclear DNA content in Musa. Theor Appl Genet. 1999, 98: 1344-1350. 10.1007/s001220051201.
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815. 10.1038/35048692.
- International Rice Genome Sequencing Project: The map-based sequence of the rice genome. Nature. 2005, 436 (7052): 793-800. 10.1038/nature03895.
- Fauré S, Noyer J, Horry J, Bakry F, Lanaud C, Gonzalez D, Leon D: A molecular marker-based linkage map of diploid bananas (Musa acuminata). Theor Appl Genet. 1993, 87: 517-526. 10.1007/BF00215098.PubMed
- Noyer J, Dambier D, Lanaud C, Lagoda P: The saturated map of diploid banana (Musa acuminata). Abstract Plant & Animal Genome V Conference. 1997
- Vilarinhos A, Carreel F, Rodier M, Hippolyte I, Benabdelmouna A, Triaire D, Bakry F, Courtois B, D'Hont A: Characterization Of Translocations In Banana By FISH Of BAC Clones Anchored To A Genetic Map. Plant & Animal Genomes XIV Conference. 2006, San Diego, CA, January 14–18, 2006
- Tropgenedb. [http://tropgenedb.cirad.fr/en/banana.html]
- Ortiz-Vazquez E, Kaemmer D, Zhang HB, Muth J, Rodriguez-Mendiola M, Arias-Castro C, James A: Construction and characterization of a plant transformation-competent BIBAC library of the black Sigatoka-resistant banana Musa acuminata cv. Tuu Gia (AA). Theor Appl Genet. 2005, 110 (4): 706-713. 10.1007/s00122-004-1896-1.PubMed
- Vilarinhos AD, Piffanelli P, Lagoda P, Thibivilliers S, Sabau X, Carreel F, D'Hont A: Construction and characterization of a bacterial artificial chromosome library of banana (Musa acuminata Colla). Theor Appl Genet. 2003, 106 (6): 1102-1106.PubMed
- Safar J, Noa-Carrazana JC, Vrana J, Bartos J, Alkhimova O, Sabau X, Simkova H, Lheureux F, Caruana ML, Dolezel J, et al: Creation of a BAC resource to study the structure and evolution of the banana (Musa balbisiana) genome. Genome. 2004, 47 (6): 1182-1191. 10.1139/g04-062.PubMed
- Devos KM: Updating the 'crop circle'. Curr Opin Plant Biol. 2005, 8 (2): 155-162. 10.1016/j.pbi.2005.01.005.PubMed
- Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, Town CD, Young ND: Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biol. 2005, 5 (1): 15-10.1186/1471-2229-5-15.PubMedPubMed Central
- Town CD, Cheung F, Maiti R, Crabtree J, Haas BJ, Wortman JR, Hine EE, Althoff R, Arbogast TS, Tallon LJ, et al: Comparative Genomics of Brassica oleracea and Arabidopsis thaliana Reveal Gene Loss, Fragmentation, and Dispersal after Polyploidy. Plant Cell. 2006, 18 (6): 1348-1359. 10.1105/tpc.106.041665.PubMedPubMed Central
- Zhu H, Choi HK, Cook DR, Shoemaker RC: Bridging model and crop legumes through comparative genomics. Plant Physiol. 2005, 137 (4): 1189-1196. 10.1104/pp.104.058891.PubMedPubMed Central
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313 (5793): 1596-1604. 10.1126/science.1128691.PubMed
- Jackson S, Rounsley S, Purugganan M: Comparative sequencing of plant genomes: choices to make. Plant Cell. 2006, 18 (5): 1100-1104. 10.1105/tpc.106.042192.PubMedPubMed Central
- Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RW, Chen AH, Edwards TM, Estill JC, et al: Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci USA. 2005, 102 (37): 13206-13211. 10.1073/pnas.0502365102.PubMedPubMed Central
- Buell CR, Yuan Q, Ouyang S, Liu J, Zhu W, Wang A, Maiti R, Haas B, Wortman J, Pertea M, et al: Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. Genome Res. 2005, 15 (9): 1284-1291. 10.1101/gr.3869505. Epub 2005 Aug 1218PubMed
- La Rota M, Sorrells ME: Comparative DNA sequence analysis of mapped wheat ESTs reveals the complexity of genome relationships between rice and wheat. Funct Integr Genomics. 2004, 4 (1): 34-46. 10.1007/s10142-003-0098-2.PubMed
- Singh NK, Raghuvanshi S, Srivastava SK, Gaur A, Pal AK, Dalal V, Singh A, Ghazi IA, Bhargav A, Yadav M, et al: Sequence analysis of the long arm of rice chromosome 11 for rice-wheat synteny. Funct Integr Genomics. 2004, 4 (2): 102-117. 10.1007/s10142-004-0109-y. Epub 2004 Apr 2014PubMed
- Ilic K, SanMiguel PJ, Bennetzen JL: A complex history of rearrangement in an orthologous region of the maize, sorghum, and rice genomes. Proc Natl Acad Sci USA. 2003, 100 (21): 12265-12270. 10.1073/pnas.1434476100. Epub 12003 Oct 12266PubMedPubMed Central
- Gu Y, Coleman-Derr D, Kong X, Anderson O: Rapid genome evolution revealed by comparative sequence analysis of orthologous regions from four triticeae genomes. Plant Physiol. 2004, 135 (1): 459-470. 10.1104/pp.103.038083.PubMedPubMed Central
- Jaillon O, Aury J, Brunet F, Petit J, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431 (7011): 946-957. 10.1038/nature03025.PubMed
- Jannoo N, Grivet L, Chantret N, Garsmeur O, Glaszmann JC, Arruda P, D'Hont A: Orthologous comparison in a gene-rich region among grasses reveals stability in the sugarcane polyploid genome. Plant Journal. 2007,
- Salse J, Piegu B, Cooke R, Delseny M: Synteny between Arabidopsis thaliana and rice at the genome level: a tool to identify conservation in the ongoing rice genome sequencing project. Nucleic Acids Res. 2002, 30 (11): 2316-2328. 10.1093/nar/30.11.2316.PubMedPubMed Central
- Salse J, Piegu B, Cooke R, Delseny M: New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) highlights reshuffling and identifies new duplications in the rice genome. Plant J. 2004, 38 (3): 396-409. 10.1111/j.1365-313X.2004.02058.x. Erratum in: Plant J. 2004 Jun;2038(2005):2873PubMed
- Aert R, Sagi L, Volckaert G: Gene content and density in banana (Musa acuminata) as revealed by genomic sequencing of BAC clones. Theor Appl Genet. 2004, 109 (1): 129-139. 10.1007/s00122-004-1603-2.PubMed
- Cheung F, Town CD: A BAC end view of the Musa acuminata genome. BMC Plant Biol. 2007, 7 (29): 29-10.1186/1471-2229-7-29.PubMedPubMed Central
- Oikawa T, Koshioka M, Kojima K, Yoshida H, Kawata M: A role of OsGA20ox1, encoding an isoform of gibberellin 20-oxidase, for regulation of plant stature in rice. Plant Mol Biol. 2004, 55 (5): 687-700. 10.1007/s11103-004-1692-y.PubMed
- Spielmeyer W, Ellis MH, Chandler PM: Semidwarf (sd-1), "green revolution" rice, contains a defective gibberellin 20-oxidase gene. Proc Natl Acad Sci USA. 2002, 99 (13): 9043-9048. 10.1073/pnas.132266399.PubMedPubMed Central
- Nishizawa Y, Saruta M, Nakazono K, Nishio Z, Soma M, Yoshida T, Nakajima E, Hibi T: Characterization of transgenic rice plants over-expressing the stress-inducible beta-glucanase gene Gns1. Plant Mol Biol. 2003, 51 (1): 143-152. 10.1023/A:1020714426540.PubMed
- Thomas BR, Romero GO, Nevins DJ, Rodriguez RL: New perspectives on the endo-beta-glucanases of glycosyl hydrolase Family 17. Int J Biol Macromol. 2000, 27 (2): 139-144. 10.1016/S0141-8130(00)00109-4.PubMed
- Romero GO, Simmons C, Yaneshita M, Doan M, Thomas BR, Rodriguez RL: Characterization of rice endo-beta-glucanase genes (Gns2-Gns14) defines a new subgroup within the gene family. Gene. 1998, 223 (1–2): 311-320. 10.1016/S0378-1119(98)00368-0.PubMed
- Simmons CR, Litts JC, Huang N, Rodriguez RL: Structure of a rice beta-glucanase gene regulated by ethylene, cytokinin, wounding, salicylic acid and fungal elicitors. Plant Mol Biol. 1992, 18 (1): 33-45. 10.1007/BF00018454.PubMed
- Childs KL, Hamilton JP, Zhu W, Ly E, Cheung F, Wu H, Rabinowicz PD, Town CD, Buell CR, Chan AP: The TIGR Plant Transcript Assemblies database. Nucleic Acids Res. 2007, D846-851. 10.1093/nar/gkl785. 35 Database
- Kuhl JC, Cheung F, Yuan Q, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, et al: A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell. 2004, 16 (1): 114-125. 10.1105/tpc.017202. Epub 2003 Dec 2011PubMedPubMed Central
- Kuhl JC, Havey MJ, Martin WJ, Cheung F, Yuan Q, Landherr L, Hu Y, Leebens-Mack J, Town CD, Sink KC: Comparative genomic analyses in Asparagus. Genome. 2005, 48: 1052-1060. 10.1139/g05-073.PubMed
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-358. 10.1093/bioinformatics/bti1018.PubMed
- Parsons JD: Miropeats: graphical DNA sequence comparisons. Comput Appl Biosci. 1995, 11 (6): 615-619.PubMed
- Simillion C, Vandepoele K, Saeys Y, Van de Peer Y: Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 2004, 14 (6): 1095-1106. 10.1101/gr.2179004.PubMedPubMed Central
- Sampedro J, Lee Y, Carey RE, dePamphilis C, Cosgrove DJ: Use of genomic history to improve phylogeny and understanding of births and deaths in a gene family. Plant J. 2005, 44 (3): 409-419. 10.1111/j.1365-313X.2005.02540.x.PubMed
- Vandepoele K, Simillion C, Van de Peer Y: Detecting the undetectable: uncovering duplicated segments in Arabidopsis by comparison with rice. Trends Genet. 2002, 18 (12): 606-608. 10.1016/S0168-9525(02)02796-8.PubMed
- Sonnhammer EL, Koonin EV: Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002, 18 (12): 619-620. 10.1016/S0168-9525(02)02793-2.PubMed
- Blanc G, Barakat A, Guyot R, Cooke R, Delseny M: Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell. 2000, 12 (7): 1093-1101. 10.1105/tpc.12.7.1093.PubMedPubMed Central
- Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 2004, 101 (26): 9903-9908. 10.1073/pnas.0307901101.PubMedPubMed Central
- Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, et al: The Genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005, 3 (2): e38-10.1371/journal.pbio.0030038.PubMedPubMed Central
- A Global Programme for Musa Improvement. [http://www.promusa.org/]
- Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16 (7): 1667-1678. 10.1105/tpc.021345. Epub 2004 Jun 1618PubMedPubMed Central
- Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, et al: Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006, 16 (6): 738-749. 10.1101/gr.4825606.PubMedPubMed Central
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.PubMed
- Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y: Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA. 2005, 102 (15): 5454-5459. 10.1073/pnas.0501102102.PubMedPubMed Central
- Schlueter SD, Wilkerson MD, Huala E, Rhee SY, Brendel V: Community-based gene structure annotation. Trends Plant Sci. 2005, 10 (1): 9-14. 10.1016/j.tplants.2004.11.002.PubMed
- Hughes AL, Friedman R, Ekollu V, Rose JR: Non-random association of transposable elements with duplicated genomic blocks in Arabidopsis thaliana. Mol Phylogenet Evol. 2003, 29 (3): 410-416. 10.1016/S1055-7903(03)00262-8.PubMed
- Vandepoele K, Simillion C, Van de Peer Y: Evidence that rice and other cereals are ancient aneuploids. Plant Cell. 2003, 15 (9): 2192-2202. 10.1105/tpc.014019.PubMedPubMed Central
- The TIGR Plant Transcript Assemblies database. [http://plantta.tigr.org/]
- Kress WJ, Prince LM, Hahn WJ, Zimmer EA: Unraveling the evolutionary radiation of the families of the Zingiberales using morphological and molecular evidence. Syst Biol. 2001, 50 (6): 926-944. 10.1080/106351501753462885.PubMed
- Bremer K: Early Cretaceous lineages of monocot flowering plants. Proc Natl Acad Sci USA. 2000, 97 (9): 4707-4711. 10.1073/pnas.080421597.PubMedPubMed Central
- Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996, 93 (19): 10274-10279. 10.1073/pnas.93.19.10274.PubMedPubMed Central
- Bennetzen JL, Ma J, Devos KM: Mechanisms of recent genome size variation in flowering plants. Ann Bot (Lond). 2005, 95 (1): 127-132. 10.1093/aob/mci008.
- Martin WJ, McCallum J, Shigyo M, Jakse J, Kuhl JC, Yamane N, Pither-Joyce M, Gokce AF, Sink KC, Town CD, et al: Genetic mapping of expressed sequences in onion and in silico comparisons with rice show scant colinearity. Mol Genet Genomics. 2005, 1-8.
- Jakse J, Telgmann A, Jung C, Khar A, Melgar S, Cheung F, Town CD, Havey MJ: Comparative sequence and genetic analyses of asparagus BACs reveal no microsynteny with onion or rice. Theor Appl Genet. 2006, 114 (1): 31-39. 10.1007/s00122-006-0407-y.PubMed
- Lohithaswa HC, Feltus FA, Singh HP, Bacon CD, Bailey CD, Paterson AH: Leveraging the rice genome sequence for monocot comparative and translational genomics. Theor Appl Genet. 2007, 115 (2): 237-243. 10.1007/s00122-007-0559-4.PubMed
- Dubcovsky J, Ramakrishna W, SanMiguel PJ, Busso CS, Yan L, Shiloff BA, Bennetzen JL: Comparative sequence analysis of colinear barley and rice bacterial artificial chromosomes. Plant Physiol. 2001, 125 (3): 1342-1353. 10.1104/pp.125.3.1342.PubMedPubMed Central
- Adams KL, Wendel JF: Polyploidy and genome evolution in plants. Curr Opin Plant Biol. 2005, 8 (2): 135-141. 10.1016/j.pbi.2005.01.001.PubMed
- Paterson AH, Bowers JE, Peterson DG, Estill JC, Chapman BA: Structure and evolution of cereal genomes. Curr Opin Genet Dev. 2003, 13 (6): 644-650. 10.1016/j.gde.2003.10.002.PubMed
- Wang X, Shi X, Hao B, Ge S, Luo J: Duplication and DNA segmental loss in the rice genome: implications for diploidization. New Phytol. 2005, 165 (3): 937-946. 10.1111/j.1469-8137.2004.01293.x.PubMed
- Vision TJ, Brown DG, Tanksley SD: The origins of genomic duplications in Arabidopsis. Science. 2000, 290 (5499): 2114-2117. 10.1126/science.290.5499.2114.PubMed
- Bowers JE, Abbey C, Anderson S, Chang C, Draye X, Hoppe AH, Jessup R, Lemke C, Lennington J, Li Z, et al: A high-density genetic recombination map of sequence-tagged sites for sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses. Genetics. 2003, 165 (1): 367-386.PubMedPubMed Central
- Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004, 16 (7): 1679-1691. 10.1105/tpc.021410.PubMedPubMed Central
- Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW: Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol Biol Evol. 2006, 23 (2): 469-478. 10.1093/molbev/msj051.PubMed
- Rong J, Abbey C, Bowers JE, Brubaker CL, Chang C, Chee PW, Delmonte TA, Ding X, Garza JJ, Marler BS, et al: A 3347-locus genetic recombination map of sequence-tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics. 2004, 166 (1): 389-417. 10.1534/genetics.166.1.389.PubMedPubMed Central
- Boivin K, Deu M, Rami J-F, Trouche G, Hamon P: Towards a saturated sorghum map using RFLP and AFLP markers. Theor Appl Genet. 1999, 98: 320-10.1007/s001220051076.
- Luo M, Wang YH, Frisch D, Joobeur T, Wing RA, Dean RA: Melon bacterial artificial chromosome (BAC) library construction using improved methods and identification of clones linked to the locus conferring resistance to melon Fusarium wilt (Fom-2). Genome. 2001, 44: 154-116. 10.1139/gen-44-2-154.PubMed
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.PubMed
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185.PubMed
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268 (1): 78-94. 10.1006/jmbi.1997.0951.PubMed
- Lukashin AV, Borodovsky M: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998, 26 (4): 1107-1115. 10.1093/nar/26.4.1107.PubMedPubMed Central
- Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H: Interpolated Markov models for eukaryotic gene finding. Genomics. 1999, 59 (1): 24-31. 10.1006/geno.1999.5854.PubMed
- Pertea M, Lin X, Salzberg SL: GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001, 29 (5): 1185-1190. 10.1093/nar/29.5.1185.PubMedPubMed Central
- Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S: Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res. 1996, 24 (17): 3439-3452. 10.1093/nar/24.17.3439.PubMedPubMed Central
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964. 10.1093/nar/25.5.955.PubMedPubMed Central
- Huang X, Adams MD, Zhou H, Kerlavage AR: A tool for analyzing and annotating genomic sequences. Genomics. 1997, 46 (1): 37-45. 10.1006/geno.1997.4984.PubMed
- Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 2000, 28 (1): 141-145. 10.1093/nar/28.1.141.PubMedPubMed Central
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28 (1): 45-48. 10.1093/nar/28.1.45.PubMedPubMed Central
- Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res. 2002, 30 (1): 276-280. 10.1093/nar/30.1.276.PubMedPubMed Central
- Transposable elements database on TIGR FTP site. [ftp://ftp.tigr.org/pub/data/TransposableElements/transposon_db.pep]
- Delcher AL, Phillippy A, Carlton J, Salzberg SL: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002, 30 (11): 2478-2483. 10.1093/nar/30.11.2478.PubMedPubMed Central
- Sonnhammer EL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995, 167 (1–2): GC1-10. 10.1016/0378-1119(95)00714-8.PubMed
- Dotter web site. [http://bioinfo.hku.hk/doc/dotter.html]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed Central
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.PubMed
- Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMed
- The Floral Genome Project – PlantTribes. [http://fgpdev.huck.psu.edu/tribe.pl]
- The TIGR plant gene indices web site. [http://www.tigr.org/tdb/tgi/plant.shtml]
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340. Print 2004PubMedPubMed Central
- Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. 2003
- The Global Musa Genomics Consortium. [http://www.musagenomics.org]
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19 (5): 651-652. 10.1093/bioinformatics/btg034.PubMed
- Birney E, Thompson JD, Gibson TJ: PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 1996, 24 (14): 2730-2739. 10.1093/nar/24.14.2730.PubMedPubMed Central
- Yang Z: Phylogenetic analysis by maximum likelihood (PAML), version 2. University College London, England. 1999
- Goldman NaZY: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11 (5): 725-736.PubMed
- Chase MW: Monocot relationships: an overview. American Journal of Botany. 2004, 91: 1645-1655. 10.3732/ajb.91.10.1645.PubMed
- The Expressed Sequence Tags Database. [http://www.ncbi.nlm.nih.gov/dbEST/]
- Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, et al: The institute for genomic research Osa1 rice genome annotation database. Plant Physiol. 2005, 138 (1): 18-26. 10.1104/pp.104.059063.PubMedPubMed Central
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.