Because of the large number of chromosomes (often >100) and the nature of autopolyploids, both high density genetic mapping and physical mapping have proven to be challenging tasks in sugarcane. Currently, there is no physical map and no saturated genetic map that covers all chromosomes. Alternative approaches would need to be tested for a potential sugarcane genome sequencing project. Our results showed that the sorghum genome is an excellent template for assembly of sugarcane euchromatic sequences. The initial assembly of pooled 454 BAC sequences showed 40% inflation compared with estimated insert sizes, which likely was caused by multiple assemblies of repetitive sequences. After aligning the sequences with the sorghum genome using orthologous genes as anchors, 78.2% of the sugarcane BAC contigs could be ordered unambiguously and 53.1% of the sugarcane BAC sequences aligned with the sorghum genic regions. Sequences that were not aligned consisted of repetitive and non-coding sequences.
The suitability of the sorghum genome as a template for sugarcane genomic sequence assembly, at least for the genic regions, will be critical for strategic planning to sequence the sugarcane genome. Current BAC by BAC or whole genome shotgun sequencing approach would require a high density genetic map ideally with a density at two markers per Mb and a physical map with a10× genome coverage. For sugarcane, the only BAC library available is constructed from commercial hybrid cultivar R570 with 1.3× genome coverage . A 10× coverage BAC library would require one million clones with an average insert size of 100 kb, an expensive and laborious task. A high density genetic map would require mapping 20,000 markers for the 115 chromosomes of R570, and these markers would have to be sequence tagged to be useful for sequence assembly, not anonymous markers such as amplified fragment length polymorphism (AFLP) markers. In the past 20 years, 13 sugarcane maps have been constructed, and each of them covers only a fraction of the genome with less than 2,000 markers, and the majority of the markers in recent maps are AFLP markers [16, 19–22]. Fortunately, the cost of sequencing is declining rapidly with increased throughput. Most likely, a draft of the sugarcane genome will be generated before an ultra high density (2 markers per Mb) genetic map and a physical map are available, using the sorghum genome as a template for sequence assembly.
The sugarcane genome has gone through at least two rounds of genome wide duplication events to become an octoploid since its divergence from a common ancestor shared with sorghum. The two rounds of duplications might have occurred after the speciation event separated the two wild species S. robustum (x = 10) and S. spontaneum (x = 8) since these two species has different basic chromosome number [9–12], within 2 million years . Although each octoploid has eight genomes, it is not possible to distinguish each individual genome and every genome is a mosaic of all eight genome segments, because every chromosome is free to pair and recombine with any one of the other seven homologous chromosomes during meiosis, although it should be noted that most genetic maps of sugarcane showed some evidence of preferential pairing [19, 27]. For this reason, it might not be possible to distinguish the two recent genome wide duplications, and a minimum tiling path of BAC clones would be as a good representative as any one single genome in the octoploid. The hybrid cultivar R570 has 2n = 115 chromosomes with potentially 12 genomes. We selected a single BAC from each of 20 euchromatic regions corresponding to 20 distinctive chromosome arms (ended up with 18 arms due to the empty clone and a misplaced BAC), representing one of the potential 12 genome. We found more genes in sugarcane sequenced fragments than in sorghum in the aligned homologous regions (209 vs. 202), and more putative sugarcane specific genes (17) than sorghum specific genes (12). Two of the 19 initially annotated sugarcane specific genes have orthologs in other part of the sorghum genome, which left 17 to be most likely sugarcane specific genes. All 17 putative sugarcane-specific genes were validated by sugarcane ESTs, while only one of the 12 putative sorghum-specific genes was validated by sorghum ESTs. Moreover, 12 of the 17 sugarcane specific genes have no match in the non-redundant protein database in GenBank, suggesting that they are likely involved in sugarcane-specific processes. Although we masked the repetitive sequences of the BACs using plant repeat database, it is possible that some of them could be low copy transposable elements since we don't have a sugarcane specific repeat database.
The sugarcane EST project (SUCEST) yielded a database containing 237,954 ESTs assembled into 33,620 unigenes from 26 different cDNA libraries . This EST database validated 74.2% of the 209 annotated genes on the 19 sugarcane BACs, while only 60.4% of the 202 sorghum annotated genes were validated by sorghum ESTs. It might be a general rule that the EST databases of polyploid organisms represent higher percentage of genes than their diploid counterparts, because the multiple (12 in the case of sugarcane hybrids) allelic forms of each gene would result in greater chance of a particular allelic form to be sequenced in a collection of a wide range of tissues and developmental stages. However, more alleles don't necessarily increase the chance of a particular gene to be expressed in any type of tissues or developmental stages, as we have discovered two developmental stage specific genes in our RT PCR experiment involving 47 predicted genes.
The subtribe Saccharinae includes three major biofuel crops, sugarcane, Miscanthus, and sorghum. Sugarcane and Miscanthus are closely related and belong to the Saccharum complex . Sorghum is their closest relative outside of the Saccharum complex. Our estimate of a common ancestor shared by sugarcane and sorghum about 7.7 million years ago is in line with the 8-9 million years estimated by Jannoo et al . This time frame should be also applied to Miscanthus as it is a member of the Saccharum complex.
Most of the BAC sequences aligned with the sorghum sequences collinearly. However, one of the BAC (172L01) aligned to multiple chromosomes of sorghum, indicating large scale chromosomal rearrangements between sugarcane and sorghum genomes. Numerous local small scale (within a BAC) rearrangements between sugarcane and sorghum genomes were also detected. These sequence arrangements at both intra- and inter-chromosomal scales between the two species reflect their evolutionary history after their divergence about 8 million years ago. Our sugarcane BAC sequences provide the view of a representative genome of the possible 12 genomes in R570. It would be more interesting to document the rearrangements among sugarcane homologs, which should be far fewer.
The 2C genome size of R570 is about 10 Gb with an average of 87 Mb per chromosome among its 115 chromosomes, larger than the ~73 Mb per chromosome in sorghum . However, our data suggest that the sorghum sequences appear to be expanded compare to the sugarcane orthologous sequences studied, due to accumulation of retroelements, contradicting the genome size estimates from flow cytometry. If what we observed truly reflect the features of these two genomes, the basic genome of sugarcane (x = 10 or x = 8) could be smaller than that of sorghum. The discrepancy between the direct sequence comparison and the genome size estimates could be due to tendency of overestimating genome size by flow cytometry, as demonstrated by the sequenced genomes of rice and poplar [29, 30]. It is also possible that the discrepancy is caused by inaccurate assembly of repetitive sequences of the sugarcane BACs generated by 454 Flex. Finally, the small sampling of sugarcane BACs that we studied may not be representative of the genome as a whole.
Sugarcane has been cultivated and improved over thousands of years, beginning in prehistoric times with selection initially on natural variations and continuing with the modern techniques of hybridization and genetic engineering. Enormous yield increase has been achieved in the last century by breeding for yield, disease and insect resistance, and stress tolerance. While sugarcane farmers throughout the world face constant challenges to sustain profitability and protect the environment , breeders face not only those challenges but also a biological constraint as the gap between average farm yield and genetic yield potential is narrowed through improved agronomic practices . Sequencing the complex genome of autopolyploid sugarcane will provide the genomic resources to study genes and gene interactions controlling sugar yield, biomass yield, and other agronomic traits. A sugarcane genome sequence has the potential to revolutionize sugarcane improvement programs by providing high throughput genome wide screening for genomic selection , and for mining promoters of specific alleles.