A preliminary analysis of genome structure and composition in Gossypium hirsutum
- Wangzhen Guo†1,
- Caiping Cai†1,
- Changbiao Wang1,
- Liang Zhao1,
- Lei Wang1 and
- Tianzhen Zhang1Email author
© Guo et al; licensee BioMed Central Ltd. 2008
Received: 31 March 2008
Accepted: 01 July 2008
Published: 01 July 2008
Upland cotton has the highest yield, and accounts for > 95% of world cotton production. Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species. Here, we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome.
142 BAC sequences from Gossypium hirsutum cv. Maxxa were downloaded http://www.ncbi.nlm.nih.gov and confirmed. These BAC sequence analysis revealed that the tetraploid cotton genome contains over 70,000 candidate genes with duplicated gene copies in homoeologous A- and D-subgenome regions. Gene distribution is uneven, with gene-rich and gene-free regions of the genome. Twenty-one percent of the 142 BACs lacked genes. BAC gene density ranged from 0 to 33.2 per 100 kb, whereas most gene islands contained only one gene with an average of 1.5 genes per island. Retro-elements were found to be a major component, first an enriched LTR/gypsy and second LTR/copia. Most LTR retrotransposons were truncated and in nested structures. In addition, 166 polymorphic loci amplified with SSRs developed from 70 BAC clones were tagged on our backbone genetic map. Seventy-five percent (125/166) of the polymorphic loci were tagged on the D-subgenome. By comprehensively analyzing the molecular size of amplified products among tetraploid G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124, and diploid G. herbaceum var. africanum and G. raimondii, 37 BACs, 12 from the A- and 25 from the D-subgenome, were further anchored to their corresponding subgenome chromosomes. After a large amount of genes sequence comparison from different subgenome BACs, the result showed that introns might have no contribution to different subgenome size in Gossypium.
This study provides us with the first glimpse of cotton genome complexity and serves as a foundation for tetraploid cotton whole genomesequencing in the future.
Cotton is the world's most important natural textile fiber and a significant oilseed crop. The cotton genus (Gossypium L.) includes approximately 45 diploid species (2n = 2x = 26) differentiated cytogenetically into eight genome groups (A-G & K), and five allotetraploid species (2n = 4x = 52) . Diploid Gossypium species differentiated approximately 5–10 million years ago (Mya), however, polyploidization is estimated to have occurred more recently 1–2 Mya . All allotetraploids were formed from interspecific hybridization events between an A-genome-like ancestral African species and a D-genome-like North American species. The closest extant relative of the original tetraploid progenitors is the A-genome species G. herbaceum L. (A1) and the D-genome species G. raimondii (D5) Ulbrich. Of these, four cotton species, including two tetraploids G. hirsutum L. (AD)1 and G. barbadense L (AD)2, and two diploids G. herbaceum L. (A1) and G. arboreum L. (A2) were independently domesticated for fiber.
Upland cotton has the highest yield, and based on the importance of fiber, over 95% of the annual worldwide cotton crop is derived from G. hirsutum L., upland cotton, and the extra-long staple (ELS) or Pima cotton (G. barbadense L.) accounts for less than 2% . Two diploid species G. herbaceum L. (A1) and G. arboreum L. (A2) are planted less often. In cultivated tetraploid cotton species, the D-subgenome plays an important role in genome structure, function and evolution. For example, many quantitative trait loci (QTL) for fiber-related traits have been detected in the D-subgenome of tetraploid cotton [4–9]. D-genome species do not produce spinnable fiber ; however important genes or regulators for fiber morphogenesis and fiber properties have been detected in this genome. Based on the above analyses, understanding the contribution of the A- and D-subgenomes to gene expression in the allotetraploids may greatly facilitate fiber trait improvement [11, 12]. To attain this goal, decoding cotton genomes will be a foundation to enhance our understanding of the functional and agronomic significance of polyploidy and genome size variation within Gossypium .
Genome size differences are evident in the tetraploids and their diploid progenitors. The haploid genome size is estimated to be ~980-Mb for G. raimondii Ulbrich, ~1.86-Gb for G. arboreum L., and ~2.83 Gb for G. hirsutum L. . Diploid species variation in DNA content reflects increases and decreases in copy numbers of various repeat families , especially retrotransposon-like elements . The method most appropriate for elucidating whole-genome sequence information in cotton is either BAC-by-BAC sequencing or gene-enrichment approaches. A pilot study by the U.S. Department of Energy Joint Genome Institutes  has been initiated to generate the whole-genome shotgun sequence of G. raimondii. Meanwhile, gene-enrichment techniques such as methylation filtration and Cot-based cloning have also been used to compare G. raimondii, G. arboreum, G. hirsutum, and G. barbadense (B. Scheffler, Workshop communication).
The whole-genome sequence analysis of G. hirsutum will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species that accounts for > 95% of world cotton production. Prior to large-scale sequencing of tetraploid G. hirsutum genomes, a microcolinearity analysis of a few pairs of homoeologous BACs was completed, and indicated that sequence conservation of homoeologous BACs was high in both intergenic and genic regions . In addition, Grover et al. (2007) suggested size differences between homoeologous BACs was attributed to differential accumulation of retroelements.
The GeneTrek approach has been proposed as an efficient way to evaluate the general properties of any genome [19, 20] and has been successfully applied to predictions regarding components of the maize genome . To better understand the general composition and structure of the tetraploid cotton genome, in the present paper, we also employed GeneTrek and BAC tagging information approaches to analyze. This methodology facilitated our evaluation of the structure and composition of the allotetraploid genome based on 142 G. hirsutum cv. Maxxa BAC clones downloaded from the National Center for Biotechnology Information (NCBI) . The study provided us the first glimpse at cotton genome complexity, and the results indicated that the gene distribution in cotton genome is uneven with gene-rich and gene-free regions, and rich in repetitive elements. Introns might have no contribution to different subgenome size in Gossypium, and a two-fold genome difference between A- and D-subgenomes, which might largely be attributed to large amplifications of transposable elements in low-density gene or gene-free regions.
Confirmation of 142 BACs origin
Due to the fact that 142 BACs were result from a mistake first submitted as part of the maize sequencing project by the Genome Sequencing Center, Washington University School of Medicine and further corrected as G. hirsutum cv. Maxxa BAC clones, we downloaded these BACs from the National Center for Biotechnology Information (NCBI)  and confirmed their origin by developing BAC-SSR markers from 142 BAC sequences.
Each BAC was scanned for dinucleiotide to hexanucleiotide repeats of at least 18 bp in length. A total of 694 microsatellite sequences were detected. Among them, 208 SSRs were dinucleotides, 118 trinucleotides, 69 tetranucleotides, 80 pentanucleotides and 219 hexanucleotides. In addition, 578 SSR primer pairs were developed and used to detect the amplification ability in G. hirsutum cv. Maxxa, and our two mapping parents, G. hirsutum acc. TM-1 and G. barbadense cv. Hai7124. Among them, all 578 primer pairs amplified expected fragment sizes in G. hirsutum cv. Maxxa, and 161 primer pairs from 79 BACs amplified polymorphisms between TM-1 and Hai7124, yielding a 27.85% polymorphic rate. Both the high-level transferability among G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124 and the high-level polymorphism between TM-1 and Hai7124 indicated that these 142 BAC sequences must be from Maxxa genome. Further, these genomic SSR markers also have potential for use in future cotton genomics and molecular breeding. The newly developed SSR primer sequences, Genbank accession numbers, repeat motifs and numbers, expected product size, and polymorphic data between TM-1 and Hai7124 are presented in additional file 1.
Global analysis of genome structure and composition of tetraploid cotton
Summary of annotation results for 142 randomly selected cotton BACs
Total number of BACs analyzed
Combined BAC lengths
Amount of identified repetitive DNA (percentage)
5.7 Mb (40.1%)
Amount of unidentified DNA with predicted ORFs structure (Nos, percentage)
1.4 Mb (1653, 9.9%)
Unidentified ORFs that show collinearity with cotton EST database (percentage)
Number of genes with similarity or collinearity support
Number of hypothetical genes with low similarity or collinearity support
Overall gene density
One gene per 34.5 kb
Number of estimated total cotton genes
More than 70,000
Local gene density and distribution
Tandem duplication of genes
Thirty gene islands contained more than two genes, and in those islands, several types of tandem duplication genes encoding the same function were identified (see Additional file 4). According to the molecular function classification of these duplication genes, most were related to binding, such as sar1 GTP-binding secretory factor, ire kinase, RNA-binding protein 10, swi2 snf2-like protein, succinate dehydrogenase flavoprotein alpha subunit, adenylate kinase, and sll2 protein. Other genes functioned in catalytic activities, including genes coding ornithine carbamoyltransferase, glucose-methanol-cholineoxidoreductase family protein, adenylosuccinate lyase, protein phosphatase-5, protein kinase family protein, methylmalonate-semialdehyde dehydrogenase, calcineurin-like phosphoesterase family proteins and serine carboxypeptidase ii. Additional genes were determined to serve in transporter activities such as plasma membrane intrinsic proteins, structural molecule activity such as 50s ribosomal protein l15, and unknown molecular function, such as growth-regulating factor 1, among others. Several disease-resistant gene clusters resided in AC187066, AC190836 and AC202830 BACs. These specific gene clusters presumably accumulated more mutations in both coding and upstream promoter regions to favor a broader response to pathogen attack . Several QTLs related to Verticillium-resistance  were also found in these regions, but warrants further investigation.
Mobile elements analysis
Types of Transposable elements in cotton genome
Length occupied (bp)
Sequence analysis of gene-free BACs
Repetitive elements in 7 gene-free BACs
No. mobile elements
No. intact LTRs
Comparative analysis of genome structure and composition between A- and D- subgenome chromosomes
Temporal mapping of 70 BACs based on SSRs
Identified tagging of 37 BACs based on amplified product analysis
The subgenome belongings of BAC clones
Comparative sequence analysis between the A- and D-subgenome chromosomes
Thirty-seven BACs with verified origins were identified in this study, 12 BACs belonging to the A-subgenome with a total length of 1,200,814 bp length including 69 gaps (average 5.75 gaps/BACs); and 25 BACs within the D-subgenome covering 2,374,313 bp length with 37 gaps (average 1.48 gaps/BACs). These results indicated that A-subgenome BACs possessed regions more difficult to sequence than those from the D-subgenome. Furthermore, the genes predicted from the 37 BACs were evaluated for possible intron size contributions that correlated with genome size between the A- and D-subgenome chromosomes. In the 12 BACs belonging to the A-subgenome, 67 genes were predicted with an average of 937 bp exons and 920 bp introns for each gene; however, in the 25 BACs belonging to the D-subgenome, 104 genes were predicted with an average of 1297 bp exons and 1414 bp introns for each gene. Therefore, introns might have no contribution to different subgenome size in Gossypium.
Characteristics of genome structure in allotetraploid cotton
Cotton is the world's most important natural textile fiber and a significant oilseed crop. Cotton fiber is also an outstanding single-cell model to study plant cell elongation, and cell wall and cellulose biosynthesis . Of all 50 cotton species, Gossypium hirsutum provides over 95% of the annual cotton crop worldwide. Elucidating the tetraploid cotton genome composition and structure, especially upland cotton, will vastly expand opportunities in cotton research and agronomic improvements worldwide. However, cotton possesses a complex genome so whole genome sequencing of tetraploid cotton represents a substantial challenge . The GeneTrek approach has been proposed as an efficient means to evaluate the general properties of any genome by annotating a small set of randomly selected BACs [19, 20]. In maize, sequence analysis of 100 randomly selected BACs led to the prediction of 42,000–56,000 genes with at least 66% repetitive DNA . In addition, sequence analysis of 74 randomly selected BACs showed that the maize nuclear genome contains about 37,000 candidate genes and 5,500 truncated and probable pseudogenes. However, the distribution of genes and repetitive elements is uneven . In the present study, properties of the upland cotton genome, such as total gene number, amount and distribution of repetitive DNA, and gene distribution, were first predicted based on the annotation of 142 randomly sequenced BACs. Compared with a density of one gene every 7.5 kb in the CesA region of homoeologous BACs , the AdhA region of homoeologous BACs exhibits one gene per 20 kb for the A-subgenome and one gene every 13 kb for the D-subgenome . These data led to the prediction of more than 70,000 genes with one gene per 34.5 kb in upland cotton. Because upland cotton is an allotetraploid and has duplicated copies of genes in homoeologous regions of the A- and D-subgenomes, approximately 35,000 genes were predicted in each subgenome. In tetraploid cotton, the distribution of genes is uneven, with gene-rich and gene-free regions. We also found 21% of BACs lacked genes and 72.5% of the gene islands contained only one gene. These results indicated that selecting only gene-rich BACs for cotton genome sequencing is not adequate to cover the entire genome, owing to the fact that more than one fifth of BACs exhibit an absence of genes.
In this study, 1,653 predicted gene models lacked homology to other species in the NCBI protein database. In addition, we verified 208 ESTs by BLASTN queries against the cotton EST database. However, we could not confirm if these transcripts were related to mobile elements, gene candidates, or special products in cotton. Therefore, we have not used the information to predict the structure and composition of the upland cotton genome. However, the functions and properties of these transcripts warrant further study to enhance the understanding of the complex upland cotton genome.
Structure difference between A- and D-subgenome chromosomes
In plants, the following factors have been summarized as the main mechanisms for genome size expansion: (1) long terminal repeat (LTR) retrotransposable element amplification and insertion such as that in maize ; (2) variation in intron size ; (3) expansion of tandemly repetitive DNA sequences ; (4) segmental duplications ; (5) accumulation of pseudogenes ; and (6) transfer of organellar DNA to the nucleus . The cultivated cotton species Gossypium hirsutum has long been known as an allotetraploid possessing a nuclear A- and D-subgenome. A- and D-genome species diverged from a common ancestor approximately 5–10 Mya and acquired genomes that differ nearly twofold in size . Based on the putative mechanisms of genome size expansion described above, it is uncertain which of the mechanism(s) played an important role in the composition and structure of the tetraploid cotton genomes. To explore this question, several studies have been initiated through comparative sequence analysis of specific genomic regions or by application of more global approaches [14, 16, 18]. Grover et al. (2004) investigated A- and D-genome size evolution from tetraploid cotton in a 104 kb contiguous sequence surrounding the CesA1 gene, and demonstrated no evidence of genome size variation between the A- and D-subgenome genic regions. In a similar study, Grover et al. (2007) obtained the aligned length surrounding the AdhA gene with 101.7 kb in the A-subgenome, 49 kb in the D-subgenome, 112.3 kb from the diploid A-genome and 55 kb from the diploid D-genome. The results revealed the aligned length size variation was mainly attributed to differential accumulation of retroelements. Hawkins et al. (2006) compared diploid A- and D-genome size differences by utilizing the whole genome shotgun (WGS) method and concluded that 40%–65% of each genome is composed of transposable elements, with Copia-like sequences accumulated in smaller genomes and Gypsy-like sequences in larger genomes.
Based on the sequence analysis of 37 subgemone-known BACs, we found no relationship between introns and different subgenome size in Gossypium. However, an average of 5.75 gaps/BAC indicated an increased number of gaps, lending difficulty to BAC assembly in the A-subgenome. The D-subgenome had an average of 1.48 gaps/BAC, demonstrating that BACs from the A-subgenome are more difficult for sequence assembly than those from the D-subgenome. This and previous studies revealed the presence of homeolog sequence and structure conservation in gene-rich regions, suggesting large amplification of transposable elements may not be in gene-rich regions, but may reside in low-density gene or gene-free regions. In future studies, the structure and function of DNA sequences in these gap regions can be confirmed by whole BAC sequence assembly analysis; and A-specific and D-specific regions related with transposable elements can be located using combined BAC-FISH technology.
The D-subgenome has a more rapid evolutionary rate in different tetraploid cotton species
Sequence and marker analyses from several previous studies indicated that varied evolutionary pressures might act on the D-subgenomes from different tetraploid cotton species. In both G. hirsutum and G. barbadense, the D-subgenome maintained greater nucleotide and allelic diversity than did the A-subgenome, results supported by duplicated paralogous Adh loci comparisons [34, 35]. In addition, G. raimondii-derived EST-SSR markers had high polymorphic frequencies between G. hirsutum and G. barbadense . In this paper, we investigated whether BACs were characterized by an A- or D-subgenome. SSR marker BACs were largely tagged in the D-subgenome determined by integration of polymorphic marker loci with our tetraploid cotton backbone linkage groups. Our results further confirmed previous studies where sequence and structure conservation of homeologs between the A- and D- subgenomes was high. These data are consistent with the evolutionary history of tetraploid cotton progenitors, where diploid A- and D-genome species were derived from the same ancestor approximately 5–10 Mya. Alternatively, relaxed selection acted on the D-subgenomes from different tetraploid cotton species, evidenced by greater DNA sequence diversity among D-subgenomes than A-subgenomes in different tetraploid cotton species.
The study provided us the first glimpse at cotton genome complexity, and the results indicated that the gene distribution in cotton genome is uneven with gene-rich and gene-free regions, and rich in repetitive elements. This study will serve as a foundation for tetraploid cotton whole genome sequencing in the future.
One hundred forty-five cotton BAC sequences were downloaded from the National Center for Biotechnology Information (NCBI)  on June 2, 2007. As part of the maize sequencing project by the Genome Sequencing Center, Washington University School of Medicine, the BACs were initially submitted as Zea mays. However, further analysis, determined the clones were from Gossypium hirsutum cv. Maxxa. The sequence data used in this paper were the product of collaborative efforts by The Maize Sequencing Consortium, including the University of Arizona, Cold Spring Harbor Laboratory, Iowa State University, and the Genome Sequencing Center at Washington University School of Medicine in St. Louis. We selected 142 from 145 BACs with sizes > 20 kb for gene annotation. A 32,101 bp length gap region from AC189045 was later excluded because the predicted genes were phage related and it was decided the sequence data were contaminated. Finally, 142 BACs spanning nearly 14.2 Mb (0.5%) of the cotton genome were used for the analysis.
Genetic mapping of BAC clones based on the simple sequence repeats (SSRs)
Each BAC was searched for SSRs with the online software SSRIT . SSRIT, written in Perl script, is a microsatellite search tool available at the USDA-ARS Center for Bioinformatics and at Comparative Genomics at Cornell University. Dinucleotide, trinucleotide, tetranucleotide, pentanucleotide and hexanucleotide SSRs were detected with SSRIT. The search standards for different repeat motifs were as described in Wang et al. (2006) . Primer pairs flanking the SSRs were designed using the program Primer3.0  and tested against our mapping parents, G. barbadense cv. Hai7124 and G. hirsutum acc. TM-1, standard lines for genetic and genomic research. Furthermore, the polymorphic SSRs were integrated into our backbone genetic map of allotetraploid cultivated cotton  using Joinmap 3.0 software with a minimum log-of-odds (LOD) score of 6.0. The structure of known BACs was further identified using mapping results and molecular size comparisons among G. hirsutum cv. Maxxa, G. hirsutum acc.TM-1, and G. barbadense cv. Hai7124, with diploid G. herbaceum var. africanum and G. raimondii as controls.
Annotation of LTR retrotranspons and other mobile elements
Repetitive element prediction was accomplished through Repeatmasker , CENSOR , and BLAST identity to characterize elements in REPBASE (version 8.5) . Compared with the results of repetitive element prediction, LTR retrotranspons were further identified by LTR_finder software , and manually verified by structural features such as LTR and TSD pairs, a primer binding site and a polypurine tract.
Sequence analysis and gene annotation
BAC sequences were subject to three ab initio gene prediction programs, FGENESH (Softberry) , GENSCAN+ and GENEMARK.HMM . Gene models provided query sequences to search the National Center for Biotechnology Information (NCBI) non-redundant protein database and the Arabidopsis thaliana protein database . All BLASTP hits were manually evaluated to determine if a gene model was likely to be a real gene or not based on e-value, query alignment and hit annotation. The integral parts of known repetitive elements were removed from the above gene models for further analysis. In addition, the sequences with gene models but no annotations were subjected to BLASTN queries against the cotton EST database released in the NCBI website . Gene Ontology (GO) of tandem duplication genes was obtained from UniProt Gene Ontology . The GO values for the best homologous hits were used to determine the ontology of molecular function, cellular components and biological processes for these sequences.
Bacterial Artificial Chromosome
Fluorescence In Situ Hybridization
Quantitative Trait Loci
Long Tandem Repeat
Target Site Duplications
Simple Sequence Repeat
Whole Genome Shotgun.
We gratefully acknowledge the Maize Sequencing Consortium in USA, including the University of Arizona, Cold Spring Harbor Laboratory, Iowa State University, and the Genome Sequencing Center at Washington University School of Medicine in St. Louis for their free release of cotton BACs sequence data. This program was financially supported in part by National Science Foundation in China (30671324, 30730067), the Program for New Century Excellent Talents in University (NCET-04-0500), Jiangsu Natural Science Foundation, China (BK2007719), and the Program for 111 project (B08025).
- Fryxell PA: A revised taxonomic interpretation of Gossypium L. (Malvaceae). Rheedea. 1992, 2: 108-165.Google Scholar
- Wendel JF, Cronn RC: Polyploidy and the evolutionary history of cotton. Adv Agron. 2003, 78: 139-186.View ArticleGoogle Scholar
- National Cotton Council, USA. [http://www.cotton.org/]
- Jiang C, Wright RJ, El-Zik KM, Paterson AH: Polyploid formation created unique avenues for response to selection in Gossypium. Proc Natl Acad Sci USA. 1998, 95: 4419-4424.PubMed CentralView ArticleGoogle Scholar
- Kohel RJ, Yu J, Park Y-H, Lazo GR: Molecular mapping and characterization of traits controlling fiber quality in cotton. Euphytica. 2001, 121: 163-172.View ArticleGoogle Scholar
- Park YH, Alabady MS, Ulloa M, Sickler B, Wilkins TA, Yu J, Stelly DM, Kohel RJ, el-Shihy OM, Cantrell RG: Genetic mapping of new cotton fiber loci using EST-derived microsatellites in an interspecific recombinant inbred line cotton population. Mol Gen Genomics. 2005, 274: 428-441.View ArticleGoogle Scholar
- Paterson AH, Saranga Y, Menz M, Jiang CX, Wright RJ: QTL analysis of genotype × environmental interactions affecting cotton fiber quality. Theor ApplGenet. 2003, 106: 384-396.Google Scholar
- Shen XL, Guo WZ, Zhu XF, Yuan YL, Zhang TZ: Molecularmapping of QTLs for qualities in three diverse lines in Upland cottonusing SSR markers. Mol Breed. 2005, 15: 169-181.View ArticleGoogle Scholar
- Ulloa M, Saha S, Jenkins JN, Meredith WR, McCarty JC, Stelly DM: Chromosomal assignment of RFLP linkage groups harboring important QTLs on an intraspecific cotton (Gossypium hirsutum L.) joinmap. J Hered. 2005, 96: 132-144.View ArticleGoogle Scholar
- Applequist WL, Cronn R, Wendel JF: Comparative development of fiber in wild and cultivated cotton. Evol Dev. 2001, 3: 3-17.View ArticleGoogle Scholar
- Saha S, Raska DA, Stelly DM: Upland cotton (Gossypium hirsutum L.) × Hawaiian cotton (G. tomentosum Nutt. ex. Seem) F1 hybrid hypoaneuploid chromosome substitution series. J Cotton Sci. 2006, 10: 146-154.Google Scholar
- Yang SS, Cheung F, Lee JJ, Ha M, Wei NE, Sze SH, Stelly DM, Thaxton P, Triplett B, Town CD, Chen JZ: Accumulation of genome-specifictranscripts, transcription factors and phytohormonal regulatorsduring early stages of fiber cell development in allotetraploidcotton. Plant J. 2006, 47: 761-775.PubMed CentralView ArticleGoogle Scholar
- Chen JZ, Scheffler BE, Dennis E, Triplett B, Zhang T, Guo W, Chen X, Stelly DM, Rabinowicz PD, Town C: Towards Sequencing Cotton (Gossypium) Genomes. Plant Physiol. 2007, 145: 1303-1310.PubMed CentralView ArticleGoogle Scholar
- Grover CE, Kim HR, Wing RA, Paterson AH, Wendel JF: Incongruent patterns of local and global genome size evolution in cotton. Genome Res. 2004, 14: 1474-1482.PubMed CentralView ArticleGoogle Scholar
- Zhao XP, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, Wendel JF, Paterson AH: Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Res. 1998, 8: 479-492.Google Scholar
- Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF: Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 2006, 16: 1252-1261.PubMed CentralView ArticleGoogle Scholar
- U.S. Department of Energy Joint Genome Institutes. [http://www.jgi.doe.gov/]
- Grover CE, Kim HR, Wing RA, Paterson AH, Wendel JF: Microcolinearity and genome evolution in the AdhA region of diploid and polyploidy cotton (Gossypium). Plant J. 2007, 50: 995-1006.View ArticleGoogle Scholar
- Bennetzen JL, Coleman C, Liu R, Ma J, Ramakrishna W: Consistent over-estimation of gene number in complex plant genomes. Curr Opin Plant Biol. 2004, 7: 732-736.View ArticleGoogle Scholar
- Devos KM, Ma J, Pontaroli AC, Pratt LH, Bennetzen JL: Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat. Proc Natl Acad Sci USA. 2005, 102: 19243-19248.PubMed CentralView ArticleGoogle Scholar
- Liu R, Vitte C, Ma J, Mahama AA, Dhliwayo T, Lee M, Bennetzen JL: A GeneTrek analysis of the maize genome. Proc Natl Acad Sci USA. 2007, 104: 11844-11849.PubMed CentralView ArticleGoogle Scholar
- NCBI. [http://www.ncbi.nlm.nih.gov]
- Graham MA, Marek LF, Shoemarker RC: Organization, expression and evolution of a disease resistance gene cluster in soybean. Genetics. 2002, 162: 1961-1977.PubMed CentralGoogle Scholar
- Yang C, Guo WZ, Zhang TZ: Molecular mapping of QTL for Verticillium wilt resistance in G. barbadense L. Plant Sci. 2008, 174: 290-298.View ArticleGoogle Scholar
- Guo W, Cai C, Wang C, Han Z, Song X, Wang K, Niu X, Wang C, Lu K, Shi B: A microsatellite-based, gene-rich linkage map reveals genome structure, function and evolution in Gossypium. Genetics. 2007, 176: 527-541.PubMed CentralView ArticleGoogle Scholar
- Kim HJ, Triplett BA: Cotton fiber growth in planta and in vitro: Models for plant cell elongation and cell wall biogenesis. Plant Physiol. 2001, 127: 1361-1366.PubMed CentralView ArticleGoogle Scholar
- Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B: Structure and architecture of the maize genome. Plant Physiol. 2005, 139: 1612-1624.PubMed CentralView ArticleGoogle Scholar
- SanMiguel P, Bennetzen JL: Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann Bot (Lond). 1998, 82: 37-44.View ArticleGoogle Scholar
- Deutsch M, Long M: Intron-exon structure of eukaryotic model organisms. Nucleic Acids Res. 1999, 27: 3219-3228.PubMed CentralView ArticleGoogle Scholar
- Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002, 30: 194-200.View ArticleGoogle Scholar
- Wendel JF: Genome evolution in polyploids. Plant Mol Biol. 2000, 42: 225-249.View ArticleGoogle Scholar
- Zhang J: Evolution by gene duplication: An update. Trends Ecol Evol. 2003, 18: 292-298.View ArticleGoogle Scholar
- Adams KL, Palmer JD: Evolution of mitochondrial gene content: Gene loss and transfer to the nucleus. Mol Phylogenet Evol. 2003, 29: 380-395.View ArticleGoogle Scholar
- Small RL, Wendel JF: Differential evolutionary dynamics of duplicated paralogous Adh loci in allotetraploid cotton (Gossypium). Mol Biol Evol. 2002, 19: 597-607.View ArticleGoogle Scholar
- Small RL, Ryburn JA, Wendel JF: Low levels of nucleotide diversity at homoeologous Adh loci in allotetraploid cotton (Gossypium L.). Mol Biol Evol. 1999, 16: 491-501.View ArticleGoogle Scholar
- Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S: Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. 2001, 11: 1441-52.PubMed CentralView ArticleGoogle Scholar
- Wang CB, Guo WZ, Cai CP, Zhang TZ: Characterization, development and exploitation of EST-derived microsatellites in Gossypium raimondii Ulbrich. Chin Sci Bull. 2006, 51: 557-561.View ArticleGoogle Scholar
- Primer3.0. [http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi]
- Repeatmasker. [http://www.repeatmasker.org]
- Jurka J, Klonowski P, Dagman V, Pelton P: CENSOR – a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem. 1996, 20: 119-121.View ArticleGoogle Scholar
- Jurka J: Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 9: 418-420.View ArticleGoogle Scholar
- LTR_finder software. [http://tlife.fudan.edu.cn/ltr_finder]
- FGENESH. [http://www.softberry.com]
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268: 78-94.View ArticleGoogle Scholar
- Lukashin A, Borodovsky M: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998, 26: 1107-1115.PubMed CentralView ArticleGoogle Scholar
- The Arabidopsis thaliana protein database. [ftp://ftp.tigr.org/pub/data/a_thaliana]
- Udall JA, Swanson JM, Haller K, Rapp RA, Sparks ME, Hatfield J, Yu Y, Wu Y, Dowd C, Arpat AB: A global assembly of cotton ESTs. Genome Res. 2006, 16: 441-450.PubMed CentralView ArticleGoogle Scholar
- UniProt Gene Ontology. [http://www.geneontology.org/GO.current.annotations.shtml]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.