We have developed a large-insert BIBAC library for Upland cotton in a BIBAC vector (pCLD04541) that is competent for direct plant transformation via both Agrobacterium[29, 30] and biolistic bombardment . The library contains 76,800 clones, with an average insert size of 135 kb. Therefore, the Upland cotton BIBAC library has a reasonably large average insert size. The 76,800 clones of the BIBAC library, when the insert-empty clones and the clones derived from chloroplast DNA are excluded, provide a 4.1× coverage of the Upland cotton haploid genome, with an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy sequence probe. The results from the library screening with 13 gene-specific overgo probes designed from the unique sequences of genes, in which we obtained 2–8 positive clones for every probe, support the estimated genome coverage of the library, even though some of the probes might hybridize to two or more copies of the sequences because Upland cotton is an allotetraploid containing two homoeologous genomes, A- and D-subgenomes. Importantly, differing from the existing cotton BAC libraries [11–15], this BIBAC library is the first one competent for direct plant transformation via both Agrobacterium[29, 30] and biolistic bombardment . The plant transformability of the BIBAC library streamlines the map-based cloning of cotton genes and QTLs (Introduction) and large-scale functional analysis of the cotton genome through high-throughput BIBAC transformation [22, 24]. Therefore, the BIBAC library reported here will promote many aspects of cotton genome research, including map-based gene and QTL cloning, genome physical mapping, genome sequencing and functional analysis of the Upland cotton genome.
We have used the BIBAC library to isolate BIBACs containing or closely linked to the genes controlling fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and cotton bacterial blight resistance. This experiment has not only further verified the quality and demonstrated the utility of the BIBAC library for cotton genomics research, but also provided the tools essential for characterization of the genes at the genomic level and promoted the use of the genes in molecular breeding through BIBAC transformation. Using the BIBACs, we have developed a high-throughput system for functional analysis of the entire Upland cotton genome in the plant model species, Arabidopsis thaliana, with which the entire Upland genome could be transformed into Arabidopsis within approximately three years by one scientist (M.P. Z, Y.Z, D.M S, and H.-B. Z., unpublished).
Furthermore, the BESs generated in this study provide 9,711 STSs (sequence-tagged sites), with one STS in approximately every 250 kb along the Upland cotton genome. These STSs will facilitate integrative physical and genetic mapping, and sequencing of the Upland cotton genome using the next-generation sequencing technology [47, 48]. Using the BIBAC library, we have constructed separate genome-wide physical maps for both the A- and D-subgenomes of Upland cotton cv. TM-1 and identified the minimal tiling path (MTP) clones (15,277 clones) spanning the physical map . We have also initiated a project of sequencing the Upland cotton cv. TM-1 genome based on the BIBAC library using the next-generation sequencing technology (D.M. S., H.-B. Z.). We are developing the integrated physical and genetic maps of individual chromosomes of the Upland cotton genome, with 1–10 contigs per chromosome, from this BIBAC library and a BAC library, which will be necessary for assembling the chromosome-sized pseudomolecules of the Upland cotton genome sequence.
Analysis of the BESs has shown that the 9,711 BESs, with a length range from 50 – 842 bp and an average length of approximately 400 bp, contain about 3,000 exons. If the exons of cotton genes have an average length of 200 – 500 bp [49, 50], the 3,000 exons are likely from approximately 3,000 genes, suggesting that approximately 30% of the 9,711 BESs are parts of genes. Among the transposable elements, the LTR retroelement family, Gypsy/DIRS1, is the most predominant in the Upland cotton genome, representing over 77% of the transposable elements in the genome. This result is consistent with that of Hawkins et al.  estimated for the genomes of its diploid putative progenitor species, G. raimondii and G. herbaceum, by random shotgun clone sequencing. The more than 1,000 new SSRs (79% of the SSRs) identified from the BESs indicate the abundance of SSR loci in the Upland cotton genome and provide additional tools for cotton genome analysis, and gene and QTL mapping.
The comparative BES analysis has shown that the Upland cotton genome has significantly diverged from the G. raimondii genome. The unexpectedly high genome divergence between the two species could be attributed to the rapid evolution of the G. raimondii genome, the polyploidization and post-polyploidization evolution of Upland cotton, or both. However, the hypothesis of the rapid evolution of the G. raimondii genome does not appear to be compatible with the finding of this study that the A-subgenome of Upland cotton has the same level of similarity as its D-subgenome when compared to the G. raimondii genome, even though G. raimondii has only a D genome. Therefore, the rapid genome evolution after polyploidization must have occurred in Upland cotton, or both Upland cotton and G. raimondii. Moreover, it appears that an extensive element exchange has occurred between the A- and D-subgenomes of Upland cotton during the process of and/or after polyploidization, thus leading to a higher similarity between the two subgenomes at the genomic element sequence level. Therefore, when the A- and D-subgenome specific BESs were aligned to a third genome - the G. raimondii genome in this study, similar alignment results were obtained. Zhang et al.  studied the gene number variation of nucleotide-binding site (NBS)-encoding gene family and receptor-like kinase (RLK)-encoding gene family between the tetraploid cottons including Upland cotton and their diploid putative progenitor species including G. raimondii and G. herbaceum. They found that the tetraploid cottons have similar numbers of NBS and RLK genes as their diploid putative progenitor species, suggesting that a large number of genes in the families were lost during the process of and/or after the cotton polyploidizaiton. It has been found recently that most genome-constituent fundamental function elements of rice, including genes, DNA transposable elements, simple sequence repeats and low complexity repeats, have a very low content variation among different chromosomes even though they are non-homologous, suggesting the existence of genomic element exchanges among chromosomes within a genome. These results provide indirect, but strong, support for the above hypothesis. (Liu Y-H, Zhang MP, Wu C, Huang JJ, Zhang H-B: DNA is structured as a linear “Jigsaw Puzzle” in the genomes of Arabidopsis, rice and budding yeast, submitted for publication). However, further studies remain to determine the molecular mechanisms underlying the rapid genome evolution and genomic element exchanges between subgenomes in the polyploid cotton cells in the process of post-polyploidization.