A combined functional and structural genomics approach identified an EST-SSR marker with complete linkage to the Ligon lintless-2 genetic locus in cotton (Gossypium hirsutum L.)

Background Cotton fiber length is an important quality attribute to the textile industry and longer fibers can be more efficiently spun into yarns to produce superior fabrics. There is typically a negative correlation between yield and fiber quality traits such as length. An understanding of the regulatory mechanisms controlling fiber length can potentially provide a valuable tool for cotton breeders to improve fiber length while maintaining high yields. The cotton (Gossypium hirsutum L.) fiber mutation Ligon lintless-2 is controlled by a single dominant gene (Li2) that results in significantly shorter fibers than a wild-type. In a near-isogenic state with a wild-type cotton line, Li2 is a model system with which to study fiber elongation. Results Two near-isogenic lines of Ligon lintless-2 (Li2) cotton, one mutant and one wild-type, were developed through five generations of backcrosses (BC5). An F2 population was developed from a cross between the two Li2 near-isogenic lines and used to develop a linkage map of the Li2 locus on chromosome 18. Five simple sequence repeat (SSR) markers were closely mapped around the Li2 locus region with two of the markers flanking the Li2 locus at 0.87 and 0.52 centimorgan. No apparent differences in fiber initiation and early fiber elongation were observed between the mutant ovules and the wild-type ones. Gene expression profiling using microarrays suggested roles of reactive oxygen species (ROS) homeostasis and cytokinin regulation in the Li2 mutant phenotype. Microarray gene expression data led to successful identification of an EST-SSR marker (NAU3991) that displayed complete linkage to the Li2 locus. Conclusions In the field of cotton genomics, we report the first successful conversion of gene expression data into an SSR marker that is associated with a genomic region harboring a gene responsible for a fiber trait. The EST-derived SSR marker NAU3991 displayed complete linkage to the Li2 locus on chromosome 18 and resided in a gene with similarity to a putative plectin-related protein. The complete linkage suggests that this expressed sequence may be the Li2 gene.

Results: Two near-isogenic lines of Ligon lintless-2 (Li 2 ) cotton, one mutant and one wild-type, were developed through five generations of backcrosses (BC 5 ). An F 2 population was developed from a cross between the two Li 2 near-isogenic lines and used to develop a linkage map of the Li 2 locus on chromosome 18. Five simple sequence repeat (SSR) markers were closely mapped around the Li 2 locus region with two of the markers flanking the Li 2 locus at 0.87 and 0.52 centimorgan. No apparent differences in fiber initiation and early fiber elongation were observed between the mutant ovules and the wild-type ones. Gene expression profiling using microarrays suggested roles of reactive oxygen species (ROS) homeostasis and cytokinin regulation in the Li 2 mutant phenotype. Microarray gene expression data led to successful identification of an EST-SSR marker (NAU3991) that displayed complete linkage to the Li 2 locus.
Conclusions: In the field of cotton genomics, we report the first successful conversion of gene expression data into an SSR marker that is associated with a genomic region harboring a gene responsible for a fiber trait. The ESTderived SSR marker NAU3991 displayed complete linkage to the Li 2 locus on chromosome 18 and resided in a gene with similarity to a putative plectin-related protein. The complete linkage suggests that this expressed sequence may be the Li 2 gene.

Background
Cotton seed fibers are initially ovule epidermal cells that terminally differentiate into fiber cells typically on the DOA. Approximately 25% of the ovule epidermal cells differentiate into fiber cells during the initiation stage of cotton fiber development and subsequently undergo a period of rapid elongation known as the elongation stage [1,2]. The rate of fiber elongation peaks at approximately 6 to 12 DPA and nears cessation at 22 DPA [3]. During peak elongation fiber cells can increase in length at rates of 2 mm/day or more depending on environment and genotype [1,4]. The length of fibers is mostly variety specific, but can also be affected by environmental conditions such as temperature during the elongation stage of development [5]. The elongation stage is followed by a brief period known as the transition stage that usually begins from 12 to 16 DPA in field conditions and depending on environmental factors such as lower temperatures that are shown to delay the onset of the transition stage [6]. The SCW stage immediately follows transition and is characterized by a dramatic increase in SCW-related gene transcripts like cellulose synthases and changes in cell wall composition as large amounts of cellulose are deposited in the SCW [7]. The SCW stage persists until about 32 DPA at which time the fiber cell is composed of approximately 95% cellulose with the remaining 5% of non-cellulosic materials comprised of proteins, polysaccharides, pectins, and waxes that reside mostly in the PCW and cuticle [3,8]. The final stage of fiber development is maturation that ceases from 40 to 60 DPA depending on environment and genotype [9]. At this time the cotton bolls crack and open, exposing the seed fibers to external ambient conditions causing them to desiccate and take on the fluffy appearance normally associated with cotton fibers.
Varieties of cultivated Upland cotton (Gossypium hirsutum L.) that display fiber mutation phenotypes including lintless and fuzzless seeds were first described in the early twentieth century [10,11]. Currently, numerous naturally occurring cotton fiber mutations have been identified globally and characterized at the genetic, and more recently, gene expression levels [12][13][14][15][16]. These fiber mutations include, among others, the glabrous seeds in the fiberless mutant lines MD17, SL1-7-1, and XZ142w [17,18]; seeds with only lint fibers and no fuzz fibers in the Naked seed lines N 1 [11] and n 2 [19]; and seeds that are described as extremely short lint fibers in the Ligon lintless-1 (Li 1 ) [20] and Ligon lintless-2 (Li 2 ) [21] mutant lines. The fiber mutations of N 1 , Li 1 , and Li 2 are single gene dominant traits [20,21] while the n 2 fiber mutation is a single gene recessive trait [19]. Recent genetic studies on the Li 2 mutation also indicate that it may have incomplete penetrance as evidenced by mutant and WT fibers at different boll locations on the same plants [22], or possibly phenotype variation due to epigenetics. These naturally occurring mutants and their wild-type fiber NILs provide a unique and powerful model system to study cotton fibers at various stages of development including initiation, elongation, and secondary cell wall biosynthesis.
The Li 1 gene was mapped to chromosome 22 using SSR [23] and RFLP [24] markers while the Li 2 gene was mapped to chromosome 18 by phenotype association with cotton aneuploid stocks [25], and linkage analysis by RFLP markers [24]. More recently, a draft of the physical map of the diploid cotton D-genome progenitor G. raimondii was released and used along with tetraploid cotton A-and D-subgenome genetic maps to generate a consensus genetic-physical map of the cotton genome that included flanking markers for the Li 2 gene on chromosome 18 [26]. These were RFLP markers designated A1552 and Gate4BC11 that flanked the Li 2 gene in an 8.9 cM region and were mapped using an interspecific F 2 mapping population composed of 158 individuals from a cross between G. hirsutum Li 2 and G. barbadense cv. Pima S-7. Previously, another RFLP marker designated Gate4BF10 was reported to flank the Li 2 gene along with A1552 in a 1.5 cM region of chromosome 18 [24]. However, the more recently released cotton consensus genetic-physical map developed by the same laboratory indicated that Gate4BF10 was mapped at two locations on chromosome 18 [26], which leave the mapping accuracy of this marker in doubt. In a separate study, two Li 2 F 2 segregating populations were developed and used to screen SSR markers for linkage to the Li 2 genetic locus. The closest SSR marker was mapped to chromosome 18 and located 6.051 cM from the Li 2 gene in an interspecific segregating population, and 9.266 cM from the Li 2 gene in an intraspecific segregating population [27].
In a near-isogenic state with the cotton line Texas Marker-1 (TM-1), both the Li 1 and Li 2 mutants have seed fibers that are extremely short (< 6 mm) compared to WT fibers that are typically greater than 20 mm in length [20,21,28]. As a monogenic dominant trait, the short-fiber phenotypes of Li 1 and Li 2 are identical in either a homozygous dominant or heterozygous state. Unlike the Li 1 mutant, which exhibits pleiotropy in the form of severely stunted and deformed plants in both the homozygous dominant and heterozygous state [20], the Li 2 mutant plants appear healthy and morphologically identical to the homozygous recessive wild-type plants with the exception of shorter seed fibers [21].
Cytological evidence suggests that the seed fibers of Li 1 mutants undergo initiation in the same manner as WT fibers, but begin to show some distorted morphological features during the early elongation stage of development [23]. Since the seed fibers of Li 1 and Li 2 fibers are shortened lint fibers, these cotton mutants represent excellent candidates to study the molecular mechanisms of fiber elongation. A recent gene expression study using microarrays on Li 1 mutant and WT cotton NILs that focused on the SCW stage of fiber development identified genes potentially responsible for the phenotypic differences observed in mutant Li 1 fibers compared to WT fibers [12]. Several genes in particular were differentially expressed during SCW biosynthesis that could potentially be involved with the Li 1 phenotype including EXPANSINS, tubulin genes, sucrose synthase (SuSy), and genes encoding MYB transcription factors [12].
Since extensive research has been ongoing with the Li 1 mutant, our laboratory selected the Li 2 mutant in an established near-isogenic state with the Upland cotton variety DP5690 as a model system to study fiber elongation events using a combined functional and structural genomics approach of microarray gene expression and molecular marker analysis. The Li 2 mutant was also selected due to concerns over the pleiotropic effects of the Li 1 mutation on development timing and the desire to harvest fiber samples from mutant and WT plants simultaneously. Understanding the molecular events that control fiber elongation and identifying regulatory elements involved in this process can provide cotton researchers and breeders with means of improving fiber length while maintaining yield either through markerassisted selection or a transgenic approach. The main objective of this research was to identify genes that were differentially expressed during the development of WT and mutant Li 2 fibers and convert the gene expression data into portable molecular markers for use in association mapping to identify the Li 2 locus.
Here we report: 1) the development of two Li 2 NILs of cotton (G. hirsutum) in the backcross five (BC 5 ) generation; 2) no apparent phenotypic differences in seed fibers of mutant Li 2 Li 2 plants compared to WT li 2 li 2 plants during the initiation and early elongation stages of fiber development; 3) mapping the Li 2 locus with SSR markers; 4) the identification of genes differentially expressed in fibers of mutant Li 2 plants compared to WT plants using microarray gene expression analysis on selected developmental time-points; 5) confirmation of microarray gene expression profiles by RT-qPCR; 6) successful conversion of differentially expressed genes into EST-SSR markers; and 7) the identification of an EST-SSR marker representing a putative plectin-related protein or regulatory element that has complete linkage to the Li 2 genetic locus suggesting it may be the Li 2 gene.

Plant materials and greenhouse experimental design
Two NILs of Li 2 Upland cottons that were homozygous dominant (Li 2 Li 2 ) and homozygous recessive (li 2 li 2 ) for the Li 2 locus were developed in a backcross program at Stoneville, MS in field and greenhouse environments. Mutant Texas marker-1 (TM-1) cotton plants containing the Li 2 gene were crossed with the Upland cotton variety DP5690 and F 1 progeny were backcrossed for five generations (BC 5 ) by SSD to DP5690 which served as the recurrent parent in each backcross. The DP5690 recurrent parent was a pure inbred line that was selfpollinated for nine generations via SSD. Progeny in each backcross were selected based on phenotype for the Li 2 short-fiber mutation. The pedigree of the two Li 2 NILs is detailed in Additional file 1.
A total of 102 Li 2 cotton plants and 80 WT li 2 li 2 cotton plants were planted in the greenhouse on six tables.
The mutant Li 2 plants were six BC 5 F 3 lines that originated via SSD. A total of 17 individual plants were grown for each line to confirm that they were homozygous Li 2 Li 2 [29]. Once the genotypes were confirmed the populations were culled to 72 mutant Li 2 Li 2 plants and 72 WT li 2 li 2 plants that were placed in the same greenhouse on six tables in a randomized complete block design. The individual mutant and WT cotton plants were labeled into three pools representing three biological replicates. Cotton bolls were harvested at the following time-points during development: -3, -1, 0, 1, 3, 5, 8, 12, 16, and 20 DPA. Bolls from the same cotton line, biological replicate, and developmental time-point were harvested from all six tables to account for environmental variability within the greenhouse and bulked for subsequent analyses. The number of bolls per bulked sample varied according to developmental time-point, with a greater number of bolls required for the earliest time-point to ensure sufficient biological material and progressively fewer bolls required for each successive time-point. For example, at opposite ends of the developmental time-course, ovules from approximately 20 -30 bolls were bulked for each -3 DPA sample, and ovules with fibers attached from approximately 8 -10 bolls were bulked for each 20 DPA sample. Harvested bolls were placed immediately on ice and transported to the laboratory where they were dissected on ice and the majority of the ovules frozen in liquid nitrogen and stored at -80°C. A small number of ovules from each sample from the -1 to 5 DPA time-points were used for SEM as described herein.

Mapping population
A mutant Li 2 Li 2 homozygous plant was used as the female in a cross with its near-isoline WT li 2 li 2 DP5690. One hundred and thirty-six F 2 plants were planted in the field in Stoneville, MS in 2009. The Li 2 trait of each F 2 progeny plant was evaluated twice at approximately 30 DPA and after boll maturation and opening.

SSR marker analysis and genetic mapping
Young leaves were collected from each one of the F 2 plants in the described mapping population. Total DNA was extracted from fresh leaves using 2.0% hexadecyltrimethylammonium bromide [30]. DNA was purified using Omega EZNA ® DNA isolation column (Omega Bio-Tek, Norcross, GA). As previously reported, the Li 2 locus resides on chromosome 18 [24,25,27]. To rapidly identify SSR markers closely linked to the Li 2 locus, we first selected 86 SSR markers that were previously mapped on either chromosome 18 or its homeologous chromosome 13 based on several published maps [31][32][33][34][35]. The probes of the RFLP markers reported by Rong et al. (2005) [24] were not available to us, and thus were not evaluated in our population. Bulked segregant analysis [36] was then used to identify potentially linked markers. For the Li 2 bulk, DNA of 10 F 2 plants that had the Li 2 phenotype were pooled at equal ratio and diluted to 10 ng/μL. The WT bulk consisted of pooled DNA from 10 F 2 progeny that had normal lint phenotype. SSR primers that generated polymorphic patterns between bulks were tested using the 20 individual DNA samples that were included in the bulks. The markers linked to the Li 2 locus were analyzed on 136 individual F 2 progeny plants as previously described [37]. All SSR primer sequences can be obtained from Cotton Marker database (http://www.cottonmarker.org) except DPL0547. The primer sequences of the SSR markers associated with Li 2 locus are listed in Additional file 2. Segregation data for the Li 2 trait and SSR markers were mapped using program JoinMap3.0 [38,39] with logarithm of odds score = 25.

Scanning electron microscopy
To prepare the samples for SEM analysis, cotton ovules were placed in tissue fixative consisting of 3% (v/v) glutaraldehyde in 0.1 M sodium phosphate, pH 7.0 and stored at 4°C. The time-points utilized for SEM were -1 to 5 DPA. After fixation, the cotton ovules were dehydrated in a graded ethanol series starting from 20% (v/v) up to 100% (v/v) ethanol. After three changes of 100% ethanol, the ovules were placed in American Optical microporous specimen capsules under 100% ethanol [40] and critical point dried from liquid carbon dioxide by standard methodology in a Ladd Critical Point Dryer model 28,000 (Ladd Research, Williston, VT). The ovules were mounted on standard Cambridge SEM stubs using double-stick Avery photo tabs, #06001. The SEM mounts were coated with 60/40 gold/palladium using a Hummer™ II Sputter Coater (Ladd Research, Williston, VT) to a thickness of 200 nm. The specimens were examined in a XL30 Environmental Scanning Electron Microscope (FEI Company, Hillsboro, OR) at an accelerating voltage from 10-15 kV under high vacuum conditions.

Cotton fiber total RNA isolation
Cotton fibers were isolated from developing ovules using a glass bead shearing technique to separate fibers from the ovules [41]. Total RNA was isolated from detached fibers using the Sigma Spectrum™ Plant Total RNA Kit (Sigma-Aldrich, St. Louis, MO) with the optional oncolumn DNase1 digestion according to the manufacturer's protocol. The concentration of each RNA sample was determined using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE). The RNA quality for each sample was determined by RNA integrity number (RIN) using an Agilent Bioanalyzer 2100 and the RNA 6000 Nano Kit Chip (Agilent Technologies Inc., Santa Clara, CA) with 250 ng of total RNA per sample.

Reverse transcription quantitative real-time PCR
The experimental procedures and data analysis related to RT-qPCR were performed according to the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines [42]. The cDNA synthesis reactions were performed using the iScript™ cDNA Synthesis Kit (Bio-Rad Laboratories, Hercules, CA) according to the manufacturer's instructions with 1 μg of total RNA per reaction used as template. Control cDNA synthesis reactions to check for genomic DNA contamination during RT-qPCR consisted of the same template and components as the experimental reactions without the reverse transcriptase enzyme. The RT-qPCR reactions were performed with iTaq™ SYBR ® Green Supermix (Bio-Rad Laboratories) in a Bio-Rad CFX96 real time PCR detection system. Thermal cycler parameters for RT-qPCR were as follows: 95°C 3 minutes, 50 cycles of 95°C 15 seconds, 60°C 30 seconds. A dissociation curve was generated and used to validate that a single amplicon was present for each RT-qPCR reaction. The calculations for amplification efficiencies of the target and reference genes, RNA inhibition assays, and the relative quantifications of the different target gene transcript abundances were performed using the comparative C q method as described in the ABI Guide to Performing Relative Quantitation of Gene Expression Using Real-Time Quantitative PCR (Applied Biosystems, Foster City, CA) with the following modification: the average of three reference gene C q values was determined by taking the geometric mean which was used to calculate the ΔC q values for the individual target genes [43]. The endogenous reference genes used in the RT-qPCR reactions were the 18S rRNA (Genbank accession U42827), ubiquitin-conjugating protein (Genebank AI730710), and alpha-tubulin 4 (TubA4, Genbank AF106570) [44]. The reference and target gene primer sequences, and target gene descriptions including in silico specificity screens using BLASTx [45] are shown in Additional file 2.
Initial analyses of specific transcripts in fibers of Li 2 mutant and WT plants were performed to determine the most informative time-points to use for microarray analysis. The genes selected were based on previous studies that indicated developmental regulation of these genes during different stages of fiber development, specifically elongation and SCW biosynthesis. The elongation stage-related genes α-expansin1 (GhExp1) [46] and Cu/Zn superoxide dismutase (GhCSD1) [47] and the SCW-related genes cellulose synthase2 (GhCesA2) [48] and a b-1,3-glucanase-like gene [49] that was previously shown to be up-regulated in the fiber SCW stage [50] were selected for preliminary RT-qPCR. The nucleotide primer sequences and gene accessions are shown in Additional file 2. The RT-qPCR reactions were performed with cDNA from fibers at initiation, elongation, and SCW thickening stages at the following time-points: DOA, 1, 3, 5, 8, 12, 16, and 20 DPA. Three biological replications were used for each time-point sample.

Microarray hybridizations and data analysis
The microarray experiments conducted in this study followed the minimum information about a microarray experiment (MIAME) guidelines [51]. The microarray utilized was the commercially available Affymetrix Gen-eChip ® Cotton Genome Array (Affymetrix Inc., Santa Clara, CA), that represents 21,854 transcripts gathered from four species of cotton (G. arboreum, G. barbadense, G. hirsutum, and G. raimondii) and a variety of tissue types including fibers at the initiation, elongation, and SCW biosynthesis stages of development. For each sample 500 ng of cotton fiber RNA was utilized for labeling using the Affymetrix GeneChip ® 3' IVT Express Kit and Cotton Genome Array hybridizations were performed according to standardized Affymetrix protocols. The developmental time-points from Li 2 mutant and WT fibers used for microarray hybridizations were 0, 8, and 12 DPA with two biological replicates used for each time-point sample. Procedures for data normalization and assessment of statistically and biologically significant genes were performed as described by Benedito et al.
To gain insight into the biological processes represented by the differentially expressed genes in the WT and mutant Li 2 fibers, an over-representation analysis (ORA) was performed on the genes that were more abundantly transcribed and statistically significant in fibers of each cotton NIL at each time-point using the program Blast2GO [53]. The Blast2GO program utilizes a Fisher's exact test for ORA and corrects the p-value to control the increased false discovery rate (FDR) associated with multiple hypotheses testing. For this experiment the FDR-corrected p-value cutoff for significance was 0.05. J-L). While no differences between WT and mutant fibers were evident in the SEM (Figure 2), there does appear to be some slight length differences at 5 DPA as shown in Figure 1, so it can be stated that the mutant fiber phenotype becomes evident by approximately 5 DPA. These data are consistent with comparable fiber length measurements obtained from developing fibers of TM-1, Li 2 , and F 3 progeny derived from a cross of TM-1 and Li 2 [28].

Confirmation of homozygosity in the mutant
Preliminary RT-qPCR using genes differentially expressed during cotton fiber development The expression of the elongation stage-related gene GhExp1 followed a similar pattern of expression in WT and Li 2 mutant fibers, with transcript abundance decreasing at the beginning of the SCW stage from 16 -20 DPA. However, the transcript abundance of GhExp1 was a significantly decreased (2.24-fold) in Li 2 mutant fibers compared to WT fibers during the presumed peak of fiber elongation at 8 DPA. The transcript abundance of the second elongation stage-related gene, GhCSD1, followed the previously shown pattern of expression during fiber development with transcript abundance higher during fiber elongation compared to the SCW biosynthesis stage [47]. In fibers of the Li 2 mutant, the expression levels of GhCSD1 remained significantly lower compared to WT fibers during elongation stage time-points (5 -12 DPA) and remained at a seemingly basal level of expression with no change in transcript abundance in mutant fibers over time-points normally associated with elongation and SCW biosynthesis from 5 -20 DPA ( Figure 3A).
The SCW-related genes GhCesA2 and b-1,3-glucanase-like followed the expected expression profiles in WT and Li 2 mutant fibers with up-regulation occurring concurrent with onset of the transition stage from 12 -16 DPA and SCW biosynthesis 16 -20 DPA ( Figure  3B). While the pattern of expression for the SCWrelated genes was the same in WT and Li 2 mutant fibers, the transcript abundance of both genes was significantly higher in the WT fibers at 16 and 20 DPA for GhCesA2 (3.31-and 4.62-fold, respectively) and 20 DPA for the b-1,3-glucanase-like gene (6.64-fold) as shown in Figure 3B.

Microarray gene expression analysis
The initial RT-qPCR analysis indicated significantly different expression levels of the elongation stage-related genes GhExp1 and GhCSD1 during time-points associated with the elongation stage of fiber development ( Figure 3A). Since there were no apparent morphological differences between WT and Li 2 mutant fibers  On the DOA, there were no significantly enriched biological processes represented by the differentially expressed genes in WT or mutant fibers. This result corresponded with the lack of any observable differences in fiber morphology between WT and mutant Li 2 fibers during initiation and early elongation. In 8 DPA fibers of the WT line, over-represented biological processes included, among others, receptor activity, signaling, cytoskeleton, and cell growth, while genes more abundantly transcribed at 8 DPA in mutant Li 2 fibers were enriched in stress and stimulus responses. The genes involved in signaling, cell growth, and cytoskeleton biological processes at 8 DPA in the WT fibers included genes whose products are known to regulate cell elongation and expansion in Arabidopsis thaliana. Among these were genes known to encode protein kinases that regulate cell elongation in A. thaliana through brassinosteroid (BR) signaling pathways such as orthologues of BRASSINOSTEROID INSENSITIVE1 (BRI1), and HERCULES1 (HERK1) and HERCULES2 (HERK2). Some of the genes that encode proteins with the potential to alter or regulate cell elongation are shown in Table 1 along with the functionally characterized phenotypes resulting from inhibition or over-expression in A. thaliana. Genes of interest with higher transcript abundance in Li 2 mutant fibers compared to WT fibers at 8 and 12 DPA, particularly those related to stress and stimulus responses, involved phytohormone signaling pathways for ethylene and cytokinin biosynthesis and regulation of phytohormone homeostasis. The potential implications of genes and biological processes overrepresented in WT and Li 2 mutant fibers during elongation are discussed later and summarized in Table 1. The complete results of the ORA analysis are presented in Additional file 3 which includes the individual microarray probeset IDs for the over-represented biological processes for each cotton NIL and time-point.

Corroboration of the microarray data
A total of twelve genes were selected for corroboration of the microarray data, in addition to the cotton Cu/Zn superoxide dismutase (GhCSD1) gene that was utilized for preliminary RT-qPCR data to select fiber developmental time-points for microarray analysis (Figure 3). Of the four genes selected for the preliminary RT-qPCR, only the GhCSD1 gene is accurately represented in the Affymetrix microarray data with the probeset ID Ghi.2036.1.S1_s_at (Table 2, Additional file 2). The GhCesA2 gene shares 100% homology with the probesets nucleotide sequence, but is below the limit of detection during the initiation and elongation stages selected for microarray analysis. The GhExp1 and b-1,3-glucanase-like genes used to design primers for RT-qPCR (Additional file 2) do not share enough homology with the Affymetrix nucleotide sequences in the regions corresponding to the probsets to allow for accurate comparison of transcript abundance between microarray and RT-qPCR.
The genes selected for microarray corroboration by RT-qPCR were chosen primarily from differentially expressed genes (≥ 2-fold; < Bonferroni-corrected pvalue threshold 2.07194E-06) in WT and mutant fibers at 8 and 12 DPA. Some selected genes were more abundantly expressed at 8 and/or 12 DPA in WT fibers, while others were more abundantly expressed in mutant fibers at the same time-points. These genes encode regulatory elements such as transcription factors including the G. hirsutum fiber initiation-related MYB transcription factor GhMYB109; gene products involved in carbohydrate metabolism such as beta-galactosidase, glycosyltransferases; and a cellulose-synthase-like cell wall-related protein. Genes that were not differentially expressed during elongation time-points were also selected to confirm the validity of the microarray data. These included genes encoding a R2R3-MYB transcription factor (probeset ID Ghi.9209.1.S1_at) and sucrose synthase (probeset ID Ghi.8665.1.S1_s_at). The microarray data were successfully corroborated for most timepoints with the exception of the SANT/MYB transcription factor gene (probeset ID Ghi.1711.1.S1_s_at) at 8 DPA. The complete results of the microarray data and the corresponding RT-qPCR data for all twelve genes and the GhCSD1 gene are summarized in Table 2 and Additional file 4.

Fine mapping the Li 2 locus region with codominant SSR markers
Of the 86 SSR markers screened, 6 (6.97%) were polymorphic between two DNA bulks from Li 2 mutant and WT plants. Except for the marker TMB2295 that had 5 recombinations with the Li 2 trait when the 20 individuals comprising the bulks were tested, the markers CIR216, DC30513, DPL0547, DPL0922, and MUSB1135 co-segregated with the Li 2 trait. These 5 markers were further evaluated in the whole F 2 population. A map was constructed around the Li 2 region (Figure 4). The Li 2 locus was flanked by markers DPL0547 and DPL0922 at genetic distances of 0.87 cM and 0.52 cM, respectively.

Identification of an EST-SSR marker with complete linkage to the Li 2 locus
In an effort of using gene expression data to identify candidate molecular markers for association mapping of the Li 2 gene, we conducted a BLAST search [54] of the Cotton Marker Database (CMD) (http://www.cottonmarker.org) using consensus sequences and singletons representing differentially expressed genes from the  (Figure 4). The Blastx hits for the fulllength cDNA with highest similarity in the NCBI nonredundant database included a putative transcription factor isolated from Ricinus communis; plectin-related proteins from A. thaliana and Vitis vinifera; and unknown proteins from Populus trichocarpa and Glycine max. The full-length predicted protein sequences are highly conserved among cotton and the other plant species for all of the above mentioned protein sequences.  The NCBI conserved domain database (CDD) [56] indicated a putative DNA binding domain present in the predicted protein sequence. The microarray probeset that led to the discovery of the linkage between NAU3991 and Li 2 locus, Ghi.8501.1.A1_at was represented as TC276355 in the Dana-Farber Cancer Institute Cotton Gene Index (DFCI) 11.0 database http://compbio.dfci.harvard.edu/ tgi/. The RT-qPCR primers used to corroborate the microarray data for probeset Ghi.8501.1.A1_at was designed from TC276355 (Additional file 2). The expression profile of TC276355 matched the microarray data for probesets Ghi.8501.1.A1_at and indicated that the gene was up-regulated in WT fibers during the fiber elongation and transition stages (8 -16 DPA) with significantly higher transcript abundance in WT fibers compared to mutant fibers at 8 and 12 DPA ( Figure 5).

Discussion
Naturally occurring cotton fiber mutations present an excellent opportunity to study the molecular events controlling fiber development. When a monogenic fiber mutation in the homozygous state is coupled with its homozygous WT counterpart, especially as NILs, an ideal model system is created for a comparative study of fiber development. Here we have presented such a model system in the Li 2 short-fiber mutation as an advanced (BC 5 ) NIL in the G. hirsutum DP5690 background. In addition to the Li 2 short-fiber mutant phenotype being caused by a single dominant allele, no pleiotropic effects are observed in the development, morphology, or fecundity of mutant cotton plants as either homozygotes or heterozygotes. Also, no pleiotropic effects of the Li 2 mutation are observed in mutant fiber development before the elongation stage. In this regard, the Li 2 mutant is useful to study molecular events specific to and controlling fiber elongation.
It was previously demonstrated that the fibers of another dominant, monogenic short-fiber mutation similar to Li 2 , designated Li 1 , have a thicker SCW than WT fibers and Li 2 mutant fibers. This was demonstrated by measuring the ratio of fiber weight-to-length to infer the degree of SCW thickening in fibers of Li 1 , Li 2 , TM-1, and normal homozygous recessive Li 2 segregates (F 3 ) from a cross between Li 2 and TM-1 [28]. The thickened SCW of Li 1 fibers was further confirmed in another study that measured the incorporation of [ 14 C] glucose in the SCW and indicated a five-fold increase in crystalline cellulose synthesis per millimeter of fiber in the Li 1 mutant compared to WT fibers [57]. This suggests that the pleiotropic effect of the short lint fiber Li 1 mutation also affects multiple stages of fiber development, while the Li 2 mutation remains specific to fiber length. It was also speculated that the thickened SCW walls of Li 1 fibers may be due to increased SCW cellulose synthase activity or an increased density of cellulose synthase subunits per unit length of fiber causing an increase in cellulose deposition in the SCW [57]. The later speculation would likely result in an increase in the transcript abundances of SCW CesA genes such as GhCesA2 in Li 1 mutant fibers compared to WT fibers during the SCW stage. A recent comparative gene expression study on the SCW stage of Li 1 mutant and WT TM-1 fibers would have provided insight into this theory, however the SCW CesA genes were not discussed and the microarray dataset was not made publically available [12]. The authors did report 8.8-fold higher transcript levels of a sucrose synthase (SuSy) gene (Genbank accession U73588) in Li 1 mutant fibers compared to WT TM-1 fibers that could support the observed increased cellulose deposition in the SCW of Li 1 mutant fibers [12,57]. Our RT-qPCR data for the SCW-related genes GhCesA2 and b-1,3-glucanase-like indicated significantly higher levels of expression at the beginning of SCW synthesis in WT fiber compared to Li 2 mutant fibers ( Figure 3B). This would be the converse of the Li 1 fibers if the density of CesA subunits per unit length of fiber were higher in Li 1 mutant fibers compared to WT fibers and supports the fiber elongation-specific, non-pleiotropic nature of the Li 2 mutation. The SuSy gene that was selected as a non-differentially expressed gene for microarray corroboration also supports this hypothesis (Table 2; Additional file 4). The RT-qPCR on all developmental time-points in our study using primers specific for the G. hirsutum SuSy gene (Genbank accession U73588) matched the GhCesA2 expression at 20 DPA ( Figure 3B) with statistically significant 2.54-fold higher SuSy transcript abundance in WT fibers compared to mutant fibers (Additional file 4).
The significantly higher expression levels of the GhCSD1 gene in WT fibers compared to mutant fibers coincided well with a previously postulated model that suggested a role for the modulation of ROS in fiber cell elongation. The model suggested that short fiber cotton species such as the diploid G. longicalyx are unable to regulate ROS accumulation during elongation, leading to stress conditions, an increase in the expression of stress-related genes, and a decrease in the rate of fiber cell elongation [58]. Comparison of the fiber gene expression profiles of G. longicalyx and diploid G. arboreum during elongation (5 DPA) revealed over-representation of biological processes involved in response to stress, presumably due to over-accumulation of ROS, in the shorter fibers of G. longicalyx. Results indicating the same differences in ROS homeostasis were obtained by the same research laboratory in two more microarray gene expression studies that compared short and long fiber cotton species during elongation [59,60]. A similar over-representation of stress response-related genes (GO:0006950) was observed in Li 2 mutant fibers compared to WT fibers at 8 DPA in our study, including genes responsive to oxidative stress and involved in ROS homeostasis (Additional file 3). A total of 105 genes were included in the over-representation of stress responses in Li 2 mutant fibers at 8 DPA and included ROS-related genes such as an NADPH-dependent aldoketo reductase (AKR) with high similarity to the A. thaliana AKR4C9 protein involved in the reduction of reactive α,β-unsaturated carbonyls produced through lipid peroxidation [61]; an alternative oxidase highly similar to the stress-responsive A. thaliana ALTERNA-TIVE OXIDASE 1A (AOX1A); a G. hirsutum ascorbate peroxidase (GhAPX1) involved in H 2 O 2 homeostasis [62]; a G. hirsutum class III peroxidase (GhPOX1) suggested to play a role in ROS production [63]; a transcription factor highly similar to the A. thaliana MULTIPROTEIN BRIDGING FACTOR 1C (AtMBF1C) that is over-expressed in response to multiple stimuli including elevated H 2 O 2 levels [64]; and several putative APETELA2 (AP2)-domain transcription factors that are also up-regulated in transgenic A. thaliana over-expressing AtMBF1C [65]. Given that some of these gene products are involved in the production of ROS required for normal fiber cell elongation, while others are involved in the reduction oxidative stress and repair of oxidative damage, it is possible that the mechanisms of ROS modulation and homeostasis are compromised in Li 2 mutant fibers.
Numerous genes involved in biological processes such as cell expansion and elongation that are up-regulated in the WT fibers compared to mutant fibers during the time-points coinciding with peak rates of fiber elongation. Genes that were more abundantly transcribed (≥ 2fold) and statistically significant in either Li 2 mutant or WT fibers during elongation were closely examined to discern the possible effects of their up-or down-regulation based on functional analysis in cotton and heterologous systems. Some examples of these genes with high similarity to their A. thaliana orthologues are briefly discussed. Detailed descriptions of these genes are also shown in Table 1. The A. thaliana knockout mutants for many of these genes developed by either T-DNA insertion, ethyl methanesulfonate (EMS) mutagenesis, or x-ray mutagenesis suggest roles in cell expansion and elongation. Some examples of the genes more abundantly transcribed in WT fibers during elongation included putative A. thaliana orthologues that when silenced by mutagenesis caused shortened petiole cells in herk1 theseus1 (the1) double mutants [66] and herk1 herk2 the1 triple mutants [67]; inhibition of root hair tip growth in mutants for the potassium transporter TINY ROOT HAIR 1 (TRH1) [68]; a 50% reduction in root hair length in mutants for a type XI myosin protein, XIK [69]; shortened leaf trichomes, reduced polarized expansion of hypocotyls, and shorter hypocotyl cells in mutants for the actin polymerization factor CROOKED (ARPC5) [70]; reduced length of root cells in mutants for the glycosylphosphatidylinositol (GPI)-anchored protein COBRA [71]; and reduced cell sizes resulting in dwarfism in the mutant brassinosteroid insensitive 1 (bri1) [72]. The functionality of a cotton orthologue of BRI1 was previously demonstrated by complementation to rescue the WT phenotype of an A. thaliana bri1 mutant. Furthermore, chemical inhibition of a BR mediated response was demonstrated to inhibit fiber initiation and elongation in in vitro cotton ovule cultures and in planta [73,74]. Two more G. hirsutum genes encoded the transcription factor GhMYB109 and the microtubule protein Beta tubulin1 (GhTub1) more abundantly in WT fibers compared to mutant fibers. Antisense suppression of GhMYB109 in transgenic cotton caused smaller and shrunken fibers initials and inhibition of fiber elongation resulting in a short-fiber phenotype [75,76]. The GhTub1 gene was transcribed 46-fold and 7-fold higher in WT fibers at 8 and 12 DPA, respectively, and resulted in longitudinal cell growth when over-expressed in fission yeast cells under the control of an inducible promoter [77].
It is well-documented that increased levels of natural and/or synthetic cytokinins inhibit cotton fiber cell elongation in the in vitro cotton ovule culture system [78,79]. Genes that were more abundantly transcribed in Li 2 mutant fibers compared to WT fibers encode proteins that are involved in cytokinin signaling and included putative orthologues of the A. thaliana HISTI-DINE-CONTAINING PHOSPHOTRANSMITTER 1 (AtAHP1) that functions as a positive regulator of cytokinin signaling [80]; the A. thaliana homeobox domain transcription factor HAT22 that is up-regulated in response to exogenous application of cytokinins [81]; and orthologues of the A. thaliana UDP-glycosyltransferases UGT73C2 and UGT73C5/DON-GLUCOSYL-TRANSFERASE 1 (AtDOGT1), glycosylate cytokinins which are implicated in cytokinin homeostasis [82]. Over-expression of UGT73C5/DOGT1 in transgenic A. thaliana results in a dwarf phenotype with reduced elongation of hypocotyls similar to BR-deficient mutants [83]. Increased cytokinin levels as determined indirectly by enzyme-linked immunosorbent assay measurements were previously reported in the ovules and fibers of four cotton fiber mutants compared to WT [84]. This study included Ligon lintless cotton, but did not specify Li 1 or Li 2 . While immunoassays that measure plant hormone are generally semi-quantitative, the results do coincide with our gene expression results and suggest a possible role for cytokinin signaling and homeostasis in the Li 2 mutant phenotype (Table 1). Any one of these genes or gene families could plausibly explain the Li 2 short-fiber phenotype based on extensive functional characterization in cotton and heterologous systems such as A. thaliana. However, the fact that Li 2 is a monogenic dominant trait suggests that the differential expression of these genes in Li 2 mutant and WT fibers are downstream effects of a defect in a regulatory element controlling fiber development, specifically elongation in the case of the Li 2 mutation.
Little information is available regarding the cotton gene harboring the NAU3991 EST-SSR marker with complete linkage to the Li 2 locus. The function of this gene in plants is speculative and could be structural or regulatory in nature based on the putative DNA binding motif and homology to the mammalian plectin protein that acts as a microfilament crosslinker in mammalian systems [85]. Presently, no functional information is available for this gene in any plant species and no phenotypes listed for mutants of A. thaliana or other plant species.
One of the major limitations of a gene expression analysis study is the lack of empirical evidence for functionality of any of the identified candidate genes that may cause an aberrant phenotype or enhance a trait of interest. This leads to a great deal of speculation as we have done here and in other cotton gene expression studies. This is especially true for a quantitative trait such as cotton fiber quality. The main goal of this gene expression study focused on practical concerns such as conversion of gene expression data into quantifiable, portable, and useable systems such as molecular markers. In regards to the regulation of a complex and multi-tiered event such as cotton fiber development by a single dominant gene, it should be expected that many downstream events from the developmental point of global regulation will be dramatically altered resulting in a high false discovery rate of candidate genes that truly are differentially expressed between, for example, mutant and WT fibers, but may be acting synergistically to cause the mutant phenotype. Since Li 2 is a monogenic dominant trait, it is possible that not one of these genes alone is the actual mutant allele causing the phenotype. The discovery of an EST-SSR marker based on gene expression data and with complete linkage to the Li 2 locus presents a novel approach for data mining a genetic marker database. Presently, we are cloning the full-length cDNAs and genomic DNAs for the NAU3991 marker gene from the Li 2 mutant and WT plants and have begun screening a G. hirsutum bacterial artificial chromosome (BAC) library. Identification of BAC clones containing the NAU3991 marker gene(s) will allow analysis of the genomic regions flanking the Li 2 locus and facilitate functional analysis of the Li 2 gene in transgenic cotton.

Conclusion
The cotton short fiber mutation Ligon lintless-2 (Li 2 ) appears specific to the elongation stage of fiber development with no apparent morphological differences between WT and mutant fibers until approximately 5 DPA. Comparative microarray gene expression profiling identified genes differentially expressed between WT and mutant fibers and suggested a role for ROS and cytokinin regulation that could result in the short fiber phenotype observed in the Li 2 mutant fibers. The microarray gene expression data was successfully converted into an EST-derived SSR marker, NAU3991, which displayed complete linkage to the Li 2 locus on chromosome 18. The complete linkage suggested that the gene harboring the NAU3991, or another unknown gene closely linked to the EST-SSR marker, may be the Li 2 gene. The function of the gene harboring the NAU3991 marker is unknown in plant species, but shares homology with a gene encoding a plectin protein that acts as a microfilament crosslinker in mammalian systems.

Additional material
Additional file 1: Pedigree of the Li 2 mutant and WT NILs. The NILs utilized in this study were in the G. hirsutum cv. DP5690 genetic background and in the BC 5 generation.
Additional file 2: Primer pair sequences designed for marker analysis and RT-qPCR. The forward and reverse nucleotide primer sequences are shown along with the microarray probesets IDs, BLASTx sequence descriptions, and relevant accession numbers.
Additional file 3: Over-representation analysis (ORA) of genes differentially expressed in Li 2 WT and mutant fibers ≥ 2-fold. The original p-values for the ORA are shown along with the FDR-corrected and FWER-corrected p-values. The statistical significance cutoff for the ORA was an FDR-corrected p-value of 0.05. The TestSeqs are the microarray probesets IDs over-represented for the indicated biological process.
Additional file 4: Corroboration of the microarray gene expression data by RT-qPCR on Li 2 mutant and WT fibers. Gene expression profiles of twelve genes selected to verify the microarray gene expression data. Affymetrix probeset IDs and predicted gene products are shown on the graph titles for each gene. The DPA time-points that revealed a significant (≥ 2-fold; p-value < 0.05) difference in transcript abundance are indicated by an asterisk and the fold-change in transcript abundance is shown on the graphs above each indicated time-point. Error bars indicate standard deviation from 3 biological replicates.