Enrichment of a set of microRNAs during the cotton fiber development
© Kwak et al. 2009
Received: 21 April 2009
Accepted: 29 September 2009
Published: 29 September 2009
Skip to main content
© Kwak et al. 2009
Received: 21 April 2009
Accepted: 29 September 2009
Published: 29 September 2009
Cotton (Gossypium hirsutum) is one of the most important economic crops and provides excellent fibers for textile manufacture. In addition to its industrial and agricultural importance, the fiber cell (plant trichome) also is a biological model system for exploring gene expression and regulation. Small RNAs regulate many aspects of plant growth and development. However, whether small RNAs are involved in regulation of fiber cell development is unknown.
We adopted a deep sequencing approach developed by Solexa (Illumina Inc.) to investigate global expression and complexity of small RNAs during cotton fiber initiation and development. We constructed two small RNA libraries prepared from wild type (WT) and fuzz/lintless (flMutant in the WT background) cotton ovules, respectively. Each library was sequenced individually and generated more than 6-7 million short sequences, resulting in a total of over 13 million sequence reads. At least 22 conserved candidate miRNA families including 111 members were identified. Seven families make up the vast majority of expressed miRNAs in developing cotton ovules. In total 120 unique target genes were predicted for most of conserved miRNAs. In addition, we identified 2 cell-type-specific novel miRNA candidates in cotton ovules. Our study has demonstrated significant differences in expression abundance of miRNAs between the wild-type and mutant, and suggests that these differentially expressed miRNAs potentially regulate transcripts distinctly involved in cotton fiber development.
The present study is the first to deep sequence the small RNA population ofG. hirsutumovules where cotton fibers initiate and develop. Millions of unique miRNA sequences ranging from 18~28 nt in length were detected. Our results support the importance of miRNAs in regulating the development of different cell types and indicate that identification of a comprehensive set of miRNAs in cotton fiber cells would facilitate our understanding of the regulatory mechanisms for fiber cell initiation and elongation.
MicroRNAs (miRNAs) are a class of short (~21 nt), endogenous non-coding small RNAs that have base pair sequences complementary with specific target genes to repress their translation or induce their degradation. While in animals regulation of gene expression by miRNAs is achieved by sequence-specific targeting of the 3' untranslated region of mRNAs, the plant miRNAs generally interact with their targets through perfect or near-perfect complementarity to direct mRNA degradation [1,2]. A growing number of new miRNAs in plants have been identified in recent years. To date, more than 1000 genes encoding miRNAs have been annotated inArabidopsis, rice and other plant species . Moreover, several other classes of small RNAs (also known as small interacting RNAs, siRNAs), distinguished by their origin and biological function, have been identified. These include heterochromatic siRNAs,trans-acting siRNAs (ta-siRNAs), natural antisense siRNAs (nat-siRNAs), Piwi-interacting RNAs , and a recently identified class of small RNAs associated with gene promoters (PASRs) and 3' termini (TASRs) .
Identification of comprehensive sets of miRNAs and other small RNAs in different plant species is a critical step to facilitate our understanding of regulatory mechanisms or networks for target genes and cell development. The higher plants contain diverse cell types. Each of them has its own initiating program, structure and biological function. Cellular differentiation is accompanied by changes in transcription, translation and many other physiological processes . Cotton fibers are single-celled epidermal trichomes and provide an outstanding model for investigation of cellular and developmental events which also occur in Arabidopsis leaf trichomes . The development of a fiber cell is a complex morphological and molecular process, which is characterized by cell cycle status, transcriptional control and multiple cytoskeletal functions comprising an integrated hierarchy of regulation. Recently, a number of genes controlling early fiber initiation and late development have been identified, and some of them have been functionally characterized.GhMYB109, a putative ortholog ofAtMYBGL1, is specifically expressed in fiber cell initials and elongating cells . Moreover, ectopic expression ofGaMYB2induces a single trichome in epidermis of Arabidopsis seeds . Two cotton genes containing WD40 domains complement the Arabidopsisttg1mutant . Comparative studies onArabidopsisleaf trichomes and multiple gene expression have provided great insights into the cotton ovular fiber development [6,11–13]. Using microarray technology, hundreds of transcripts were analyzed and exhibited distinct expression patterns during the early stage of fiber cell development [6,11–16]. Using a computational approach, we initially identified 37 potential miRNAs from cotton (G. hirsutum); further, 96 potential targets were detected for these potential miRNAs . More recently, several other labs also performedin silicoidentification of dozens of conserved miRNAs from the same species [18–20]. Amongst the putative targets these studies reported were transcription factors (e.g. MYB), auxin responsive proteins and other genes related to fiber development [17,18]. To explore the role of small RNAs in cotton fibers, Abdurakhmonov and co-workers recently analyzed small RNA sequences from 0-10 days post anthesis (DPA) developing cotton ovules and obtained hundreds of small RNAs . Although 583 unique sequence signatures of small RNAs were achieved, only two conserved miRNAs were detected. It is most likely that the traditional method does not sequence deeply enough to sample the full complexity of small RNAs in ovules. Recently developed high-throughput sequencing technologies provide a powerful approach to identify and quantify sRNAs/miRNAs . Small RNAs are best discovered and measured by deep sequencing methods that have high sensitivity and specifiCity [23–25]. Additionally, it is feasible to explore or annotate miRNAs in organisms whose genome sequences have not been completed. Here, we adopted a deep sequencing method developed by Solexa (Illumina Inc.) to identify small RNAs from cotton ovules and analyze abundance and complexity of small RNAs. We constructed two small RNA libraries prepared from wild type (WT) and fuzz/lintless (Mutant in the same background) cotton ovules, respectively. Samples were collected from 0-10 DPA developing cotton ovules, which cover major morphological changes as well as several underlying developmental processes including fiber initiation and elongation . Each library was sequenced individually and generated more than 6-7 million short sequences, with a total of over 13 million sequence reads. We obtained more than 100 conserved miRNAs representing 36 families from the cotton ovules. Many of them were originally identified in this study. In addition, two non-conserved novel miRNA candidates were identified.
The dataset was used to query the non-coding RNAs sequences deposited at NCBI GenBankhttp://www.ncbi.nlm.nih.gov/and the Rfam database  in order to separate the small RNAs that match to non-coding sequences such as rRNA, tRNA, snRNA and snoRNA. These accounted for 332,455 reads (46,542 unique sequences) in wild-type and 292,158 reads (45,612 unique sequences) in mutant. Since the cotton genome has not been completely sequenced, we used Short Oligonucleotide Analysis Package (SOAP,http://soap.genomics.org.cn) to annotate small RNA sequences that map to TIGR Cotton Transcript Assemblies (TIGR)http://plantta.jcvi.org/. With only 70,667 TAs available, less than 10% of the small RNAs from each library were mapped.
Identified known (or conserved) candidate miRNA families inGossypium hirsutumwildtype (WT) and fuzz/lintless mutant (M) ovules.
The number of reads reflects enrichment of miRNAs. Most of the miRNA read frequencies exhibit significant differences between the two libraries. Expression of miR165/166, miR159, miR160, miR162, miR167, miR171, miR172, miR390, miR393, miR394, miR396, miR403, miR408, miR503, miR535 and miR894 were significantly up-regulated in mutant compared to the wild-type, whereas miR156/157, miR164, miR168, miR169, miR395, miR397 and miR399 showed down-regulation in mutant relative to the wild-type. The abundantly presented families like miR165/166, miR160 and miR167 were expressed very highly in the mutant. MiR399 was detected 6-fold higher in wild-type than in mutant. Several other miRNA families such as miR393, miR408 and miR530 also showed higher levels of expression in mutant than in wild-type. To a lesser degree, miR395 and miR397 showed comparatively low expression levels in mutant. Interestingly, the highly enriched miR156/157 family was found at similar levels in both libraries.
Read frequency and mature sequences ofGossypium hirsutumnovel miRNA candidates.
5p Sequence (5' -- 3')
3p Sequence (5' -- 3')
Predicted target functions for the identified cotton conserved candidate miRNAs.
Number of targets
Regulation of transcription
To date, more than one thousand plant miRNA genes have been annotated and some of them have been well characterized . However, the number of plant miRNAs appears not to be saturated and many other functional miRNAs in plant species remain to be investigated. Compared to annotated miRNAs from Arabidopsis and rice, very few miRNAs from cotton plants have been identified. Recently, several studies performedin silicoidentification of miRNAs fromG. hirsutum[17–19,32]. Approximate 18 highly conserved miRNA families were detected and several less conserved miRNAs (or families) were found. When compared to the miRNAs predicted previously, most of them could be recovered by deep sequencing, and only a small portion of them (e.g. miR391 and miR400) were not [17–19,32]. These missing miRNAs might be attributed to the fact that the sequences (e.g. EST or GSS) used for prediction were derived from tissues such as leaves or roots rather than ovules. Also, it was likely that false positive predictions were included. On the other hand, several miRNAs (or families) such as miR159, miR172, miR319, miR473, miR477, miR479, miR530, miR535, miR858 and miR894 were not successfully predicted, suggesting that these miRNAs may be tissue-specific in cotton ovules. MiR172 and miR390 have been recently cloned from cotton (G. hirsutum) ovule using a traditional cloning approach , both of which were also detected in this study.
The present study is the first to deep sequence the small RNA population ofG. hirsutumovules where cotton fiber cells initiate and develop. Millions of unique siRNA sequences ranging from 18~28 nt in length were detected. Analysis of the evolutionary conservation of these miRNAs revealed 111 individual conserved miRNAs belonging to 22 families. Together with the severalG. hirsutummiRNAs existing in miRBase (Release 12.0, Sept, 2008, www.sanger.ac.uk/Software/Rfam/ftp.shtml), this study will bring the number of miRNAs inG. hirsutumup to 120.
The vast majority of conserved miRNAs from cotton ovules is not surprising. Most of the miRNAs identified in this study are conserved in Arabidopsis and only a few are conserved in other plant species. The phenomenon can be explained by the fact that cotton fibers (seed trichomes) and epidermal hairs (leaf trichomes) are phenotypically similar; both types of trichomes use a common mechanism,e.g. that closely associated with the transcription factors for regulating trichome initiation and development [9,16]. Notably, some highly conserved miRNA families such as miR156/157, miR167 and miR172 were sequenced more than ten thousands or even one hundred thousands times in a single library. These highly conserved miRNAs may represent a relationship between evolutionary conservation and expression abundance . On the opposite, some miRNA families that are less conserved or even species-specific have very low abundance. From an evolutionary view, these miRNAs play a role in establishing and maintaining phenotypic diversity between different groups of organisms and are involved in regulation of the lineage-specific pathways and functions [24,33]. In addition to the conserved miRNAs, two putative miRNAs identified in this study do not have orthologs in Arabidopsis and other species (Table2). Since the non-conserved miRNAs usually express at a low level and in specific cell types (ovules), it is suggested that these cotton-specific miRNAs may have expanded after the divergence of the monocot and dicot plant lineages, supporting the presumption that a diverse set of miRNAs are evolving rapidly and independently with each species .
Our comparative analysis of small RNA abundance between the wild-type andflmutant indicated that the mutant contains an altered small RNA population, with a smaller proportion of total RNA reads. However, inflmutants many small RNA families with 21-22 nt were enriched (Figure 2). Differential miRNA abundance was also found between the wild-type and mutant. A surprising observation was that the majority of miRNA families inflmutant had significantly higher abundance than in wild-type (Figure 3B). This result suggests that the mutant has a changed regulation ofMIRNAexpression during the fiber development. Further identification of the regulatory process and metabolic pathway in mutants will provide insights into the impaired miRNA biogenesis and abnormal trichome cell differentiation.
It is of interest that a large number of miRNAs from ovules potentially target transcription factors, which was consistent with our previous predictions . InArabidopsis, miR159 mediates cleavage ofMYB101andMYB33transcripts, the two transcription factors that function as positive regulators of abscisic acid (ABA) response . Similarly, miR319 is complementary to a highly conserved motif in the coding region of the GAMYB-related clade ofMYB[36,37]. In this study, several members of putative MYB families were predicted to have binding sites for miR169, miR396, miR399 and miR858, suggesting the potential regulation of fiber development. MiR858 is of particular interest because it targets theGhMYB10mRNA. Ectopic expression ofGhMYB10in transgenic tobacco plants causes abnormal cell shapes of leaf trichomes .
Amongst the predicted targets of miR167 was Auxin Response Factors (ARF). These proteins are bound to the auxin response elements and regulate auxin-mediated transcriptional activation/repression.In vitro-cultured cotton ovules, exogenous auxin is required to promote fiber cell development . Our data have demonstrated that in the mutant, miR167 was expressed more highly. In rice culture cells, miR167 was shown to cleave ARF8 mRNA. The abundance of miR167 was controlled by the level of auxin in growth medium. When cells were grown in auxin-free medium, miR167 level decreased . Similarly, a putative ARF8 transcript was predicted to be targeted by miR167 in cotton ovules. This suggests that auxin levels are possibly higher in the mutant ovules. We predicted miR160 to target an ARF10-like mRNA transcript, which expressed at higher levels in the mutant library, just like miR167. MiR393 targets putative Transport Inhibitor Response 1 (TIR1) transcripts in cotton. TIR1 is an auxin receptor involved in a mechanism leading to the Aux/IAA degradation . Inhibition of TIR1 by miR393 would down-regulate auxin signaling. MiR393 showed an expression level nearly 2 fold higher in mutants than in wild-type. Interestingly, most of the miRNAs involved in the auxin pathway were found to be up-regulated in the mutant.
In Arabidopsis, several miRNAs like miR399 and miR395 are induced by the nutrient deficiency [40,41]. Under normal growth conditions, these miRNAs do not express. However, both miR399 and miR395 were moderately sequenced in this study, particularly in the wild-type ovules (Table1). This can be attributed to the advantage that deep sequencing can detect miRNAs at a very low level. Under phosphorus starvation, miR399 targets a ubiquitin-conjugating E2 enzyme, which in turn regulates Pi acquisition [41,42]. In this study, deep sequencing identified 8 miR399 members and 7 unique genes were predicted as potential targets of miR399. More interestingly, all of these predicted targets appear not to be correlated with phosphate metabolism. This result provides a new clue to the multiple roles of miR399 that may play in diverse cell types or species. MiR399f/g were predicted to have complementarity to a putative MYB family transcription factor. Besides, miR399g targets four cotton vacuolar ATP synthase subunit B transcripts. In cotton fiber, elongation is driven by turgor pressure generated by vacuolar H+-ATPase activity on tonoplasts . The process occurs synchronously with the increase in the rate of cell elongation, indicating that vacuolar H+-ATPase may play a crucial role in cotton fiber development .
Expression of miR398 was much lower in mutant than in wild-type. Previously, miR398 in Arabidopsis was identified to target gene coding Cu/Zn superoxide dismutase . Similar target was predicted for the miR398 from cotton ovules. Interestingly, specific cotton Cu/Zn superoxide dismutase have been recently detected in the secondary cell walls of developing cotton fibers and are suggested to be involved in cell wall growth . Whether miR398 regulates superoxide dismutase and cell wall growth would be an interesting topic to be investigated.
Using deep sequencing method many of conserved miRNAs from cotton ovules were identified. Our results indicated that there are differential expression profiles of miRNAs from the wild-type and mutant ovules, which can be expected to regulate transcripts distinctly involved in cotton fiber development. Further identification of these differentially expressed miRNAs from ovules would allow better understanding of the regulatory mechanisms for fiber cell development.
Upland cotton plants (Gossypium hirsutumL.) cv. Xuzhou 142 (wild type; WT) andfuzzless-lintlessmutant (M) in Xuzhou 142 background were field grown at the Jiangsu Agricultural Academy of Sciences under regular field conditions during spring/summer 2008. Flowers were tagged and developing ovules were harvested and directly dissected from 0 to 10 DPA ovaries in early mornings. The excised ovules were frozen in liquid nitrogen and stored at -80°C for analysis. Wild type and mutant ovules of 1, 2, 3, 4, 5, 6, 8, 10 DPA were selected for total RNA isolation.
Ovular total RNA was extracted using the pBiozol Total RNA Extraction Reagent (BioFlux) according to the manufacturer's instructions, supplemented with two extra chloroform washes before nucleic acid precipitation. A 1% agarose gel, stained by ethidium bromide, was run to preliminarily indicate the integrity of the RNA. All RNA samples were quantified and examined for protein contamination (A260 nm/A280 nm ratios) and reagent contamination (A260 nm/A230 nm ratios) by a Nanodrop ND 1000 spectrophotometer. In addition, the RIN (RNA integrity number) determined by the Agilent Technologies 2100 Bioanalyzer was greater than 8 for all samples.
Total RNA from wild type and mutant was prepared for small RNA Sequencing-by-Synthesis according to the prescribed procedure and standards of the Illumina Sample Preparation Protocol. The samples were quantified and equalized so that equivalent amounts of RNA from mutant and wild-type were analyzed. In brief, total RNA was purified by electrophoretic separation on a 15% TBE-urea denaturing PAGE gel and small RNA regions corresponding to the 18-30 nucleotide bands in the marker lane were excised and recovered. The 18-30 nt small RNAs were 5' and 3' RNA adapter-ligated by T4 RNA ligase and at each step length validated and purified by urea PAGE gel electrophoretic separation. The adapter-ligated small RNA was subsequently transcribed into cDNA by SuperScript II Reverse Transcriptase (Invitrogen) and PCR amplified, using primers that anneal to the ends of the adapters. The amplified cDNA constructs, too, were purified and recovered. The final quality of the library was ensured by validation of the size, purity and concentration of the cDNA library on an Agilent Technologies 2100 Bioanalyzer. The two constructed cDNA libraries subsequently underwent Solexa/Illumina's proprietary flow-cell cluster generation and bridge amplification. After which the 1G sequencer, during automated cycles of extension, recorded fluorophore excitation and determined the sequence of bases for each cluster.
Raw sequence reads were produced by the Illumina 1G Genome Analyzer at BGI-Shenzhen, China and processed into clean full length reads by the BGI small RNA pipeline. During this procedure all low quality reads, including 3' adapter reads and 5' adapter contaminants were removed. The remaining high quality sequences were trimmed of their adapter sequences and sequences larger than 30 nt and smaller than 18 nt were discarded. All high quality sequences, even those with only a single unique read, were considered as significant and further analyzed. Unique small RNA sequences were mapped to cotton TIGR reference sequences by SOAP . Small RNAs derived from rRNAs, tRNAs, snRNAs and snoRNAs deposited at the Rfam and NCBI GenBank databaseshttp://ftp.ncbi.nlm.nih.govwere identified by NCBI blast. In order to determine conserved miRNAs, unique sequences were aligned with known miRNAs from miRBasehttp://microrna.sanger.ac.uk/seguence/index.html(Release 12.0, Sept, 2008) with a maximum of two mismatches, where gaps count as mismatches. Potentially novel miRNAs were identified by folding the flanking genome sequence of unique small RNAs using MIREAPhttps://sourceforge.net/projects/mireap/, followed by the prediction of the secondary structure by mFold 3.1 . The essential criteria  were used for selecting the miRNA candidates,e.g. sequences of miRNA precursors can fold into a hairpin secondary structure that contains the ~21 nt mature miRNA sequence from one arm and miRNA* derived from the opposite arm, both of which form a duplex with two nucleotide, 3' overhangs. For prediction of miRNA targets, the procedure and criteria were followed as described previously [17,48]. More strictly, at most three mismatches between miRNA sequences and potential mRNA targets were allowed in this study. The biological function of the predicted targets was retrieved from the Universal Protein Resourcehttp://www.uniprot.org.
We used the chi-squared test to determine the statistical significance of the differences between the two libraries and applied the Yates correction for one degree of freedom. Our null hypothesis is based on that between the wild-type (expected frequency) and the mutant (observed frequency) there is no significant difference. In order to do so we normalized the total mutant sequence reads to the total high quality reads of the wild-type library. This approach allowed us to determine whether the deviations (the difference between observed and expected) were the result of chance, or whether they were due to other factors. In the case of a calculated probability p < 0.01 we reject our null hypothesis and conclude that a factor other than chance is operating for the deviation to be so great.
This research was financially supported by the National Basic Research Program of China, 863 Project of China (2008AA10Z128).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.