Double feature selection and cluster analyses in mining of microarray data from cotton
© Alabady et al; licensee BioMed Central Ltd. 2008
Received: 11 October 2007
Accepted: 20 June 2008
Published: 20 June 2008
Cotton fiber is a single-celled seed trichome of major biological and economic importance. In recent years, genomic approaches such as microarray-based expression profiling were used to study fiber growth and development to understand the developmental mechanisms of fiber at the molecular level. The vast volume of microarray expression data generated requires a sophisticated means of data mining in order to extract novel information that addresses fundamental questions of biological interest. One of the ways to approach microarray data mining is to increase the number of dimensions/levels to the analysis, such as comparing independent studies from different genotypes. However, adding dimensions also creates a challenge in finding novel ways for analyzing multi-dimensional microarray data.
Mining of independent microarray studies from Pima and Upland (TM1) cotton using double feature selection and cluster analyses identified species-specific and stage-specific gene transcripts that argue in favor of discrete genetic mechanisms that govern developmental programming of cotton fiber morphogenesis in these two cultivated species. Double feature selection analysis identified the highest number of differentially expressed genes that distinguish the fiber transcriptomes of developing Pima and TM1 fibers. These results were based on the finding that differences in fibers harvested between 17 and 24 day post-anthesis (dpa) represent the greatest expressional distance between the two species. This powerful selection method identified a subset of genes expressed during primary (PCW) and secondary (SCW) cell wall biogenesis in Pima fibers that exhibits an expression pattern that is generally reversed in TM1 at the same developmental stage. Cluster and functional analyses revealed that this subset of genes are primarily regulated during the transition stage that overlaps the termination of PCW and onset of SCW biogenesis, suggesting that these particular genes play a major role in the genetic mechanism that underlies the phenotypic differences in fiber traits between Pima and TM1.
The novel application of double feature selection analysis led to the discovery of species- and stage-specific genetic expression patterns, which are biologically relevant to the genetic programs that underlie the differences in the fiber phenotypes in Pima and TM1. These results promise to have profound impacts on the ongoing efforts to improve cotton fiber traits.
Microarray technology provides data in high-dimensional space defined by the size of the genome under investigation. With such high-dimensional data, feature selection methods are essentially classification tools used to identify gene clusters that reveal biologically meaningful relationships . A classical use of feature selection analysis  is to identify the most discriminating features or dimension in a matrix of microarray data . Developing new methods to discriminate between sets of microarray data for both dimensions (time points/conditions) and features (genes) will improve data mining processes that in turn will lead to the discovery of biologically relevant relationships. In cotton fiber genomics, microarrays provide a robust technology for identifying developmentally regulated genes during cotton fiber morphogenesis in the two major cultivated species, G. barbadense L. cv. Pima S7 (Gb) and G. hirsutum L. cv. TM1 (Gh). These two species vary in fiber characteristics and yield; G. barbadense offers superior fiber quality properties like length, fineness, and strength, while G. hirsutum is characterized by high yield. Breeding programs around the world are working towards developing high-yielding G. hirsutum cultivars with the fiber properties of G. barbadense. In both species, fiber development occurs in four overlapping stages; initiation (-3 to 5 dpa), elongation (3 to 21 dpa), secondary cell wall synthesis (14 to 45 dpa), and maturation (40 to 55 dpa) . Despite the similarity in timing and duration of developmental stages, however, inherent differences in the developmental programs lead to the production of fiber with discrete phenotypic differences. Therefore, elucidating the genetic mechanisms that underlie these differences is crucial to designing strategies for the genetic enhancement of cotton fiber traits with superior Pima characteristics. In this respect, transcriptome profiling of developing Gb and Gh fibers is pivotal to discovering the specific genetic program that drives the development of fiber in these genotypes. Of more importance is the identification of the developmental signals that trigger differential regulation of biological processes that yield the discrete Gb and Gh phenotypes. Few studies to date were conducted to study fiber genomics at the developmental level in a single cotton species (Reviewed in [5, 6]), and no studies have focused on molecular differences between both species (Gb and Gh) at the transcriptional level. In our lab, stage-specific developmentally regulated genes during fiber morphogenesis were identified independently in Pima and TM1 species (Alabady and Wilkins, In Preparation). In this study, we describe a novel application of feature selection analysis to simultaneously select between both features (genes) and dimensions (time points) of the developmental transcriptome of the two species. This novel application is termed "double feature analysis" as it enables simultaneous selection between features and dimensions in an unsupervised learning context, and therefore differs from more traditional feature selection, which selects within only one variable.
The objectives of this study were to 1) Discover the top discriminating genes between the transcriptomes of Pima and TM1 fibers at the most distant developmental time points using the novel application of double feature selection (DFS) analysis, and 2) Determine the intersecting genes between developmentally regulated fiber profiles (Pima and TM1) and the top differentially expressed genes identified by DFS, and to link the intersected genes to specific stages of fiber development.
Reliability of the Microarray data and design
Developmentally regulated expression profiles in Pima and TM1
Developmental dissection of developmentally regulated genes in Pima and TM1 fiber transcriptomes revealed species- and stage-specific clusters.
No. of genes (%)
No. of clusters
Stage specific clusters (genes)
Pima developmentally regulated profile
1 (323), 5 (239)
Up1 during PCW synthesis only (4.65)
2 (232), 9 (209)
Up1 during SCW deposition only (3.65)
Up1 in PCW and SCW (3.31)
3(493), 6(588), 8 (217)
Down2 in PCW and SCW (10.76)
7 (77), 10 (86),
TM1 developmentally regulated profile
5 (234), 8 (122)
Up1 during PCW synthesis only (2.95)
6 (160), 7 (400)
Up1 during SCW deposition only (4.64)
Up1 in PCW and SCW (2.92)
2(392), 3(517), 4 (103)
Down2 in PCW and SCW (8.38)
Top differentially expressed genes between Pima and TM1 profiles of fiber transcriptome
Pima specific pattern (3.85)
TM1 specific pattern (5.81)
Differential expression profiles in Pima versus TM1 fibers
Top discriminating genes partially intersect with developmentally regulated genes
Identifying the intersecting genes between developmentally regulated profiles identified by independent analysis of Pima and TM1 transcripomes and the top discriminating genes identified by DFS serves two purposes: it assesses the level of significance in the top discriminating genes, and identifies species-specific differentially regulated profiles. A custom Perl script was developed to identify intersecting genes between the following three data sets: developmentally regulated fiber genes from Pima, developmentally regulated fiber genes from TM1, and the DFS top discriminating genes (Figure 2D). Three intersecting gene sets of interest were generated. Set 1 includes 125 genes that intersect the top differentially expressed genes identified by DFS and developmental profiles (Additional file 1), and represents highly-distant differentially expressed genes between developmentally-regulated fiber profiles of Pima and TM1. Specific to the Pima developmental profile, Set 2 includes 338 genes that intersect with the top DFS differentially expressed genes, and were not significantly expressed in TM1. Similarly, Set 3 included 240 genes that intersect with TM1 developmentally regulated genes and the top DFS differentially expressed genes, and were not significantly expressed in Pima (Figure 2D). These three gene sets are anticipated to be key to elucidating the genetic mechanisms that underpin phenotypic differences in the physical fiber traits of the two species. The expression patterns of Set 1 genes in Pima fiber were completely opposite to those of TM1, and likely reflect genetic differences in developmental programming that account for the genotypic differences responsible for the fiber phenotype. Moreover, the fact that Set 1 genes are significantly regulated in a manner that is diametrically opposed in the two species suggests that the expression level of these genes might be involved in triggering/stimulation the expression of genes in Set 2 (Pima-specific) and Set 3 (TM1-specific). In this article, functional analysis of genes in Set 1 (Additional file 1) was the main focus of the discussion.
Species and stage specific gene profiles
Developmental, function and pathway analyses of 125 cotton fiber genes (Set 1) identified by expression profiling and double feature selection analysis highlighting the top represented GO pathways.
Top pathway(P value)
Top GO(P value)
Slightly up in SCW
Highly up in SCW
Reductive carboxylase cycle (0.013)
Response to biotic stimulus (0.02)
Down in SCW
Slightly up in the overlapped transition from PCW to SCW
Stilbene, cumarine and lignin biosynthesis (0)
Purine nucleotide binding (0.025)
Up in both PCW and SCW
Down in both PCW and SCW
Starch and Sucrose metabolism (0.03)
Sucrose metabolism (0.02)
Up in PCW and down in SCW
Up in both PCW and SCW
VEGF signaling pathway (0.02)
Serine-type peptidase activity (0.0003)
Down in the transition from PCW to SCW and up in SCW
Up in the PCW and down in the transition from PCW to SCW
Phenylalanine metabolism (0.004)
Transferase activity (0.01)
Although developmental programming of fiber morphogenesis is very similar in both Pima and TM1, species-specific differences in the structure and composition of the cell walls produce discrete fiber phenotypes and fiber quality traits. It is crucial to discover the genes responsible for governing fiber properties, and especially those diagnostic genes of the Pima fiber phenotype for the genetic enhancement of fiber quality. The main objective here was to apply a novel application of double feature selection analysis to microarray data as a means for identifying transcriptional differences between developing Pima and TM1 fibers, and to gain novel insight into the mechanisms that underlie the phenotypic differences. From a global perspective, developmentally regulated genes during fiber morphogenesis only partially overlap in Pima and TM1, indicating that discrete genetic mechanisms govern fiber morphogenesis are involved. The higher percentage of the transcriptome that is specifically regulated in developing Pima fiber (17.24%) relative to TM1 (11.75%) is consistent with genetic mapping studies showing a decided bias for Pima alleles in interspecific mapping populations . Moreover, the portions of the transcriptome contributing to each of the developmental stages are higher in Pima than in TM1 for all stages of fiber development, with the surprising exception of the SCW stage, in which 4.64% of the transcriptome is specific to SCW synthesis in TM1 as opposed to only 3.65% in Pima (Table 1). Keeping in mind the superiority of Pima fiber quality relative to TM1 appears to be primarily controlled by cellulose synthesis and associated metabolic processes during the SCW biogenesis lends even greater credence to our findings. This study provides the first compelling evidence that developmental programming of fiber morphogenesis in Pima and Upland cottons are linked to discrete genetic mechanisms that govern the fiber phenotype (Table 1). Such information will prove vital to molecular breeding programs that focus on the genetic enhancement of fiber quality. The identification of developmentally-regulated, stage- and species-specific gene clusters in this study is the first step toward developing a comprehensive understanding of fiber development at the molecular genetics level. Moreover, genetic dissection of the fiber transcriptome into stage- and species-specific profiles paves the way for addressing important biological questions in plant cell development and applications in agricultural biotechnology to improve fiber traits.
The novel application of feature selection analysis established that the most distant time points, in terms of gene expression, between Pima and TM1 development occur at 17 and 24 dpa (Figure 2A). With a small genotype-based variation in developmental timing and duration of the PCW/SCW transition stage, the interval between 17 and 24 dpa represents the general frame for PCW/SCW transition stage. The duration of this stage is thought to be species-specific and longer in species that produces longer fibers , which will have direct influence in both fiber length and strength. This explains the importance of indentifying top discriminating genes at these two time points between the transcriptome of the two genotypes. Based on expression profiles of the fiber transcriptome at these two points, the top discriminating genes (1167) between Pima and TM1 are naturally classified into Pima-specific (3.85%) and TM1-specific (5.81%) as identified by K-means clustering and supported by SVD analysis (Figures 2B and 2C). The reliability and biological relevance of the results prove that double feature selection analysis offers a powerful new tool with applications in data mining of microarray data. Species-specific regulation patterns represented 60.21% of the top discriminating genes: 28.96% of which are Pima-specific, 20.56% TM1-specific, and 10.71% are regulated in both Pima and TM1 (Figure 2D). The results suggest that genes that intersect in the three data sets are major players of functional importance in fiber developmental programs that in turn, dictate the physical differences between Pima and TM1 fiber phenotypes. Interestingly, expression patterns of Set 1 genes (10.71%) in Pima fibers are the opposite of the patterns produced in TM1, leading to the speculation that expression of Set 1 genes influences, either directly or indirectly, the execution of genetic programs encoded by gene Sets 2 (28.96%) and 3 (20.56%) in Pima and TM1, respectively. Therefore, it is possible that differential expression of Set 1 genes between Pima and TM1 may trigger or stimulate the expression of genes in Sets 2 and 3 in a species-specific manner. Thus, Set 1 genes may be key to the successful genetic manipulation of cotton fiber traits. Further classification of Set 1 genes identified five species- and stage-specific gene clusters regulated primarily at the transition stage between PCW and SCW synthesis, and the early phases of SCW biogenesis (Table 2). Functional analysis of these five species- and stage-specific clusters unveiled major pathways that are differentially regulated in Pima in contrast to TM1 fibers. Cotton fiber cell walls are >96% cellulose at maturity and differential regulation of carbohydrate metabolism is no doubt a critical component in determining the fiber phenotype, which in turn, hinges on the structure and composition of the PCW and SCW. Therefore, the fact that sucrose metabolism (p = 0.02) is up-regulated during PCW and SCW in Pima versus TM1 and is associated with higher sucrose synthase activity is an important discovery. Sucrose synthase is thought to interact with the cellulose synthase complex to presumably act as a metabolic channel to convert sucrose into glucose via UDP-glucose to add sugar moieties to the growing glucan chain during cellulose synthesis . Therefore, differential expression of sucrose synthase maybe key to the development of superior Pima fibers. Similarly, phenolic synthesis pathways (PAL and coumarin) are up-regulated during PCW and PCW/SCW overlapping stages in TM1, whereas only the PAL pathway is up-regulated from 21 to 24 dpa in developing Pima fibers. It was reported that coumarin inhibits cellulose synthesis in both PCW and SCW biogenesis in in vitro ovule cultures . Based on this and our results, we therefore hypothesize that the up-regulation of coumarin biosynthesis pathway inhibits PCW extension in TM-1 relative to Pima fibers, in which coumarin biosynthesis is down-regulated. Increasing PAL activities result in elevated levels of ferulic acid (FA), which in turn, may lead to an increase in the level of cell wall-bound diferulic acid to curtail cell wall extensibility during the elongation stage . In contrast, the up-regulation of PAL activities in the early stages of Pima SCW biogenesis, suggests a major role in the termination of the cell elongation phase, and in determining important properties such as fiber length. Similar results were obtained by Wakabayashi et al.  where they showed that abscisic acid curtails the extensibility of cell wall of wheat coleoptiles by decreasing cell wall-bound ferulic and diferulic acid.
The novel application of double feature selection (DFS) combined with cluster analyses to mine independent microarray experiments proved effective in discovering new biologically relevant information not previously detected in microarray data. DFS revealed major biological processes that were linked to stage-specific events during cotton fiber development. Important metabolic processes, including sucrose metabolism and phenylpropanoid pathways, PAL and coumarin, are developmentally and differentially regulated in a genotype-specific manner as well. Based on these results, we propose that these processes in particular play a crucial role in a stage-specific manner that in turn, profoundly influence genotypic differences in fiber characteristics.
Independent experiments of two representative genotypes of cultivated cotton species, G. barbadense L. cv. Pima S7 (Gb) and G. hirsutum L. cv. TM1 (Gh), were grown in a randomized block design under identical greenhouse conditions. Total RNA was extracted from single-celled developing fibers harvested at 5, 8, 10, 14, 17, 21 and 24 days post-anthesis (dpa). Developmental profiles of the cotton fiber transcriptome generated from an oligonucleotide microarray platform (NCBI-GEO Accession GPL6917)  using a double loop experimental design  included a dye swap hybridization strategy (Figure 1A). Direct microarray hybridizations (total 28) were performed per genotype to generate the transcriptome profiles. Custom Perl and Python scripts were employed to filter microarray raw data prior to normalization and statistical analysis using Linear Model for Microarray Data (LIMMA) in R (version 1.9.0) statistical software package . Processed and raw data for Gb and Gh microarray experiments were deposited in NCBI-GEO in MIAME compliant format with the accession numbers GSE11689 and GSE11693, respectively.
Data filtration and normalization
Data were filtered by applying the following steps: 1) Low quality spots were manually flagged, 2) Background noise on each sub-array was reduced by subtracting the background intensities mean from the spot signal intensities, 3) Corrected spot intensities lower than the mean of corrected buffer spots plus 3 standard deviations were excluded , 4) The absolute ratios (M = R/G) of dye swaps were inversed and the upper 30% quantile of divergent data points were removed. This cut-off value enhanced the correlation between dye swaps by increasing the linearity, and 5) Exclusion of all genes in which fewer than 60% of the data points did not pass all the filtration steps. In order to adjust for the effects that arise from variation inherent to microarray technology rather than biological differences in the RNA samples, filtered data was normalized using intensity-dependent normalization . The robust scatter plot smoother 'Lowess' implemented in R package and LIMMA was used to perform a local A-dependent normalization. To make full use of within-array duplicated spots, LIMMA's pooled correlation method  was used to estimate the strength of the correlation between duplicated spots by fitting separate linear models to the expression data for each gene, but with a common value for between-replicate correlations.
Data accuracy and reliability
The accuracy and reliability of microarray data was evaluated through the following steps: 1) The quality of microarray data, for every hybridization, was assessed before and after normalization via MA scatter plots of log2ratio versus log2 amplitude signals, 2) Dye bias was adjusted based on the correlation between Cy3 and Cy5 signals from self-hybridization control experiments, and 3) The reliability of the double loop design was determined by estimating the linearity between data derived from direct and indirect routes, and between two indirect routes in the double loop design (Figure 1).
Linearization and significance analysis
Normalized high quality data fitted into seven linear models identified significant and differentially regulated genes at each developmental stage relative to each one of the other time points (global reference). The empirical eBayes method in LIMMA, which computes moderated t-statistics, moderated F-statistics, and log-odds of differential expression was applied to identify the significance of differential expression at each time point . A correction for multiple testing using False Discovery Rates (FDR)  was used and significant changes in gene expression limited to p < 0.05. After identifying the significantly expressed genes in all linear models, the following data sets were created for each genotype: 1) Transcriptome profiles that include the expression coefficient relative to the expression at 5 dpa for all genes as calculated from all data points (8 per gene per time point) generated by the double loop hybridization design, and 2) Developmentally regulated profiles that include all significantly differentially expressed genes during fiber development relative to the expression at 5 dpa.
Double feature selection
Feature selection has been used in many microarray data analyses in both supervised learning (classification) and unsupervised learning (clustering) contexts. Feature selection in our analysis is in the context of clustering. Double feature selection analysis was developed to discover the developmental time point(s) at which the greatest expressing distance (dissimilarity) occurred between the fiber developmental stages of the two genotypes, with simultaneous identification of the most differentially expressed genes between the transcriptomes of Pima and TM1 fibers. For this purpose, the expression matrix of the fiber transcriptome was used for each genotype. The novel double feature selection was developed as follows:
For each combination, the summation of D(i), for i = 1, ..., 12,063 was computed. Time points in the combination that produces the highest summation of D's are the most discriminating time points. D was normalized by the number of time points in the combination. Two D's cannot be compared if their numbers of time points are different. That is, we cannot directly compare a D from a combination of two time points with a D from three time points. In our analysis, we considered the most discriminating combination of two time points. Additional theoretical work would be needed if one wants to pick the most discriminating time points from all possible time point combinations without relying on prior knowledge of cotton fiber development. To investigate the discriminatory power of the double feature selection analysis, the top discriminatory genes (D ≥ 2) between Pima and TM1 fiber transcriptomes were clustered using K-means, in which the Silhoutte score was used as a measure of the goodness , and combines both within-cluster cohesion and between-cluster separation. In detail, for the ith object (gene), the silhouette score s i is defined as (b i -a i )/max(a i ,b i ) where a i is the average distance to all other objects in the cluster the ith object belongs to, and bi is the minimum of average distances between the ith object and all the objects any cluster not containing the ith object. In the K-means clustering, each object is associated with a silhoutte score. The average of these scores ranges from -1 to 1 where 1 is a perfect clustering, 0 is a random clustering, and -1 is a bad clustering. Moreover, the expression distance matrix of the top discriminating genes was assessed for the presence of natural clusters using the singular value decomposition .
Cluster analysis of developmentally regulated profiles
Differentially regulated genes in Pima and TM1 developing fibers were independently clustered using K-means. Similarly, the intersected genes between Pima developmentally regulated, TM1 developmentally regulated, and the top discriminating genes developed by double feature selection were clustered based on expression similarity. K-means clustering  using Pearson correlation as the similarity measure and 100 iterations was applied. This cluster analysis identified different stage-specific expression patterns in each gene set, and therefore linked the expression data with the various developmental stages in Pima and TM1 fibers.
Biological relevance of the results
Genes were functionally analyzed in three steps: 1) The highest level of differential expression in each pattern was correlated to the corresponding fiber developmental stage and hence, to the major biochemical processes within the stage, 2) Major cellular activities, biological processes, and molecular functions were identified using GO annotation , and 3) Metabolic pathways and associated genes were identified, when possible, using KEGG pathways .
double feature selection
primary cell wall
secondary cell wall
singular value decomposition
The authors gratefully acknowledge the contribution of Brad A Sickler to performing part of the microarray experiments and funding by the Texas Governor's Emerging Technology Superior Research Award granted to T.A.W.
- Xing EP, Jordan MI, Karp RM: Feature selection for high-dimensional genomic microarray data. Proceedings of the Eighteenth International Conference on Machine Learning. 2001, 601–608-Google Scholar
- John GH, Kohavi R, Pfleger K: Irrelevant features and the subset selection problem. 1994, New Brunswick, NJ, USA, Morgan Kaufmann, 129:Google Scholar
- Jirapech-Umpai T, Aitken S: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics. 2005, 6 (1): 148-10.1186/1471-2105-6-148.PubMed CentralView ArticleGoogle Scholar
- Loguercio LL, Zhang JQ, Wilkins TA: Differential regulation of six novel MYB-domain genes defines two distinct expression patterns in allotetraploid cotton (Gossypium hirsutum L.). Molecular and General Genetics MGG. 1999, 261 (4): 660-671. 10.1007/s004380050009.Google Scholar
- Wilkins TA, Arpat AB: Mini Review The cotton fiber transcriptome. 2005, Blackwell Synergy, 124 (3): 295-Google Scholar
- Lee JJ, Woodward AW, Chen ZJ: Gene Expression Changes and Early Events in Cotton Fibre Development. Ann Bot (Lond). 2007Google Scholar
- Rosenzweig BA, Pine PS, Domon OE, Morris SM, Chen JJ, Sistare FD: Dye-Bias Correction in Dual-Labeled cDNA Microarray Gene Expression Measurements. 2004, National Institute of Environmental Health Sciences, 112 (4): 480-488.Google Scholar
- Rousseeuw P: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. 1987, Elsevier Science Publishers BV Amsterdam, The Netherlands, The Netherlands, 20 (1): 53-65.Google Scholar
- Golub GH, Van Loan CF: Matrix Computations. 1996, Johns Hopkins University PressGoogle Scholar
- Paterson AH, Saranga Y, Menz M, Jiang CX, Wright R: QTL analysis of genotype× environment interactions affecting cotton fiber quality. Theor Appl Genet. 2003, 106 (3): 384-396.Google Scholar
- Beasley CA: Cellulose content in fibers of cotton, which differ in their lint lengths and extent of fuzz. Physiologia Plantarum. 1973, 45: 6-Google Scholar
- Amor Y, Haigler CH, Johnson S, Wainscott M, Delmer DP: A Membrane-Associated Form of Sucrose Synthase and Its Potential Role in Synthesis of Cellulose and Callose in Plants. PNAS. 1995, 92 (20): 9353-9357. 10.1073/pnas.92.20.9353.PubMed CentralView ArticleGoogle Scholar
- Montezinos D, Delmer DP: Characterization of inhibitors of cellulose synthesis in cotton fibers. 1980, Springer, 148 (4): 305-311.Google Scholar
- Wakabayashi K, Hoson T, Kamisaka S: Abscisic acid suppresses the increases in cell wall-bound ferulic and diferulic acid Levels in dark-grown wheat (Triticum aestivum L.) coleoptiles. 2007, Jpn Soc Plant Physiol, 38 (7): 811-817.Google Scholar
- Arpat A, Waugh M, Sullivan JP, Gonzales M, Frisch D, Main D, Wood T, Leslie A, Wing R, Wilkins T: Functional genomics of cell elongation in developing cotton fibers. Plant Mol Biol. 2004, 54 (6): 911-929. 10.1007/s11103-004-0392-y.View ArticleGoogle Scholar
- Kathleen Kerr M, A. Churchill G: Statistical design and the analysis of gene expression microarray data. 2001, Cambridge University Press, 77 (02): 123-128.Google Scholar
- The R project for statisitical computing. [http://www.r-project.org/]
- Hüser AT, Becker A, Brune I, Dondrup M, Kalinowski J, Plassmeier J, Pühler A, Wiegräbe I, Tauch A: Development of a Corynebacterium glutamicum DNA microarray and validation by genome-wide expression profiling during growth with propionate as carbon source. 2003, Elsevier, 106 (2-3): 269-286.Google Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30 (4): e15-10.1093/nar/30.4.e15.PubMed CentralView ArticleGoogle Scholar
- Smyth GK, Yang YH, Speed T: Statistical Issues in cDNA Microarray Data Analysis. 2003, Humana PressView ArticleGoogle Scholar
- Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. 2004, bepress, 3 (1): 3-Google Scholar
- Reiner A, Yekutieli D, Benjamini Y: Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics. 2003, 19 (3): 368-375. 10.1093/bioinformatics/btf877.View ArticleGoogle Scholar
- Konig R, Baldessari D, Pollet N, Niehrs C, Eils R: Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design. Nucl Acids Res. 2004, 32 (3): e29-10.1093/nar/gnh027.PubMed CentralView ArticleGoogle Scholar
- Han J, Kamber M: Data Mining: Concepts and techniques. 2006, Morgan KaufmannGoogle Scholar
- The Gene Ontology . [http://www.geneontology.org/]
- KEGG pathway database . [http://www.genome.ad.jp/kegg/pathway.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.