- Research article
- Open Access
Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
© Aubourg et al; licensee BioMed Central Ltd. 2007
- Received: 13 July 2007
- Accepted: 02 November 2007
- Published: 02 November 2007
Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models.
The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS.
This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs.
- Transcriptome Data
- Arabidopsis Genome
- Arabidopsis Gene
- Normalize Logarithm Intensity
- Hybridization Threshold
Since the finishing of the whole genome sequencing of the model plant Arabidopsis thaliana and its first annotation by the international Arabidopsis community , gene prediction results have been regularly updated . Indeed, the MIPS and the TIGR have made available a new annotation release each year taking into account the completion of the genome sequence, the improvement of gene prediction tools and the increasing number of transcript sequences in the database . The latest version is based on recent annotation carried out by TAIR . In addition to this global semi-automatic annotation, different works have also improved Arabidopsis gene detection using orphan ESTs [5, 6], comparative genomics [7, 8], or combination of data through expertise of gene families .
In the framework of the European CATMA project [10, 11], a micro-array was produced with 24576 specific gene sequence tags (GSTs). These GSTs were defined from the Arabidopsis genome sequence to be highly specific in order to minimize cross-hybridization . The GST design was based not only on the TIGR annotation, but also on the predictions of protein coding genes obtained with the Eugene v1.0 software . Indeed, by combining different information (transcripts, splicing sites, translation initiation sites, coding potential and protein similarities), Eugene has provided an alternative Arabidopsis genome annotation. By comparing with the TAIR version 6.0 annotation release, the CATMA v2 GSTs tag 21 260 Arabidopsis TAIR genes and 677 regions defined up to now as intergenic. These 677 GSTs, specific to the CATMA resource, are excellent tools to reveal possible under-predicted functional genes in Arabidopsis. Furthermore, several predicted genes are tagged by at least 2 distinct GSTs, most often one overlapping each gene extremity. Previous works on gene annotation pointed out that erroneous gene merging is a usual shortcoming of gene predictors [14, 15]. With different GSTs associated with the same genes, we have a powerful way to identify such critical situations.
Available public transcriptome data produced with the CATMA micro-arrays were used to investigate these questions . The dataset of 1044 hybridizations using 522 different samples covers numerous developmental stages, biotic and abiotic stresses and mutants. All the micro-array experiments were performed in our laboratory with a normalized protocol of labeling, hybridization, data normalization and statistical analysis ensuring a perfect homogeneity of the data.
Selection of candidate GSTs
Characterization of novel genes
Sequence comparisons at the protein level and a search for PFAM motifs  were applied to each newly identified gene. For 215 genes (46%), significant similarities were detected at least in one other locus in the Arabidopsis genome and/or with proteins from different species, indicating that they belong to known gene families (Figure 3). Nevertheless, inference of function by similarity could be made for only 71 genes (15%) and the remaining 394 genes encode proteins with unknown biochemical function. Surprisingly, 86 genes (18%) were previously annotated by AGI members at the BAC scale (Figure 3) but their model was ignored in the whole genome annotation done later, probably because of poor supporting data.
In 61% of the cases, the latest Eugene v1.59 annotation provided a gene model. In the remaining 39%, we have evidence of the presence of transcriptional units overlapping the GST position but not any additional information on their intron-exon structure. Between the Eugene version used to design CATMA GSTs and the latest Eugene version, the number of false positive predicted genes decreased but some true positive genes were lost.
The topological distribution of the 465 novel genes is quite similar to all the Arabidopsis coding genes. They are evenly distributed in the 5 chromosomes and are rarely present in the peri-centromeric regions or other identified heterochromatic regions.
In 16 additional cases, expression signals associated with candidate GSTs have highlighted an erroneous annotation of the neighbor gene and have led to the improvement of gene models by significant extension of their respective CDS. The extension of these 16 CDS (by one to 4 exons in 3' or 5') is always confirmed by the coherent extension of similarities with homologous proteins (see Additional File 2).
Expression of novel genes
Erroneous gene merging
The CATMA microarrays, based on both Eugene v1.0 and TIGR annotations, allowed us to discover 465 novel genes and to improve 29 gene models (16 CDS extensions and 13 gene splits). Furthermore, the analysis of the transcriptome data from 522 hybridized samples brings an additional functional dimension with numerous expression conditions of these novel and corrected genes. The biological and biochemical roles of the large majority of the novel genes remain unknown since only 15% of them share similarities with proteins of known function (Figure 3). However, the analysis of the large transcriptome data available through CATdb  may provide the first insights as to their functions. Inference on functions for unknown genes by such a compendium approach has already been successfully reported on yeast .
The fact that Eugene Markov model detects a high coding potential at these loci suggests that the novel genes are encoding proteins, with a short mean size (Figure 4), and are not RNA genes or huge extensions of neighbor gene UTRs. Despite recent works based on different methods [20, 24], our results show that the "intergenic" section of the Arabidopsis genome is again reduced by the discovery of short genes characterized by a limited number of conditions promoting their expression. A recent application of the Affymetrix tiling array has recently highlighted novel transcribed regions in the Arabidopsis genome . The intersection with our results concerns only 16 genes. The fact that the tiling approach missed several novel genes detected by our CATMA based approach might be explained by the comparatively limited number of mRNA samples used by Hanada et al. . In April 2007, TAIR released the 7th version of the annotation genome with 681 new genes compared to the previous one . Only 70 genes out of the 465 novel genes identified by this work have been re-annotated at the structural level. As expected, these 70 genes are mainly those supported by cognate transcript sequences (see Additional File 1). All these results strongly illustrate that the annotation process is a long and difficult task and that many years are necessary after the first release of the sequence of a complex eukaryote genome to obtain (nearly) full knowledge of its gene content. Even 7 years after the publication of the complete sequence of the 5 Arabidopsis chromosomes , this goal has not yet been achieved. As our work shows, further progress requires the association of several and complementary approaches based on high-throughput experimental work and ab initio predictions. Due to the diversity of the possible approaches, in terms of confidence level and information content, the integration of the results is a process of increasing complexity that benefits a large community through the step-by-step updates of the Arabidopsis gene annotation previously done by TIGR  and pursued by TAIR .
The transcriptome data used in this work have all been produced with the CATMA v2 microarray . They include 522 hybridized samples extracted from 40 different projects which cover 12 organ types: cells (61 samples), protoplasts (18), roots (78), hypocotyls (28), stems (10), leaves (136), flowers (10), mature pollen (2), siliques (4), seeds (16), aerial (40) or whole plants (119). Hybridizations include 49 specific developmental conditions, i.e. specific developmental stages and organs, 39 mutants and 63 different abiotic/biotic stresses or treatments. All the transcriptome data are available in the CATdb database . They have also been deposited either in the NCBI GEO  or the EBI ArrayExpress  repositories (see additional file 4).
For each CATMA array, the raw data are the logarithm of median feature pixel intensity at 635 nm (red) and 532 nm (green) wavelengths; no background is subtracted. A normalization per array is performed to remove systematic biases. First, spots that are considered badly formed features are excluded. Then, a global intensity-dependent normalization is performed using the lowess procedure  to correct the dye bias. Finally, for each block, the log-ratio median calculated over the values for the entire block is subtracted from each individual log-ratio value to correct effects on each block (print-tip, washing and/or drying effects). At the end of the normalization step, a normalized log-ratio, i.e. an expression difference (in log base 2) between the two samples co-hybridized on the same array, is given for each spot. A normalized logarithm intensity for each sample is also calculated. This is done according to the within-array correction proposed by Yang and Thorne .
Since each comparison of two samples is performed in dye-swap, the log-ratio between the two co-hybridized samples is defined as the average of the normalized log-ratios of the two arrays of a dye-swap, and the intensity signal of a sample is defined as the average of the normalized logarithm intensities of the two arrays of a dye-swap.
Determination of the hybridized GSTs
We have developed a new statistical procedure to determine the set of probes whose intensity signal is considered significant, since existing procedures are either an arbitrary threshold based on an estimation of a local background or require the knowledge of a population of non-hybridized probes . Our procedure is divided into two steps. The first step consists in the estimation of the intensity distribution using mixture models. The use of mixture of distributions appears natural, as each component of the mixture can be interpreted in terms of clusters of probes whose signal intensities are similar. Two characteristics of the histograms under study are first that the signal is bounded, the lower bound being linked to the auto-fluorescence of probes and second that an important number of probes have a signal close to the lower bound. This leads to dissymmetrical histograms with a left peak. For this reason, we use a truncated Gaussian mixture model in order to indirectly model the peak. The introduction of truncation parameters allows us to re-weight the densities on a compact support the bounds of which are defined by the minimal and maximal values of the intensity signal. The model parameters are estimated with a modified EM algorithm. To be specific, we modified the traditional EM algorithm proposed by Dempster  by including a fixed-point algorithm in the M-step to take into account the bias in the empirical estimators . To best fit the histogram, a collection of mixture models of untruncated, left, right and left-right truncated Gaussian distributions is considered and, for each of them, the number of components varies between 1 and 5. The best model is chosen using the Bayes Information Criterion (BIC) . The second step of our procedure is to define a hybridization threshold from the estimated density based on the components of the mixture. It is done as follows: when intensity values are ranked by descending order, the hybridization threshold is the first intensity value such that the Maximum a posteriori (MAP) rule does not classify it on the component with the highest mean and such that one of the calculated posterior probabilities is greater than 10-4. Once the threshold is defined, an intensity signal is declared as significant when it is greater than the hybridization threshold and the associated GST is declared hybridized. Otherwise, the intensity signal is not significant and we consider that transcription of the corresponding gene is not detected.
Differential analysis for the detection of erroneous gene merging
We focus on distinct GSTs supposed to match the same gene, declared differentially expressed and which have log-ratio of opposite sign. To do that, a differential analysis is performed per dye-swap with a paired t-test on the normalized log-ratios. The number of observations per spot is inadequate for calculating a gene-specific variance. For this reason, it is assumed that the variance of the log-ratios is the same for all genes, and spots displaying extreme specific variances (too small or too large) are excluded. The raw P values are adjusted by the Bonferroni method, which controls the Family Wise Error Rate (FWER) . When the Bonferroni P-value is lower than 0.05, the gene is declared differentially expressed. Genes with a missing P-value are genes with a too small or a too large specific variance or genes for which only one observation is available, i.e. when for one of the two arrays the spot corresponding to the gene was a badly formed feature.
Searches of cognate transcripts, RACE-PCR products, previous lost annotation, and of homologous proteins in Arabidopsis or in other species have been carried out by sequence comparisons (BLASTn or BLASTp) with GenBank Release 159. Additional information such as PFAM motifs, MPSS tags, GST position and Eugene v1.59 CDS models have been retrieved from the FLAGdb++ database [17, 18].
RT-PCR and sequencing
Primers for RT-PCR were designed using Primer3  with the following parameters: primer size 20–21 mers, primer minimum Tm 50°C, primer maximum Tm 65°C, maximum Tm difference 3, primer minimum GC 40%, product minimum size 130, product optimal size 150, product maximum size 200. All other parameters were left at default values. The resulting primer sets are available in supplementary data (see Additional Files 1 and 2). Reverse transcription was performed on 2 μg of total RNA using an oligodT primer (18 mers) and the Superscript II reverse transcriptase (Invitrogen), for 1 hour at 42°C. The enzyme was then heat-inactivated at 65°C and the samples were treated with RNase H. Negative controls were performed without reverse transcriptase (RT-) on each sample with at least twenty couples of primers in order to check for any remaining DNA contamination. PCR amplifications were carried out from 2 μl of the RT product in the presence of 1 u of Taq DNA Polymerase (Biolab) in a 50 μl final volume, using the following program: hold for 5 min at 94°C; 35 cycles of 30 sec at 94°C, 30 sec at 58°C, and 30 sec at 72°C; and 7 min at 72°C; then 4°C. Ten μl of the RT-PCR products were run on a 2% agarose gel. The remaining part of the RT-PCR products was used for sequencing.
We are grateful to all the CATMA partners and collaborators for the making of the micro-arrays and all the transcriptome projects used in the study. We acknowledge Franck Samson for FLAGdb++ developments, Carine Serizet for the Eugene v1.59 annotation of the Arabidopsis genome, and Joan Sobota for correcting the manuscript. The URGV CATMA resource has been funded by Génoplante.
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Wortman JR, Haas B, Hannick LI, Smith RK, Maiti R, Ronning CM, Chan AP, Yu C, Ayele M, Whitelaw CA, White OR, Town CD: Annotation of the Arabidopsis genome. Plant Physiol. 2003, 132: 461-468. 10.1104/pp.103.022251.PubMed CentralPubMedView ArticleGoogle Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003, 31: 5654-5666. 10.1093/nar/gkg770.PubMed CentralPubMedView ArticleGoogle Scholar
- Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 2003, 31: 224-228. 10.1093/nar/gkg076.PubMedView ArticleGoogle Scholar
- Riano-Pachon DM, Dreyer I, Mueller-Roeber B: Orphan transcripts in Arabidopsis thaliana: identification of several hundred previously unrecognized genes. Plant J. 2005, 43: 205-212. 10.1111/j.1365-313X.2005.02438.x.PubMedView ArticleGoogle Scholar
- Hirsch J, Lefort V, Vankersschaver M, Boualem A, Lucas A, Thermes C, d'Aubenton-Carafa Y, Crespi M: Characterization of 43 non-protein-coding mRNA genes in Arabidopsis, including the MIR162a-derived transcripts. Plant Physiol. 2006, 140: 1192-1204. 10.1104/pp.105.073817.PubMed CentralPubMedView ArticleGoogle Scholar
- Bonnet E, Wuyts J, Rouzé P, Van de Peer Y: Detection of potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target gene. PNAS. 2004, 101: 11511-11516. 10.1073/pnas.0404025101.PubMed CentralPubMedView ArticleGoogle Scholar
- Katari MS, Balija V, Wilson RK, Martienssen RA, McCombie WR: Comparing low coverage random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for their ability to add to the annotation of Arabidopsis thaliana. Genome Res. 2005, 15: 496-504. 10.1101/gr.3239105.PubMed CentralPubMedView ArticleGoogle Scholar
- Aubourg S, Brunaud V, Bruyère C, Cock M, Cooke R, Cottet A, Couloux A, Déhais P, Deléage G, Duclert A, Echeverria M, Eschbach A, Falconet D, Filippi G, Gaspin C, Geourjon C, Grienenberger JM, Houlné G, Jamet E, Lechauve F, Leleu O, Leroy P, Mache R, Meyer C, Nedjari H, Negrutiu I, Orsini V, Peyretaillade E, Pommier C, Raes J, Risler JL, Rivière S, Rombauts S, Rouzé P, Schneider M, Schwob P, Small I, Soumayet-Kampetenga G, Stankovski D, Toffano C, Tognolli M, Caboche M, Lecharny A: The GENEFARM project: structural and functional annotation of Arabidopsis gene and protein families by a network of experts. Nucleic Acids Res. 2005, 33: D641-D646. 10.1093/nar/gki115.PubMed CentralPubMedView ArticleGoogle Scholar
- Crowe ML, Serizet C, Thareau V, Aubourg S, Rouzé P, Hilson P, Beynon J, Weisbeek P, van Hummelen P, Reymond P, Paz-Ares J, Nietfeld W, Trick M: CATMA – A complete Arabidopsis GST database. Nucleic Acids Res. 2003, 31: 156-158. 10.1093/nar/gkg071.PubMed CentralPubMedView ArticleGoogle Scholar
- Hilson P, Allemeersch J, Altmann T, Aubourg S, Avon A, Beynon J, Bhalerao RP, Bitton F, Caboche M, Cannoot B, Chardakov V, Cognet-Holliger C, Colot V, Crowe M, Darimont C, Durinck S, Eickhoff H, de Longevialle AF, Farmer EE, Grant M, Kuiper MT, Lehrach H, Léon C, Leyva A, Lundeberg J, Lurin C, Moreau Y, Nietfeld W, Paz-Ares J, Reymond P, Rouzé P, Sandberg G, Segura MD, Serizet C, Tabrett A, Taconnat L, Thareau V, Van Hummelen P, Vercruysse S, Vuylsteke M, Weingartner M, Weisbeek PJ, Wirta V, Wittink FR, Zabeau M, Small I: Versatile gene-specific sequence tags for Arabidopsis functional genomics: Transcript profiling and reverse genetics applications. Genome Res. 2004, 14: 2176-2189. 10.1101/gr.2544504.PubMed CentralPubMedView ArticleGoogle Scholar
- Thareau V, Déhais P, Serizet C, Hilson P, Rouzé P, Aubourg S: Automatic design of gene-specific sequence tags for genome-wide functional studies. Bioinformatics. 2003, 19: 2191-2198. 10.1093/bioinformatics/btg286.PubMedView ArticleGoogle Scholar
- Schiex T, Moisan A, Rouzé P: Eugène, an eukaryotic gene finder that combines several sources of evidence. Lect Notes Computational Sciences. 2001, 2066: 111-125.View ArticleGoogle Scholar
- Aubourg S, Rouzé P: Genome Annotation. Plant Physiol Biochem. 2001, 39: 181-193. 10.1016/S0981-9428(01)01242-6.View ArticleGoogle Scholar
- Mathé C, Sagot M-F, Schiex T, Rouzé P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 2002, 30: 4103-4117. 10.1093/nar/gkf543.PubMed CentralPubMedView ArticleGoogle Scholar
- CATdb, a CATMA Arabidopsis transcriptome database. [http://urgv.evry.inra.fr/CATdb]
- Samson F, Brunaud V, Duchêne S, De Oliveira Y, Caboche M, Lecharny A, Aubourg S: FLAGdb++: a database for the functional analysis of the Arabidopsis genome. Nucleic Acids Res. 2004, 32: D347-D350. 10.1093/nar/gkh134.PubMed CentralPubMedView ArticleGoogle Scholar
- FLAGdb++, an integrative database around plant genomes. [http://urgv.evry.inra.fr/FLAGdb]
- Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S: The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res. 2004, 14: 1641-1653. 10.1101/gr.2275604.PubMed CentralPubMedView ArticleGoogle Scholar
- Moskal WA, Wu HC, Underwood BA, Wang W, Town CD, Xiao Y: Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome. BMC Genomics. 2007, 8: 18-10.1186/1471-2164-8-18.PubMed CentralPubMedView ArticleGoogle Scholar
- Korf I, Flicek P, Duan D, Brent MR: Integrating genomic homology into gene structure prediction. Bioinformatics. 2001, 17 (Suppl 1): S140-S148.PubMedView ArticleGoogle Scholar
- Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A: Pfam: clans, web tools and services. Nucleic Acids Res. 2006, 34: D247-D251. 10.1093/nar/gkj149.PubMed CentralPubMedView ArticleGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5.PubMedView ArticleGoogle Scholar
- Hanada K, Zhang X, Borevitz JO, Li W-H, Shiu S-H: A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis genome are transcribed and/or under purifying selection. Genome Res. 2007, 17: 632-640. 10.1101/gr.5836207.PubMed CentralPubMedView ArticleGoogle Scholar
- TAIR, The Arabidopsis Information Resource. [http://www.Arabidopsis.org]
- TIGR, The Institute for Genomic Research (J. Craig Venter Institute). [http://www.tigr.org]
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: Mining tens of millions of expression profiles, database and tools update. Nucleic Acids Res. 2007, 35: D760-D765. 10.1093/nar/gkl887.PubMed CentralPubMedView ArticleGoogle Scholar
- Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky E, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone S: ArrayExpress, a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003, 31: 68-71. 10.1093/nar/gkg091.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang YH, Thorne N: Normalization for two-color cDNA microarray data. IMS Lecture Notes – Monograph Series. Edited by: Goldstein DR. 2003, Science and Statistics, 40: 403-418.Google Scholar
- Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ, White KP: A gene expression map for the euchromatic genome of Drosophila melanogaster. Science. 2004, 306: 655-660. 10.1126/science.1101312.PubMedView ArticleGoogle Scholar
- Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistics Society. 1977, 39: 1-38.Google Scholar
- Johnson NL, Kotz S, Balakrishnan N: Continuous Univariate Distributions. Edited by: John Wiley & Sons. 1994, New-York: Series in Probability and Statistics, 2: 2Google Scholar
- Schwarz G: Estimating the dimension of a model. Ann Statist. 1978, 6: 461-464.View ArticleGoogle Scholar
- Ge Y, Dudoit S, Speed TP: Resampling-based multiple testing for microarray data analysis. TEST. 2003, 12: 1-44. 10.1007/BF02595811.View ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 in the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.