A new approach for the analysis of bacterial microarray-based Comparative Genomic Hybridization: insights from an empirical study
© Taboada et al; licensee BioMed Central Ltd. 2005
Received: 16 February 2005
Accepted: 27 May 2005
Published: 27 May 2005
Microarray-based Comparative Genomic Hybridization (M-CGH) has been used to characterize the extensive intraspecies genetic diversity found in bacteria at the whole-genome level. Although conventional microarray analytical procedures have proved adequate in handling M-CGH data, data interpretation using these methods is based on a continuous character model in which gene divergence and gene absence form a spectrum of decreasing gene conservation levels. However, whereas gene divergence may yet be accompanied by retention in gene function, gene absence invariably leads to loss of function. This distinction, if ignored, leads to a loss in the information to be gained from M-CGH data.
We present here results from experiments in which two genome-sequenced strains of C. jejuni were compared against each other using M-CGH. Because the gene content of both strains was known a priori, we were able to closely examine the effects of sequence divergence and gene absence on M-CGH data in order to define analytical parameters for M-CGH data interpretation. This would facilitate the examination of the relative effects of sequence divergence or gene absence in comparative genomics analyses of multiple strains of any species for which genome sequence data and a DNA microarray are available.
As a first step towards improving the analysis of M-CGH data, we estimated the degree of experimental error in a series of experiments in which identical samples were compared against each other by M-CGH. This variance estimate was used to validate a Log Ratio-based methodology for identification of outliers in M-CGH data. We compared two genome strains by M-CGH to examine the effect of probe/target identity on the Log Ratios of signal intensities using prior knowledge of gene divergence and gene absence to establish Log Ratio thresholds for the identification of absent and conserved genes.
The results from this empirical study validate the Log Ratio thresholds that have been used in other studies to establish gene divergence/absence. Moreover, the analytical framework presented here enhances the information content derived from M-CGH data by shifting the focus from divergent/absent gene detection to accurate detection of conserved and absent genes. This approach closely aligns the technical limitations of M-CGH analysis with practical limitations on the biological interpretation of comparative genomics data.
Comparison of intraspecies multi-strain bacterial genome sequence data has shown that, even over short evolutionary time scales, genome evolution is dominated by gene insertions/deletions and gene divergence [1–4]. Genome levels of intraspecies genetic diversity must be examined if we are to gain a better understanding of genome evolution  and if we are to maximize the practical use of bacterial genome sequence information, for instance for development of technical applications, e.g., vaccine or drug development.
One of the aims of bacterial intraspecies comparative genomics is to determine the overall genetic similarity between strains. Where sequence information is available, this type of analysis relies heavily on sequence homology and centres on the determination of conserved genes, strain-specific (i.e. unique) genes and, where the sequence provides unambiguous evidence, determination of orthologous and paralogous genes [6–9]. Although it has become increasingly apparent that obtaining the sequence of multiple strains per species is highly desirable, currently these types of datasets are limited in number. In their absence, other methods for performing comparative genomics have been developed. Among them, microarray-based comparative genomic hybridization (M-CGH) based on genome-sequenced strains has shown enormous potential [10–12].
Two different microarray-based approaches have been used to study the genetic composition of unknown bacterial strains. In the first approach, a control genome-sequenced strain was used as a reference to generate the probes for a microarray [13–16]. In the second approach, microarray probes were derived from the tester strain, either from a tester-derived shotgun library or a library enriched for tester-specific DNAs . With either approach, control- and tester-derived targets are co-hybridized to the microarray and control- and tester-derived signals are compared, often by computing the Log Ratio (LR) = log2(tester signal/control signal). Whereas genes with similar signal in either channel are expected to have LRs near zero, genes with LRs that deviate significantly from LR = 0 are likely to show copy number changes or sequence divergence between control and tester strains.
The relatively small number of studies on bacterial M-CGH has demonstrated the power of the method in a comparative genomics context despite a lack of consensus in current methods for analyzing M-CGH data. Although potential methods for standardizing and improving analysis have been suggested [15, 18] in practice, M-CGH data has routinely been analyzed by categorizing genes into two groups: genes that are likely to be conserved and genes that are likely to be divergent. One notable problem with this approach is that no attempt is made to differentiate between gene divergence and gene absence, despite the significant biological and evolutionary differences implied by these two types of events. A framework for improved analysis would require empirical data on the relationship between Log Ratio (LR) from M-CGH experiments and sequence conservation levels, however, to our knowledge no studies exist that have directly examined this question.
The availability of intraspecies genome data from two strains of Campylobacter jejuni [19, 20], has provided us with the opportunity to examine the quantitative relationship between the LR and probe/target identity (PTI) using our C. jejuni microarray. This experimental design allows us to directly match microarray results to the a priori interpretation of gene divergence and gene absence patterns. The goal of this study is to define the analytical parameters for the accurate prediction of gene conservation levels, leading to improved interpretation of M-CGH data. We present here the results of a detailed analysis of M-CGH experiments using the two genome-sequenced strains of C. jejuni.
Results and discussion
Determination of technical variation in M-CGH experiments
Analysis of the Log Ratio distribution of highly conserved genes
Analysis of the relationship between PTI and Log Ratio
Analysis of the Log Ratio distribution of absent genes
Determination of thresholds for highly conserved and absent genes
Microarray analysis, whether in the context of gene expression or M-CGH studies, is based on determining which genes have statistically significant differential hybridization signal between two samples. In M-CGH analysis, these differential signals are the result of sequence divergence or differences in copy number. Two critical issues rise to the forefront in M-CGH analysis: a) does a gene show genuine differential signal (i.e. outside the norms of variability due to experimental error); b) what is the nature of the event that gave rise to the differential signal (i.e. sequence divergence, copy number change)?
Because M-CGH generates hybridization data as a proxy for sequence similarity data, it is important that it be analyzed as such. While some empirical work has been carried out on probe/target identity (PTI) and data analysis using the microarray platform, the focus has largely been on optimization of species detection and/or identification in complex samples [21–23]. In these applications, the primary goal is that of optimizing probe sets and hybridization conditions to maximize the specificity of species-specific probe/target interactions, possibly at the expense of decreased assay sensitivity and thus the majority of microarrays used for species identification are oligonucleotide-based. By contrast, in comparative genomics, the primary goal is that of gene detection for the purpose of characterizing gene content, and thus the focus must shift to detection sensitivity in order to minimize the likelihood of false positive calls on gene absence events. Because oligonucleotide-based arrays can lead to erroneous gene absence calls , the majority of M-CGH studies have used amplicon-based microarrays, which are more sensitive albeit at the expense of specificity .
A common thread among bacterial M-CGH studies has been the grouping of all outliers into a single category. Currently it is unclear whether divergent and absent genes can be distinguished based on LR data alone. Although the lack of distinction between these types of events does not negate the results from these studies, it can potentially restrict further analysis of the data. For example, in any pair of intraspecies genomes, sequence similarity can be used to define genes absent in one or the other strain as well as genes that are conserved in both strains. Although the "biological interpretation" in the case of gene absence is unambiguous, many possibilities arise when sequences share any level of similarity. For instance, single nucleotide substitutions can lead to truncated or inactive gene products. Additionally, the level of sequence similarity required for full functional homology varies from gene to gene, increasing the complexity of the analysis even when DNA sequences are directly available. The inexact nature of hybridization analysis further compounds the difficulty in interpreting signal from divergent genes by M-CGH, and thus focusing on conserved and divergent genes ignores the increased reliability of gene absence calls.
In previous work, we presented data suggesting that highly negative LR values were consistent with gene deletion events, paving the way for making the distinction between divergent and absent genes based on LR data . When M-CGH data is analyzed such that gene absence events are grouped together with all other gene divergence events (i.e. as a continuous character model), it represents a significant loss of information both from a technical and from a biological point of view. In addition to the greater ambiguity in data interpretation as LRs approach the threshold for gene conservation, the continuous character approach negates the functional distinction that can be made between gene absence and gene divergence events. Because the LR thresholds described here could be used to reliably predict gene absence and gene conservation, it would be advantageous to focus the analysis of M-CGH data on the accurate detection of conserved and absent genes. While the data between the two thresholds should not be altogether discarded, the two thresholds represent boundaries defining regions in which gene absence and gene presence can be predicted with high confidence and thus should be given greater weight in subsequent analytical steps. It is important to note that the exact value of the LR thresholds presented here is specific to our experimental platform. The prediction accuracy achieved was remarkably high because of the uniform levels of variance across the multiple replicates analysed and because of the high correlation coefficients between replicates (the average ρ ≅ 0.92). This dataset was highly idealized because the relatively small number of replicates was carried out in such a way as to minimize technical variation. Nevertheless, a previous study in which we applied the thresholds described here on a large dataset showed that LRs below our "absence threshold" correlated very highly with other potential indicators of gene deletion .
Given the many documented sources of technical variability that can influence microarray results (e.g. variation in handling between individual investigators, laboratory conditions, microarray print batches), thresholds for gene presence/absence detection should be calibrated to the differential levels of technical variance found in individual microarray experiments, especially in large datasets. Kim et al  have suggested a solution to array-specific variance and normalization bias by determining thresholds specific to each array based on the point at which the LR distribution deviates from its inferred normal distribution. In practice, we have found that this approach can be susceptible to "narrow" LR distributions, leading to relaxed thresholds that yield an increased number of false positives for gene divergence. An alternative approach to deal with unequal variances and normalization biases across a dataset is based on normalizing multiple microarrays using the Z-score transformation [26, 27], in which LR values are divided by the standard deviation of the LR data distribution. Z score-based metrics could be used to replace Log Ratio-based metrics, enabling direct comparisons that are more valid because data from each microarray is "variance-calibrated".
Based on the higher than expected Log Ratio values obtained in the case of absent genes, the "relative accuracy" of Log Ratio measurements obtained from short probes is significantly compromised under the standard hybridization conditions we used. It is important to note however, that based on the average standard deviations observed (< 250 bp = 0.68; > 250 bp = 0.85), results obtained from short probes do not lack precision compared to those obtained from longer probes. Nevertheless, our results show that data obtained from short probes yield anomalously high Log Ratio values. It is only because our assay represents a closed system in which all components are known that we were able to determine that short probes can significantly underestimate Log Ratio measurements. These results would not have been readily apparent in a typical experiment since there would be no a priori knowledge on expected Log Ratio values. Although these results were obtained in a series of CGH experiments, the anomalous Log Ratio data from short probes is likely to be encountered under any type of microarray hybridization experiment, including gene expression-profiling experiments. Although longer probes performed better in our assay, this is likely a result of the higher signal intensity obtained with long probes relative to short probes. Optimal hybridization and scanning conditions for long probes would likely be sub-optimal for short probes, leading to decreased signal and a concomitant drop in Log Ratio amplitudes. Thus the problem resides not in probe length per se, but rather in the mixed probe lengths encountered in our microarray. These results have important implications towards microarray probe design because the adverse probe-length effect could be mitigated through standardizing probe length. Failing that, it would be advantageous to incorporate probe length effects into any analytical framework.
The results presented here have been used to examine the relationship between LR and sequence conservation. The variability inherent in hybridization-based approaches makes it unlikely that LR data from M-CGH experiments can be used to accurately predict the level of sequence identity among divergent genes. In view of the considerable ambiguity in interpreting the significance of gene divergence even when sequence information is available, the focus on gene divergence in M-CGH studies must be re-assessed. We have established thresholds for the use of LR values for the accurate detection of highly conserved and absent genes, which should increase the robustness of downstream data interpretation and should extend the range of biological interpretation of M-CGH data. An accurate determination of conserved and absent genes should increase the accuracy of strain genotyping, metabolic pathway prediction, and determination of conserved targets for vaccine or drug development from M-CGH data.
Bacterial strains and genomic DNA isolation
Strain RM1221 was obtained from Food Safety and Health Research Unit, USDA. Strain NCTC 11168 was obtained from the American Type Culture Collection (Mannassas, VA). Genomic DNA isolation was carried out as previously described 
Construction of a C. jejuni NCTC 11168 open reading frame DNA microarray
Genomic DNA was sheared to a mean fragment size of 1.5 Kb by nebulization in 35% glycerol at 15 PSI for 45 seconds as described by . For each sample, 5 μg of sheared DNA were fluorescently labelled using direct chemical coupling with the Label-IT (Mirus Corp., Madison, WI) cyanine dyes Cy3 and Cy5 as recommended by the manufacturer. Probes were purified by sequentially passing samples through SigmaSpin (Sigma, Oakville, ON) and Qiaquick (Qiagen, Mississauga, ON) columns. Labelled DNA sample yields and dye incorporation efficiencies were calculated using a Nanodrop ND-1000 spectrophotometer (Nanodrop, Rockland, DE). Microarray hybridizations were set-up by co-hybridizing 2 ug of differentially labelled genomic DNA samples and were carried out as previously described . NCTC 11168 versus RM1221 hybridizations were carried out in triplicate. A set of dye-swap experiments was also carried out, giving a total of 6 replicate experiments. Self-self hybridization experiments were carried out in which separate NCTC 11168 (or RM1221) genomic DNA samples were labelled with each Cy-dye and co-hybridized to the array.
Data acquisition and analysis
Microarrays were scanned and analyzed as previously described . Briefly, microarrays were scanned using a Chipreader laser scanner (BioRad, Mississauga, ON) according to the manufacturer's recommendations. Spot quantification, signal normalization and data visualization were performed using the program ArrayPro Analyzer (version 4.5; Media Cybernetics, Silver Spring, MD). Net signal intensities were obtained by performing local-ring background subtraction and spots with a signal less than 5-times above background were excluded from the analysis. Signal intensities for replicate spots were averaged and data from each channel were adjusted by sub-array normalization using cross-channel Loess regression. The ratio of tester signal to control signal for each gene was transformed to its base 2 logarithm , log2 [Tester Signal / C. jejuni NCTC 11168 Signal], hereafter referred to as "Log Ratio" (LR). LRs from the two "within slide" spot replicates were averaged. To increase the number of observations for statistical purposes, LR data from each microarray replicates were analyzed separately.
Determining level of sequence identity between probes and targets (PTI)
We used the BLAST software package  to determine the identity between microarray probes and predicted target sequences. Complete genome sequence information for C. jejuni NCTC 11168 and C. jejuni RM1221 was downloaded from the National Center for Biotechnology Information's Prokaryotic Genomes Database , GenBank records AL111168 and CP000025, respectively. We created BLAST databases from the nucleotide sequences of the open reading frames in each C. jejuni genome strain and queried them with the nucleotide sequences of each probe in our microarray using the BLASTN program. The percent identity of the best hit for each subject/query pair was determined.
List of abbreviations
microarray-based comparative genomic hybridization
standard deviation (or σ)
Funding for this work has been provided to ENT, RRA, WAF and JHEN through the National Research Council's Genomics and Health Initiative, and to CCL through Health Canada.
- Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL, Carmel G, Tummino PJ, Caruso A, Uria-Nickelsen M, Mills DM, Ives C, Gibson R, Merberg D, Mills SD, Jiang Q, Taylor DE, Vovis GF, Trust TJ: Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature. 1999, 397: 176-80. 10.1038/16495.PubMedView ArticleGoogle Scholar
- Anjum MF, Lucchini S, Thompson A, Hinton JC, Woodward MJ: Comparative genomic indexing reveals the phylogenomics of Escherichia coli pathogens. Infect Immun. 2003, 71: 4674-83. 10.1128/IAI.71.8.4674-4683.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Edwards RA, Olsen GJ, Maloy SR: Comparative genomics of closely related salmonellae. Trends Microbiol. 2002, 10: 94-9. 10.1016/S0966-842X(01)02293-4.PubMedView ArticleGoogle Scholar
- Fukiya S, Mizoguchi H, Tobe T, Mori H: Extensive genomic diversity in pathogenic Escherichia coli and Shigella Strains revealed by comparative genomic hybridization microarray. J Bacteriol. 2004, 186: 3911-21. 10.1128/JB.186.12.3911-3921.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Lan R, Reeves PR: Intraspecies variation in bacterial genomes: the need for a species genome concept. Trends Microbiol. 2000, 8: 396-401. 10.1016/S0966-842X(00)01791-1.PubMedView ArticleGoogle Scholar
- Avison MB: Comparative Genomics: Digging for Data. Genomics, Proteomics, and Clinical Bacteriology: Methods and Reviews. Edited by: Woodford N, Johnson AP. 2004, Totowa, NJ USA: Humana Press, 266: 47-70.View ArticleGoogle Scholar
- Kuroda M, Hiramatsu K: Genome Sequencing and Annotation: An Overview. Genomics, Proteomics, and Clinical Bacteriology: Methods and Reviews. Edited by: Woodford N, Johnson AP. 2004, Totowa, NJ USA: Humana Press, 266: 29-46.View ArticleGoogle Scholar
- Payne DJ, Gwynn MN, Holmes DJ, Rosenberg M: Genomic Approaches to Antibacterial Discovery. Genomics, Proteomics, and Clinical Bacteriology: Methods and Reviews. Edited by: Woodford N, Johnson AP. 2004, Totowa, NJ USA: Humana Press, 266: 231-60.View ArticleGoogle Scholar
- Van Ijperen C, Saunders NA: Microarrays for Bacterial Typing: Realistic Hope or Holy Grail?. Genomics, Proteomics, and Clinical Bacteriology: Methods and Reviews. Edited by: Woodford N, Johnson AP. 2004, Totowa, NJ USA: Humana Press, 266: 213-28.View ArticleGoogle Scholar
- Sails AD, Swaminathan B, Fields PI: Utility of multilocus sequence typing as an epidemiological tool for investigation of outbreaks of gastroenteritis caused by Campylobacter jejuni. J Clin Microbiol. 2003, 41: 4733-9. 10.1128/JCM.41.10.4733-4739.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Schouls LM, Reulen S, Duim B, Wagenaar JA, Willems RJ, Dingle KE, Colles FM, Van Embden JD: Comparative genotyping of Campylobacter jejuni by amplified fragment length polymorphism, multilocus sequence typing, and short repeat sequencing: strain diversity, host range, and recombination. J Clin Microbiol. 2003, 41: 15-26. 10.1128/JCM.41.1.15-26.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Yan W, Chang N, Taylor DE: Pulsed-field gel electrophoresis of Campylobacter jejuni and Campylobacter coli genomic DNA and its epidemiologic application. J Infect Dis. 1991, 163: 1068-72.PubMedView ArticleGoogle Scholar
- Dorrell N, Mangan JA, Laing KG, Hinds J, Linton D, Al-Ghusein H, Barrell BG, Parkhill J, Stoker NG, Karlyshev AV, Butcher PD, Wren BW: Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity. Genome Res. 2001, 11: 1706-15. 10.1101/gr.185801.PubMedPubMed CentralView ArticleGoogle Scholar
- Leonard EE, Takata T, Blaser MJ, Falkow S, Tompkins LS, Gaynor EC: Use of an open-reading frame-specific Campylobacter jejuni DNA microarray as a new genotyping tool for studying epidemiologically related isolates. J Infect Dis. 2003, 187: 691-4. 10.1086/368268.PubMedView ArticleGoogle Scholar
- Pearson BM, Pin C, Wright J, I'Anson K, Humphrey T, Wells JM: Comparative genome analysis of Campylobacter jejuni using whole genome DNA microarrays. FEBS Lett. 2003, 554: 224-30. 10.1016/S0014-5793(03)01164-5.PubMedView ArticleGoogle Scholar
- Taboada EN, Acedillo RR, Carrillo CD, Findlay WA, Medeiros DT, Mykytczuk OL, Roberts MJ, Valencia CA, Farber JM, Nash JH: Large-scale comparative genomics meta-analysis of Campylobacter jejuni isolates reveals low level of genome plasticity. J Clin Microbiol. 2004, 42: 4566-76. 10.1128/JCM.42.10.4566-4576.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Poly F, Threadgill D, Stintzi A: Identification of Campylobacter jejuni ATCC 43431-specific genes by whole microbial genome comparisons. J Bacteriol. 2004, 186: 4781-95. 10.1128/JB.186.14.4781-4795.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim CC, Joyce EA, Chan K, Falkow S: Improved analytical methods for microarray-based genome-composition analysis. Genome Biol. 2002, 3: RESEARCH0065-10.1186/gb-2002-3-11-research0065.PubMedPubMed CentralView ArticleGoogle Scholar
- Parkhill J, Wren BW, Mungall K, Ketley JM, Churcher C, Basham D, Chillingworth T, Davies RM, Feltwell T, Holroyd S, Jagels K, Karlyshev AV, Moule S, Pallen MJ, Penn CW, Quail MA, Rajandream MA, Rutherford KM, van Vliet AH, Whitehead S, Barrell BG: The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature. 2000, 403: 665-8. 10.1038/35001088.PubMedView ArticleGoogle Scholar
- Fouts DE, Mongodin EF, Mandrell RE, Miller WG, Rasko DA, Ravel J, Brinkac LM, Deboy RT, Parker CT, Daugherty SC, Dodson RJ, Durkin AS, Madupu R, Sullivan SA, Shetty JU, Ayodeji MA, Shvartsbeyn A, Schatz MC, Badger JH, Fraser CM, Nelson KE: Major structural differences and novel potential virulence mechanisms from the genomes of multiple campylobacter species. PLoS Biol. 2005, 3: e15-10.1371/journal.pbio.0030015.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu L, Thompson DK, Li G, Hurt RA, Tiedje JM, Zhou J: Development and evaluation of functional gene arrays for detection of selected genes in the environment. Appl Environ Microbiol. 2001, 67: 5780-90. 10.1128/AEM.67.12.5780-5790.2001.PubMedPubMed CentralView ArticleGoogle Scholar
- Tiquia SM, Wu L, Chong SC, Passovets S, Xu D, Xu Y, Zhou J: Evaluation of 50-mer oligonucleotide arrays for detecting microbial populations in environmental samples. Biotechniques. 2004, 36: 664-70. 672, 674-5PubMedGoogle Scholar
- Wilson WJ, Strout CL, DeSantis TZ, Stilwell JL, Carrano AV, Andersen GL: Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Mol Cell Probes. 2002, 16: 119-27. 10.1006/mcpr.2001.0397.PubMedView ArticleGoogle Scholar
- Daran-Lapujade P, Daran JM, Kotter P, Petit T, Piper MD, Pronk JT: Comparative genotyping of the Saccharomyces cerevisiae laboratory strains S288C and CEN.PK113-7D using oligonucleotide microarrays. FEMS Yeast Res. 2003, 4: 259-69. 10.1016/S1567-1356(03)00156-9.PubMedView ArticleGoogle Scholar
- Denef VJ, Park J, Rodrigues JL, Tsoi TV, Hashsham SA, Tiedje JM: Validation of a more sensitive method for using spotted oligonucleotide DNA microarrays for functional genomics studies on bacterial communities. Environ Microbiol. 2003, 5: 933-43. 10.1046/j.1462-2920.2003.00490.x.PubMedView ArticleGoogle Scholar
- Cheadle C, Vawter MP, Freed WJ, Becker KG: Analysis of microarray data using Z score transformation. J Mol Diagn. 2003, 5: 73-81.PubMedPubMed CentralView ArticleGoogle Scholar
- Colantuoni C, Henry G, Zeger S, Pevsner J: SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis. Bioinformatics. 2002, 18: 1540-1. 10.1093/bioinformatics/18.11.1540.PubMedView ArticleGoogle Scholar
- Campychip Description. [http://ibs-isb.nrc-cnrc.gc.ca/ibs/immunochemistry/campychips_e.html]
- Bodenteich AS, Chissoe Y, Wang F, Roe BA: Shotgun cloning as the strategy of choice to generate templates for high throughput dideoxynucleotide sequencing. Automated DNA sequencing and analysis techniques. Edited by: Adams MD, Fields C, Venter C. 1994, London, UK: Academic Press, 42-50.Google Scholar
- Smyth GK, Yang YH, Speed T: Statistical issues in cDNA microarray data analysis. Methods Mol Biol. 2003, 224: 111-36.PubMedGoogle Scholar
- Altschul SF, Lipman DJ: Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990, 87: 5509-13.PubMedPubMed CentralView ArticleGoogle Scholar
- National Center For Biotechnology Information's Prokaryotic Genomes Database. [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi]