SAMMD: Staphylococcus aureus Microarray Meta-Database
© Nagarajan and Elasri. 2007
Received: 22 May 2007
Accepted: 02 October 2007
Published: 02 October 2007
Skip to main content
© Nagarajan and Elasri. 2007
Received: 22 May 2007
Accepted: 02 October 2007
Published: 02 October 2007
Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles.
SAMMD is hosted and available at http://www.bioinformatics.org/sammd/. Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their expression data and understand it at global level. It will also allow scientists to compare and contrast their transcriptome to that of the other published transcriptomes.
Staphylococcus aureus is an important human pathogen, causing diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is the foremost cause of nosocomial infections. S. aureus is also posing serious threats because of its ability to acquire multiple antibiotic resistances. Microarray studies enable the analysis of the pathogens response at a global level. There have been several studies carried out on the global expression profiles of S. aureus in response to different effectors like vancomycin , mild acid , stress  etc. There are also several transcriptional profiles of regulatory genes like sigB , sarA , mgrA  etc. To date there are about 30 published journal articles that contain about 67 microarray experiments in S. aureus. The use of this large amount of expression data is limited by the fact that it is not located in a centralized source. In addition, those data that have been deposited in the public databases are difficult to use for direct comparisons to data generated by researchers. We have addressed this issue by building a Staphylococcus aureus microarray meta-database (SAMMD) which contains all the published microarray data generated for S. aureus. SAMMD is a web accessible database that allows users to mine for information about a single or several genes. SAMMD can also be used to compare a whole transcriptome to published data.
Databases are increasingly useful in biology as huge amount of data is generated by high throughput techniques such as Microarray technology. Computational tools are essential to analyse the vast mines of archived data and generate biological information. Scientists are encouraged to deposit published data in public databases such as the National Centre for Biotechnology Information – Gene Expression Omnibus (NCBI-GEO)  or European Bioinformatics Institute (EBI) Array Express , or Stanford Microarray Database (SMD) . To date, most groups however, have not complied resulting in a large amount of published DNA microarray data that is inaccessible for further analysis by other scientists. This makes it difficult to manipulate data or make comparisons with other experiments.
Even when raw data is available online, the lack of computational tools and expertise as well as the difference in platforms used to generate the data makes difficult to take full advantage of these resources. SAMMD addresses these issues by providing a central location for S. aureus microarray data. SAMMD was designed to allow users to quickly and easily mine the vast and growing collection of S. aureus transcriptomic data across different platforms. The search functions in SAMMD allow in depth analysis for the expression of one or collections of genes. SAMMD is a valuable tool for understanding the molecular mechanisms of pathogenesis in S. aureus.
SAMMD is the first database that contains all the transcriptomic data for S. aureus. Databases devoted to other organisms have been developed. For instance, The Saccharomyces Genome Database (SGD)  is devoted to the yeast, Saccharomyces cerevisiae. SGD is a comprehensive genome database, which also contains information about several yeast Microarray experiments. Two other similar databases are devoted to E. coli gene expression , and to the human microarray data (LOLA) .
SAMMD is a highly valuable tool for staphylococcal research. SAMMD is useful to study a single gene, several genes, or a genome-wide transcriptome. SAMMD can also be used to gain insights about mutational status using transcriptomic data. Since SAMMD will be updated constantly as new transcriptomes are published, its utility and value will continue to grow.
SAMMD is a relational database consisting of five tables (Database schema is presented in the SAMMD help page). The "annotation" table includes information about ORF IDs from six different strains of S. aureus (N315, COL, Mu50, MW2, MRSA252 and MSSA476), obtained from the primary source (National Centre for Biotechnology Information). The "experiment" table includes information about the Microarray experiments including the regulator or growth condition, number of replicates, strains, array platform, data analysis software, fold change cut-off value, and Pubmed ID (PMID). In addition, the experiment table will indicate whether RNA stabilising agents were used and if the raw datasets are available. The "reference" table contains references to published journal articles from which the data was extracted. The "regulated" table contains the list of differentially expressed genes (represented from the all the above mentioned strains of S. aureus), the effect of the mutation or growth condition on each gene (up or down). A table labelled "others" consists of information about non-transcriptomic DNA Microarray experiments such as genome comparisons. Corresponding primary and foreign keys are used to link the tables used in SAMMD.
Details about the experiments such as fold change cut off, Microarray platform, strains, etc. were obtained by carefully studying the journal articles. The related lists of regulated genes were extracted from either the journal article (from PDF files) or their respective supplemental files (Word or Excel files). The extracted lists of ORF IDs were mapped to N315 ORD IDs, using perl scripts, to enable comparison studies. The mapping scripts were written based on TIGR annotation files (S. aureus Version 6). Currently the mapping to N315 ORF ID is done for ORF IDs from 5 strains (MW2, COL, Mu50, MRSA252, and MSSA476). Genes that are not found in strain N315 are included in the database using their original name. The extracted data was entered in to the database using phpMyAdmin interface.
Quality of the data was checked at various points either by perl scripts or manual inspections. For instance, most of the S. aureus transcriptional experiments have few redundant genes from different strains represented in the same slide. These redundant genes are also reported as different entries in the list of regulated genes. In SAMMD, such duplicate entries were removed from the extracted and mapped gene lists. SAMMD has a feed back form, which the users could use to email the authors about any errors or other problems that they encounter.
We have also implemented a full text search against the "NCBI Product Name" column of the "annotation" table, under the search by "ORF Function" option. This lets users execute searches in SAMMD using key words. This full text search is Boolean operator enabled allowing users to add operators such as "AND", "OR", "NOT", "" and "*" to limit their search. The SAMMD help page contains more detailed help with examples about the usage of these Boolean operators.
Options are also available for users to download relevant data from the database. Current statistics about the number of records in the database and a detailed help page with example gene lists and usage illustrations are also available. Contact information is also included for additional help with SAMMD and to receive comments and suggestions from users in order to improve the database.
SAMMD would be of immense use to scientists who are working on S. aureus. Knowing the transcriptional status of a particular gene from the literature might be a cumbersome task, because of the different semantics that are used to denote the genes from different strains. Manual searching for the transcriptional status of a particular gene becomes a laborious task, given the number of experiments and the huge number of regulated gene lists. SAMMD helps molecular microbiologists to over come these problems.
Comparative transcriptome analysis becomes conveniently possible with the help of SAMMD. Scientists who perform Microarray based experiments, could now easily compare their list of regulated genes (transcriptome) to that of the other transcriptomes in the database. They could also compare among the datasets that are already in SAMMD. This could help them to find out regulatory patterns and connections between their transcriptome and other transcriptomes, using the list of overlapping genes.
We are planning to incorporate a graphic module to represent the list of overlapping genes that is generated as a result of advanced search using a user entered list of genes.
Microarray gene expression databases like NCBI-GEO and EBI-ArrayExpress hold the raw data only for a very few of the 30 published Microarray papers that are listed in SAMMD. The raw data that are published in these databases are not easily accessible for the use by biologists.
By developing SAMMD, we have addressed these issues. SAMMD is a searchable database of Microarray gene expression data of S. aureus. Such a database is valuable in staphylococcal research in light of increasing multiple-antibiotic resistance in S. aureus. SAMMD will allow scientists to study the role of individual genes in the context of global transcriptomes as well as enable comparison of new transcriptomes to published ones. SAMMD will facilitate understanding of the complex regulatory networks of S. aureus.
The database is entirely based on open source concept and hence its usage is licensed under GNU General Public License (GPL). The database is available at the URL: http://bioinformatics.org/sammd/
Staphylococcus aureus Microarray Meta-Database
Open Reading Frame
General Public License
National Centre for Biotechnology Information
Gene Expression Omnibus
European Bioinformatics Institute
Saccharomyces Genome Database
Stanford Microarray Database
This study was supported by a grant (1R15AI062727-01A1) from the National Institute of Allergy and Infectious Diseases (NIAID) to MOE and by The Mississippi Functional Genomics Network (NCRR/NIH P20 RR016476).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.