- Methodology article
- Open Access
Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis
© Yavatkar et al; licensee BioMed Central Ltd. 2008
- Received: 17 October 2007
- Accepted: 28 February 2008
- Published: 28 February 2008
Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.
We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.
EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.
- Vertebrate Genome
- Test Genome
- Alignment Parameter
- Conserve Sequence Block
- Comparative Genomic Tool
Comparative analysis of orthologous DNA has revealed that many cis-regulatory enhancers contain multi-species conserved sequences (MCSs) that are essential for their transcriptional regulation (reviewed by [1–4]). We have previously described EvoPrinter and cis-Decoder, both web-accessed tools for discovering and comparing conserved sequences that are shared among three or more orthologs [4, 5]. Generated from superimposition of multiple pair-wise BLAT alignments , an EvoPrint provides an ordered uninterrupted representation of conserved sequences as they exist in the genome of interest. When multiple species are included in the analysis, near base-pair resolution of conserved sequences required for gene function can be achieved. For example, when 12 Drosophila species, representing ~200 million years of cumulative evolutionary divergence, are included in the EvoPrint process, one can identify sequences that are essential for cis-regulatory function (both enhancers and minimal promoters), conserved protein encoding sequences, and micro-RNA binding sites. EvoPrinterHD is a second-generation alignment tool that automates the comparative analysis to rapidly identify a significantly higher percentage of conserved sequences shared among evolutionarily distant orthologs even if they exist within rearranged DNA. In contrast to most comparative multi-sequence alignment tools (reviewed by ), which display columns of sequences that contain gaps to optimize alignments, the species-centric EvoPrint is a single uninterrupted sequence and thus displays more bases in a single view than is possible with conventional alignments. In addition, the uninterrupted readout allows for the rapid extraction and automated curation of conserved DNA from the genome of interest.
At the core of the original multi-genome EvoPrinter alignment algorithms is the BLAT algorithm  for pairwise alignments. Although BLAT alignments generate uninterrupted representations of the aligning regions, one drawback of BLAT when performing alignments of evolutionarily distant DNAs, as initially noted by Kent , is that short regions of homology that span the non-overlapping 11-mers go undetected. We developed eBLAT to overcome the inability of BLAT to detect these short blocks of homology. To accomplish this, each genome is indexed three independent ways, each staggered differently; additionally, the alignment parameters have been adjusted to enhance the detection of short blocks of sequence conservation. By performing three independent alignments using the staggered indices with the optimized alignment parameters and then superimposing the resulting alignments to show all aligning sequences, the overall detection of conserved sequences has been improved by as much as 75% when evolutionary distant orthologous sequences are aligned.
In addition to the automated alignments for bacteria, nematode, mosquito, Drosophila, and vertebrate genomes, and the higher eBLAT resolution, EvoPrinterHD includes algorithms that search the intra-genomic aligning regions for rearrangements, duplications and sequencing gaps. EvoPrints generated with composite eBLATs highlight conserved sequences within the reference DNA irrespective of genomic rearrangements within one or more of the aligning regions. Four additional programs have been added: (1) an EvoDifferences profile, portraying in a single view the conserved sequences that are detected in all but one of the species included in the EvoPrint; (2) input reference DNA exchange, allowing for detection of species-specific changes in the less-conserved DNA flanking MCSs; (3) automated extraction and curation of conserved sequence blocks (CSBs), facilitating their comparative analysis , and (4) for bacteria, an EvoUnique print that highlights unique or uniquely shared sequences among subsets of genomes. Due in part to its speed and flexibility of genome selection, EvoPrinterHD interfaces well with other web-accessed tools. The time required to undertake a comparative genome analysis of sequences that contain putative cis-regulatory enhancers is significantly reduced. For example, a 12 Drosophila EvoPrint analysis and curation of CSBs within a 2 Kb genomic region that contains a cluster of transcription factor DNA-binding sites (discovered using the FlyEnhancer genome motif search tool ) requires less than 30 seconds. Once CSBs are discovered, subsequent analysis via cis-Decoder algorithms enable the generation of conserved sequence tag libraries that further facilitate enhancer comparative studies.
The following is a description of the sequential steps and accompanying algorithms used by EvoPrinterHD to identify conserved sequences shared among multiple genomes. Instructions and a tutorial for optimizing its use can be accessed at the EvoPrinterHD web site .
In addition to the original non-overlapping 11-mer genomic index of BLAT , EvoPrinterHD indexes each genome into a second set of non-overlapping 11-mers, offset by four base pairs from the initial indexing, and into a third set of non-overlapping 9-mers. The resulting staggered indexing increases the likelihood that homologous regions missed by any one of the individual indices will be identified. The use of multiple genome indices and optimization of the alignment phase parameters (see below) is the basis of the enhanced detection of conserved sequences between evolutionarily distant orthologous DNAs.
EvoPrinterHD currently holds in memory three independent indices of each of 37 bacteria, 3 mosquito, 5 nematode, 12 Drosophila and 20 vertebrate genomes, representing ~112 billion bp in total memory.
Modification of BLAT search and alignment parameters
The alignment sensitivity of EvoPrinterHD for the discovering short blocks of conserved sequence homology between evolutionary distant orthologs was increased by optimizing the Genomic Finding (gf) client program parameters of the original BLAT algorithm . The search and alignment parameters were adjusted by: (1) optimizing the stringency factor for low homology alignments by increasing it from 0.0005 to 0.001, (2) reducing the initial expansion gap between adjacent hits from a setting of four to three, (3) reducing the additional expansion gap penalty from three to one, (4) maximizing the allowable gaps and inserts from 12 to 16, and (5) changing the value of allowable codon gap parameter from two to three to optimize for codon polymorphisms in open reading frames.
Detecting conserved sequences with EvoPrinterHD algorithms
To maximize the identification of short CSBs between evolutionary divergent orthologs, EvoPrinterHD generates 3 different input reference DNA vs. test genome BLAT alignments to the same aligning region using the three indices described above. As an output of the client program, EvoPrinterHD then generates a superimposed composite of the 3 different alignments. The algorithm does this by first creating an array of nucleotide strings of each of the 3 input reference DNA BLAT alignment sequences and then loops through the strings one base at a time, outputting a capital letter when at least one of the 3 readouts has an aligning base at that position, thereby generating a composite readout that displays all conserved bases. The program also generates BLAT readouts of the test genome aligning region and both are stored in memory for later analysis, EvoPrint generation and for exchange of input reference DNA, accomplished by selecting one of the aligning region sequences as the new reference sequence to reinitiate the analysis. The algorithm also generates eBLATs for the second and third highest score aligning regions for each of the selected genomes.
The mosquito, nematode, Drosophila and Staphylococcus EvoPrinterHD algorithms automatically generate, respectively, 27, 45, 108 and 153 pairwise BLAT alignments, assembles 9, 15, 36, and 51 eBLAT readouts, and then superimposes the individual pairwise eBLAT alignments (3 per genome) to generate a color-coded composite-eBLAT (ceBLAT) for each aligning region. The vertebrate EvoPrinterHD and enteric bacteria EvoPrinterHD both generate up to 180 pairwise BLAT alignments assembling 60 eBLAT readouts and 20 ceBLATs. To reduce alignment times, EvoPrinterHD algorithms currently employ two Dell PowerEdge (2.8 GHz/64 GB RAM; 6950 series) dual quad-core processor servers operating in parallel with the RedHat Enterprise Linux 5 operating system and the Network File System to simultaneously query multiple indexed genomes.
We also compared EvoPrinterHD-generated EvoPrints to multi-genome alignments obtained from the UCSC comparative genome bioinformatics alignment program [20, 21]. The alignment resolution of EvoPrinterHD is equivalent to the multi-species UCSC alignments in detecting CSBs. The two alignment programs detect the same conserved sequences with 93% to 95% correspondence in five different enhancers compared (Figure 2C; and data not shown).
EvoPrinterHD repeat finder
Identification of rearranged and duplicated conserved sequences
Generating EvoPrints, and EvoDifferences profiles and EvoUnique Prints
Based on the data provided on the alignment scorecard, different combinations of ceBLAT alignments can be chosen to generate an EvoPrint. The EvoPrinter algorithm  creates an array of nucleotide strings from each of the selected alignments and then looks for conservation of sequence by looping through each of the strings one base at a time, outputting an uppercase base for only those input reference DNA nucleotides that are aligned in all of the different ceBLATs included in the analysis (Figure 5B). Those DNA bases within the input DNA that are not shared with all species are represented as lowercase nucleotides. The "All Alignments or None" options for each species allows for rapid changes in the repertoire of species alignments used to generate an EvoPrint. As a default setting, EvoPrinterHD selects ceBLATs to generate an EvoPrint; however, the user can select just the highest scoring alignment to generate an EvoPrint, and doing so eliminates potential false positives that are identified as repeat sequences. As discussed above, when evolutionarily distant species are included in the analysis, MCS containing genomic rearrangements in one or more of the selected genomes are identified in the second and third eBLAT alignments. To include the rearranged sequences in the analysis, ceBLATs are used to generate the EvoPrint. The use of the intra-species ceBLATs in the EvoPrint procedure, rather than selecting first, second or third alignments for generation of the EvoPrint, enhances the ability of EvoPrinterHD to identify and display, in a single uninterrupted sequence, conserved sequences within the input DNA even though the MCSs reside within genomic rearrangements in one or more of the orthologous DNAs included in the comparative analysis. Our experience indicates that highly repetitious sequences do not interfere with the use of ceBLATs, because the presence and position of repeats varies across the species used to generate the EvoPrint. For the 20 vertebrate or for the enteric bacteria, genomes can be added or removed from the initial analysis simply by returning to the selection page and adding or deselecting different genomes. Because EvoPrinterHD holds the previous alignments in memory, the time required to add additional genomes to the comparative analysis is significantly reduced.
An additional readout, the EvoDifferences profile, is also displayed along with the EvoPrint; it highlights the unique differences (conserved sequence losses) that each species contributes to the comparative analysis (Figures 2B and 5C). The EvoDifferences profile can also be considered a "relaxed EvoPrint" since bases identified by the different colors are present in all species except for the single species denoted by that color. The apparent absence of a conserved sequence or base change in a single species could have several explanations: (1) the difference represents a unique evolutionary change, (2) it may be the result of a sequencing error, and/or (3) the sequence is present but not identified by the ceBLAT due to three or more genomic rearrangements in the aligning region.
For bacteria, a third readout, the color-coded EvoUnique print, highlights those bases in the input reference DNA that are unique (that do not align with any of the other genomes included in the analysis) and those bases that align with only a single other or two other genomes included in the analysis (data not shown).
Parsing and curation of selected conserved sequences
To facilitate the comparative analysis of different conserved sequences from different enhancers, EvoPrinterHD allows for the curation of CSBs by enabling the user to automatically extract and collate CSBs in both forward and reverse-complimented orientations (data not shown). The "extract conserved sequence block" option (located at the top of each EvoPrint readout) provides for the automatic extraction, naming and consecutive numbering of 6 bp or longer CSBs from selected regions of an EvoPrint or EvoDifferences profile (see tutorial ). In addition to the annotated list of forward and reverse sequences the readout shows the selected EvoPrinted region from which the conserved sequences were extracted. A link is also provided to the cis-Decoder CSB comparative algorithms .
Identifying species-specific changes in less-conserved DNA
EvoPrinterHD affords a rapid, convenient way to detect and curate DNA sequence conservation between related and evolutionarily distant animals. When multiple genomes are included in the analysis, the uninterrupted EvoPrint readout provides a species-centric view of conserved sequences that are required for gene function. EvoPrinterHD advances the EvoPrint method by providing an automated higher-definition view of sequence conservation from which the conserved sequence blocks can be rapidly curated for subsequent analysis. EvoPrinterHD also identifies genomic regions within one or more of the selected species that harbor rearrangements of the conserved DNA, and identifies unique or uniquely shared DNA sequences within bacterial genomes.
Genome sequence files and their assembly dates
The following genome sequence files were curated from the Genome Bioinformatics Group of University of California, Santa Cruz : Human, March 2006 (hg18); Chimpanzee, March 2006 (panTro2); Rhesus, January 2006 (rheMac2); Rat, November 2004 (rn4); Mouse, February 2006 (mm8); Cat, March 2006 (felCat3); Dog, May 2005 (canFam2); Horse, January 2007 (equCab1); Cow, March 2005 (bosTau2); Opossum, January 2006 (monDom4); Chicken, May 2006 (galGal3); Xenopus tropicalis, August 2005 (xenTro2); Zebrafish, March 2006 (danRer4); Tetraodon, February 2004 (tetNig1); Fugu, October 2004 (fr2); Stickleback, February 2006 (gasAcu1); Medaka, April 2006 (oryLat1); D. melanogaster, April 2006 (dm3); D. simulans, April 2005 (droSim1); D. sechellia, October 2005 (droSec1); D. yakuba, November 2005 (droYak2); D. erecta, August 2005 (droEre1); D. ananassae, August 2005 (droAna2); D. pseudoobscura, November 2005 (dp3); D. persimilis, October 2005 (droPer1); D. virilis, August 2005 (droVir2); D. mojavensis, August 2005 (droMoj2); D. grimshawi, August 2005 (droGri1); C. elegans, January 2007 (ce4); C. brenneri, January 2007 (caePb1); C. briggsae, January 2007 (cb3); C. remanei, March 2006 (caeRem2); and P. pacificus, February 2007 (priPac1); The genome sequence files for the Elephant, June 2005; Hedgehog, June 2006 and Armadillo, June 2005 were downloaded from the Broad Institute .
The following bacteria genome sequence files were curated from the BacMap database of University of Alberta : Staphylococcus aureus COL; Staphylococcus aureus MRSA252; Staphylococcus aureus MSSA476, Staphylococcus aureus Mu50; Staphylococcus aureus MW2; Staphylococcus aureus N315; Staphylococcus aureus subsp. aureus NCTC 8325; Staphylococcus aureus RF122; Staphylococcus aureus subsp. aureus USA300; Staphylococcus epidermidis ATCC 12228; Staphylococcus epidermidis RP62; Staphylococcus haemolyticus JCSC1435; Escherichia coli 536; Escherichia coli APEC O1; Escherichia coli CFT073; Escherichia coli O157:H7 EDL933; Escherichia coli K12 MG1655; Escherichia coli W3110; Escherichia coli O157:H7 Sakai; Klebsiella pneumoniae MGH 78578; Salmonella enterica Choleraesuis SC-B67; Salmonella enterica Paratypi A ATCC 9150; Salmonella typhimurium LT2; Salmonella enterica CT18; Salmonella enterica Ty2; Shigella boydii Sb227; Shigella dysenteriae Sd197; Shigella flexneri 2a 2457T; and Shigella flexneri 301. The genome sequence files for Staphylococcus aureus subsp. aureus JH1, Staphylococcus aureus subsp. aureus JH9, Staphylococcus aureus Mu3, and Staphylococcus aureus subsp. aureus str. Newman were curated from the European Bioinformatics Institute of the European Molecular Biology Laboratory . The genome sequence file for Escherichia coli UT189 was taken from Enteropathogen Resource Integration Center , and genome sequence data for Salmonella bongori was downloaded from the Sanger Institute Sequencing Centre .
The mosquito genome sequence files for Aedes aegypti, Anopheles gambiae and Culex pipiens were curated from the VectorBase database .
We are grateful to Jim Kent, Kory Johnson and Howard Nash for helpful discussions and advice during the EvoPrinterHD development phase. We also thank Ken Weeks and Jack Bishop for their technical expertise and acknowledge the editorial expertise and assistance of Judith Brody. This research was supported by the Intramural Research Program of the NIH, NINDS.
- Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE: Human-mouse genome comparisons to locate regulatory sites. Nat Genet. 2000, 26: 225-228. 10.1038/79965.PubMedView ArticleGoogle Scholar
- Yuh CH, Brown CT, Livi CB, Rowen L, Clarke PJ, Davidson EH: Patchy interspecific sequence similarities efficiently identify positive cis-regulatory elements in the sea urchin. Dev Biol. 2002, 246: 148-161. 10.1006/dbio.2002.0618.PubMedView ArticleGoogle Scholar
- Berezikov E, Guryev V, Plasterk RH, Cuppen E: CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res. 2004, 14: 170-178. 10.1101/gr.1642804.PubMedPubMed CentralView ArticleGoogle Scholar
- Brody T, Rasband W, Baler K, Kuzin A, Kundu M, Odenwald WF: cis -Decoder discovers constellations of conserved DNA sequences shared among tissue-specific enhancers. Genome Biol. 2007, 5: R75-10.1186/gb-2007-8-5-r75.View ArticleGoogle Scholar
- Odenwald WF, Rasband W, Kuzin A, Brody T: EVOPRINTER: a multi-genomic comparative tool for rapid identification of functionally important DNA. Proc Natl Acad Sci. 2005, 102: 14700-14705. 10.1073/pnas.0506915102.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT-the BLAST-like alignment tool. Genome Res. 2002, 12: 656-64. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanchette M: Computation and analysis of genomic multi-sequence alignments. Annu Rev Genomics Hum Genet. 2007, 8: 193-213. 10.1146/annurev.genom.8.080706.092300.PubMedView ArticleGoogle Scholar
- Markstein M, Zinzen R, Markstein P, Yee KP, Erives A, Stathopoulos A, Levine MA: A regulatory code for neurogenic gene expression in the Drosophila embryo. Development. 2004, 131: 2387-94. 10.1242/dev.01124.PubMedView ArticleGoogle Scholar
- EvoPrinter. [http://evoprinter.ninds.nih.gov/]
- Li X, Gutjahr T, Noll M: Separable regulatory elements mediate the establishment and maintenance of cell states by the Drosophila segment-polarity gene gooseberry. EMBO J. 1993, 12: 1427-1436.PubMedPubMed CentralGoogle Scholar
- Ip YT, Levine M, Bier E: Neurogenic expression of snail is controlled by separable CNS and PNS promoter elements. Development. 1994, 120: 199-207.PubMedGoogle Scholar
- Margolis JS, Borowsky ML, Steingrimsson E, Shim CW, Lengyel JA, Posakony JW: Posterior stripe expression of hunchback is driven from two promoters by a common enhancer element. Development. 1995, 121: 3067-3077.PubMedGoogle Scholar
- Wharton KA, Crews ST: CNS midline enhancers of the Drosophila slit and Toll genes. Mech Dev. 1993, 40: 141-154. 10.1016/0925-4773(93)90072-6.PubMedView ArticleGoogle Scholar
- Lehman DA, Patterson B, Johnston LA, Balzer T, Britton JS, Saint R, Edgar BA: Cis -regulatory elements of the mitotic regulator, string/Cdc25. Development. 1999, 126: 1793-1803.PubMedGoogle Scholar
- Sun Y, Jan LY, Jan YN: Transcriptional regulation of atonal during development of the Drosophila peripheral nervous system. Development. 1998, 125: 3731-3740.PubMedGoogle Scholar
- Gindhart JG, King AN, Kaufman TC: Characterization of the cis-regulatory region of the Drosophila homeotic gene Sex combs reduced. Genetics. 1995, 139: 781-95.PubMedGoogle Scholar
- Reddy KL, Wohlwill A, Dzitoeva S, Lin MH, Holbrook S, Storti RV: The Drosophila PAR domain protein 1 (Pdp1) gene encodes multiple differentially expressed mRNAs and proteins through the use of multiple enhancers and promoters. Dev Biol. 2000, 224: 401-14. 10.1006/dbio.2000.9797.PubMedView ArticleGoogle Scholar
- Gallo SM, Li L, Hu Z, Halfon MS: REDFly: a regulatory element database for Drosophila. Bioinformatics. 2006, 22: 381-383. 10.1093/bioinformatics/bti794.PubMedView ArticleGoogle Scholar
- Hoch M, Seifert E, Jäckle H: Gene expression mediated by cis-acting sequences of the Kruppel gene in response to the Drosophila morphogens bicoid and hunchback. EMBO J. 1991, 10: 2267-78.PubMedPubMed CentralGoogle Scholar
- Genome Bioinformatics Group of UC Santa Cruz. [http://hgdownload.cse.ucsc.edu/downloads.html]
- Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W: Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner. Genome Res. 2004, 14: 708-15. 10.1101/gr.1933104.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu Z, Yang X, Tan F, Cullion K, Thiele CJ: Molecular cloning and characterization of human Castor, a novel human gene up-regulated during cell differentiation. Biochem Biophys Res Commun. 2006, 344: 834-844. 10.1016/j.bbrc.2006.03.207.PubMedView ArticleGoogle Scholar
- Mellerick DM, Kassis JA, Zhang SD, Odenwald WF: castor encodes a novel zinc finger protein required for the development of a subset of CNS neurons in Drosophila. Neuron. 1992, 9: 789-803. 10.1016/0896-6273(92)90234-5.PubMedView ArticleGoogle Scholar
- Kambadur R, Koizumi K, Stivers C, Nagle J, Poole SJ, Odenwald WF: Regulation of POU genes by castor and hunchback establishes layered compartments in the Drosophila CNS. Genes Dev. 1998, 12: 246-60. 10.1101/gad.12.2.246.PubMedPubMed CentralView ArticleGoogle Scholar
- Broad Institute. [http://www.broad.mit.edu/mammals/]
- BacMap database of University of Alberta. [http://wishart.biology.ualberta.ca/BacMap/]
- European Bioinformatics Institute of the European Molecular Biology Laboratory. [http://www.ebi.ac.uk/genomes/bacteria.html]
- Enteropathogen Resource Integration Center. [http://www.ericbrc.org/portal/eric/ecoliut189]
- Sequencing Centre Sanger Institute. [http://xbase.bham.ac.uk/genome.pl?id=1843]
- Lawson D, Arensburger P, Atkinson P, Besansky NJ, Bruggner RV, Butler R, Campbell KS, Christophides GK, Christley S, Dialynas E, Emmert D, Hammond M, Hill CA, Kennedy RC, Lobo NF, MacCallum MR, Madey G, Megy K, Redmond S, Russo S, Severson DW, Stinson EO, Topalis P, Zdobnov EM, Birney E, Gelbart WM, Kafatos FC, Louis C, Collins FH: VectorBase: a home for invertebrate vectors of human pathogens. Nucleic Acids Res. 2007, 35: D503-505. 10.1093/nar/gkl960.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.