When we conducted a comprehensive comparison of promoter sequences for human and mouse orthologous genes, we noted that the promoter pairs of orthologous genes contained non-orthologous promoters. The source of these non-orthologous promoters could be the potential false pairings in the orthologous table. Another possible reason is the presence of alternative promoters [38, 39], which can result in the failure to select the corresponding TSSs between human and mouse. The other possible cause is the existence of species-specific promoters; for example, our group recently reported that there are human promoter sequences whose counterparts are completely missing in the mouse genomic sequences . Nevertheless, despite these problems that may cause mis-pairing of non-orthologous promoters, as much as 82% of the promoter pairs were shown to be evolutionally related in the data set. Although the dynamic aspects of TSSs, such as TSS diversification ad TSS turn over, have been highlighted recently [38, 39, 41, 42], our results show that the representative TSS for each gene has been generally sustained during the evolution of the human and mouse lineages.
We focused on gene pairs with promoters that appeared to be truly evolutionally related, and examined the relationship between promoter conservation and gene function. We found that the terms with high promoter conservation are related to signaling events inside as well as outside of the cell. Considering that the promoter conservation levels reflect the regulatory information contained in the sequence, the results suggest that these genes require more regulatory information embedded in the promoter. It is reasonable to suppose that more regulatory information enables more sophisticated changes of expression levels, thereby allowing these proteins to work effectively as signaling molecules. On the other hand, genes involved in metabolism, which showed low promoter conservation, may require relatively less regulatory information in their promoter sequences. Consistently, a recent study revealed that housekeeping genes tend to show reduced upstream sequence conservation . Specifically, in relation to ribosomal proteins, Perry et al.  pointed out that most of their promoters contain transposable elements, resulting in a low conservation. The reduced regulatory information in the promoters of ribosomal proteins might be compensated by the translational regulation mechanism directed by the 5' terminal oligopyrimidine sequence in their mRNAs .
Related discussions on regulatory sequence conservation have been made for specific categories of genes. Iwama and Gojobori  found that transcription factor genes, particularly those related to developmental processes, show high upstream sequence conservation, suggesting that these genes form highly connected regulatory networks. Lee et al.  reported that genes involved in adaptive processes tend to have highly conserved upstream regions in mammalian genomes, and also suggested the complex combinatorial circuitry of their transcriptional regulation. There have been other approaches based on whole genome comparisons, where highly conserved non-coding regions were found to be associated with developmental genes [34, 46, 47]. However, as Lee et al. suggested , most of these regions are far from genes and have little overlap with promoter regions. Thus, it seems that these regions are conserved independently from the promoter regions.
The conserved elements in the promoter may be either very short, or spread over a much longer region than the 1,200 bases. In both cases, our measures will report poor conservation when there is just a right amount of it. The local alignment score we used to measure promoter conservation can be roughly considered as a combination of identity and alignment length. Identity reflects the rates of substitutions and indels, and length reflects larger insertions, such as transposon insertions. When we examined the promoter conservation tendency for each GO term, by using alignment length or percentage identity as a measure of conservation, the tendencies were consistent with each other (Additional file 11). Thus, the evolutionary pressures of each functional category on alignment length and identity work in the same direction.
When we investigated the relationship between protein conservation and promoter conservation in mammals, we observed a very weak correlation between them. This suggests that substantial portions of the evolutionary changes of promoter and protein sequences are under different types of selective pressures. This observation is consistent with the nematode  and yeast  cases, and thus the very weak correlation between protein and promoter conservation might be universal from unicellular organisms to higher vertebrates.
In order to understand the relationship of protein and promoter sequence conservation in terms of gene functions, we examined it by a decomposition based on GO categories. When we dissected not only promoter conservation but also protein conservation, different trends were observed for proteins and promoters. As for proteins, high conservations were observed for terms related to a wide range of gene expression processes that occur in the cytosol and the nucleoplasm, while low conservations were observed for terms related to extracellular regions, cell surface and membrane-bounded organelles (such as mitochondrion, peroxisome and lysosome). Although the results for the membrane-bounded organelles seem surprising, considering that they often carry out basic, conserved metabolic process, they can also be considered as being topologically "outside" of the cell, given that they are on the opposite side of the membrane from the cytosol. The problem of the determinant of the protein evolutionary rate [49, 50] needs to be solved to fully clarify the phenomenon. Nevertheless, our observation provides the trends of the protein sequence evolution in terms of functional categories. Comparing these trends with those of promoters, we found that these two kinds of trends are nonparallel: protein conservation depends on whether they are on the cytosolic side or not, while promoter conservation seems to depend on whether the gene is related to signaling or metabolism. Specifically, cytoplasmic ribosomal proteins, which exist in the cytosol and are engaged in metabolism, have high protein conservation in spite of low promoter conservation. On the other hand, cell-cell signaling genes, which act outside or at the surface of the cell to convey signals, show low protein conservation in spite of high promoter conservation. These terms may provide evidence that decoupled properties exist between protein and promoter sequence evolution.