Comparative analysis of cancer genes in the human and chimpanzee genomes

Background Cancer is a major medical problem in modern societies. However, the incidence of this disease in non-human primates is very low. To study whether genetic differences between human and chimpanzee could contribute to their distinct cancer susceptibility, we have examined in the chimpanzee genome the orthologous genes of a set of 333 human cancer genes. Results This analysis has revealed that all examined human cancer genes are present in chimpanzee, contain intact open reading frames and show a high degree of conservation between both species. However, detailed analysis of this set of genes has shown some differences in genes of special relevance for human cancer. Thus, the chimpanzee gene encoding p53 contains a Pro residue at codon 72, while this codon is polymorphic in humans and can code for Arg or Pro, generating isoforms with different ability to induce apoptosis or interact with p73. Moreover, sequencing of the BRCA1 gene has shown an 8 Kb deletion in the chimpanzee sequence that prematurely truncates the co-regulated NBR2 gene. Conclusion These data suggest that small differences in cancer genes, as those found in tumor suppressor genes, might influence the differences in cancer susceptibility between human and chimpanzee. Nevertheless, further analysis will be required to determine the exact contribution of the genetic changes identified in this study to the different cancer incidence in non-human primates.


Background
Cancer is a major and growing clinical problem in modern societies. Although usually referred to as a single disease, cancer represents more than 200 different pathologies, which are characterized by an uncontrolled cell growth that may derive in the invasion of surrounding tissues and the subsequent generation of metastasis in distant organs of the body [1,2]. Tumor development is a complex process in which genetic, epigenetic and environmental factors are implicated [3][4][5][6]. The importance of genetic factors in cancer is now well established, as mutations in specific genes have been associated with the neoplastic transformation and development of specific cancer types [3,7,8]. This fact is further supported by the existence of hereditary cancer syndromes, caused by germ-line mutations in specific genes and responsible for about 5% of all diagnosed malignancies [9][10][11]. Over the last two decades, a number of studies have focused on the identification of the different genes that can contribute to cancer. These studies have led to the conclusion that alterations in three types of genes -oncogenes, tumorsuppressor genes, and stability genes -are mainly associated with the genesis of cancer [3]. These studies have alsocontributed to elucidate the molecular mechanisms through which these genes act during tumor development and progression [12]. Finally, in some cases this knowledge has resulted in the introduction of new therapeutic strategies for cancer treatment [13][14][15][16].
Chimpanzees (Pan troglodytes) represent our most closely related organism. They have a more similar physiology to human than any other model organism, and the study of several human diseases in chimpanzee has led to a better understanding of pathologies such as hepatitis or AIDS [17,18]. Interestingly, a number of works have reported that cancer incidence in non-human primates is very low. This fact is especially evident for epithelial neoplasms such as breast, prostate or lung carcinomas, which are responsible for more than 20% of human deaths but whose incidence in great apes is less than 2% [18][19][20][21][22]. The difference in cancer incidence between human and chimpanzees can mainly derive from three facts: i) exposure to different environmental factors, including diet and habits, ii) differences in life expectancy and iii) genetic differences that might result in humans being more susceptible to cancer development.
The completion of the first draft of the chimpanzee genome sequence has opened the possibility to study whether genetic differences between human and chimpanzee could contribute to the observed differences in cancer susceptibility between both species [23]. To address this question, we have used the chimpanzee genome sequence to identify and compare the orthologous genes of a set of 333 human genes that have been previously implicated in cancer development [3,7]. This analysis has revealed that all analyzed human cancer genes are present in the chimpanzee and exhibit a high degree of conservation between both species.
However, further detailed analysis of a series of genes of special interest in human cancer such as p53 and BRCA1, has revealed some differences with the corresponding chimpanzee counterparts. In this work, we present the results of this comparative genomic analysis and discuss the putative relevance of the observed genetic changes for explaining some aspects of the differential cancer susceptibility between human and chimpanzee.

Analysis of cancer genes in the chimpanzee genome
As an initial attempt to study cancer-associated genes in the chimpanzee, we have compared 333 human cancer genes to the chimpanzee genome and identified and analyzed the orthologous genes (see Additional data file 1). The set of human cancer genes was selected from the literature based on mutational analysis and/or roles in processes such as chromosomal stability, promoter methylation or control of mitotic checkpoints [3,4,7,[24][25][26][27][28]. As can be seen in Figure 1, more than 50% of the genes belong to three main functional categories: transcription factors, phosphorylation (including kinases, kinase inhibitors and phosphatases), and DNA repair. Other functional groups include structural proteins, GTPases and GTPase regulators or proteins involved in ubiquitylation. Finally, other minor groups include tumor suppressors, the protein and RNA components of the telomerase, or proteins implicated in apoptosis among others.
To perform this study, the cDNA sequence from all human cancer genes was first compared with the chimpanzee genome in order to determine the presence or absence of the corresponding orthologs, and then to predict their complete cDNA sequences. This analysis revealed that all analyzed human cancer genes are present in the chimpanzee genome. The high sequence coverage of this set of cancer genes (96% coverage at the nucleotide level, 1,161,044 nucleotides), has allowed us to perform a detailed comparison -at the nucleotide and amino acid level -of cancer genes between both species (see Additional data file 1 for more details). Direct comparison of human and chimpanzee cancer genes indicates that they are highly conserved, showing 99.38% identities at the protein level, and 99.19% at the nucleotide level, what is similar to the average amino acid identities between both organisms (99.38%) [23]. Interestingly, we have identified 71 cancer genes (representing 21% of the total) encoding proteins which are 100% identical to their human counterparts (Additional data file 2). The conservation of cancer genes between human and chimpanzee suggests that they perform essential cellular functions and are in agreement with previous studies [29]. This proposal is consistent with the fact that most analyzed cancer genes encode intracellular proteins highly conserved from yeast to human, and implicated in fundamental processes such as cell signalling, cell cycle control and maintenance of genomic stability [3].

Analysis of insertions and deletions in protein coding regions from thechimpanzee genome
Despite the high conservation between human and chimpanzee cancer genes described above, in the course of the present study we identified 20 chimpanzee genes (6% of all analyzed genes) that encode proteins containing or lacking specific residues due to the insertion or deletion of codons in the corresponding open reading frames ( Table  1). The analysis of these genes can be important in order to identify human-specific changes in protein coding genes. A total of 9 chimpanzee genes encode proteins containing extra amino acids in their sequence, 6 of them located within trinucleotide repeats resulting in longer expansions in chimpanzee, with up to 4 extra glutamine residues in the case of MN1. On the other hand, 12 chimpanzee genes encode proteins with less amino acids than their human counterparts. Similarly to the expanded proteins in chimpanzee, 8 of the 12 expansions in human proteins were located in trinucleotide repeats. The higher proportion of longer alleles in humans than in chimpanzee is consistent with known results [30]. Interestingly, analysis of these expanded regions using information retrieved from EST databases, resulted in the finding that six human genes, ASPSCR1, DEK, FANCC, MLLT3, MYST4 and ZNF384, are polymorphic for these loci. As functional trinucleotide tandem repeats that vary in humans have been shown to be also variable in chimpanzee [30], it is likely that all these genes present variability in both species. These results suggest that these regions might be prone to genetic instability, and open the possibility to investigate if all these genes showing human-chimpanzee Table 1

Gene Codons Residues inserted Human status
Codons inserted in chimpanzee

Codons inserted in human
variability in trinucleotide repeats are polymorphic in humans and influence the risk of tumor development.
During the course of this analysis, we also observed that 52 genes (representing 15.6% of all analyzed cancer genes) contained putative frameshifts and/or premature stop codons in the coding sequences (Additional data file 3). This high number of conflictive regions is not likely to reflect bona fide differences between these species, but the result of sequencing problems or artefacts derived from the assembly process. To evaluate these possibilities, we took advantage of the availability of two independent assemblies for the chimpanzee genome, called ARACHNE and PCAP [23]. The ARACHNE assembly, containing more coverage and fewer order conflicts, is used by most public genome browsers and was employed in the initial steps of this study. The PCAP assembly is a de novo assembly that is non-redundant with ARACHNE and provides an additional resource for this kind of sequencing conflicts. To distinguish artefacts of the assembly process from real differences or sequencing errors, we compared the conflictive regions identified in cancer genes to the PCAP assembly. This strategy resulted in the resolution of 83% of the conflicts, as they were correct in the PCAP assembly, thereby suggesting that they represented errors generated during the assembly process. Nevertheless, 9 cancer genes (BCL3, ELL, GAS7, MLL, MLLT3, NF1, RECQL4, SMARCB1 and TPR) still contained the same frameshift in both assemblies. Because these genes might represent real differences between human and chimpanzee cancer genes, we PCR-amplified all these conflictive regions, and the resulting DNA fragments were subjected to direct sequencing. This strategy allowed us to confirm that these genes do not contain frameshifts or premature stop codons, and encode functional proteins in all cases. These findings call the attention to the degree of accuracy of the chimpanzee genome despite the existence of two assemblies. Ongoing efforts to increase the shotgun coverage from ~4 fold to ~8 fold redundancy will produce a higher quality sequence necessary for a reliable ascertainment of specific interesting discrepancies.

Analysis of missense mutations in cancer genes in the chimpanzee genome
Mutations in human cancer genes can be originated by different causes, including chromosomal translocations resulting in fusion proteins, disruption of the coding sequence by frameshifts or premature stop codons, and point mutations that modify the structure and function of the corresponding protein. As mentioned above, we did not find evidence in the chimpanzee genome of gene fusions, frameshifts or premature stop codons in the set of analyzed cancer genes. However, it is interesting to study the status in the chimpanzee genome of residues that are variable in human genes and are associated with cancer.
For this purpose, we extracted the information of variant alleles in human genes from the Human Gene Mutation Database (HGMD) and the Online Mendelian Inheritance in Man (OMIM) databases [31,32], and determined the corresponding residue in the chimpanzee protein. We found two genes associated with breast cancer, BRCA2 and ERBB2, in which the chimpanzee sequence differed from the human in residues that have been reported to be polymorphic in humans. Both alleles, BRCA2 (N372H) and ERBB2 (V655I), have been associated with different risk of developing breast cancer [33,34]. Interestingly, in both cases the chimpanzee sequence (N372 and I655) corresponds to the allele that has been associated with a reduced risk of cancer, and both are being found at high frequencies in humans, around 0.8 for N372 [34] and from 0.7 to 0.9 for I665.
We also identified at least two cases in genes associated with colon carcinomas, MLH1 (A441T) and MSH2 (S323C), and one with prostate carcinomas, PON1 (I102V), in which the susceptibility allele in human appears to be the wild-type allele in chimpanzee. However, due to the limited clinical and functional data on these variants, the biological significance of these findings is still unknown, and further studies will be required to determine the pathogenic activity of these alleles.

Analysis of the tumor suppressor gene p53 in the chimpanzee genome
A detailed analysis of the gene encoding the tumor suppressor p53, which is the most frequently mutated gene in human cancer [35], showed a single amino acid difference between human (Arg at position 72) and chimpanzee (Pro72). Interestingly, this p53 codon is polymorphic in human and the allele encoding Pro72 is frequent in some populations [36]. Analysis of chimpanzee DNA from four different geographic regions revealed a Pro/Pro genotype in all cases, what confirmed our findings using the chimpanzee genomic sequence. However, we cannot rule out the possibility that this codon might be polymorphic in chimpanzee and that the Arg72 allele could be present in some individuals, what would require the analysis of a greater number of samples. In this sense, sequencing of 14 chimpanzee samples revealed a Pro/Pro genotype, suggesting that this allele would be the most common one in chimpanzee, with a Poisson distribution maximum estimate of allele frequency of 0.1 at a 95% confidence interval.
Additionally, we have sequenced the same p53 region in different primates, including gorilla, orangutan or mandrill, and determined that codon 72 codes for Pro in all species (Fig. 2). The fact that apes and some Old World monkeys contain Pro at this position, suggests that it is the ancestral allele and the Arg72 allele is unique to the human lineage. Nevertheless, the ancestral Pro/Pro genotype is still present in humans, representing about 8.3% of the Central-European population and 45% of the African yoruba population (see dbSNP rs1042522) [37]. This finding raises interesting questions about the different susceptibility to cancer of both organisms, as several studies have shown that both isoforms are functionally different in their ability to induce apoptosis or interact with p73 [38][39][40].

Analysis of the BRCA1 locus in the chimpanzee genome
The tumor suppressor BRCA1 has an unusual evolutionary history, in that ratios of replacement to silent nucleotide substitutions (K A /K S ) are greater than one when comparing human and chimpanzee lineages, but not in lineages of other primates or other mammals [41][42][43]. This observation is consistent with positive selection pressure on BRCA1 during recent hominoid evolution. We evaluated the complete human and chimpanzee genomic sequences of BRCA1 in light of this observation. In human and chimpanzee, the BRCA1 locus spans ~130 kb, including the BRCA1 gene itself and a partially duplicated 5' (telomeric) region. The duplication includes the highly conserved bidirectional promoter regulating BRCA1 and the adjacent gene NBR2, a BRCA1 partial pseudogene, and NBR1/M17S2. The number of nucleotide differences between human and chimpanzee sequences at the BRCA1 locus is not remarkable, 1.15%, and does not vary appreciably between the BRCA1 locus and the duplicated region. However, humans and chimpanzee sequences differ by an 8 Kb insertion/deletion within the partially duplicated region. The deletion on the chimpanzee sequence prematurely truncates the NBR2 gene. The consequences to BRCA1 expression in chimpanzee of the truncation of the adjacent, co-regulated gene are not known. In mouse, the region is not duplicated and the bidirectional promoter co-regulates Brca1 and Nbr1 [44].

Discussion
The availability of the chimpanzee genome sequence provides an excellent opportunity to explore the genetic basis for some of the biological differences between human and our most closely related organism [23]. An striking finding in this regard is the observation that non-human primates show a lower incidence of cancer than humans, specifically for epithelial carcinomas [18,20]. To investigate if this different susceptibility to cancer was due to genetic differences between humans and chimpanzees, we performed a comparative analysis in the chimpanzee genome of a set of 333 human genes that have been directly implicated in tumor development [3,7]. This analysis revealed that all human cancer genes contain a clear ortholog in the chimpanzee genome, and show a percentage of identities at the protein level similar to the genome average (99.38%). Nevertheless, the strict conservation of this group of more than 330 genes contrasts with the recent analysis of other groups of genes, including a set of 560 proteases, which shows that at least 7 genes are absent in either human or chimpanzee genomes [45], or the CD33-related Siglecs, which also show specific loss of some genes in the human lineage [46]. These data suggest that despite cancer genes show a percentage of identities similar to the genome average, they are perfectly conserved between human and chimpanzee, confirming previous studies showing a higher conservation of genes implicated in essential cellular functions [29].
This study has shown that despite the high degree of identity in cancer genes, there are 1542 amino acid changes whose contribution to the different cancer susceptibility between human and chimpanzee should be further investigated. Nevertheless, additional factors contributing to the observed differences could include changes in diet, lifestyle or exposure to mutagenic agents [47][48][49]; physiological differences in immune system or in life expectancy and aging rates [50]; variations in gene expression, alternative splicing or in DNA methylation patterns [51,52]; or alterations in other genes not analyzed in this study. In this regard, we must emphasize that we have compiled a group of 333 genes that are causally implicated in cancer development as a result of mutational analysis or owing to their participation in processes such as control of mitotic checkpoints, chromosomal stability or promoter methylation. This set of genes includes those annotated in a recent census of human cancer genes [7], but also incorporates novel genes described to be mutated in human carcinomas [3,4,[25][26][27][28]. However, the possibility that other yet unknown cancer-related genes are responsible for the different susceptibility to this disease between human and chimpanzee, cannot be ruled out.
Amino acid alignment of the p53 proline-rich domain from different primates In relation to the existence of functional differences in the immune system of human and non-human primates, it is now clear that the immune and inflammatory responses play important roles in cancer development and progression [53][54][55][56]. Accordingly, structural and functional variations in genes associated with these processes might be responsible for the species-specific susceptibility to certain tumors. Life expectancy is another factor that might contribute to the observed differences in cancer incidence between human and chimpanzee. Although age is known to represent a major risk factor for cancer development, the observation that non-human primates have a lower incidence of epithelial carcinomas but not other cancers [18], suggests that life expectancy might contribute only partially to the observed differences in cancer susceptibility. On the other hand, and despite the strong conservation in the coding regions of cancer genes from human and chimpanzee, it is possible that differences in regulatory elements outside these coding regions can result in changes in the expression levels or in the tissue distribution patterns [57]. In fact, recent reports have shown considerable differences in the expression levels of orthologous genes from different primate species [58][59][60], supporting the idea that regulatory changes might account for some interspecies differences, including cancer susceptibility [61].
Despite the high conservation of cancer genes between both species, we identified 20 genes containing several codon insertions or deletions in their protein coding regions, although the functional significance of these differences, including their putative association with cancer, will require further studies. It is interesting to note that in 70% of these cases, the insertion/deletion event occurred within trinucleotide repeats, suggesting that these regions could be prone to genetic instability, in both human and chimpanzee genomes. Analysis of EST databases allowed us to confirm this hypothesis, as six human genes, ASPSCR1, DEK, FANCC, MLLT3, MYST4 and ZNF384, were polymorphic in the number of repeats in these regions. In the case of genes implicated in cancer, it will also be interesting to study if some of the identified haplotypes could confer a higher risk of tumor development.
The fact that most cancer genes show a high degree of conservation between human and chimpanzee, prompted us to analyze in more detail the observed changes in genes previously reported to be of special relevance in human cancer, such as the tumor suppressors p53 and BRCA1. We found that chimpanzee p53 shows a single amino acid difference with human p53, resulting in a protein containing Pro instead of Arg at position 72. Interestingly, this codon is polymorphic in human, and the Pro72 residue is common in all studied human populations [36,62].
Sequencing of this region in other non-human primates allowed us to confirm that this Pro residue has been conserved during evolution, and suggests that the Arg72 allele, present in 55-92% of the human population, arose in the human lineage. The presence of Pro72 in chimpanzee might have physiological consequences, as several reports have found interesting functional differences between both p53 isoforms. In fact, p53-Arg72 protein has an increased ability to induce apoptosis, to translocate from the nucleus to the mitochondria, to be degraded by the E6 oncoprotein of human papillomavirus or to inactivate p73 when mutated [38][39][40]. Different studies have provided evidence for an increasedrisk of cancer development associated with the Arg72 genotype, although this topic has been controversial and further studies will be necessary to definitely address this question [39,40,63]. Therefore, and although the presence of Pro72 in chimpanzee p53 might contribute to the reduced cancer susceptibility in non-human primates, further work will be required to confirm this hypothesis.
Analysis of the tumor suppressor BRCA1 has shown that it has been subjected to positive selection during recent hominoid evolution, as the overall K A /K S ratios are greater than one when comparing human and chimpanzee lineages, but not on lineages of other primates or mammals, as has been recently shown [43]. Additionally, the chimpanzee BRCA1 locus contains an 8 Kb deletion in the partially duplicated 5' region. This deletion prematurely truncates the NBR2 gene, which is regulated by a bi-directional promoter that co-regulates NBR2 and BRCA1. The consequences to BRCA1 expression in chimpanzee of the truncation of the adjacent, co-regulated gene are not known. The distinctive evolution of human and chimpanzee BRCA1 suggest that an evolutionary approach may be important to understanding selection at this, and perhaps other cancer associated genes [43,64,65].

Conclusion
In summary, in this work we have performed an analysis of a defined set of 333 genes to try to elucidate the molecular basis of human-chimpanzee differences in one aspect of significant biomedical relevance: the interspecies variability in cancer susceptibility. The overall picture emerging from this comparative analysis is one that reflects the high degree of conservation in this group of cancer genes, although specific differences in relevant genes such as p53 and BRCA1 can illuminate new functional and evolutionary aspects of these tumor-suppressor genes. Altogether, the limited genetic variability found in this human-chimpanzee comparative analysis might contribute to the different cancer susceptibility between these closely related species. However, further investigations will be necessary to determine the influence of these genetic changes in cancer, or whether additional factors, such as changes in gene regulation, immune system genes, life expectancy or envi-ronmental influences, might also contribute to this process.

Methods
Bioinformatic screening of the chimpanzee genome A database of human cancer genes was constructed by using information from the literature, and the corresponding cDNA and protein sequences were retrieved from the GenBank database. Each single human cDNA sequence was compared against the ARACHNE and PCAP chimpanzee genome assemblies by using a combination of BLAT and BLAST algorithms [66,67]. The corresponding chimpanzee cDNA and protein sequences were extracted and compared to the human ortholog by using the EMBOSS sequence analysis package. The full list of cancer genes, accession numbers and comparison to the human orthologs are available at in the Additional data file 1. In all cases, the primary chimpanzee cDNA sequence was obtained from the ARACHNE assembly, as this sequence is used by the Chimpanzee Genome Sequencing Consortium and by most public genome browsers (NBCI, Ensembl, or UCSC). However, in those cases where the chimpanzee gene was incomplete, the PCAP assembly was used to fill gaps in the genome, and the corresponding sequence was incorporated to the chimpanzee cDNA sequence for our analysis.
Conflictive regions were defined as those presenting frameshifts or premature stop codons in the chimpanzee coding sequence. Those regions were carefully examined in both ARACHNE and PCAP assemblies, as well as the corresponding human region in the human genome sequence and EST databases. In four cases (LMO2, MLLT7, MN1, and MYST4) the presence of a putative frameshift or premature stop codon in the chimpanzee sequence resulted from incorrect human sequence entries that contained frameshifts in the deposited entry. Modification of the human cDNA with the aid of EST and genomic sequences resulted in curated human cDNA and protein sequences, and the absence of frameshift or premature stop codon in the corresponding chimpanzee prediction. All other conflictive regions in the chimpanzee genome were solved by PCR amplification and direct sequencing of the corresponding gene using chimpanzee genomic DNA.

PCR amplification and direct sequencing of chimpanzee genes
Chimpanzee, gorilla, orangutan and mandrill genomic DNA from different geographic regions was obtained using standard phenol-chloroform procedures. To analyze conflictive regions in the chimpanzee genome, we designed specific oligonucleotides flanking the exon of interest (Additional data file 3), and the corresponding region was PCR amplified from chimpanzee total genomic DNA in a Perkin Elmer 9700 thermocycler using High Fidelity Taq DNA polymerase or GC-RICH PCR system for regions rich in GC content (Roche Diagnostics). The amplified product was purified and subjected to automatic sequencing using the 5' oligonucleotide as primer using an ABI Prism 310 DNA sequencer (Applied Biosystems).