Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms
© Qu et al; licensee BioMed Central Ltd. 2006
Received: 09 March 2006
Accepted: 17 August 2006
Published: 17 August 2006
Complementary single-nucleotide polymorphisms (SNPs) may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR), demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs) in the fourfold degenerate (FFD) sites as compared to intronic SNPs (iSNPs).
The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs.
The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.
Single-nucleotide polymorphisms (SNPs) involve two complementary base substitutions, one on each DNA strand. Where the two DNA strands are functionally distinct (such as in transcribed sequences), the two complementary substitutions may not occur with equal frequency on each strand, due to transcription-related mutation/repair mechanisms or selective pressure from functional effects on mRNA. A↔G vs. C↔T asymmetry in the two DNA strands is well known to exist in prokaryotes. In the human, there is an excess of C↔T over G↔A in mutations causing Mendelian disorders while excess of A→G substitutions in the sense strand of transcribed intronic sequences was found when comparing a ~1.5 Mb region of human chromosome 7 to its chimpanzee orthologue. Both reports attributed the bias to transcription-coupled repair (TCR), and further support for transcription-coupled effect has been provided by the correlation between strand bias in nucleotide composition of transcribed sequences with transcription levels. However, the conflicting results observed within coding and intronic sequences have not been explored further. It is highly unlikely that TCR distinguishes between exons and introns. Furthermore, our current knowledge of TCR[6, 7] suggest that its action would affect the proportion of A→G vs. T→C mutations, but should not affect other mutations. An alternative explanation for the observed discrepancy between exons and introns is that synonymous exonic substitutions in mammals may be under non-trivial selective pressures, as has been suggested by some recent studies[8, 9]. An important effect of synonymous coding mutations is the association with gene splicing[10, 11]. In humans, evidence of selection on synonymous variations may have a profound effect on how we view the role of synonymous variations in genetic disease and phenotypic variability. Further research is needed besides these studies: the analysis of disease-causing mutations required assumptions about likelihood of coming to clinical attention based on chemical differences between substituted amino acids, while the work on intronic sequences was confined to a single ~1.5 Mb region and the genome-wide applicability of the results remains to be proven. Neither study explored differences between introns and exons to distinguish mutation/repair effects from alterations in RNA function. To our knowledge, strand asymmetry in human SNPs has not been fully examined for possible clues about the mutational mechanisms that created them and/or their potential functional significance. We therefore undertook a systematic examination of human coding SNPs in the fourfold degenerate (FFD) codon site and a random sample of intronic SNPs (iSNPs) for strand asymmetry between A↔G and C↔T polymorphisms.
The asymmetry pattern of A↔G and C↔T iSNPs and FFD SNPs
Non-CpG vs. CpG*
χ2 = 3.2, v = 1, p = 0.074
χ2 = 2.1, v = 1, p = 0.151
Non-CpG vs. CpG*
χ2 = 10.9, v = 1, p = 0.001
χ2 = 50.1, v = 1, p < 0.001
The nucleotide composition of human genome introns and FFD codons
The proportions of A→G and C→T FFD substitutions corrected by codon compositions
Asymmetry ratio (95%CI)
χ2 = 13.6, v = 1, p < 0.001
A→G vs. T→C 1.56 (1.23, 1.97)
χ2 = 6.2, v = 1, p = 0.013
C→T vs. G→A 1.25 (1.05, 1.49)
χ2 = 26.0, v = 1, p < 0.001
A→G vs. T→C 1.63 (1.35, 1.97)
χ2 = 21.3, v = 1, p < 0.001
C→T vs. G→A 0.75 (0.67, 0.85)
Decreased FFD SNPs at A|T dinucleotides
χ2 = 7.9,
v = 1, p = 0.005
χ2 = 19.1,
v = 1, p < 0.001
Discussions and conclusion
It is of great interest that the C→T excess over the complementary G→A in non-CpG FFD SNPs is not seen in iSNPs or FFD SNPs that are part of a CpG. As iSNPs and FFD SNPs should confront the same transcription-coupled mechanisms, including TCR and transcription-coupled mutation (TCM), the C→T excess of FFD SNPs must be driven by mechanisms other than mutational/repair factors. Alternatively, biologically significant effects of synonymous SNPs (sSNPs) on aspects of RNA function other than protein coding may exist and be subject to selective pressures. Unlike lower organisms, it is still contentious whether selection for translational efficiency does[19, 20] or does not [21–24] play a major role in shaping codon usage (and therefore sSNP frequencies) in mammals. There is little variation in iso-acceptor tRNA gene numbers and the population sizes are likely too low to reflect very weak selective pressures. On the other hand, translation may be affected by RNA secondary structure which, like splicing, mRNA stability, or other less well understood RNA functions, may be significantly altered by single-nucleotide changes. Such mechanisms have recently been suggested in a few studies[8, 9, 25]. If sSNPs do have such biological effects, there is evidence to suggest that changes in mRNA secondary structure are likely to play an important role in mediating them[25, 26]. Given the evidence of compromised mRNA stability in the presence of A|T dinucleotides at dicodon boundaries [16, 17], G→A polymorphisms at FFD sites may have deleterious effects that C→T does not, thus creating selection pressure that favors C→T if the next codon begins with a T. In this report we show that this is indeed the case.
The different asymmetry pattern between non-CpG and CpG sites can be attributed to the hypermutability of the latter. The effects of selection on the observed mutation patterns are most pronounced in relatively slowly mutating, non-CpG sites. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are younger and have confronted less selective pressure than non-CpG FFD SNPs. For the same selection effect on A|T dinucleotides, A→G polymorphism may also confront more selection pressure than T→C, which can also explain why the A→G excess is not significantly different in FFD non-CpG and intronic CpG sites.
In conclusion, we confirm the genome-wide excess of A→G over T→C mutations previously reported in a small region of chr. 7 , a finding that points to TCR as an important factor in human mutagenesis. More importantly, our analysis of FFD SNPs clearly suggests a mechanism that operates differentially in intronic vs. exonic sequences. We propose that selective pressure related to changes in mRNA stability is the most likely explanation. In view of the balance between selective and mutational pressures, we provide satisfactory explanation for the previous contradictory findings of mutation rates in humans [3, 4, 28]. Our finding further highlights the importance of not overlooking potential function by the sSNPs, which may not be as selectively neutral as is generally thought, an important consideration given the expected wealth of complex-disease association data to come out of the new genotyping technologies.
SNP information collection
Considering the possibility that some SNPs recorded by NCBI dbSNP database may not be reliable and result from DNA sequencing errors, we performed the investigation using the Perlegen dataset of DNA variation genotyping[30, 31]. The SNPs were all identically ascertained by microarray resequencing of the genome, and verified in multiple populations. Only single nucleotide polymorphisms with two alleles were included. SNPs in sex chromosomes were not included in this study. Reference sequences of the SNPs in 22 pairs of human autosomes were bulk-downloaded from the NCBI dbSNP database build 124.
The orientation of SNP reference sequence
The dbSNP reference sequences of iSNPs can not be aligned with mRNA sequence directly. Some FFD SNP reference sequences have intronic sequence included, and some genes have different mRNA transcripts from alternative splicing. Therefore, instead of aligning SNP sequence with mRNA sequence, we wrote Java scripts to determine the orientation of a dbSNP reference sequence in the DNA coding strand. The corresponding NCBI genome DNA contig sequence was first downloaded from the NCBI reference sequences. Then, a SNP reference sequence was aligned with the contig sequence around the SNP contig position and the orientation in the contig sequence was determined. The orientation of mRNA sequence in the same contig sequence was acquired from the annotation of dbSNP. Based on these two orientations, the orientational relation of SNP reference sequence and mRNA sequence was known. The corresponding nucleotide polymorphism in the DNA coding strand were determined consequently.
Correction for nucleotide or codon compositions
In order to determine the relative rates of each substitution, the observed counts were corrected for the background frequencies of nucleotides or codons. Both the intronic and FFD nucleotide compositions were acquired from the 14,029 genes annotated by the CCDS dababase[34, 35]. For background intronic nucleotide compositions, the first introns as well as the first and last 200 bp of each intron were excluded. As an example of correction, for A→G polymorphism, the observed number (NA→G) corrected by the frequency of adenine (PA) was calculated as:
The corrected proportions of each type of polymorphisms within the A↔G-C↔T pair were calculated in the same way. For the computation of the asymmetry ratio of complementry polymorphism, such as A→G vs. T→C,
The 95% CI was computed by logistic regression analysis.
We thank Luc Marchand for technical support and the three anonymous reviewers for their valuable comments. Funded by Genome Canada, Genome Quebec, the Juvenile Diabetes Research Foundation International, and Canadian Institutes of Health Research. HQQ was supported by a postdoctoral fellowship from the Montreal Children's Hospital Foundation. JM is a recipient of a Canada Research Chair.
- Frank AC, Lobry JR: Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene. 1999, 238 (1): 65-77. 10.1016/S0378-1119(99)00297-8.PubMedView ArticleGoogle Scholar
- Lobry JR: Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 1996, 13 (5): 660-665.PubMedView ArticleGoogle Scholar
- Krawczak M, Ball EV, Cooper DN: Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998, 63 (2): 474-488. 10.1086/301965.PubMedPubMed CentralView ArticleGoogle Scholar
- Green P, Ewing B, Miller W, Thomas PJ, Green ED: Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003, 33 (4): 514-517. 10.1038/ng1103.PubMedView ArticleGoogle Scholar
- Majewski J: Dependence of mutational asymmetry on gene-expression levels in the human genome. Am J Hum Genet. 2003, 73 (3): 688-692. 10.1086/378134.PubMedPubMed CentralView ArticleGoogle Scholar
- Jiricny J, Su SS, Wood SG, Modrich P: Mismatch-containing oligonucleotide duplexes bound by the E. coli mutS-encoded protein. Nucleic Acids Res. 1988, 16 (16): 7843-7853.PubMedPubMed CentralView ArticleGoogle Scholar
- Lamers MH, Perrakis A, Enzlin JH, Winterwerp HH, de Wind N, Sixma TK: The crystal structure of DNA mismatch repair protein MutS binding to a G x T mismatch. Nature. 2000, 407 (6805): 711-717. 10.1038/35037523.PubMedView ArticleGoogle Scholar
- Chamary JV, Hurst LD: Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage. Mol Biol Evol. 2004, 21 (6): 1014-1023. 10.1093/molbev/msh087.PubMedView ArticleGoogle Scholar
- Willie E, Majewski J: Evidence for codon bias selection at the pre-mRNA level in eukaryotes. Trends Genet. 2004, 20 (11): 534-538. 10.1016/j.tig.2004.08.014.PubMedView ArticleGoogle Scholar
- Cartegni L, Chew SL, Krainer AR: Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002, 3 (4): 285-298. 10.1038/nrg775.PubMedView ArticleGoogle Scholar
- Pagani F, Baralle FE: Genomic variants in exons and introns: identifying the splicing spoilers. Nat Rev Genet. 2004, 5 (5): 389-396. 10.1038/nrg1327.PubMedView ArticleGoogle Scholar
- Majewski J, Ott J: Distribution and characterization of regulatory elements in the human genome. Genome Res. 2002, 12 (12): 1827-1836. 10.1101/gr.606402.PubMedPubMed CentralView ArticleGoogle Scholar
- Keightley PD, Gaffney DJ: Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc Natl Acad Sci U S A. 2003, 100 (23): 13402-13406. 10.1073/pnas.2233252100.PubMedPubMed CentralView ArticleGoogle Scholar
- Bird AP: CpG-rich islands and the function of DNA methylation. Nature. 1986, 321 (6067): 209-213. 10.1038/321209a0.PubMedView ArticleGoogle Scholar
- Strachan T RAP: Human molecular genetics. 1999, Oxford , BIOS Scientific, 2nd EdGoogle Scholar
- Carroll SS, Chen E, Viscount T, Geib J, Sardana MK, Gehman J, Kuo LC: Cleavage of Oligoribonucleotides by the 2`,5`-Oligoadenylate- dependent Ribonuclease L. J Biol Chem. 1996, 271 (9): 4988-4992. 10.1074/jbc.271.9.4988.PubMedView ArticleGoogle Scholar
- Carlini DB: Context-dependent codon bias and messenger RNA longevity in the yeast transcriptome. Mol Biol Evol. 2005, 22 (6): 1403-1411. 10.1093/molbev/msi135.PubMedView ArticleGoogle Scholar
- Beletskii A, Bhagwat AS: Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci U S A. 1996, 93 (24): 13919-13924. 10.1073/pnas.93.24.13919.PubMedPubMed CentralView ArticleGoogle Scholar
- Comeron JM: Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Genetics. 2004, 167 (3): 1293-1304. 10.1534/genetics.104.026351.PubMedPubMed CentralView ArticleGoogle Scholar
- Lavner Y, Kotlar D: Codon bias as a factor in regulating expression via translation rate in the human genome. Gene. 2005, 345 (1): 127-138. 10.1016/j.gene.2004.11.035.PubMedView ArticleGoogle Scholar
- Eyre-Walker AC: An analysis of codon usage in mammals: selection or mutation bias?. J Mol Evol. 1991, 33 (5): 442-449. 10.1007/BF02103136.PubMedView ArticleGoogle Scholar
- Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T: Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol. 2001, 53 (4-5): 290-298. 10.1007/s002390010219.PubMedView ArticleGoogle Scholar
- Duret L: Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002, 12 (6): 640-649. 10.1016/S0959-437X(02)00353-2.PubMedView ArticleGoogle Scholar
- dos Reis M, Savva R, Wernisch L: Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004, 32 (17): 5036-5044. 10.1093/nar/gkh834.PubMedView ArticleGoogle Scholar
- Duan J, Antezana MA: Mammalian mutation pressure, synonymous codon choice, and mRNA degradation. J Mol Evol. 2003, 57 (6): 694-701. 10.1007/s00239-003-2519-1.PubMedView ArticleGoogle Scholar
- Chamary JV, Hurst LD: Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005, 6 (9): R75-10.1186/gb-2005-6-9-r75.PubMedPubMed CentralView ArticleGoogle Scholar
- Subramanian S, Kumar S: Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res. 2003, 13 (5): 838-844. 10.1101/gr.1152803.PubMedPubMed CentralView ArticleGoogle Scholar
- Hanawalt PC: Transcription-coupled repair and human disease. Science. 1994, 266 (5193): 1957-1958.PubMedView ArticleGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. 1983, Cambridge [Cambridgeshire] ; New York. , Cambridge University Press, Bibliography: p. -353.View ArticleGoogle Scholar
- Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR: Whole-Genome Patterns of Common DNA Variation in Three Human Populations. Science. 2005, 307 (5712): 1072-1079. 10.1126/science.1105436.PubMedView ArticleGoogle Scholar
- Perlegen Sciences, Inc. [ http://www.perlegen.com/ ].Google Scholar
- NCBI dbSNP database [ http://www.ncbi.nlm.nih.gov/SNP/ ].Google Scholar
- NCBI reference sequences [ ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ ].Google Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31 (1): 51-54. 10.1093/nar/gkg129.PubMedPubMed CentralView ArticleGoogle Scholar
- UCSC Genome Bioinformatics [ http://genome.ucsc.edu/ ].Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.