Gene conversion in the rice genome
- Shuqing Xu†1, 7, 8, 9,
- Terry Clark†1, 2,
- Hongkun Zheng†1, 3,
- Søren Vang†4,
- Ruiqiang Li1, 3,
- Gane Ka-Shu Wong1, 5,
- Jun Wang1, 3, 6Email author and
- Xiaoguang Zheng1Email author
© Xu et al; licensee BioMed Central Ltd. 2008
Received: 16 September 2007
Accepted: 25 February 2008
Published: 25 February 2008
Gene conversion causes a non-reciprocal transfer of genetic information between similar sequences. Gene conversion can both homogenize genes and recruit point mutations thereby shaping the evolution of multigene families. In the rice genome, the large number of duplicated genes increases opportunities for gene conversion.
To characterize gene conversion in rice, we have defined 626 multigene families in which 377 gene conversions were detected using the GENECONV program. Over 60% of the conversions we detected were between chromosomes. We found that the inter-chromosomal conversions distributed between chromosome 1 and 5, 2 and 6, and 3 and 5 are more frequent than genome average (Z-test, P < 0.05). The frequencies of gene conversion on the same chromosome decreased with the physical distance between gene conversion partners. Ka/Ks analysis indicates that gene conversion is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less than ten percent. Pseudogenes in the rice genome with low similarity to Arabidopsis genes showed greater likelihood for gene conversion than those with high similarity to Arabidopsis genes. Functional annotations suggest that at least 14 multigene families related to disease or bacteria resistance were involved in conversion events.
The evolution of gene families in the rice genome may have been accelerated by conversion with pseudogenes. Our analysis suggests a possible role for gene conversion in the evolution of pathogen-response genes.
Gene conversion involves a non-reciprocal transfer of information between two homologous genes where one segment replaces nucleotides in its corresponding homolog. Gene conversion is generally considered a homogenization force on the genome, although it has two distinct consequences. In homogenization, gene conversion causes concerted evolution in gene families through reciprocal exchange of sequence between paralogs . However, diversification can occur, for example, when a pseudogene or otherwise unexpressed gene segment is transferred into another, functioning gene. Alteration of gene function through diversification can have advantageous consequences, such as in immune system diversification involving the major histocompatibility complex genes [2–4].
In eukaryotes, gene conversion has been classified into two types based on the conversion targets; one involves allele conversion and the other involves repeated genes. Conversion between alleles occurs at the same loci on sister-chromatids or between homologous chromosomes. Conversion events between repeated genes can occur at different loci on the same chromatid, sister-chromatids, homologous chromosomes, or non-homologous chromosomes . These events leave signatures in genome sequences that are detectable through specialized statistical analysis. In this study, we use such statistical methods with genome sequence data and annotations to study the gene conversion history of multigene families in the rice genome.
Genome wide searches for gene conversion events between paralogs have been performed in yeast , Caenorhabditis elegans , and mouse and rat . Significant evidence for gene conversion events has been detected on human chromosomes 21 and Y [9–12]. Analyses of selected regions of the Arabidopsis thaliana genome suggest that the divergence during the process of gene evolution is affected by gene conversion [13, 14]. However, prior to our work, no genome-wide conversion analysis has been reported for plants. As a result, little is known about how gene conversion influences the evolution of multigene families in plant genomes.
The rice genome has evidence of ancient whole genome duplication, as well as recent chromosomal and segmental duplication [15, 16]. Because of the increase in paralogs through duplication, the rice genome may have undergone potentially many gene-conversion and unequal-crossover events in its evolution. Studies of these events can enhance our understanding of evolutionary processes behind multigene families in the rice genome. Toward this end, we have mined the rice genome database  for gene conversion traces.
Number and length of gene conversions detected in the rice genome
Gene conversion with pseudogenes may accelerate gene family evolution; in this model, pseudogenes are postulated as a source of genetic information. The introduction of genetic material from pseudogenes may lead to higher divergence in orthologs between rice and related species. To test this hypothesis, we established rice gene families as having either low similarity (LS) or high similarity (HS) to Arabidopsis thaliana. HS gene families are defined as having statistically significant sequence similarity to Arabidopsis genes. Conversely, LS gene families have low similarity to Arabidopsis. It follows that LS gene families are more likely to be rice specific.
Genes with low similarity to Arabidopsis were found to have more gene-conversion events than genes with high similarity. * Fisher's Exact Test P value < 0.01.
To rule out effects from assembly artifacts on our study, we performed a similar analysis on the japonica rice genome published by The International Rice Genome Sequencing Project (IRGSP) . The gene conversion characteristics were indistinguishable between the two assemblies (data not shown).
Distribution of conversion events on the chromosomes
The number of genes involved in conversions on each chromosome compared to the estimated number based on chromosome length. The estimated number is the total number of conversions in the genome divided by the genome sequence length.
Genes with conversions
Estimated genes with conversions
From a total of 3844 gene pairs in the 626 multigene families, 2903 pairs were located on different chromosomes, with 941 pairs co-located on the same chromosome. It follows that the proportion of inter-chromosomal pairs involved in conversions was ~8% (229/2903), and intra-chromosomal pairs involved in conversions was ~16% (148/941). Although the number of inter-chromosomal conversion events was higher than intra-chromosomal conversion, the inter-chromosomal fraction was lower with respect to the potential for conversion based on total gene pairs. Thus, candidate pairs on the same chromosome apparently result in a higher likelihood for gene conversion. This does not necessarily represent an intrinsic bias for intra-chromosomal gene conversion; alternatively, it may represent an opportunistic positioning for gene conversion to occur.
To determine the extent of segmental duplication on the observed conversion events, we compared conversion data with duplicated segments in the Beijing indica assembly . For all 229 inter-chromosome conversion events, only 21 out of 229 (~9%) inter-chromosome conversion events were involved in the segmental duplications.
Conversion bias based on sequence similarity and orientation
The directions between gene pairs were also examined. Among the 3844 gene pairs, 2172 had the same direction and 1672 had reverse directions. In the 377 conversion events, 277 (60%) occurred between gene pairs with the same direction (See details in Additional File 1). The proportions of conversions with same-direction gene pairs (227/2172 = ~10%) and reverse-direction gene pairs (150/1672 = ~9%) were similar. The larger number of gene conversion events in the same direction coincides with the larger number of gene pairs in the same direction; however, from our data we cannot determine whether the conversion bias is an intrinsic preference.
Evolutionary selection correlated to gene function
To determine if gene pairs involved in conversions are subject to evolutionary selection pressure, synonymous substitution rates (Ks) and non-synonymous substitution rates (Ka) were used. The Ka/Ks ratio can reflect the selection pressure between gene pairs caused by evolutionary processes. We calculated and compared the Ka/Ks ratios for two groups: (1) the closest homologs in each multigene family where at least one homolog was involved in gene conversion and (2) all close homologs in each multigene family (see Methods). The Ka/Ks profiles for the two groups were indistinguishable (data not shown).
We assessed the function of genes involved in conversions using the protein nr database at NCBI . Although the function of many genes is presently unknown, we identified approximately14 gene families involved in conversions to be related to disease or bacteria resistance. These include genes coding for phospholipase D, cytochrome P450, receptor-like kinase and receptor kinase-like proteins. Some conversions were also found in related Arabidopsis gene families [13, 20]. (See details in Additional File 1). The highest conversion frequency was found in the phospholipase D (AK070203) gene family. Phospholipase D has been identified as an enzyme generating secondary messengers in plants, triggering defense against bacterial attacks .
The proliferation of duplications during the evolution of the rice genome may have increased the potential for gene conversion and crossover events within multigene families through an increase in donor sequences. In our analysis of rice, the likelihood for gene conversion was found to be greater between pairs on the same chromosome than pairs on different chromosomes, even though more pairs were found for the latter case. The large number of duplicated repeats between chromosomes provides numerous opportunities for inter-chromosomal gene conversion. That only ~9% of gene conversion occurred between pairs involved in inter-chromosome segmental duplications indicates that the observed conversions were primarily from other sources.
Our analysis considers fragments with uncharacteristically high similarity as candidates for gene conversion. High-similarity between fragments may also be caused by strong stabilizing selection. However, the fragments identified in gene conversion events are situated in spans of sequence flanked by sequence with low similarity. The low-similarity context suggests gene conversion and unequal crossing over as possible explanations for the high-similarity inner fragments.
We mapped all identified conversions onto the rice genome sequence (Figure 2). The most frequent conversions were between chromosomes 1 and 5, 2 and 6, and 3 and 5. Our data show a decrease in intra-chromosomal gene conversion frequency as the distance between genes increase. This distance dependence corresponds to previous genome-wide studies of yeast [6, 7]. In C. elegans, a high proportion of conversion was observed between tandemly duplicated members of gene families . The higher conversion frequency between genes with short separation on the same chromosome may be a consequence of a relationship between conversion and recombination. In Arabidopsis, it was found that the upper limit of pairwise distance between genes involved in conversion is 40 kb . This is similar to the value we found in the rice genome (Figure 4).
Conversions involving pseudogenes could accelerate gene family evolution, and may accelerate divergence of some gene families relative to their orthologs. In this study, we found that pseudogenes are more prone to participate in gene conversion in the LS gene families than in HS gene families in the rice genome. Thus, conversion with pseudogenes in LS gene families may contribute to the acceleration of LS gene evolution. The rationale behind this is linked to the susceptibility of pseudogenes to accumulate mutations more rapidly than expressed genes, which may then be transferred to conversion partners. This may occur where pseudogene fragments are recruited into functional genes. By definition, LS genes have low similarity with Arabidopsis, which means LS gene families are potentially rice-specific. Based on this, we can postulate that inter-conversion with pseudogenes may be a source of rice-specific genes. A similar mechanism is suspected for some human speciation events . Moreover, our findings support the view that pseudogenes contain potential material for new genes .
The question remains whether genes involved in conversions are subject to selective pressure that differs from non-converted genes. A mechanism has been suggested that favors selection of some gene clusters in the tomato plant exhibiting traces of gene conversion and conferring disease resistance . But does this occur in rice and is it genome wide? We calculated Ka and Ks values and their ratios for groups of gene with and without conversions – no differences were observed between the two groups. These similar Ka/Ks ratios may indicate that the genes involved in conversions were not subject to significant selective pressure. The indistinguishable Ka/Ks ratios of gene pairs involved in conversion imply that the genome-wide gene conversions were not tightly linked to selection pressure in the rice genome. As suggested by Mondragon-Palomino and Gaut, these indistinguishable Ka/Ks ratios may be a result of both methodological and biological influences . Inter-conversions through recombination and gene conversion may also influence the accuracy of Ka/Ks analysis . Although the co-occurrence of gene conversion and positive selection has been found in some studies, there is evidence that gene conversion is independent of positive selection [13, 26].
The diversification within gene families could be caused by conversion with variant paralogs . This mechanism has been widely observed in mammalian immune systems [2, 28, 29]. In tomato and Arabidopsis, gene conversion has been detected in genes related to disease and bacteria resistance [13, 24]. Our genome-wide analysis of rice identified at least fourteen such genes potentially influenced by gene conversion events. Some of these genes have counterparts in Arabidopsis , while others were specific to rice. Our results contribute to the view that intergenic gene conversion can create variety within a gene family, providing a mechanism for the adaptive evolution, such as disease resistance . The diversified paralogs conferring disease resistance would be advantageous for adaptive reorganization and response to various diseases or bacteria.
We have detected 377 gene conversion events in the rice genome. The overall characteristics of gene conversion in the rice genome suggest influences by extensive duplication events throughout the evolution of rice. Our data further suggest that conversion with pseudogenes may have accelerated the evolution of multigene families. In particular, the adaptive evolution of disease resistance in rice may have been significantly influenced by gene conversion.
The initial 28,469 full-length rice cDNA sequences were downloaded from RIKEN and FAIS centers . We aligned these cDNAs to the Oryza sativa indica genome sequence using BLAT . We removed the following sequences: redundant genes, namely those that are the smaller cDNA of two alignments with overlap of at least 100 bp; unlikely protein coding genes with open reading frames less than 100 amino acids; and those sequences with more than 10% transposon-like content identified by using RepeatMasker with RepBase . Eventually, 13,089 reliable protein-coding genes were obtained  and used as consensus sequences.
Paralogs in the rice genome (Indica 9311) were identified based on similarity with the 13,089 reference protein sequences using BLAST . The pipeline used to identify paralogs is similar to the FGF approach . The reference protein sequence was defined as the consensus sequence for each family. The members of gene families were defined by alignments with at least 80% similarity over 70% of the corresponding consensus sequence. If the subject sequence exhibited significant similarity to more than one consensus protein, we considered only the highest-scoring match. The family members were aligned together with the consensus protein sequence using GeneWise .
Families with more than two genes were defined as multigene families. In the 626 multigene families, we identified 3824 gene pairs; a gene pair is defined as one gene and its closest homolog in the same family. We defined high-similarity (HS) genes as those with a homolog in Arabidopsis and low-similarity (LS) as those without. LS genes and HS genes were identified based on tblastn searches using an E-value threshold of 10-7as previously reported , involving at least 50% of a given Arabidopsis protein or 100 amino acids. The cDNAs were aligned to protein sequences also using GeneWise. Those cDNAs containing multiple stop codons or frame-shift mutations were considered to be pseudogenes.
Detection of Gene Conversion Events
The aligned sequences within the multigene families were used to detect conversion events using the GENECONV program version 1.81 . This program detects pairs of sequences that share unusually long stretches of similarity in regions of overall lower similarity . The methods used by GENECONV make it difficult to detect conversion events as candidate region lengths approach zero, i.e., for very short sequences. For example, if a conversion event contains only 3 bp of information, we cannot confidently assert the criterion of unusually long stretches of similarity, a signature of gene conversion. Both global and pairwise P-values were calculated based on 1000 permutations of original data and a BLAST-like searching algorithm.
We only used P-values from global fragments, which were multiple comparisons corrected for all possible sequence pairs. The g2 option provided by GENECONV was used to allow some accumulation of mutations in conversion regions following a candidate conversion event. With the g2 option, all multigene families that had global P-values less than 0.05 for inner fragments were considered as statistical significance requirement for conversion events; 215 of the detected pairs had P-values less than 0.01, and 162 had with P-value higher than 0.01 but less than 0.05. The inner fragment indicates a possible gene conversion event between ancestors of the two sequences in the alignment where the outer fragments have diverged.
The lengths of the converted regions were also determined using the multiple-sequence alignment used by GENECONV. We removed the gaps involved in the conversion regions of both paralogs to compensate for multiple-sequence alignment effects thereby improving the length estimate for the conversion regions. The lengths were also checked manually against pairwise alignments using randomly selected pairs without a significant difference with our multiple-sequence alignment analysis.
We compared all paralogs in each gene family. If the paralogs of a family shared more than 95% similarity, we only considered one of them as a conversion partner with other high-scoring alignments designated as members of the gene family. The goal of this step was to eliminate conversion copies resulting from subfamilies with recent duplications.
Similarity, direction, distance and evolutionary selection analysis
The similarities between gene pairs were estimated based on their cDNA sequences. Sequence alignments and similarity scores were calculated between all paralogs within each multigene family. The genes were mapped onto the genome sequence using TBLASTN; direction and distance between gene pairs were determined from the physical map. The expected number of genes involved in conversion events on each chromosome was estimated based on the length of chromosomes. In computing sequence similarity, we compared the sequences with and without the identified converted region to exclude the contribution of converted sequence to the overall score.
Ka and Ks values of gene pairs were calculated using the LPB93 method . Ka/Ks ratios were compared between gene pairs for gene-conversion partners consisting of 3824 pairs in 626 multigene families. To increase the calculation sensitivity, gene pairs with Ks > 1 were removed because the Ka/Ks calculations by LPB93 may not be reliable for highly diverged gene pairs. Functional information for gene families involved in gene conversion were obtained from NCBI .
low similarity to Arabidopsis;
high similarity to Arabidopsis;
International Rice Genome Sequencing Project.
This project was supported by Chinese Academy of Sciences (KSCX2-YW-N-023; GJHZ0518), Ministry of Science and Technology under high-tech program 863 (2006AA02Z334; 2006AA10A121), Beijing Municipal Science and Technology Commission (D07030200740000), and National Natural Science Foundation of China (90608010; 90208019; 90403130; 30221004; 90612019; 30392130). Other support came from Ole Rømer grants from the Danish Natural Science Research Council and the Danish Medical Research Council.
- Teshima KM, Innan H: The effect of gene conversion on the divergence between duplicated genes. Genetics. 2004, 166 (3): 1553-1560. 10.1534/genetics.166.3.1553.PubMedPubMed CentralView ArticleGoogle Scholar
- Weiss EH, Mellor A, Golden L, Fahrner K, Simpson E, Hurst J, Flavell RA: The Structure of a Mutant H-2-Gene Suggests That the Generation of Polymorphism in H-2 Genes May Occur by Gene Conversion-Like Events. Nature. 1983, 301 (5902): 671-674. 10.1038/301671a0.PubMedView ArticleGoogle Scholar
- Martinsohn JT, Sousa AB, Guethlein LA, Howard JC: The gene conversion hypothesis of MHC evolution: a review. Immunogenetics. 1999, 50 (3–4): 168-200. 10.1007/s002510050593.PubMedView ArticleGoogle Scholar
- Richman AD, Herrera LG, Nash D, Schierup MH: Relative roles of mutation and recombination in generating allelic polymorphism at an MHC class II locus in Peromyseus maniculatus. Genet Res. 2003, 82 (2): 89-99. 10.1017/S0016672303006347.PubMedView ArticleGoogle Scholar
- Petes TD, Hill CW: Recombination between Repeated Genes in Microorganisms. Annu Rev Genet. 1988, 22: 147-168. 10.1146/annurev.ge.22.120188.001051.PubMedView ArticleGoogle Scholar
- Drouin G: Characterization of the gene conversions between the multigene family members of the yeast genome. J Mol Evol. 2002, 55 (1): 14-23. 10.1007/s00239-001-0085-y.PubMedView ArticleGoogle Scholar
- Semple C, Wolfe KH: Gene duplication and gene conversion in the Caenorhabditis elegans genome. J Mol Evol. 1999, 48 (5): 555-564. 10.1007/PL00006498.PubMedView ArticleGoogle Scholar
- Ezawa K, S OO, Saitou N: Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Genome-wide search of gene conversions in duplicated genes of mouse and rat. Mol Biol Evol. 2006, 23 (5): 927-940. 10.1093/molbev/msj093.PubMedView ArticleGoogle Scholar
- Padhukasahasram B, Marjoram P, Nordborg M: Estimating the rate of gene conversion on human chromosome 21. Am J Hum Genet. 2004, 75 (3): 386-397. 10.1086/423451.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeffreys AJ, May CA: Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet. 2004, 36 (2): 151-156. 10.1038/ng1287.PubMedView ArticleGoogle Scholar
- Rozen S, Skaletsky H, Marszalek JD, Minx PJ, Cordum HS, Waterston RH, Wilson RK, Page DC: Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature. 2003, 423 (6942): 873-876. 10.1038/nature01723.PubMedView ArticleGoogle Scholar
- Bosch E, Hurles ME, Navarro A, Jobling MA: Dynamics of a human interparalog gene conversion hotspot. Genome Res. 2004, 14 (5): 835-844. 10.1101/gr.2177404.PubMedPubMed CentralView ArticleGoogle Scholar
- Mondragon-Palomino M, Gaut BS: Gene conversion and the evolution of three leucine-rich repeat gene families in Arabidopsis thaliana. Mol Biol Evol. 2005, 22 (12): 2444-2456. 10.1093/molbev/msi241.PubMedView ArticleGoogle Scholar
- Haubold B, Kroymann J, Ratzka A, Mitchell-Olds T, Wiehe T: Recombination and gene conversion in a 170-kb genomic region of Arabidopsis thaliana. Genetics. 2002, 161 (3): 1269-1278.PubMedPubMed CentralGoogle Scholar
- Yu J, Hu SN, Wang J, Wong GKS, Li SG, Liu B, Deng YJ, Dai L, Zhou Y, Zhang XQ, et al: A draft sequence of the rice genome (Oryza sativa L. ssp indica). Science. 2002, 296 (5565): 79-92. 10.1126/science.1068037.PubMedView ArticleGoogle Scholar
- Yu J, Wang J, Lin W, Li SG, Li H, Zhou J, Ni PX, Dong W, Hu SN, Zeng CQ, et al: The Genomes of Oryza sativa: A history of duplications. PLoS Biol. 2005, 3 (2): e38-. 10.1371/journal.pbio.0030038. Epub 2005 Feb 1.PubMedPubMed CentralView ArticleGoogle Scholar
- The map-based sequence of the rice genome. Nature. 2005, 436 (7052): 793-800. 10.1038/nature03895.
- Modrich P, Lahue R: Mismatch repair in replication fidelity, genetic recombination, and cancer biology. Annu Rev Biochem. 1996, 65: 101-133. 10.1146/annurev.bi.65.070196.000533.PubMedView ArticleGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007, D5-12. 10.1093/nar/gkl1031. 35 Database
- Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996, 93 (19): 10274-10279. 10.1073/pnas.93.19.10274.PubMedPubMed CentralView ArticleGoogle Scholar
- Laxalt AM, Munnik T: Phospholipid signalling in plant defence. Curr Opin Plant Biol. 2002, 5 (4): 332-338. 10.1016/S1369-5266(02)00268-6.PubMedView ArticleGoogle Scholar
- Hayakawa T, Angata T, Lewis AL, Mikkelsen TS, Varki NM, Varki A: A human-specific gene in microglia. Science. 2005, 309 (5741): 1693-1693.PubMedGoogle Scholar
- Balakirev ES, Ayala FJ: Pseudogenes: are they "junk" or functional DNA?. Annu Rev Genet. 2003, 37: 123-151. 10.1146/annurev.genet.37.040103.103949.PubMedView ArticleGoogle Scholar
- Parniske M, HammondKosack KE, Golstein C, Thomas CM, Jones DA, Harrison K, Wulff BBH, Jones JDG: Novel disease resistance specificities result from sequence exchange between tandemly repeated genes at the Cf-4/9 locus of tomato. Cell. 1997, 91 (6): 821-832. 10.1016/S0092-8674(00)80470-5.PubMedView ArticleGoogle Scholar
- Wong WSW, Yang ZH, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004, 168 (2): 1041-1051. 10.1534/genetics.104.031153.PubMedPubMed CentralView ArticleGoogle Scholar
- Kuang H, Woo SS, Meyers BC, Nevo E, Michelmore RW: Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. Plant Cell. 2004, 16 (11): 2870-2894. 10.1105/tpc.104.025502.PubMedPubMed CentralView ArticleGoogle Scholar
- Zimmer EA, Martin SL, Beverley SM, Kan YW, Wilson AC: Rapid Duplication and Loss of Genes-Coding for the Alpha-Chains of Hemoglobin. Proc Natl Acad Sci USA. 1980, 77 (4): 2158-2162. 10.1073/pnas.77.4.2158.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohta T: Role of Diversifying Selection and Gene Conversion in Evolution of Major Histocompatibility Complex Loci. Proc Natl Acad Sci USA. 1991, 88 (15): 6716-6720. 10.1073/pnas.88.15.6716.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohta T: Role of gene conversion in generating polymorphisms at major histocompatibility complex loci. Hereditas. 1997, 127 (1–2): 97-103. 10.1111/j.1601-5223.1997.00097.x.PubMedGoogle Scholar
- Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, et al: Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 2003, 301 (5631): 376-379. 10.1126/science.1081288.PubMedView ArticleGoogle Scholar
- Kent WJ: BLAT – the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Jurka J: Repeats in genomic DNA: mining and meaning. Curr Opin Struct Biol. 1998, 8 (3): 333-337. 10.1016/S0959-440X(98)80067-5.PubMedView ArticleGoogle Scholar
- Wang W, Zheng HK, Fan CZ, Li J, Shi JJ, Cai ZQ, Zhang GJ, Liu DY, Zhang JG, Vang S, et al: High rate of chimeric gene origination by retroposition in plant genomes. Plant Cell. 2006, 18 (8): 1791-1802. 10.1105/tpc.106.041905.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Zheng H, Shi J, Fang X, Li Y, Vang S, Fan W, Wang J, Zhang Z, Wang W, Kristiansen K, et al: FGF: a web tool for Fishing Gene Family in a whole genome database. Nucleic Acids Res. 2007, W121-125. 10.1093/nar/gkm426. 35 Web Server
- Birney E, Durbin R: Using GeneWise in the Drosophila annotation experiment. Genome Res. 2000, 10 (4): 547-548. 10.1101/gr.10.4.547.PubMedPubMed CentralView ArticleGoogle Scholar
- Sawyer S: Statistical Tests for Detecting Gene Conversion. Mol Biol Evol. 1989, 6 (5): 526-538.PubMedGoogle Scholar
- Li WH: Unbiased Estimation of the Rates of Synonymous and Nonsynonymous Substitution. J Mol Evol. 1993, 36 (1): 96-99. 10.1007/BF02407308.PubMedView ArticleGoogle Scholar
- NCBI. [http://www.ncbi.nlm.nih.gov/]