Seven different genes encode a diverse mixture of isoforms of Bet v 1, the major birch pollen allergen

Background Pollen of the European white birch (Betula pendula, syn. B. verrucosa) is an important cause of hay fever. The main allergen is Bet v 1, member of the pathogenesis-related class 10 (PR-10) multigene family. To establish the number of PR-10/Bet v 1 genes and the isoform diversity within a single tree, PCR amplification, cloning and sequencing of PR-10 genes was performed on two diploid B. pendula cultivars and one interspecific tetraploid Betula hybrid. Sequences were attributed to putative genes based on sequence identity and intron length. Information on transcription was derived by comparison with homologous cDNA sequences available in GenBank/EMBL/DDJB. PCR-cloning of multigene families is accompanied by a high risk for the occurrence of PCR recombination artifacts. We screened for and excluded these artifacts, and also detected putative artifact sequences among database sequences. Results Forty-four different PR-10 sequences were recovered from B. pendula and assigned to thirteen putative genes. Sequence homology suggests that three genes were transcribed in somatic tissue and seven genes in pollen. The transcription of three other genes remains unknown. In total, fourteen different Bet v 1-type isoforms were identified in the three cultivars, of which nine isoforms were entirely new. Isoforms with high and low IgE-reactivity are encoded by different genes and one birch pollen grain has the genetic background to produce a mixture of isoforms with varying IgE-reactivity. Allergen diversity is even higher in the interspecific tetraploid hybrid, consistent with the presence of two genomes. Conclusion Isoforms of the major birch allergen Bet v 1 are encoded by multiple genes, and we propose to name them accordingly. The present characterization of the Bet v 1 genes provides a framework for the screening of specific Bet v 1 genes among other B. pendula cultivars or Betula species, and for future breeding for trees with a reduced allergenicity. Investigations towards sensitization and immunotherapy should anticipate that patients are exposed to a mixture of Bet v 1 isoforms of different IgE-reactivity, even if pollen originates from a single birch tree.


Background
Pathogenesis-related class 10 (PR-10) proteins constitute the largest group of aeroallergens and are among the four most common food allergens [1]. The main allergen is a PR-10 pollen protein from the European white birch (Betula pendula) termed Bet v 1 [2]. Birch pollen is a major cause of Type I allergies in the temperate climate zone of the northern hemisphere. Over 95% of the tree pollensensitized patients in Scandinavia display IgE binding to Bet v 1, while 60% react exclusively to this allergen [3]. Pollen of other Fagales species contains Bet v 1 homologues that share epitopes with Bet v 1 [4].
PR-10 proteins are present as a multigene family across a range of phylogenetically distant species, including Gymnosperms, Monocots, and Dicots [5][6][7]. As a consequence, several foods contain Bet v 1 homologues, including nuts, vegetables, and Rosaceae fruits [6,8,9]. Patients that are sensitized to Bet v 1 may experience mild allergic symptoms upon consumption of these foods due to IgE crossreactivity. Symptoms of this so called oral allergy syndrome (OAS) are mainly limited to the oral cavity. Crossreactivity has clearly been demonstrated by allergic responses to the celery protein Api g 1, which is initiated by sensitization to Bet v 1 [8].
The B. pendula genome contains multiple PR-10 genes with varying expression patterns. Among these, the Bet v 1 allergens are expressed in pollen. The first Bet v 1 isoform was identified by immunoscreening a pollen cDNA expression library with serum of birch pollen allergic patients [2]. Other Bet v 1 isoforms have been sequenced by various authors since then [10][11][12][13]. Pollen mixtures from multiple trees were found to contain multiple Bet v 1 isoforms [13]. Bet v 1 isoforms differ in the ability to bind IgE and in the T-cell proliferation score [14]. Two other types of PR-10 proteins were detected in birch cells that were grown in a liquid medium in the presence of microbial pathogens [15]. These proteins are expressed in roots and leaves under basal conditions or induced under various stress-related conditions [15,16]. The PR-10 genes from B. pendula form a homogeneous group, based on sequence similarities. Homogeneity is suggested to be maintained by concerted evolution [17]. Arrangements of PR-10 genes into clusters, such as found for Mal d 1 genes in apple (Malus domestica), may facilitate concerted evolution [18].
Given the prominent role of Bet v 1 allergens in the sensitization to hay fever and OAS, birch is a relevant target for development of allergy prevention strategies. Selection and breeding of hypoallergenic trees or the application of genetic modification to develop these may potentially reduce the allergenic load caused by birch. Knowledge on the diversity of PR-10 genes, their expression, and aller-genicity of the gene products is required to facilitate these strategies. In the present study, we amplified, cloned and sequenced PR-10 alleles from three B. pendula cultivars to establish the number of PR-10/Bet v 1 genes and the isoform diversity within a single tree.

PCR recombination artifacts
When PCR amplification is performed on groups of closely related sequences, such as the PR-10 gene family, accurate sequences are essential to distinguish between members. When we initially determined the relationships among the recovered sequences, most clades in the Bayesian consensus tree had low posterior probabilities. Several sequences occupied intermediate positions between well-defined clusters. This suggested the possibility of recombination. Recombination could have occurred in vivo through a crossing-over or gene conversion between prior existing genes, or in vitro during the PCR through strand-switching or re-annealing of incompletely amplified fragments.
Evidence supports the view that recombinant sequences were PCR artifacts. Recombination signals were abundant in the sequences obtained after the 30-cycle PCR ( Figure  1a) and virtually absent when 22 cycles were employed ( Figure 1b). Several GenBank sequences showed clear evidence for recombination too (Figure 1c). Putative recombinants from our experiments lacked or nearly lacked unique mutations and could be separated into two or three stretches that were identical to other sequences obtained. The GenBank recombinants shared close to 100% sequence identity to combinations of other accessions.
The use of independent PCRs resolves which sequences are genuine, since the probability of isolating identical artifacts in independent PCRs is extremely low. Most sequences without recombination signal were confirmed in independent PCRs (Table 1), while those with a recombination signal were not. The only exception was the PR-10.03B02.01 allele from 'Tristis' that was found in three independent PCRs. This allele was an in vivo recombination of the first 300 bp, including the intron, from the PR-10.03D gene and 183 bp from the original PR-10.03B gene as found in 'Schneverdinger Goldbirke' and 'Long Trunk'. Putative recombination artifacts were quite abundant in the 26 and 30-cycle PCR (27-46% of the sequences), but rare in the 22 and 24-cycle PCR (2-11%) ( Table 1). We conservatively maintained the sequences that were confirmed in independent PCRs and those with at least three unique mutations for further analysis. A minimum of three unique mutations was chosen to ensure that potential base mis-incorporation artifacts were excluded.
Phylogenetic profiles for detection of recombination Figure 1 Phylogenetic profiles for detection of recombination. Phylogenetic profile of the sequences from B. pendula 'Long Trunk' obtained after a PCR of (a) 30 cycles (n = 72 sequences) and (b) 20 cycles (n = 53). (c) Phylogenetic profile of the Gen-Bank PR-10 sequences from B. pendula (n = 66). The x-axis represents the sequence position (5'-3' including only informative positions). The y-axis indicates the phylogenetic correlation. Low values are indicative for recombination [27]. Low values at the edges are artifacts of the employed method. Several of the PR-10 sequences from B. pendula that were available from GenBank/EMBL/DDBJ also showed clear evidence for recombination ( Figure 1c). However, it was not possible to do a similar check as mentioned above for the GenBank sequences, and we can only hypothesize on the presence of artifact sequences without such direct evidence. Given the regular occurrence of artifact sequences in our experiments, we maintained only those 40 Gen-Bank sequences that were not under suspicion of recombination artifacts for further analysis.

Phylogenetic analysis: newly isolated sequences
We sequenced 404 individual clones in both directions (Table 1). Fourteen different sequences were identified in the diploid cultivar 'Schneverdinger Goldbirke', 7 of which were unique for this cultivar. Fifteen different sequences (of which 10 were unique) were identified in the diploid cultivar 'Tristis'. Approximately twice as many different sequences, namely 28, were identified in the tetraploid cultivar 'Long Trunk'; of these, 16 sequences were of B. pendula origin (8 unique), and 12 sequences  were from another Betula species (all unique). In total, 32 different sequences were found.
The Open Reading Frame (ORF) of the sequences was highly conserved and the alignment was straightforward. All but one ORF contained 483 nucleotides, coding for a putative protein that is 160 amino acids long. One sequence from 'Schneverdinger Goldbirke' required the inclusion of an indel between base 388 and 389 of the consensus. This sequence was denoted as a pseudogene, since the indel introduced a stop codon at 7 codons downstream. It cannot be excluded that this pseudogene is expressed, since the stop codon was located near the 3' end. The intron position was identical in all sequences and located at codon 62, being inserted between the first and second nucleotide. Most alleles had 5' splicing sites of AG:GT, with the exception of one allele that had a GG:GT splicing site. The 3' splicing sites were AG:GC or AG:GA. This is in concordance with known motifs for plant introns. The introns were relatively AT-rich (55-65%).
We determined the relationships among the PR-10 sequences from B. pendula. For this, the 'Long Trunk' sequences that were designated to the other parental species were excluded. Excluding primer traces, 171 of the 452 aligned exon positions were variable, while 150 positions were phylogenetically informative. The consensus tree from the Bayesian analysis indicated several welldefined clusters ( Figure 2). We implemented a cut-off level of 98% identity and allowed maximally two alleles per cultivar per gene to estimate the number of genes. In this way, we putatively identified ten genes in 'Schneverdinger Goldbirke', eleven in 'Tristis', and thirteen in 'Long Trunk'. Thirteen different genes were distinguished when the information was combined. Each gene was identified as such in at least two birch cultivars and was characterized by a distinct intron, in most cases of a different size between 84 and 152 bp ( Figure 2). The similarity between different alleles of one gene ranged from 98.9 to 100% identity in the exons, which corresponds to 0-5 SNPs. Bet v 1.02A and Bet v 1.02B were not well distinguished in the coding sequences, but had distinguishable introns. The pseudogene PR-10.03B-p01 from 'Schneverdinger Goldbirke' was identical to the PR-10.03B allele from 'Long Trunk' except for its indel.
When all alleles from 'Long Trunk' were included, 182 variable positions were identified among the 452 aligned exon positions. 154 positions were phylogenetically informative. The topology of the consensus tree from the Bayesian analysis showed that seven specific 'Long Trunk' genes were clustered pair wise to the B. pendula genes and these are likely to represent orthologuous genes from the second Betula species (not shown). Given the high identity (up to 100%) to PR-10 sequences from B. ermanii, this species, or a close relative, is likely to represent the second parental species (unpublished data, Schenk et al.). Intron sequences of orthologuous genes mostly showed slight differences in length or base pair composition. Three genes were recovered only from the unknown parent species and five only from B. pendula, indicating that Betula species do not necessarily have the same (number of) genes.

Phylogenetic analysis: GenBank sequences versus newly obtained sequences
All but two GenBank sequences had an ORF of 483 base pairs that coded for 160 amino acids and was generally interrupted by a position-conserved intron. Two sequences required the inclusion of an indel, following base 354 of the consensus sequence. This resulted in a stop codon directly after the insertion. The GenBank dataset was combined with the newly obtained sequences. Primer traces were discarded, resulting in an aligned stretch of 425 bp from base 28 to 452 of the consensus. Several GenBank sequences are identical at this stretch, reducing the number of unique GenBank sequences in the analysis to thirty-three. 173 out of 425 aligned positions were variable, while 146 were phylogenetically informative. The information from the Bayesian consensus tree was added to Table 2 by indicating in which gene cluster the GenBank alleles landed. Similarly to the analysis of the newly isolated sequences we used a cut-off of 98% identity. In the resulting classification, 35 out of 40 alleles that were assigned to a particular gene showed more than 99% identity for the exons. The lowest similarity of an allele that still clustered with a particular gene in the phylogenetic analysis was 98.4% identity with the reference sequence (Table 2).
We classified the genes into five subfamilies (I-V) based on identities of the coding regions and the intron. The average identity between alleles within each subfamily was 95-100%. The GenBank sequences were in part derived from RNA extractions from specific tissues. We used this tissue information to predict the transcription of the PR-10 genes ( Table 2). For this, alleles of a single gene are assumed to have the same mode and location of transcription. Subfamily I and II consist of respectively four and three genes and include alleles that are homologous to the pollen-expressed Bet v 1 allergens [2,13]. For four genes within these subfamilies, we found alleles that were 100% identical to pollen mRNA-derived sequences previously deposited into GenBank. Alleles from two other genes were 99.8% identical (1 SNP difference) to pollen mRNA-derived sequences from GenBank, which we take to predict the location of transcription for these six genes with a very high level of confidence. For only one of the genes in subfamily II, there was no mRNA-derived homologue in the GenBank database, but the high homology (97. .2%) to the other genes in this subfamily suggests that this gene will be expressed in pollen as well.
Subfamily III consists of four genes, two of which have alleles that are 99.5% identical to homologues of the previously described ypr10a and ypr10b sequences, which are transcribed in roots and leaves [15]. Transcription of the other two genes in subfamily III is unknown, as is the transcription of the single gene in subfamily IV. Subfamily V consists of a single gene. One of the recovered alleles was 100% identical to the previously described ypr10c sequence that is also expressed in roots and leaves [15]. Given the fact that all sequences in subfamily III-V are less than 90% homologous to pollen-derived mRNAs and that there is no evidence of expression in pollen, we consider these as non-allergens.

Nomenclature
All recovered B. pendula alleles were added to    Amino acid sequences, amino acids that affect IgE-reactivity, and T-cell epitopes of the PR-10 proteins Figure 3 Amino acid sequences, amino acids that affect IgE-reactivity, and T-cell epitopes of the PR-10 proteins. Amino acid sequences of the PR-10 proteins from B. pendula 'Tristis' (Tr), 'Schneverdinger Goldbirke' (Sv), and the B. pendula alleles from 'Long Trunk' (Lt). Amino acids associated with high allergenicity are marked with grey boxes and those associated with low IgE-reactivity (located within B-cell epitopes) are marked with black boxes [12,14]. The locations of the two major T-cell activating regions are indicated above the consensus [22]. T-cell epitope   B  B B B  B  B  BB  B   10  20  30  40  50  60  70  80  90  100  110  120  130  140  150

PR-10 proteins and allergenicity
A high similarity between proteins increases the chance that they share epitopes, while on the other hand, a single amino acid change may influence allergenicity drastically. The high homogeneity among the PR-10 genes of B. pendula was reflected by a higher allelic variation at the nucleic acid level compared to the protein level. Hence, the 45 different genomic sequences encoded 32 different putative isoforms (Figure 3). This is consistent with the relatively large number of synonymous mutations compared to the number of non-synonymous mutations (K a / K s ratios) [21]. The number of synonymous (K s ) and nonsynonymous (K a ) substitutions per site were calculated from pair wise comparisons of all alleles from the three B. pendula varieties. The average value was 0.080 for K a and 0.247 for K s , resulting in an average K a /K s ratio of 0.33 (n = 45). Analysis of the occurrence of non-synonymous mutations per codon indicated two interesting regions. The region between codon Asn 42 and Ile 56 lacked non-synonymous mutations and is characterized by a phosphatebinding loop with the sequence motive GxGGxGx ( Figure  3). Relatively many amino-acid differences were present beyond codon 125, especially between isoforms of the genes PR-10.04/PR-10.05 and PR-10.01 to PR-10.03 (Figure 3).
Previous research has identified isoforms with varying IgE reactivity within mixtures of pollen. In fact, this is also true for pollen from a single tree. Ferreira et al. [14] [12,14]. These are marked in Figure 3. The tetraploid 'Long Trunk' contained several isoforms with unique amino acid substitutions due to its putative hybrid origin. This greatly enlarges the variation in putative IgEreactivity among isoforms of this cultivar.

PCR artifacts
When PCR amplification is performed on groups of closely related sequences, such as the PR-10 gene family, accurate sequences are essential to distinguish between members. We used Pfu polymerase, which has proofreading functionality and reduces the number of base substitution error rates. However, previous research has shown that Pfu polymerase may generate more and more complex recombination artifacts than Taq polymerase [23,24] through incomplete primer extension and re-annealing to a different template [25], or strand switching between different templates [26]. Reducing the number of cycles [24] was an efficient solution to lower the amount of artifacts.
We identified PCR recombination artifacts in several of our sequences using the computer program Phylpro [27]. The comparison of independent PCRs enabled sequence validation and exclusion of both recombination and base substitution artifacts. The high occurrence of recombination artifacts in our experiments (27-46% after 30 PCR cycles) is not uncommon. For example, Wang and Wang [28] report 32% recombination artifacts after 30 cycles of PCR amplification for 16S rRNA genes. The occurrence of recombinant sequences within a mixture of approximately 15 different sequences is expected to be high, because almost all recombinations are detected. Half of the recombinations would, for example, remain undetected if only two templates are present. The presence of 13 genes also increases the amount of template compared to a single copy gene.
Using the same analysis to detect recombination in Gen-Bank/EMBL/DDJB accessions, we observed a recombination signal in 22 out of 62 (35%) accessions. These are not necessarily all PCR artifacts, as some recombinations may have occurred in vivo during evolution of the genes. However, we found only one true recombinant in our dataset and no evidence of past recombination events in the comparison between sequences from two different species (within the hybrid). This indicates that the occurrence of in vivo recombination is probably rare.
Base substitution error rates for mixtures of non-proofreading Taq and proofreading Pfu are approximately 5.6 × 10 -6 under optimal conditions [29]. This error rate is ~2-4 fold higher when only Taq is used [29,30]. Without confirmation in independent PCRs these errors can not be excluded and it is very likely that Bet v 1 sequences with base substitution errors have been deposited into Gen-Bank/EMBL/DDJB. Therefore, not all published isoforms will be clinically relevant. In addition, the clinical relevance of the isoforms will be influenced by their expression levels. If multiple allergen isoforms exist, there is a risk for selecting a recombinant isoform with low IgEreactivity as a diagnostic tool, or even selecting isoforms that resulted from PCR artefacts. We therefore strongly suggest the use of primers that are highly specific for one gene [18] or, preferentially, the application of multiple independent PCRs to facilitate sequence validation in future sequencing work on allergens.

PR-10 and Bet v 1 genes
The PR-10 gene family of Betula pendula was shown to encompass at least thirteen genes. This is a conservative estimate since we used strict inclusion criteria. The distinction between genes is supported by the presence of a distinct intron. Each gene was identified as such in at least two birch cultivars. We attributed previously described GenBank sequences to these genes. Alleles from ten identified genes had previously been described [2,[10][11][12][13], while we identified three new genes. The genes are grouped into five subfamilies, based on sequence homologies in the ORF and intron. Differences in transcription coincide with the division between subfamilies. An organization of PR-10 genes into subfamilies was also reported for Malus domestica [18] and for Pinus monticola [5].
A striking feature of the PR-10 isoforms in B. pendula is their homogeneity, which may extend to other Fagales species, such as alder (Alnus glutinosa) and hazel (Corylus avellana). The intron has conserved 3' and 5' splicing sites and is always located at codon 62, as is e.g. reported for C. avellana and M. domestica [11]. High homogeneity may result from strong purifying selection or from concerted evolution. The presence of low K a /K s ratios among the isoforms suggests the occurrence of purifying selection. Evidence for concerted evolution is present in the overall gene tree of the PR-10 family [17]. Concerted evolution causes genes to evolve as a single unit with members exchanging genetic information through gene conversion and unequal crossing-over. Tandemly arranged genes may have high conversion rates [31], while this is a prerequisite for the occurrence of unequal crossing-over. Most PR-10 genes in apple map to two loci and are arranged in a duplicated cluster [18]. This organization may be a common feature for PR-10 genes. However, as pointed out by Nei and Rooney [32], the molecular mechanism of gene conversion is not well understood, and the model of birth-and-death evolution of genes may also explain the evolution of the PR-10 gene family. The presence of pseudogenes, although at a low frequency, is therefore of particular interest. An analysis of a species at an intermediate evolutionary distance, such as C. avellana, would be useful to clarify which mechanisms determine the evolution of PR-10 genes and to investigate a possible recent radiation of PR-10 genes.
The birch genome contains at least seven pollen-expressed genes that encode a mixture of Bet v 1 isoforms with varying IgE-reactivity. Swoboda et al. [13] found that pollen mixtures from multiple trees contain multiple Bet v 1 isoforms. We identified 14 different Bet v 1 isoforms in the three cultivars, nine of which are entirely new. The IgEreactivity has been tested for several isoforms using recombinant proteins [14]. The allergenicity of the new isoforms can be examined in the future by expressing the isoforms as recombinant proteins and use these in a SPT or T-cell activation tests. Ferreira et al. [14] divided the Bet v 1 isoforms into three groups according to their IgE-reactivity and confirmed the division between high, moderate, or low IgE-reactivity in a Skin Prick Test (SPT). One high and one low IgE-reactive isoform from their analysis were 100% identical to isoforms that we have obtained from a single tree, while two intermediate IgE-reactive isoforms differed only by one amino acid from the alleles of two other identified genes. This strongly suggests that isoforms of different IgE-reactivity are in fact alleles encoded by different genes. Thus, each examined cultivar has the genetic background to express a mixture of isoforms with a high, moderate, and low IgE-reactivity. We plan to confirm this at the protein level in the near future.

Nomenclature
The nomenclature of Bet v 1 raises several issues. The first isoforms were termed Bet v 1a to Bet v 1n by Swoboda et al. [13], but these have subsequently been renamed and incorporated into the official database of the allergen nomenclature committee [19]. This database currently lists 37 allergen isoforms that have been termed Bet v 1.0101 to 1.3101. However, as can be seen from the list of known isoforms in Table 2, not all published isoforms have been added to this list, even though several of these isoforms were obtained from pollen mRNA. On the other hand, isoforms which have been recovered only from mRNA from roots and leaves are included as allergens (Bet v 1.1101 to Bet v 1.1301). Also, several of the described isoforms are highly suspicious as we observed clear recombination signals. As a result, the list is a random series of alleles that belong to different genes and has no biological basis. Similar problems were described for the Mal d 1 and Mal d 3 allergens by Gao et al. [18,20]. To allow differentiation between alleles from different genes for the Bet v 1 alleles we have adopted their system.

Allergenicity of birch trees
The exact isoform composition of the three cultivars differed due to allelic variations. This may result in differ-ences in allergenicity between cultivars. However, an exact copy of the most allergenic allele, Bet v 1a, was present in all three cultivars. Quantity measurements on expression indicate that Bet v 1a is the dominant isoform in pollen [13]. Given that the diversity of Bet v 1 isoforms within a single tree is larger than the diversity between the examined B. pendula cultivars, a characterization of Bet v 1-type isoforms should be done in other Betula species as well.
Investigations towards sensitization and immunotherapy should anticipate that patients are exposed to a mixture of Bet v 1 isoforms of different IgE-reactivity, even if pollen originates from a single birch tree. Differences in allergenicity between birch trees may also result from variation in allergen content. Variation in allergen content has e.g. been shown for apple [33] and olive pollen [34].
Many Betula species and Betula hybrids have higher ploidy levels (tetraploid, hexaploid, and even octaploid) than B. pendula and are likely to contain increased numbers of allergen isoforms, as we found in the tetraploid cultivar 'Long Trunk'. For example, the tetraploid B. pubescens is dispersed throughout Europe, while other exotic birch species are increasingly introduced as cultivars, contributing to a larger allergen pool. However, interspecific Betula hybrids, which have a higher ploidy level, may also pose a potential source of hypoallergenic trees. Especially hybrids between less related species often display a reduced fertility, which may result in a reduced or aborted pollen production.
If breeding for hypoallergenic trees is implemented, approaches should take into account that the Bet v 1/PR-10 genes may be clustered and differences in allergenicity between clusters of genes may be used to guide breeding efforts. Clustering may be determined by mapping studies [18] and by screening and partial sequencing of a genomic library. In the genome sequence of Populus [35] we indeed can observe an organization of PR-10 genes into clusters. Other approaches to generate hypoallergenic trees may include the search for sterile or low pollen producing trees, or the application of RNA interference technology, which proved to be successful to silence the Mal d 1 allergens in apple [36] without phenotypic abnormalities. One issue that needs the be resolved is that although birch PR-10/Bet v 1 proteins have been suggested to act as plant steroid carriers [37], the exact conditions under which transcription is induced are still unknown. Given the abundance of Bet v 1 in birch pollen, silencing may affect pollen viability. However, for breeding of hypoallergenic cultivars that are propagated vegetatively this would be considered a welcome side-effect.

Conclusion
We have shown that the PR-10 gene family of Betula pendula encompasses at least thirteen genes that can be grouped into five distinct subfamilies. Differences in expression coincide with the division between subfamilies. Genes from two subfamilies were shown to be transcribed in pollen, based on a high (99.8-100%) homology with cDNA sequences available in GenBank/ EMBL/DDJB. The seven genes that belong to these subfamilies encode a mixture of Bet v 1 isoforms of varying IgE-reactivity. The present characterization of the PR-10 family in birch provides a framework for the screening of Bet v 1 genes among other Betula species or B. pendula cultivars and for potential breeding approaches for birch trees with a reduced allergenicity.

Plant material
The natural distribution range of B. pendula Roth (syn. B. verrucosa) covers almost the whole of Europe. Several B. pendula cultivars have been bred, including interspecific hybrids (also referred to as interspecies hybrids) between B. pendula and other Betula species. We collected young leaves from three B. pendula cultivars in the collection of PPO Boskoop (WUR, the Netherlands), namely 'Long Trunk', 'Schneverdinger Goldbirke', and 'Tristis'. Fresh leaf samples were sent to Plant Cytometry Services (Schijndel, The Netherlands) and screened by flow cytometry to estimate the ploidy level. Diploid (B. pendula) and tetraploid (B. pubescens) controls were included. The cultivars 'Schneverdinger Goldbirke' and 'Tristis' were diploid, while the cultivar 'Long Trunk' was tetraploid. The latter is likely to be an interspecific hybrid between B. pendula and a second, unknown, Betula species. The alleles recovered from 'Long Trunk' were either assigned to B. pendula based on sequence and intron similarity or were considered specific for this cultivar. The specific 'Long Trunk' alleles were analyzed separately. DNA was extracted using the DNeasy Plant Mini kit (Qiagen) according to the manufacturer's instructions. and Taq polymerase (Goldstar)(5U/μl), and 20-80 ng template DNA. H 2 O was added to obtain a total volume of 20 μl. PCR mixtures were subjected to the following conditions: initial heating step at 95°C for 15 minutes, denaturation at 94°C for 30 s, annealing at 50°C for 45 s, and extension at 72°C for 60s. A final extension step of 10 min at 72°C was added after 22-30 cycles. Given the observation of recombination among the recovered sequences, we subsequently varied the number of PCR cycles at intervals of 2 cycles. The minimum number of cycles was established by visual inspection of the amplification products on agarose gel at 22 for the BpI primer pair and at 24 for the BpII primer pair. Originally, 30 PCR cycles had been used. We repeated the experiment at 22-24 cycles to ensure that amplification was in its linear phase.

PCR, cloning, and sequencing
To obtain the A-tailing that facilitates the ligation procedure, five additional cycles were run on 1-4 μl of PCR product with Taq polymerase (Goldstar). PCR conditions were similar as described above. PCR products were purified with the MinElute PCR Purification Kit (Qiagen). Purified samples were ligated into the pGEM-T easy Vector (Promega) and established in Escherichia coli XL1 Blue competent cells (Stratagene) according to the manufacturer's instructions. White colonies were picked from agar plates and grown overnight at 37°C in freeze medium. PCR-based screening was performed with vector-specific M13 primers. PCR products were purified with Sephadex G-50 (Millipore). The DYEnamic™ ET Terminator Cycle Sequencing Kit (Amersham) was used for the sequence reaction. Sequence products were analyzed on a 96-capillary system (ABI 3730xl).
Genomic B. pendula sequences have been submitted to GenBank as DQ296566-DQ296598 and DQ325525-DQ325535, and the specific 'Long Trunk' sequences as DQ296599-DQ296610.