We report on a significant resource of over 130,000 ESTs derived from a range of Actinidia species (Table 1). We targeted tissues and developmental stages in order to sample genes involved in physiological and biochemical processes including fruit ripening, flavor development, control of color and the synthesis of chemicals with health-related attributes. For this reason, the two most widely cultivated species of kiwifruit, A. chinensis and A. deliciosa, are well represented with together over 100,000 ESTs (Table 1). In addition, fruit and bud libraries are also well represented, with over 38,000 and 50,000 ESTs, respectively. A. chinensis and A. deliciosa are so closely related, as is A. setosa (Li) C.F. Liang et A.R.Ferguson, that they are variously treated as being distinct species or as varieties of the one species. The other two main species studied, A. arguta and A. eriantha Benth., also have commercial potential but are more distantly related .
The genus Actinidia is unusual in how much inter-taxal and intra-taxal variation in ploidy and in the wild, there is a structured reticulate pattern of diploids, tetraploids, hexaploids, and octoploids in diminishing frequency, associated, in at least some taxa, with geographic separation of ploidy races. A. deliciosa is hexaploid, A. setosa is diploid, and there are diploid and tetraploid races of A. chinensis, the tetraploids apparently coming from a restricted part of the natural distribution of the species. Most evidence suggests that diploid A. chinensis was a progenitor of tetraploid A. chinensis and hexaploid A. deliciosa but it is not clear whether genomes from other species have contributed. The basic chromosome number (n = 29) is high and it seems increasingly likely that diploid A. chinensis is itself a rediploidized palaeopolyploid .
As is common in EST sequencing projects (e.g., [6, 8, 10], there is a high degree of redundancy in the ESTs, with clustering reducing the number of unique sequences from over 132,000 to 41,858 NRs (18,070 TCs, 23,788 singletons). We would expect this number of NRs to be an overestimate of the number of genes in Actinidia, especially given that the database contains sequences from multiple species of Actinidia. Using the same correction used in the apple EST paper , we expect an Actinidia genome to have around 27,000 genes.
On average 20% (± 2% standard error) of the sequences from each library with over 1000 ESTs were singletons suggesting a high degree of novelty in these libraries. On average 28% (± 4%) of sequences did not have a homolog in the various public databases based on BLAST searches with an E value > 1.0e-10. An average of 16% (± 3%) of ESTs were identified as 3' UTR candidates based on the presence of a poly(A) tail within 40 bp of the start (taking into account reverse sequences). These 3' sequences would not be expected to be identified by BLAST searches and so would affect the novelty of a library. Less than 12% of NRs did not have BLAST matches (E > 10) in the Arabidopsis proteome, Uniref, NCBI ref or SwissProt databases.
There was only a small degree of overlap in NRs between libraries. Libraries from different species and different tissues showed a 5 to 9% overlap in NRs, libraries from different species but the same tissue showed an 8 to 10% overlap and libraries from the same species but different tissue showed a 7 to 13% commonality in NRs. These comparisons were made over five large libraries with more than 9,000 EST members each and an average of 2.1 ESTs per NR. These results suggest that there were more NRs in common between libraries made from the same tissue or from the same species, but this tendency was not particularly marked.
Detecting SNPs using an automatically assembled EST database is a cost effective way to discover new DNA polymorphisms and develop novel markers, although it can be a challenging task, especially in polyploid Actinidia species. A significant proportion of the sequence variants predicted from overlapping ESTs within an NR will correspond to "real" SNPs, which means the sequence differences found are allelic variants of a given locus and not sequencing errors or differences between paralogs, homoeologs or orthologs. Homoeolog SNPs could be particularly common in the polyploid accessions of species such as A. deliciosa and A. arguta that make up a large proportion of this database, but are also possible in diploids as a result of conserved gene pairs of paleopolyploid origin. Allelic SNPs can be used directly and converted into molecular markers for genetic mapping, population genetics and linkage disequilibrium studies or for marker-assisted selection. A SNP marker for determining the sex of kiwifruit seedlings  has already been successfully utilized. Since the database contains sequence data from multiple species, and ~40% of TCs are made up of more than one species, several SNPs were detected in the Actinidia EST database corresponding to sequence between orthologous loci from different Actinidia species. Hence, they cannot fully be considered as allelic SNPs, but more as species-specific variations. However, as kiwifruit breeding programs often use controlled crosses between different species, the interspecific SNPs will segregate in the progeny and be useful as markers.
The incidence of SSRs in NRs was higher in Actinidia (30%) than in apple (20%), and the frequency of di-nucleotide and tri-nucleotide SSRs differed between these two species. This increase was evident in all of the sequence classes but greatest in AG and AC (double the incidence among apple NRs). Even though the Actinidia genome EST resource represents several species, while apple came mainly from one species, this would not explain these differences. Perhaps the longer period of domestiCity in apple, based on a narrow genetic basis compared to kiwifruit, may explain the difference. Alternatively it may reflect that a greater proportion of homoeologs have grouped into TCs in the polyploid kiwifruit data than in the apple dataset.
Overall the codon usage of the three Actinidia species shares many similarities with that of other dicotyledons represented in the codon usage database . Comparisons with Arabidopsis codon usage showed that A. deliciosa and A. eriantha differ markedly for 15 and 17 amino acids, respectively, whereas A. chinensis differs in its preference for eight particular amino acids. Further comparisons with apple, grape, pear, peach, loblolly pine, tomato, citrus, potato and tobacco showed that the codon preference of the Actinidia species is most similar to that of apple . A. deliciosa differs from apple only in its codon preference for aspartate, glycine, isoleucine and leucine. A. eriantha also differs from apple for these four amino acids and also serine. The codon preference for A. chinensis and apple also differ for only four amino acids, these being asparagine, glutamine, threonine and valine. CpG suppression is also evident in Actinidia species with an XCG/XCC ratio of between 0.68 and 0.71 for the three species evaluated. This modest level of suppression of the CpG di-nucleotides is similar to that of apple (0.64) and differs markedly from that of Arabidopsis which shows nearly no suppression (0.92) and from the high level found in grape (0.35). This may well reflect different levels of methylation in the coding sequences used by different species of plants.
Mapman was used to assign function to the Actinidia NRs and thus to their constitutive ESTs. Only 32% of the ESTs did not have an Arabidopsis homolog at E < 1.0e-10. In general, the functional distribution of NRs was very similar to the functional distribution of Arabidopsis proteins (Table 3) suggesting that the sampling of Actinidia ESTs well represented the major functional classes of plant genes. This is surprising given the biased selection of libraries with virtually no root ESTs sequenced. However, the high number of bud meristem libraries meant that genes expressed in metabolically active dividing tissue were sampled.
Fruit of the Actinidia genus show several characteristics that distinguish them from other fruit species. These include flesh color (green is the most common, but yellow, orange and red fruit also occur in the genus), chemical composition including high vitamin C and quinic acid contents, and a novel aroma composition (Additional file 7), characterized by the abundant esters. In addition, kiwifruit has been identified as a fruit with a potential to cause allergeniCity among consumers, although this is a problem common to many other fruit. For this reason, we analyzed the Actinidia EST database to identify genes involved in these pathways and products. These analyses showed the depth and usefulness of the database for selecting candidate genes for most steps in the selected pathways. The other useful characteristic of the Actinidia EST database is the wide range of genetic and phenotypic diversity sampled across the Actinidia genus (Fig. 1) and the value of using this diversity to discover novel traits through functional genomics and through mapping and positional cloning approaches.