The Fabaceae (Leguminosae) is the third largest angiosperm family, containing c. 18,000 species attributed to 650 genera [1–3]. Legumes provide major benefits to cropping systems and the environment, due to the ability to perform symbiotic nitrogen fixation. In comparison to cereals, for which a broad range of genetic and genomic resources are available, genomic databases for legumes are generally still underdeveloped. However, recent advances in sequencing and genotyping technologies offer the opportunity to rapidly ameliorate the status of given species at relatively low cost . Major efforts are currently being directed towards the development of species-specific genomic tools and datasets. As an example, the whole genome sequence of soybean, a warm-season grain legume, has recently been determined http://www.phytozome.net/soybean.
Cool-season food legumes within the Hologalegina clade of the Fabaceae sub-family Papilionoideae, which includes lentil, chickpea, field pea and faba bean (pulses), are important food and fodder crops, especially in developing countries such as those of the Indian sub-continent . These species are important components of farming systems across Western Asia, the Middle East, North Africa, the Indian sub-continent, North America and Australia. In Australia, pulses are sown over c. 2 million hectares and produce c. 2.5 million tonnes of grain with a commodity value of over AU$ 675 million . Despite close phylogenetic relationships, pulse species vary considerably in aspects of biology such as genome size, fundamental chromosome number, ploidy level, and degree of reproductive self-compatibility. The genome size of chickpea is relatively small (c. 700 Mb), but pulses of the Vicieae tribe (lentil, pea and faba bean) exhibit much larger genome sizes (in the range from 4-13 Gb). Recently, generation of large-scale lentil transcriptome data by our group has substantially increased the volume of publicly available genomic data for this species . Similar strategies have been pursued for field pea and faba bean in the current study.
Field pea, which is the third most globally important grain legume crop (at 5.5 million hectares per year) after soybean and common bean (Phaseolus vulgaris L.), is a self-pollinating diploid (2n = 2x = 14) species with a genome size of c. 5 Gbp . Various studies have been performed to determine the genetic basis of multiple phenotypic traits in field pea [9–11] and to quantify diversity between different pea cultivars [12–16]. Recently, a comprehensive transcriptome analysis of field pea has been performed using second-generation sequencing technologies  that will contribute significantly to the enrichment of genomics resources for field pea. In contrast, faba bean has not been widely adopted on a global basis. In terms of cultivation area, this species ranks fourth among the cool-season food legumes (at 2.6 million hectares per year) after field pea, chickpea and lentil http://faostat.fao.org. Faba bean has been traditionally cultivated in the Mediterranean basin, the Nile valley, Ethiopia, Central and East Asia, Latin America, Northern Europe, North America and Australia . Faba bean is a diploid taxon (2n = 2x = 12), and exhibits facultative cross-pollination at frequencies ranging from 4-84%. The nuclear genome size of faba bean is one of the largest yet described among crop legumes, at c. 13 Gb. Formal genetic analysis of faba bean, such as through genetic linkage mapping and identification of quantitative trait loci (QTLs), has so far been hindered by these aspects of biology .
Conventional breeding methods based on phenotypic assessment are currently in use for breeding line selection in field pea and faba bean. Such methods are logistically demanding and time-consuming, especially for traits that require specific biotic or abiotic challenges, such as resistance to individual diseases. In addition to this, when breeding for types eaten as immature seed, quality testing adds considerable complexity to the relevant programs. There is consequently a major requirement for species-specific molecular genetic markers and derived linkage maps for field pea and faba bean, to enable germplasm advancement through genomics-assisted selection.
Current publicly available genetic and genomic tools for field pea and faba bean are limited in extent [20–23], comprising 18,552 and 5,253 ESTs, respectively that are available in Genbank. In addition to this, a recently sequenced Pisum sativum transcriptome generated a total of 81,449 unigenes that are also available for download as a fully annotated fasta format . Second-generation DNA sequencing systems such as the Roche 454 massively-parallel pyrosequencing platform are capable of rapidly producing species-specific genomic resources to address these short-comings. This system can generate 4-6 × 108 bp from each run, with individual read lengths of 400-500 bp , and is suitable for de novo sequencing of small genomes , whole genome resequencing , SNP detection , and in particular, sequencing of transcriptomes .
ESTs obtained from the latter activity provide valuable resources for gene discovery, large-scale expression analysis, improved genome annotation, elucidation of phylogenetic relationships and facilitation of breeding programs for both plants and animals through provision of SSR and single nucleotide polymorphism (SNP) genetic markers . SSR loci have been widely used for improvement of a range of crop species . Only a limited number of SSRs are available in public domain for field pea and faba bean, creating an incentive for further discovery and validation. In comparison with genomic DNA-derived SSRs, those located in ESTs are functionally associated with genic regions, and support potential diagnostic genetic marker development [31–34].
This study describes the development, de novo assembly and gene annotation of a transcriptome dataset derived from cDNA samples obtained from several tissues at various stages of development of multiple field pea and faba bean genotypes. Clustering and annotation to generate a unigene set has permitted computational identification of SSR loci, and the design and evaluation of a set of EST-SSR marker-directed primer pairs.