Next-generation sequencing has reduced the existing gap between major crop genomic platforms and the limited resources that are currently available for orphan crops . Complete transcriptome sequencing has generated species specific molecular markers, in silico expression analyses, gene discovery, and phylogenetic relationships [43, 44].
In this research, we used 454 cDNA sequences to assemble transcriptomes of two tissues (L1 and L2) of yellow lupin. We recovered a large number of previously unknown and uncharacterized yellow lupin gene sequences (Table 2). The total number of sequences for the combined library was mostly additive from L1 and L2. The L1 library favored the inclusion of longer 3’UTR regions, and thus, reducing the amount of coding sequences needed to assemble longer combined contigs (L1L2). As a consequence, two or more sequences belonging to the same transcript may not be assembled together, causing an overestimation of expressed sequences. The larger amount of 3’UTR regions for L1 is also in agreement with the lower GC content, condition typically associated with untranslated regions [45, 46]. Undoubtedly, a number of expressed sequences are tissue specific and will not assemble into combined contigs. For instance, several genes related to seed dormancy and germination are not expressed in vegetative and floral tissues [47, 48]. The same specificity was observed in a number of tissues and plant species [49–51]. The assembly of L1L2 generated 55,309 isotigs of which 30,811 had similarity to putative proteins found in other plant species. Comparative studies carried out against L. japonicus, M. truncatula and G. max showed a total of 31,520 lupin sequences similar to at least one of the model legume databases and 22,219 were similar to all of them. Lotus and Medicago belong to the Galegoid subclade, which includes mostly temperate legume species . Glycine is a member of the Phaseoloid subclade which comprises mostly tropical species . Lupins belong to the Genistoid subclade, which is sister (and distant) to most of the described Papilionoid subclades; especially those containing most domesticated species .
Although micro-repeat motifs are frequent in plant genomes and their respective transcriptomes, the frequency of SSR discovery depends on the search criteria [42, 54–56]. We analyzed 55,309 lupin isotig sequences using MISA and identified 2,796 SSR motifs with an average frequency of one SSR per 17.75 kbp. Tri-nucleotide repeats were the motifs most frequently found in L. luteus expressed sequences. Similar results have been reported in numerous plant species [26, 28, 54, 55, 57]. The abundance of trimeric EST-SSRs has been attributed to the absence of frameshift mutations when there is length variation in these SSRs . Indeed, 1,435 EST-SSRs were discovered within coding regions of the gene. Among tri-nucleotide repeats, AT-rich motifs were the most predominant ones (74.5%), which have also been observed in soybean, Citrus and Arabidopsis [54, 57]. For di-nucleotide repeats, AT was the most frequently observed motif, contrasting with results from Arabidopsis, soybean, maize, rice, wheat and barley where AC/GT were the most frequent repeats [26, 28, 54, 55, 57]. The high proportion of untranslated sequences (specifically 3’UTR), mainly contributed from the L1, could explain the bias toward A/T-rich repeat sequences observed in yellow lupin. There were no CG repeats in the lupin sequences, similar to results obtained in barrel medic , rice, corn, soybean , wheat , Sorghum , Arabidopsis, apricot and peach .
We used GBrowse to visualize lupin ESTs aligned to the M. truncatula chromosomes (Figure 3). This approach potentially identifies paralogs sequences and allows color-coded alignment by BLAST significance . A total of 25,400 L. luteus contigs were localized and found to be distributed across the entire Medicago genome with chromosomes Mt1 and Mt3 having the highest number of gene matches. Each yellow lupin sequence was mapped to an average of 3.7 locations, which may correspond in part to rounds of genome duplications previously described for the Medicago genome . Understanding syntenic relationships among species is essential to exploit the available tools developed for comparative genomic analysis. Using this approach, we created a new method of developing molecular markers, markers that are based on conserved microsynteny (CMS) between orphan and model species. Genome comparisons among M. truncatula, G. max and L. japonicus have shown that, in general, most genes in Papilionoid legume species are likely to be found within a relatively long syntenic region of any other Papilioniod species . Positive amplification and sequencing of L. luteus intergenic regions, based on PCR primers located on M. truncatula adjacent genes, suggested the existence of microscale synteny between these legume species. Roughly 40% of the targeted intergenic L. luteus regions amplified, points out the usefulness of conserved legume chromosome blocks for genomic studies of orphan crops. Although some primer pairs failed to amplify, poor amplification could be a consequence of non-synteny, but also other technical limitations could also explain negative PCR results. For instance it is known that non-coding DNA regions are highly variable among species [63, 64], and negative PCR amplifications could easily due to excessively long L. luteus intergenic regions.
Few studies have reported the use of EST-SSRs in Lupinus species [19, 21, 22]. Most efforts have focused on genetic linkage mapping and in diversity studies in L. angustifolius, L. albus and L. luteus. To validate our L. luteus polymorphic markers we tested 50 EST-SSRs on a population of 64 genotypes of L. luteus. An analysis of genotypic diversity illustrated the existence of several clusters within L. luteus germplasm. The lack of a clear pattern following the geographical accession origin (country) could be explained by three reasons. 1) The number of accessions may not have been large enough to allow a clear pattern to emerge. 2) L. luteus is widely distributed across the Mediterranean region, mainly due to human introductions . This situation could have homogenized natural genetic distinctiveness, leaving mostly population subdivisions based on breeding histories. 3) Finally, it is possible some accessions could have been misclassified; and thus, obscuring an existing geographical clustering pattern.
We observed that a number of high yellow lupin EST-SSR amplified fragments in two other lupin species, L. hispanicus and L. mutabilis (Table 1). The high number of transferable markers between L. luteus and L. hispanicus confirmed their closer genetic relationship [5, 65] than L. luteus and L. mutabilis. The two closely related species have the same chromosome number (2n = 52) and are still interfertile, generating a natural hybrid called hispanicoluteus. Phylogenetic studies have placed new and old world lupins into two different clades [5, 65, 67]. Thus, most EST-SSRs amplified in L. mutabilis (2n = 48), the only cultivated new world lupin , should have high transferability rates to other lupin species, such as L. albus and L. angustifolius. The understanding of the genetic diversity among other close relative lupin species will facilitate the transfer of favorable variation into cultivated species. For instance, L. hispanicus has been suggested as a reservoir of favorable variation for a number of biotic and abiotic stresses currently affecting L. luteus[68, 69].