Transcriptome sequences are a valuable resource, especially for species without a completely sequenced genome, such as peanut. They accelerate gene discovery, provide an asset for molecular markers development and allow expression analysis and evolutionary genome dynamics studies. In the present study, Next Generation Sequencing (NGS) enabled the generation of large numbers of sequence reads in a rapid and cost-effective manner, and enabled the development of genomic resources for the exploitation of the stress resistances harbored by two wild diploid relatives of peanut.
Some recent studies have indicated that short reads from 454 GS 20 and GS FLX can effectively be used to characterize gene regions in a number of less studied species, including some tropical legumes [26, 28–30, 45, 46]. In the present study, the average read length for both species was of 280 bp, which allowed estimated genome coverage of up to 163 Mbp of high quality reads for both diploid Arachis genomes studied in a single sequencing run. In comparison with other studies in legumes, a relatively small number of singletons were produced (8,922 for A. duranensis and 10,189 for A. stenosperma), furthermore the average length and number of reads per contig assembled was comparatively high (475.5 bp and 26 reads/contig) (Table 2) [27, 29, 47]. This may in part be due to very stringent quality and assembly parameters used, which also may partly explain that only 5% of the contigs produced in this study (1,012) failed to show significant functional annotation.
The lack of a complete sequenced and annotated reference genome makes it very difficult to estimate the genome coverage obtained in this study for both species analyzed. However, if we take as comparison other diploid legume genomes which have already been completely sequenced and assume the same number of genes, as for Medicago truncatula (38,835) and Lotus japonicus (42,395), we could suggest that up to 54% of the A. duranensis (21,714) and 44% of A. stenosperma (17,912) unigenes were covered in our work. However, it is also important to be aware that more than one contig or singleton can be originated from a single gene due to either non-overlapping sequence reads or high levels of sequence error in a single read .
Transcription factors (TFs) are of special interest due to their role in controlling plant developmental processes and responses to environmental conditions, including functions of key importance to agronomic performance . They have an essential role in the signal transduction networks that leads from the perception of stress signals to the expression of stress-responsive genes, and, as opposed to most structural genes, tend to control multiple pathway steps within a transcriptional cascade . Therefore, TFs are expected to be excellent candidates for modifying complex traits in crop plants, with TF-based technologies likely to be a prominent part of the next generation of successful biotechnology crops [48, 49]. In the present study, 1% of the transcripts were identified as transcription factors (TFs). Their overall distribution among the various known TF protein families was compatible with previous studies in other legumes such as soybean, chickpea, pigeonpea and cultivated peanut [4, 28, 30, 50, 51], with bZIP, MYB, NAC, bHLH, AP2-EREBP and WRKY highly represented in both A. duranensis and A. stenosperma transcripts.
The most expressed TF family was the basic leucine zipper (bZIP)-type TF protein, which comprise regulators of many central developmental and physiological processes and abiotic and biotic stress responses . Among other reports, this TF has been associated with water deficit-response in the relatively drought resistant tepary bean (Phaseolus acutifolius) and to abscisic acid (ABA)-regulated gene expression required for the dehydration-response in Arabidopsis. Likewise, this TF family was the most expressed in A. duranensis plants subjected to gradual water limited stress (18%), suggesting a role of this family in this relatively drought tolerant species. The bZIP TF family was also the most expressed TF in A. stenosperma leaves subjected to C. personatum (18%), and has already been described as involved in defense response to other host-fungi interactions, such as to the stripe rust via the ethylene/methyl jasmonate -dependent signal transduction pathways in wheat , and to regulate the expression of some stress-responsive genes such as the PR-1 and Glutathione S-Transferase in Arabidopsis.
The second most highly expressed TF family in drought imposed A. duranensis plants (12%) and fungi infected A. stenosperma leaves (14%) was the MYB family, which has been described to act through the ABA signaling cascade to regulate stomatal movement and therefore water loss regulation, and disease resistance in Arabidopsis and rice [57, 58]. Likewise, the plant specific NAC transcription family was showed to be highly expressed in A. duranensis (10%) and to a lesser extent in A. stenosperma (2%). NAC proteins function has been previously described in potato and Brassica napus under fungal infection [59, 60] and to significantly increase drought tolerance in soybean and chickpea [61, 62].
Dehydration-responsive element binding (DREB) proteins a subgroup of the AP2/EREBP, have an important role in plant response and adaptation to abiotic stresses . In this study, they constituted 7% of the TFs in A. duranensis plants subjected to water limited stress. A previous study with transgenic peanut plants over expressing DREB1A showed that the changes in the antioxidative machinery in these transgenic plants under water-limiting conditions played no causative role in improved transpiration efficiency [5, 64, 65]. Nonetheless, different DREB homologues have shown to play different roles in increasing tolerance to cold, salt and drought in different plant species, and have been extensively studied in Arabidopsis, rice and soybean being correlated to increased dehydration tolerance in these species [66–69]. An additional consideration is that recent studies indicate that function of central regulators as NAC, WRKY, and zinc finger proteins may be modulated by mechanisms such as small RNA (miRNA)-mediated posttranscriptional silencing, reactive oxygen species signaling and epigenetic processes such as DNA methylation and posttranslational modifications of histones . This suggests that a more comprehensive elucidation of the role and dynamics of drought and defense responsive TFs in plants may be required.
Retroelements, particularly the long terminal repeat (LTR) retrotransposons, constitute the major part of repetitive DNA of plant genomes. Some of these elements seem to be constitutively expressed and others are silent and can be activated upon certain stress signals such as tissue culture, ionizing irradiation, wounding or poliploidization. As a matter of fact, data from the whole genome sequencing of several eukaryotes strongly suggests that, far from being circumstantial, the activity of transposable elements plays an extremely important role in the plasticity and regulation of host gene functions . The mechanisms of how stress induces the activity of an element are not completely clarified, but it has been shown that most expression features of Tnt1, a Solanaceae retrotransposon, can be deduced from the structure of its regulatory regions, located in the LTR that contains several cis-acting elements, which are similar to well characterized motifs involved in activation of defense genes, whilst the Tnt1A G-box-like sequence is related to the typical ABA-responsive (ABRE) sequences and is identical to the MYC recognition sequence present in many drought-inducible genes [71, 72].
In the present study, many transcripts from both species were identified as having similarity to retroelements. Therefore, we studied in more detail FIDEL, the only fully characterized Ty3-gypsy retrotransposon described in allotetraploid peanut (A. hypogaea) and its putative diploid ancestors A. duranensis (A genome) and A. ipaënsis (B genome) . Using qRT-PCR analysis, we observed that FIDEL showed an increased expression ratio in both, A. duranensis roots subjected to gradual water limited stress and A. stenosperma leaves inoculated with fungus, when compared to non-challenged plants. In tobacco and other Solanaceae, drought stress and fungi infection have been described as triggering independent mechanisms of plant defense response and activation of transcription factors and retroelements [71, 73]. In our study, we observed that both biotic and abiotic stresses induced FIDEL or FIDEL-related sequences. However, if the induction of FIDEL represents an activation of some specific FIDEL sequences, FIDEL harboring regions or some more specific response is not known.
Plants, in response to pathogen effectors, have co-evolved specific cytoplasm resistance R protein receptors which recognize individual pathogen effector molecular signatures and activate a second line of defense known as effector-triggered immunity (ETI) , also previously known as gene-for-gene or race-specific resistance. In contrast to non-specific response (PAMP-triggered immunity-PTI), which will occur in all members of a particular plant species, ETI operates at the intra-specific level, with resistant genotypes possessing the necessary R gene allele . Conservation of motifs within R genes, such as those present within nucleotide-binding site leucine rich repeat domains, have facilitated their characterization in diverse plant taxa. Putative R genes or Resistance Gene Analogs (RGAs) are commonly clustered, as a result of duplication events occurring under diversifying selection. In Arachis, a previous investigation on RGAs content in a number of wild species  showed that from the 78 NBS sequences identified, most fall within legume-specific clades, some of which appear to have undergone extensive copy number expansions. In the present study, all five RGA sequences showed an increase on expression under C. personatum inoculation, when compared to the basal expression in the control samples. This was hardly unexpected, as proteins encoded by disease resistance (R) genes, are mostly constitutively expressed in resistant genotypes, mediating specific molecular recognition of pathogenic microorganisms and triggering signaling cascades that activate defense reactions [76, 77]. A broader characterization of the transcriptional response of a suite of defense genes following stimulation of these R-genes, (i.e. kinases, peroxidases, transcription factors, NPR1) , and the defense pathways that they trigger is being conducted via Illumina deep sequencing. This will allow a better understanding of their contribution to the overall resistance response of A. stenosperma to C. personatum.
The transcriptome databank produced in this study enabled the development of 2325 SSR primer pairs of which 214 showed to be polymorphic between the two species. These new markers will enrich the current reference AA diploid Arachis map  and other Arachis tetraploid maps under construction. In addition, these EST-SSRs markers exhibit potential advantages when compared to SSRs located in non-transcribed regions due to generally more consistent efficiency of amplification, and enhanced cross-species transferability .
The development of new SSRs is of special interest in Arachis because these are still the markers of choice in this genus, due to the difficulties in the application of SNPs markers on the cultivated tetraploid species. Therefore, these new markers will contribute to enrich existing genetic maps, generate more informative genetic and genomic tools and enable the identification of orthologous genes through genome synteny analysis .