A number of drought-responsive and salinity-responsive genes have been identified, cloned and characterized from an array of plant species and notably in model plant species such as Medicago , Arabidopsis , rice , soybean , Lotus  and poplar , etc. In contrast, in case of chickpea, where crop production is adversely affected by drought and salinity, not much information is available on candidate genes or molecular markers associated with tolerance/resistance to these stresses. This study attempted to develop a comprehensive transcriptomics resource of chickpea.
cDNA libraries, ESTs and unigenes
Plant roots are the primary sites for perception and injury during water stress, including salinity and drought. In many circumstances, it is the stress sensitivity of the root that limits the productivity of the entire plant [25, 26]. Plants are known to use more than one mechanism to resist unfavourable environmental conditions. For example, under drought conditions, plants may 'escape' the stress by undergoing rapid phenological development, completing their lifecycle before the onset of serious water deficit  or 'avoid' the stress by maintaining relatively high tissue water potential even at low soil-moisture content, balancing between water loss and turgor . Therefore, root tissues were targeted for generating the drought and salinity-responsive ESTs in the present study.
With an objective of compiling as many drought-responsive and salinity-responsive ESTs as possible, different kinds of drought stress treatments (Polyethylene glycol (PEG) induction, dehydration, slow drought stress (dry down) in greenhouse and slow drought stress in field conditions) were imposed on drought-responsive genotypes, while salinity-responsive genotypes were stressed using 80 mM NaCl. This study provided 20,162 ESTs, 4,558 drought-responsive unigenes (UG-I), 2,595 salinity-responsive unigenes (UG-II), and a total of 9,569 chickpea unigenes (UG-IV). This is the first report on the generation of such large number of ESTs based on Sanger sequencing in chickpea. In the past, ESTs have been generated at ICRISAT  and by other research groups that represented a total of 7,097 ESTs in the public domain at the time of analysis in March 2008 . Thus the present study contributes a 3 fold increase in ESTs.
Characterization of chickpea unigenes
Four sets of unigenes were characterized in terms of their similarity with ESTs from other legume species available in the public domain and to deduce a putative function and assign them to a particular GO class. All the four unigene sets showed highest similarity with Medicago (58.8%-81.9%) and least similarity with groundnut (27.3%-41.0%). Other legume species showed similarity with chickpea unigenes in the range of 44.3% - 79.6% as shown in Table 1. The similarity of chickpea unigenes with other legume species is in general agreement with legume phylogenetic tree displaying relative order of speciation [30, 31]. Medicago is the most closely related species in the phylogenetic tree which explains the larger number of significant hits with Medicago sequences. There were some exceptions however; chickpea unigenes showed higher similarity to soybean (65.8%) as compared to Lotus (53.3%) whereas the phylogenetic distance between chickpea and Lotus is less than it is with soybean. The smaller EST dataset available in Lotus (183,153) as compared to soybean (880,561) is probably responsible for this observation.
While analysing sequence similarity of compiled chickpea unigenes (UG-IV) with other legume species, 7,815 (81.6%) unigenes had significant similarity to ESTs of at least one analysed legume species. Conservation of 552 (5.7%) unigenes across the legume species was observed. Sequence comparison of UG-IV unigenes with other plant species i.e. rice, poplar and Arabidopsis showed 5,088 (53.1%) unigenes sharing significant similarity with ESTs of atleast one of the model plant species and 1,742 (18.2%) conserved across all three model plant species, while 283 (2.9%) of total chickpea unigenes shared significant similarity with ESTs of all the plant species analysed. This suggests that 283 (2.9%) chickpea unigenes can be considered conserved in plants; 552 (5.7%) unigenes can be considered conserved across legumes. It is also interesting to note that 53 (0.5%) chickpea unigenes did not show any homology with any of the legume EST datasets searched. This indicates that at least 53 (0.5%) new unigenes without legume homologs in the public domain have been contributed to through this study. About 38 (0.3%) chickpea unigenes did not show any similarity with sequences from any of the plant species EST datasets searched or UniProt database and may thus be considered novel.
In terms of BLASTX analysis, a putative function could be deduced for 2,965 (UG-III) of 6,404 unigenes and 2,071 could be functionally categorised based on ESTs generated in this study. Analysis of the entire chickpea unigene set (UG-IV) revealed that only 3,147 (32.8%) out of 9,569 unigenes could be functionally categorised. The functional annotation will be very useful in selecting unigenes for developing microarray or functional molecular markers [32, 33].
The functional categorization of each unigene dataset into GO categories revealed a similar percentage distribution of genes in the four categories: cellular process, metabolic process, binding and catalytic activity. These four functional categories described were also reported as major categories in Arabidopsis , rice , tomato  and also in barley .
Clustering analysis to identify patterns of gene expression
Clustering analysis identified patterns of gene expression which are unique to different libraries. The profiles of some of the interesting gene families and genes that could play an important role in stress response were investigated (Figure 5).
The majority of the genes in cluster I (containing 22 contigs) are those highly expressed in ICC 1882_Drought_GH library. Apart from very highly represented 'Isoflavone-7-O-methyltransferase 9' (7 IOMT-9) (UniProt ID: O22309, 93 transcripts), an isoflavanoid biosynthetic enzyme which plays a role in plant resistance to pathogens , many transcripts with putative annotations to nucleic acid binding, protein binding activity classes were co-expressed in this cluster. Two un-annotated contigs (Contig640 and Contig720) identified in this cluster were considerably expressed in ICC 1882_Drought_GH library with transcript numbers of 150 and 449, respectively. Considering the expression ratio difference of these uncharacterised proteins, further studies may reveal their possible role during stress conditions in plants.
The sub-cluster IIa contains genes specifically expressed in the ICC 4958_dehydration root library. Two contigs homologous to the biotic stress-responsive 'probable pleiotropic drug resistance' (PDR) protein (UniProt ID: Q7PC86) and 'phenylalanine ammonia-lyase' (PAL) (UniProt ID: P45732) were identified. Both PDR and PAL are involved in the phenylpropanoid and flavanoid/isoflavanoid pathways leading to phenylpropanoid biosynthesis in plants in response to various biotic and abiotic stresses [37, 38]. The relative transcript counts of PDR and PAL signify their associated role in stress signaling processes.
The gene coding for 'dead ringer protein homolog' (UniProt ID: Q8MQH7) was found to be highly induced in ICC 1882_PEG_Induction library of subcluster IIb. Functionally, these proteins protect other proteins from denaturation by heat . Their up-regulation in the drought tolerant genotype suggests a significant role in imparting tolerance against drought.
All 23 contigs in the sub-cluster IIIa were highly expressed in the ICCV2_Salinity library. UniProt classification of these contigs indicated that a large fraction of them (11) were involved in cellular processes and nine contigs homologous to different stress related proteins. 'ABA responsive' (ABR) related transcripts were observed higher in drought- and salinity-sensitive genotype specific libraries such as ICC 1882_PEG and JG 11_Salinity libraries. Higher accumulation of ABR transcript levels has been reported in stressed plants . Similarly, high accumulations of 'heat shock cognate 70 kDa protein' (HSP70s) (UniProt ID: P27322) was observed in stress-sensitive genotype related ICCV2_Salinity root library as compared to JG 11_Salinity and ICC 1882_Drought_Glasshouse libraries. This implies a significant role of 'HSPs' in protecting the plant cells during abiotic stresses. Interestingly, the co-expression of Contig773, Contig1406 and Contig1457 corresponding to 'NAD(P)H-dependent 6-deoxychalcone synthase' (UniProt ID; P26690), 'Isoflavone-7-O-methyltransferase 9' (UniProt ID; O22309) and 'Isoflavone reductase' (UniProt ID; Q00016), respectively were observed in the subcluster IIIa. All these genes were involved in the phytoalexin biosynthesis pathway and isoflavanoid phytoalexin has been reported to occur principally in legumes during defence response of plants .
The sub-cluster IIIb represents genes that are highly expressed in the ICC 1882_dehydration library, accounting for 7.6% of all contigs in the hierarchial cluster. Six out of eight genes co-expressed in this cluster have been putatively annotated and categorized to stress response related genes. Contigs coding for 'dehydrin' DHN3 (Contig91) (UniProt ID: P28461) and hydrophilic 'late embryogenesis abundant (LEA) protein 1' (UniProt ID: Q49816; Q49817) (Contig638 and 534) were co-expressed in this cluster. Involvement of LEA-1 genes has been reported during seed development at low water potentials . The presence of protective compound such as 'sugar transport protein 13' (UniProt ID: Q94AZ2) during water stress conditions  in this cluster suggests a protective reaction to osmotic stress in sensitive genotypes as compared with tolerant genotypes. Similarly, the occurrence of 'glycine-rich cell wall proteins' (GRPs) (UniProt ID: A3C5A7) involved in cell wall lignification during pathogen attack [13, 44], 'pathogenesis-related protein 1A/1B precursor' (UniProt ID: P32937) and 'chitinase-3-like protein 4 precursor' (UniProt ID: Q91Z98) in this cluster signifies their protective role during stress response. These observations suggest the possible involvement of the above mentioned genes in protection and repair of damaged cell walls caused under stress.
The sub-cluster IIId includes 23 (21.9%) contigs whose transcripts are highly expressed in both the ICC 4958_Drought_Field library and ICC 1882_Drought_Field library and also contains a majority of stress-responsive functionally important genes. Up-regulation of membrane spanning genes such as 'probable aquaporin PIP-type' (UniProt ID: P25794; Q9ATM4), which mediate regulation of root hydraulic conductivity in response to environmental stimuli  were observed also in ICCV2_Salinity library as well as their significant expression in the other two libraries mentioned above and is not unexpected since its involvement is well reported in plant drought stress response. It is noteworthy that the transcripts annotating to 'metallothionein- like proteins' (MTs) (UniProt ID: Q39458, Q39459, Q9SSK5) involved in heavy metal detoxification and accumulate in response to high metal concentration, nutrient deprivation and heat shock , identified in this cluster were highly expressed in ICC 4958_Drought_Field library and even more so in ICCV 2_Salinity library. Their high expression level in the ICCV 2_Salinity library may be correlated with their putative role in detoxification of salts accumulated in salinity stressed roots. Contigs similar to classical 'arabinogalactan proteins' (AGPs) (UniProt ID: Q9ZT16), abundant in the plant cell wall and plasma membrane  were identified to be highly expressed in the ICC 4958_Drought_Field library. 'Fasciclin like Arabinogalactans' (FLAs) is a class of AGPs known to be regulated both during developmental processes and stress responses. Different classes of FLAs have been associated with ABA and down-regulated by drought stress . Likewise, a high copy number of transcripts corresponding to the ABR gene (UniProt ID: Q06931) were highly expressed in ICC 1882_Drought_Field library (423 transcripts), ICC 4958_Drought_Field library (291 transcripts) and moderately expressed in ICCV 2_Salinity library (55 transcripts). The predicted expression pattern of AGPs and ABRs identified in this study is also consistent with experimental observations in Arabidopsis .
It is important to note that although a high number of transcripts annotated to genes known to be involved in stress responses have been identified in clusters IIb, IIc, IIIc and IV, their functional inter-relativity could not be explained. In addition, many uncharacterised and unknown proteins with significant expression levels have also been identified in these clusters. Further investigation of these genes is required to understand their importance in stress signaling and/or tolerance mechanisms in chickpea. Although these genes have been identified in response to abiotic stress, many of these genes are likely to be involved in conferring resistance to biotic stresses as well, since a crosstalk between signal transduction occurring during biotic and abiotic stress is well known phenomenon . In summary, these candidate genes will be very useful to integrate in genetic maps and link them with QTLs for abiotic/biotic stress tolerance as well as targeting them in genetic engineering or reverse genetics approaches.
Transcriptome resource for developing functional markers
The chickpea unigene dataset (9,569) (UG-IV) defined in the present study can be used to develop a variety of molecular markers as stated by Varshney and colleagues . In case the polymorphism detected by a particular type of molecular marker correlates with variation in coding region affecting gene function, the molecular marker could become a candidate marker for a trait of interest.
Mining of 9,569 unigenes (UG-IV) showed occurrence of 3,728 SSRs, however majority of these SSRs represent mononucleotide repeats (1,793). This is a common feature of database mining of ESTs for identification of SSRs [16, 49]. After excluding the monomeric SSRs, higher proportion of SSRs was dominated by dimeric SSRs (126) followed by trimeric SSRs (110). In general, studies dealing with mining of ESTs for SSRs reported higher abundance of trimeric SSRs [16, 50], however there are several studies  that show exceptions as well. Indeed the number and distribution of SSRs of a particular class also depends on the criteria and tool used for mining the ESTs .
In terms of converting identified SSRs into potential SSR markers for chickpea genetics and breeding, primer pairs could be designed for 177 SSRs (Additional file 10). This increases the repertoire of SSR markers available in chickpea . The EST-SSR markers developed in this study were compared with those developed earlier and available in public domain [e.g.[52–55]] so that only non-redundant set of SSR markers should be developed. As a result, for validation purpose, only 77 SSR primer pairs with preferred criteria were synthesized. Fifty of these newly developed SSR markers on 24 chickpea lines, had varied alleles (average 4.6 per marker) per locus with an average PIC value of 0.43 per marker. This suggests a moderate discriminatory power of this new set of SSR markers. Di-nucleotide SSRs (ICCeM0011, ICCeM0046, ICCeM0048, ICCeM0035, ICCeM0051, ICCeM0063 and ICCeM0033) with atleast 12 repeat units and compound motifs (ICCeM0013, ICCeM0054 and ICCeM0055) had relatively higher number of alleles (7-12) and high PIC values ranging from 0.73 to 0.86. Such correlation between number of alleles, maximum number of repeat units and PIC value has been observed in other earlier studies .
The present study also provided 36,086 SNPs in 2,047 unigenes that can be used for converting into SNP markers. SNPs are the most abundantly found co-dominant polymorphic sites in greater proportion both in intronic and exonic regions of the genomes, occurring with variable frequencies and becoming very popular in plant genetics and breeding due to their amenability for high throughput genotyping. EST mining has been a popular approach for large scale identification of SNPs . However, the error rates in EST sequencing may sometime lead to erroneous SNP calls. Therefore, to enhance more reliability of SNPs identified, deep multiple alignments (=5 reads per contig containing SNP) were considered. In case of chickpea, although a few SNP markers have been reported [58, 59], the present study, probably provides the first comprehensive set of SNPs for chickpea. Although high-throughput SNP genotyping platform such as GoldenGate assay of Illumina are available that allows genotyping of large number of SNPs (e.g. 1,536) in parallel , conversion of SNPs into CAPS assay is a cost effective method that can be used in low-tech laboratories . The present study provides 240 candidate genes where SNPs can be assayed by CAPS.