This study demonstrates that DDD combined with microarray data, is an effective method to identify and analyze genes specifically expressed in the developing seed of wheat. Becerra et al.  used EST virtual subtraction combined with microarray data analysis to discover Arabidopsis genes specifically expressed in immature seeds. Eujayl et al.  identified differentially expressed unigenes in developing wheat seeds of various species and different stages using only DDD analysis. In Eujayl's study, apart from seed storage genes, other 46 unigenes were identified as seed-specific, although, only 23 unigenes were described in their report. Among the 23 unigenes which were reported, 10 were included in our result of 201 unigenes. As for the remaining part, 3 unigenes had been retired from the database, 5 had no microarray data and the other 5 unigenes were shown not to be seed-specific by the microarray data.
The expression level of a gene is commonly estimated using two analysis approaches referred to as 'analog' and 'digital'. To identify seed-specific genes in wheat, both of these approaches were used in our study. Among the 407 unigenes preliminarily screened by DDD, there are 33 CG gene models encoding the wheat storage proteins such as gliadin, glutentin, triticin and avenin. There are also two other CG gene models encoding late embryogenesis abundant (LEA) proteins which are the most abundantly expressed proteins in the seeds. Since the accumulation of seed storage proteins and LEA proteins are both highly seed-specific processes , the coverage of these genes with known tissue specificity demonstrates the feasibility of DDD methods in wheat. However, because of the limitation of the quantity and diversity of ESTs in wheat, the results of single DDD screening were not entirely accurate. In our study, 55 unigenes were proved to be not seed-specific by microarray analysis while they were identified as seed-specific during DDD analysis. Similarly, there were also certain false positives rates in the microarray-alone analysis. While 95 genes were identified as seed specific in microarray analysis, the corresponding EST profiles suggest they were actually expressed in tissues other than seeds too. To make the results more reliable, apart from the DDD analysis, the combined analysis with microarray data was also necessary to screen seed-specific genes in wheat. Finally a total of 201 unigenes as an intersection of these two methods were identified for further study.
Cross-species comparisons with model species Oryza Sativa were used to test the specificity of the data. The results showed that 62 (63%) homologous genes were seed-specific in rice as indicated by the microarray data. Among the 62 genes above, 23 were detected in the list of 70 genes retrieved by the method of DDD + microarray. The results have three indications: first, a large number of genes with seed related functions may have diverged within monocots, because approximately 40% of the wheat seed-specific proteins surveyed in the study produced no significant BLASTp hits in the rice protein database. For instance, the gliadins produce no hits within rice proteins, consistent with the variation in the predominant storage protein type in cereals, which are gliadins in wheat and glutelins in rice [30–32]. Second, partial seed-specific genes among rice and wheat are functionally conserved, possibly similar in other species. These results could serve as reference for identifying seed-specific genes in other crops. Third, the fact that 63% of the identified homologs were also specifically expressed in the seeds of rice provides further validation of the methods used in the current study. Additionally, reverse analysis of sensitivity test were also done to assess the quality of the data. Genes that have been identified as seed-specific in rice were searched to find their counterparts in wheat and 43 (61%) homologous genes were identified. Among the 43 wheat homologs 27 were proven to be seed-specific and detected in the list of 201 unigenes retrieved in the first place. Further GO enrichment analysis showed that most of the GO terms of rice seed-specific genes were similar to those of wheat. Similarly, the specificity and sensitivity analysis were also done for the lists of seed-specific genes screened by DDD-alone and microarray-alone (Table 2). It is clear that compared to the single analysis of microarray or DDD, the intersection of these two methods is more reliable and does not drastically reduce coverage. The reliability analysis and similar function ontology further proved the validity of the method.
To further confirm the results of in-silico analysis, 6 unigenes strictly following the selection formula were randomly selected from CG gene models of the 201 unigenes for qRT-PCR analysis. Again, the results showed that all 6 of the selected genes were specifically expressed in developing seeds. Two unigenes, which did awkward of DDD selection, but have significant expression in the pool of seeds compared to the contrast pool, were found to be not seed-specific. All the above evidences indicated the selection methods used in this study are stringent and effective for screening for the seed-specific genes in wheat.
During the analysis, it is worthily noticed that not all the unigenes screened by DDD have corresponding probeset. There are three major reasons for this. Firstly, microarray data available for wheat are still limited and less openly accessible. Second, given the size and complexity of wheat genome, the wheat Affymetrix 61 K GeneChip® can only cover a limited number of genes on wheat genomes. Thirdly, due to the frequent update of the unigene database, some unigene clusters were retrieved, the ESTs in the clusters might be retracted or distributed to other new clusters (http://www.ncbi.nlm.nih.gov/UniGene/help.cgi?item=FAQ), and the microarray data couldn't catch up with the update of unigenes, so there will be lots of unigenes have no corresponding probesets. Because of the limitation, the number of seed-specific genes identified with the combined methods could be less than the actual numbers. For instance, 2 unigenes screened by DDD without corresponding probsets were rejected during the selection, were actually proven by qRT-PCR to be seed-specific (figure 5B). Despite these challenges, microarray data provides valuable information for the validation of the DDD screening results, especially for the genes with corresponding specific probsets.