Sesame (Sesamum indicum L.), a member of the Pedaliaceae, is a diploid (2n = 26) dicotyledon and one of the oldest oil seed crops, growing widely in tropical and subtropical areas [1, 2]. Sesame seeds are an important source of oil (44-58%), protein (18-25%), and carbohydrates (13.5%) , and are traditionally consumed directly. They are used as active ingredients in antiseptics, bactericides, viricides, disinfectants, moth repellants, and antitubercular agents because they contain natural antioxidants such as sesamin and sesamolin . Among the primary edible oils, sesame oil has the highest antioxidant content  and contains abundant fatty acids such as oleic acid (43%), linoleic acid (35%), palmitic acid (11%), and stearic acid (7%) . In addition, sesame oil is important in the food industry because of its distinct flavor. These characteristics have stimulated interest in the biochemical and physiological composition of sesame oil .
Previous studies on sesame have mainly focused on quantitative genetics , traditional genetic breeding , and genetic relationships and diversity among sesame germplasm collections [9, 10]. Although much effort has been devoted to cloning key genes and characterizing fatty acid elongation and unsaturated fatty acid biosynthesis in sesame [11–13], the molecular mechanisms behind fatty acid biosynthesis and metabolism remain unclear. Publicly available datasets are of limited use for future sesame research, such as elucidating the molecular mechanisms of specific traits and understanding the complexity of the transcriptome, gene expression regulation, and gene networks. Progress in novel gene discovery and molecular breeding in sesame has been limited by the lack of genomic information. For example, only 3,328 expressed sequence tag (EST) sequences in sesame have been deposited in the dbEST GenBank database (as at January 2011).
Molecular markers play an important role in many aspects of plant breeding, such as identification of the genes responsible for desirable traits. Molecular markers have been widely used to map important genes and assist with the breeding of oil crops. However, in sesame, only 10 genomic simple sequence repeat (SSR)  and 44 EST-SSR  markers have been developed. Genetic relationships and diversity among germplasm collections have been investigated mostly using AFLP, ISSR, and RAPD markers. In sesame, marker-assisted selection and molecular breeding lag behind other crops owing to a lack of effective molecular markers. Thus, a rapid and cost-effective approach to develop molecular markers for sesame is required. Compared with other types of molecular markers, SSRs have many advantages, such as simplicity, effectiveness, abundance, hypervariability, reproducibility, codominant inheritance, and extensive genomic coverage . Based on the original sequences used to identify simple repeats, SSRs can be divided into genomic SSRs and EST-SSRs. Traditional methods to isolate and identify genomic SSRs are costly, labor-intensive, and time-consuming [17, 18]. In addition, the interspecific transferability of genomic SSRs is limited because of either a disappearance of the repeat region or degeneration of the primer binding sites . Alternatively, EST-SSRs are derived from expressed sequences, which are more evolutionary conserved than noncoding sequences; therefore, EST-SSR markers have a relatively high transferability. With the increasing number of ESTs deposited in public databases, an expanding number of EST-SSRs have been developed, and the polymorphism and transferability of EST-SSRs have been evaluated in many plant species [20–30].
The transcriptome is the complete set and quantity of transcripts in a cell at a specific developmental stage or under a physiological condition. The transcriptome provides information on gene expression, gene regulation, and amino acid content of proteins. Therefore, transcriptome analysis is essential to interpret the functional elements of the genome and reveal the molecular constituents of cells and tissues. Transcriptome or EST sequencing is an efficient way to generate functional genomic-level data for non-model organisms. Large collections of EST sequences are invaluable for gene annotation and discovery [31, 32], comparative genomics , development of molecular markers [34, 35], and population genomics studies of genetic variation associated with adaptive traits . Recently, an increasing number of EST datasets have become available for model and non-model organisms, but relatively few ESTs are currently available for sesame.
Numerous technologies have been developed to analyze and quantify the transcriptome. Initially, a traditional sequencing method was used, but this approach is costly, time-consuming, and sensitive to cloning biases since it involves cDNA library construction, cloning, and labor-intensive Sanger sequencing. Because of the deep coverage and single base-pair resolution provided by next-generation sequencing instruments, RNA sequencing (RNA-seq) is an efficient method to analyze transcriptome data. Theoretically, any high-throughput sequencing technology can be used for RNA-seq, such as the Illumina Genome Analyzer, Applied Biosystems' SOLiD, and Roche 454 Life Sciences system. Because of the increased read length by 454 pyrosequencing compared to the other two platforms [37–39], the 454 system is usually adopted for non-model organisms to create a transcriptome database , and a short-read-based technology such as the Solexa platform has been used for resequencing . Recent algorithmic  and experimental (e.g., Illumina/Solexa mate-pair and short-read paired-ends libraries) advances are likely to increase the applicability of Illumina sequencing and de novo assembly, which has been successfully and increasingly used for model [40, 42–44] and non-model organisms [39, 45–47]. These technologies are efficient, inexpensive, and reliable for genome and transcriptome sequencing, and suitable for non-model organisms such as sesame.
In this study, we sampled the pooled transcriptomes of roots, leaves, shoot tips, flowers, and the developing seeds of sesame using Illumina paired-end sequencing technology to generate a large-scale EST database and develop a set of EST-SSRs. To our knowledge, this study is the first to characterize the complete transcriptome of sesame by analyzing large-scale transcript sequences using an Illumina paired-end sequencing strategy. These EST datasets will serve as a valuable resource for novel gene discovery and marker-assisted selective breeding in sesame.