Data processing scheme of tomato full-length cDNA sequences. (A) Scheme for data processing of tomato full-length cDNA sequences. Four separate full-length-enriched libraries, LEFL1, FC, LEFL2, and LEFL3, were constructed. From randomly chosen clones, we obtained high-quality 5'-end sequences, 30,679, 8046, 18,697, and 27,216 sequences from the LEFL1, FC, LEFL2, and LEFL3 libraries, resprectively. These high-quality 5'-end sequences were registered in the EST division of the DDBJ. These were combined with 238,157 public tomato ESTs and then clustered into 76,276 groups. Clusters containing FC or LEFL sequences as a member were selected. The FC or LEFL sequence with the longest 5'-extension was chosen as the representative of each cluster and sent for full-length sequencing. Full-length sequencing was finished for 13,227 cDNAs, which were registered in the high throughput cDNA (HTC) division of the DDBJ. From 13,227 HTCs, 12,106 non-redundant full-length cDNAs were chosen. The 12,106 full-length cDNA set was tested for whether it contained non-coding RNA-derived cDNAs, pathogen transcript-derived cDNAs, chimeric clones, and cDNAs containing retained introns. After excluding these sequences, a set of 11,597 non-redundant HTCs was checked for CDS. Finally, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated for subsequent sequence analyses. (B) Distribution of the number of 5'-end sequences derived from FC and LEFL cDNA libraries in each cluster.