Schematic diagram of the assignment and annotation of SAGE tags. Each processing step was performed using a custom PERL script (Additional file 1). UniGenes are assigned annotations by BLASTX, with the UniGene sequences searched against the non-redundant (nr) protein database. Tags are preferentially assigned to UniGenes with annotations and in cases of multiple matches assigned to the UniGene with the highest cumulative frequency, to reduce redundancy within the data. Fuzzy matching tolerates up to 2 bp mismatch between the tag and the representative UniGene sequence.