Rarefaction analysis of gene representation in different libraries; in each library different numbers of randomly sampled reads were blasted against Arabidopsis peptides (TAIR9) and the number of identified AGIs tagged at least once, five, ten and 100 times was recorded. The resulting data was modeled with non-linear regression fitting y = (ax)/(b+x) (continuous line). (A) Reads were randomly sampled from the union of all 12 libraries. The total number of different AGIs identified is 17 104. (B) Reads were randomly sampled from either normalized leaf library one (LVN.1, blue) or the non-normalized leaf library (LVR.1, red). The total numbers of different AGIs used for annotation for the normalized and the non-normalized library are 13,298 and 8,515, respectively. Data from all libraries is summarized in table 4 and plots are presented in the Additional file 7.