Effect of motif length and sampling region on false positives data set. The data set of true positive TFBSs were ordered and grouped by their motif lengths. For each motif, 10 randomly matching hits were searched for in upstream promoter regions (both 1 kb and 5 kb upstream of TSS), and the resulting average nucleosome occupancy (Poccupied) of these false positives was calculated for each motif length. For comparison, the average nucleosome occupancy was also determined for the true TFBSs grouped per motif length. As can be seen, the reference sets have a higher average nucleosome occupancy for every motif length, indicating that the hits of longer motifs do not enrich the reference data set with true positive hits. Consequently all motif lengths ranging from 5 to 16 bp can be used to construct a reference false positives data set.