Figure 1

K-mer composition of the MDR index. The fraction of collapsed discrete and all 20-mers in the set is shown as a function of the repeat level up to 500 copies. The curve for the collapsed discrete 20-mers converges to 1 rapidly, indicating that most 20-mers in the set are relatively infrequent in the genome. The curve that plots all available 20 mers converges more slowly and is a reflection of a small fraction of high frequency 20-mers in the set.