From: Short DNA sequence patterns accurately identify broadly active human enhancers

Short DNA sequence patterns accurately distinguish broadly active human enhancers from the genomic background and context-specific enhancers. Classifiers trained using (a) all possible 6-mers or (b) the density of TF motifs as features can identify the broadly active human enhancers. ROC curves were calculated using 10-fold cross-validation and averaging the ROC obtained by each round of validation. The area under each curve (AUC) is given in parentheses. Shaded areas are bounded by the maximum and minimum observed ROC. Precision-recall curves are given in Additional file 1: Figure S8. (c) The log2(Fold Enrichment) of all 6-mers in the 1961 broadly active enhancers vs. the corresponding negative sets. Enrichments were calculated for each of the 4096 6-mers. Box plots show median and 1st/3rd quartiles, while the black point and line indicate mean and standard deviation. The fold changes for the four DRMs are indicated on each distribution by GC, CA, GA, and TA

