Skip to main content

Table 1 Summary statistics of 3326 motif pairs identified in 569 genomes

From: Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

 

Min

1st Qu.

Median

Mean

3rd Qu.

Max

d

5

19

43

39.9

53

89

Initial_n

6

41

62

80.6

100

644

Reduced_n

6

27

44

57.8

73

633

Difference

0

2

9

22.7

26

335

Cutoff

6

18

31

47.8

64

521

Reduced_n/Cutoff

1

1.06

1.16

1.33

1.42

4.93

Gene

0%

20%

81%

61%

96%

100%

Intergenic

0%

2%

12%

35%

73%

100%

Overlap

0%

0%

2%

4%

4%

100%

  1. The summary statistics is per motif. Meaning of abbreviations in the table: d, spacer length of a motif pair; Initial_n, number of copies of the motif pair before alignment; Reduced_n, number of copies after alignment and elimination of duplicate spacers; Difference, the difference between Initial_n and Reduced_n; Cutoff, significance cutoff (the lowest number of copies for the motif pair to be considered significant); Reduced_n/Cutoff, the ratio of Reduced_n and Cutoff, indicating the relative significance for the motif pair; Gene, the percentage of each motif pair occurrences found in genes; Intergenic, the percentage of each motif pair occurrences that are in intergenic regions; Overlap, the percentage of each motif pair occurrences that overlap with a gene start or end. These percentages are calculated as follows: For any given significant motif, we run a query with Pattern Locator, which gives the percentage of the motif occurrences that fall in gene, intergenic region or overlap with gene starts or ends. The quantiles in the table are for these percentages over all significant motifs