Skip to main content

Table 1 Segment characteristics for Arabidopsis thaliana

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Data Set

# Sequences/

Chromosomes

Min. Seq. Length

Max. Seq. Length

Mean Seq. Length

Std. Deviation

Total Nucleotides

Genome Percentage

3' UTRs

19,771

8

3,118

228.134

152.106

4,510,410

3.78

5' UTRs

18,585

8

3,214

140.088

130.288

2,603,531

2.18

Introns

118,319

8

10,234

164.446

178.484

19,457,029

16.32

Core Promoters

27,023

100

100

100

0

2,702,300

2.27

Proximal Promoters

27,023

900

900

900

0

24,320,700

20.41

Distal Promoters

27,025

1,371

2,000

1,999.96

5.01105

54,048,839

45.35

Genome-wide

5

18,585,000

30,432,600

23,837,300

4,432,780

119,186,497

100.00

  1. Overview of the characteristics properties for non-coding segments and the entire genome for Arabidopsis thaliana. The number of sequence refers to the respective number of unique sequences in the specific segment. In case of the entire genome the sequences are the complete chromosomes. Min. Seq. Length refers to the length of the shortest sequence in the set, while Max. Seq. Length refers to the length of the longest sequence in the set. Mean Seq. Length provides the average length of the sequences in the set, while Std. Deviation describes the deviation from said mean. Finally Total Nucleotides describes the total number of nucleotides contained within the sequences of the set and Genome Percentage elaborates on the relationship between the nucleotide count of the set versus the entire genome.
  2. Some sequences in the segments are shorter than 8 nucleotides. Since these sequences cannot harbour any putative regulatory elements in the context of this study, the sequences are removed from the table. For the 3'UTRs this results in a total of 179 nt being omitted, for 5'UTRs 1207 nt and for introns 26 nt. They are however included in the calculation of the background for the different segments since they contribute to the overall nucleotide distribution.