Skip to main content

Table 1 Segment characteristics for Arabidopsis thaliana

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Data Set # Sequences/
Min. Seq. Length Max. Seq. Length Mean Seq. Length Std. Deviation Total Nucleotides Genome Percentage
3' UTRs 19,771 8 3,118 228.134 152.106 4,510,410 3.78
5' UTRs 18,585 8 3,214 140.088 130.288 2,603,531 2.18
Introns 118,319 8 10,234 164.446 178.484 19,457,029 16.32
Core Promoters 27,023 100 100 100 0 2,702,300 2.27
Proximal Promoters 27,023 900 900 900 0 24,320,700 20.41
Distal Promoters 27,025 1,371 2,000 1,999.96 5.01105 54,048,839 45.35
Genome-wide 5 18,585,000 30,432,600 23,837,300 4,432,780 119,186,497 100.00
  1. Overview of the characteristics properties for non-coding segments and the entire genome for Arabidopsis thaliana. The number of sequence refers to the respective number of unique sequences in the specific segment. In case of the entire genome the sequences are the complete chromosomes. Min. Seq. Length refers to the length of the shortest sequence in the set, while Max. Seq. Length refers to the length of the longest sequence in the set. Mean Seq. Length provides the average length of the sequences in the set, while Std. Deviation describes the deviation from said mean. Finally Total Nucleotides describes the total number of nucleotides contained within the sequences of the set and Genome Percentage elaborates on the relationship between the nucleotide count of the set versus the entire genome.
  2. Some sequences in the segments are shorter than 8 nucleotides. Since these sequences cannot harbour any putative regulatory elements in the context of this study, the sequences are removed from the table. For the 3'UTRs this results in a total of 179 nt being omitted, for 5'UTRs 1207 nt and for introns 26 nt. They are however included in the calculation of the background for the different segments since they contribute to the overall nucleotide distribution.