Data Set | # Sequences/ Chromosomes | Min. Seq. Length | Max. Seq. Length | Mean Seq. Length | Std. Deviation | Total Nucleotides | Genome Percentage |
---|
3' UTRs | 19,771 | 8 | 3,118 | 228.134 | 152.106 | 4,510,410 | 3.78 |
5' UTRs | 18,585 | 8 | 3,214 | 140.088 | 130.288 | 2,603,531 | 2.18 |
Introns | 118,319 | 8 | 10,234 | 164.446 | 178.484 | 19,457,029 | 16.32 |
Core Promoters | 27,023 | 100 | 100 | 100 | 0 | 2,702,300 | 2.27 |
Proximal Promoters | 27,023 | 900 | 900 | 900 | 0 | 24,320,700 | 20.41 |
Distal Promoters | 27,025 | 1,371 | 2,000 | 1,999.96 | 5.01105 | 54,048,839 | 45.35 |
Genome-wide | 5 | 18,585,000 | 30,432,600 | 23,837,300 | 4,432,780 | 119,186,497 | 100.00 |
- Overview of the characteristics properties for non-coding segments and the entire genome for Arabidopsis thaliana. The number of sequence refers to the respective number of unique sequences in the specific segment. In case of the entire genome the sequences are the complete chromosomes. Min. Seq. Length refers to the length of the shortest sequence in the set, while Max. Seq. Length refers to the length of the longest sequence in the set. Mean Seq. Length provides the average length of the sequences in the set, while Std. Deviation describes the deviation from said mean. Finally Total Nucleotides describes the total number of nucleotides contained within the sequences of the set and Genome Percentage elaborates on the relationship between the nucleotide count of the set versus the entire genome.
- Some sequences in the segments are shorter than 8 nucleotides. Since these sequences cannot harbour any putative regulatory elements in the context of this study, the sequences are removed from the table. For the 3'UTRs this results in a total of 179 nt being omitted, for 5'UTRs 1207 nt and for introns 26 nt. They are however included in the calculation of the background for the different segments since they contribute to the overall nucleotide distribution.