In this study we have taken advantage of the wealth of fully annotated staphylococcal genomes to take a detailed look at STAR elements. To our knowledge this is the first in depth study of these interspersed repeats at the sequence level across multiple staphylococcal species, providing a unique insight into their evolution.
STAR elements are highly abundant in S. aureus and yet we have shown that strain variation in the STAR element nucleotide sequences strongly correlates with their evolutionary lineage, as derived by MLST. This is unexpected as intergenic regions such as the STAR loci, which consist of repetitive elements dispersed throughout the genome, would be expected to show a high level of mutation and hence evolve at a higher rate than the conserved functional MLST loci where mutations are observed at a very low rate . These findings suggest STAR elements are functional and may be under strong purifying selection.
STAR elements were sequenced from the gapR, hprK and orf
loci from multiple S. aureus strains. In the majority of loci where multiple STAR repeats were present, the spacer sequences were often identical or differed by 1–3 nucleotides resulting in tandem repeats of ~50 nucleotides. These repetitive sequences should be unstable and exhibit frequent alterations in repeat number due to slip-strand mispairing during DNA replication. This process is likely to drive rapid alterations in repeat number, but not sequence, at many of these loci, as found with some other bacterial tandem repeats [3, 20, 21]. Congruent with this theory, strains belonging to the same ST contain identical or highly conserved spacer sequences between the interspersed STAR motifs at a specific locus even when repeat numbers varied. This also suggests that localised expansion and contraction of the repeat region occurs even as the strains diverge from one another.
In contrast, the spacer sequences are distinct at each STAR locus, even within a particular genome. Due to the repetitive nature of STAR elements it has previously been suggested that homologous recombination between repeats occurs as a means of large scale genomic rearrangements , or could provide a simple means of propagating these repeats at different loci throughout the genome. As the spacers are distinct between unrelated strains and at different STAR loci within a strain, homologous recombination is unlikely to be occurring at a high frequency between STAR loci either intergenomically or intragenomically. Either of these processes would result in gene conversion and the emergence of a dominant spacer sequence variant across multiple loci, a phenomenon we did not identify in this study. From the evidence presented here we suggest that the process of varying repeat number within a locus is limited to duplication or deletion of motifs from within that locus during DNA replication or repair and is not due to recombination with elements present elsewhere in the genome. We also suggest that the mechanism for dispersal of the STAR elements to new positions throughout the S. aureus genome may not involve recombination as originally hypothesised.
The gapR STAR locus was the least structurally stable of the three loci studied. The loss of the elements in the Group 2 and 2b structure occurs at the same “deletion” site and the surrounding DNA is undisturbed compared to that of the Group 1 and 1b strains. This is similar to another class of interspersed bacterial repeats known as Enterobacterial repetitive intergenic consensus (ERIC) sequences, which have been identified across the eubacterial kingdom . The sequence surrounding an inserted ERIC remains unchanged, indicating a precise insertion or deletion event via a mechanism distinct from classic transposition mechanisms [23, 24]. It is unclear whether a similar conserved mechanism is involved in the total loss or gain of STAR loci or whether the deletion site is merely acting as a hotspot for STAR element translocation. The partial loss of elements seen in strains such as RF122 (Group 3) does not occur at this deletion site, and may represent a different mechanism of repeat propagation or an error in repeat translocation in an ancestral strain that has been maintained in subsequent generations. There is no evidence of the total loss or gain of the gapR STAR locus in the recent evolution of S. aureus strains, as both the Group 2 and Group 3 isolates fall into distinct evolutionarily lineages. This strongly implies that the deletion process is infrequent and that the loss or gain of the gapR STAR locus may have occurred in early ancestors of these lineages and been retained in subsequent isolates. Pourcel et al. observed a similar complex structure for the STAR elements in the SA0906 locus (locus 28 in this study) with restriction of specific structural variants to certain lineages . These findings provide further evidence of the conservation of each of the STAR loci within a strain and lineage.
Our observed correlation between evolutionary lineage and both the structure of the gapR locus and the spacer sequences of the gapR, hprK and orf
loci, suggests that STAR element loci retain lineage-specific phylogenetic information and may be utilised as major determinants of lineage in typing schemes. The genome wide mapping of STAR elements across the 15 S. aureus strains studied here identified 12 loci that were present in every genome sequence and a further 11 loci that were present in 85% of the genome sequences. The vast majority of these loci (20/23) contain more than one repeat and exhibit variable repeat numbers (data not shown), making them prime candidates for the development of future typing schemes. Some STAR loci have already been utilised in typing schemes for S. aureus, first using an RFLP typing method , and more recently as part of a greater multiple-locus variable-number tandem-repeat analysis (MLVA) scheme alongside other variable-number tandem repeats (VNTR’s) and staphylococcal interspersed repeat units (SIRU) [11, 13–15]. The recent extended MLVA scheme utilised six STAR element loci of which five were completely conserved in a collection of 240 strains , although only four are present in up to 85% of the strains studied here. Therefore our highly conserved loci should be examined for their potential value as markers of lineages.
We have found that the STAR elements are not restricted to specific genomic neighbourhoods across staphylococcal species. This would suggest that the elements are not simply decaying from some early Staphylococcus progenitor as this genus has diverged over time, but rather that each species has acquired STAR elements as independent events, which have then undergone proliferation to distinct locations in each genome. Furthermore, STAR elements are maintained at a much higher level in the S. aureus and S. lugdunensis genomes compared to other staphylococcal species. The higher prevalence of these elements in S. aureus and S. lugdunensis may be due to the presence of a dispersal mechanism (e.g. a transposase mechanism) that is absent in the other species studied here, the absence of a mechanism to prevent spread of repetitive elements in these two species or strong selection for the function of these elements.
The highly conserved nature of STAR elements within a CC suggests a functional role. Unlike eukaryotic genomes which can contain more than 50% repetitive DNA , prokaryotic genomes are streamlined as the propagation of non-functional “selfish” DNA is a burden to the rapidly dividing organisms and selected against [3, 25]. Other repeat elements in bacteria have functions in cell physiology, such as transcriptional control  and protection of the microbial genome against foreign DNA [6, 26, 27]. A functional role for STAR elements is supported by evidence showing that some STAR elements are present in the leader regions of mRNAs although the significance of this for gene expression has yet to be investigated further . Alternatively, these repetitive sequences may have a general function in chromosome structure or stability, as seen with some eukaryotic repeat elements , which has led to their maintenance and spread within staphylococcal genomes. The STAR repeats are found associated with loci encoding virulence factors, metal transporters and several essential metabolic enzymes. The significance of the STAR repeats in the intergenic regions of these particular loci requires further investigation.
Interestingly, both S. aureus and S. lugdunensis tend to be much more pathogenic in humans compared to other staphylococcal species  with S. lugdunensis N920143 having several homologues of S. aureus virulence and colonisation factors that are not found in other staphylococcal species . Our finding that STAR elements are present in higher levels in two of the more virulent staphylococcal species may indicate that the STAR elements play a role in pathogenesis. With the huge increase in the number of available genome sequences, the occurrence of STAR repeats in other bacterial species requires further investigation to confirm their existence and function outside of the staphylococcal genus.