Complete genome sequence of Arthrobacter sp. PAMC25564 and its comparative genome analysis for elucidating the role of CAZymes in cold adaptation

Background The Arthrobacter group is a known set of bacteria from cold regions, the species of which are highly likely to play diverse roles at low temperatures. However, their survival mechanisms in cold regions such as Antarctica are not yet fully understood. In this study, we compared the genomes of 16 strains within the Arthrobacter group, including strain PAMC25564, to identify genomic features that help it to survive in the cold environment. Results Using 16 S rRNA sequence analysis, we found and identified a species of Arthrobacter isolated from cryoconite. We designated it as strain PAMC25564 and elucidated its complete genome sequence. The genome of PAMC25564 is composed of a circular chromosome of 4,170,970 bp with a GC content of 66.74 % and is predicted to include 3,829 genes of which 3,613 are protein coding, 147 are pseudogenes, 15 are rRNA coding, and 51 are tRNA coding. In addition, we provide insight into the redundancy of the genes using comparative genomics and suggest that PAMC25564 has glycogen and trehalose metabolism pathways (biosynthesis and degradation) associated with carbohydrate active enzyme (CAZymes). We also explain how the PAMC26654 produces energy in an extreme environment, wherein it utilizes polysaccharide or carbohydrate degradation as a source of energy. The genetic pattern analysis of CAZymes in cold-adapted bacteria can help to determine how they adapt and survive in such environments. Conclusions We have characterized the complete Arthrobacter sp. PAMC25564 genome and used comparative analysis to provide insight into the redundancy of its CAZymes for potential cold adaptation. This provides a foundation to understanding how the Arthrobacter strain produces energy in an extreme environment, which is by way of CAZymes, consistent with reports on the use of these specialized enzymes in cold environments. Knowledge of glycogen metabolism and cold adaptation mechanisms in Arthrobacter species may promote in-depth research and subsequent application in low-temperature biotechnology. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07734-8.


Conclusions:
We have characterized the complete Arthrobacter sp. PAMC25564 genome and used comparative analysis to provide insight into the redundancy of its CAZymes for potential cold adaptation. This provides a foundation to understanding how the Arthrobacter strain produces energy in an extreme environment, which is by way of CAZymes, consistent with reports on the use of these specialized enzymes in cold environments. Knowledge of glycogen metabolism and cold adaptation mechanisms in Arthrobacter species may promote in-depth research and subsequent application in low-temperature biotechnology.

Background
The Arthrobacter genus is a member of the family Micrococcaceae, which belongs to the phylum Actinobacteria [1,2]. Arthrobacter species are often isolated from soil, where they contribute to biochemical cycles and decontamination [3]. Additionally, these species have been isolated worldwide from a variety of environments, including sediments [4], human clinical specimens [5], water [6], glacier cryoconite [7], sewage [8], and glacier ice [9]. Cold environments represent about 75 % of the earth, and their study provides information about new microorganisms and their evolution in cold environments [10]. Psychrophilic microorganisms have colonized all permanently cold environments, from the deep sea to mountains and polar regions [11]. Coldadapted microorganisms utilize a wide range of metabolic strategies to grow in diverse environments. In general, the ability to adapt to low temperatures requires that microorganisms sense a decrease in temperature, which induces upregulation of cold-associated genes [12]. Arthrobacter is a gram-positive obligate aerobe that requires oxygen to grow in a variety of environments. Obligate aerobes grow through cellular respiration and use oxygen to metabolize substances like sugars, carbohydrates, or fat to obtain energy [13,14]. However, there is a still lack of research on how obligate aerobes acquire adequate energy in cold environments.
Carbohydrate active enzymes (CAZymes) have functions associated with biosynthesis, binding, and catabolism of carbohydrates. This classification system is based on amino acid sequence similarity, protein folds, and enzymatic mechanism. Thus, one can understand overall enzyme function and carbohydrate metabolism through CAZymes [15]. These enzymes are classified based on their catalytic activity: glycoside hydrolase (GH), carbohydrate esterase (CE), polysaccharide lyase (PL), glycosyltransferase (GT), and auxiliary activity (AA). In addition, CAZymes may have non-catalytic subunits like a carbohydrate-binding module (CBM). CAZymes are already well known in biotechnology, and their industrial applications are of interest to many researchers because they produce precursors for bio-based products such as food, paper, textiles, animal feed, and various chemicals, including biofuels [16,17].
Most bacteria can use glycogen as an energy storage compound, and the enzymes involved in its metabolism are well known. A recent study showed the physiological impact of glycogen metabolism on the survival of bacteria living in extreme environments [18]. Some microorganisms can adapt quickly to continuously changing environmental conditions by accumulating energy storage compounds to cope with transient starvation periods. These strategies use glycogen-like structures such as polysaccharides composed of α-D -glycosyl units connected by α-1,4 linkages and branched by α-1,6 glycosidic linkages. Such biopolymers differ in their chain length and branching occurrence. To be used as carbon and energy sources, their glucose units are released by specific enzymes [19].
Microorganisms have synergistic enzymes capable of decomposing plant cell walls to release glucose. This phenomenon can be used for energy supply to maintain microbial growth [20]. Starch is an excellent source of carbon and energy for microbes that produce proteins responsible for extracellular hydrolysis of starch, in-cell absorption of fructose, and further decomposition into glucose [21]. In addition, strains that metabolize glycogen show important physiological functions, including use of energy storage compounds for glycogen metabolism. These pathways act as carbon pools that regulate carbon fluxes [22], and partly, this ability is attributed to CAZymes. Using comparative genome analysis of bacteria isolated from cold environments and the genetic patterns of CAZymes within them, this study provides an understanding of how survival adaptation can be achieved in extremely low-temperature environments.

Results and discussion
Profile of the complete genome of Arthrobacter sp. PAMC25564 As shown in Table 1, the complete genome of Arthrobacter sp. PAMC25564 is composed of a circular chromosome of 4,170,970 bp with a 66.74 % GC content. 3,829 genes were predicted on the chromosome of which 3,613 protein-encoding genes were functionally assigned, whereas the remaining genes were predicted as hypothetical proteins. We annotated 147 pseudogenes, 15 rRNA genes, and 51 tRNA genes distributed through the genome. From the predicted genes, 3,449 (90.08 %) were classified into 20 functional Clusters of Orthologous Groups (COG) categories, whereas the remaining 380 (9.92 %) remained un-classified. The most numerous COG categories were S genes with unknown function (705 genes), transcription (category K, 298 genes), amino acid transport and metabolism (category E, 280 genes), carbohydrate transport and metabolism (category G, 276 genes), and energy production and conversion (category C, 259 genes) (Fig. 1). Many of these genes are related to amino acid transport, transcription, carbohydrate transport, and energy production/conversion, which suggests that this strain utilizes CAZymes for energy storage and carbohydrate metabolism. Most bacteria rely on cell respiration to catabolize carbohydrates to obtain the energy used during photosynthesis for converting carbon dioxide into carbohydrates. The energy is stored temporarily in the form of high-energy molecules such as ATP and used in several cell processes [23,24]. However,  Arthrobacter is already known as a genus of bacteria that is commonly found in cold environments. All species in this genus are gram-positive obligate aerobes and as such require oxygen to grow. These organisms use oxygen to metabolize substances like sugars, polysaccharides, or fats, to obtain energy as cellular respiration [14]. Therefore, we predicted that the PAMC25564 strain could also utilize carbohydrate degradation to obtain energy through these results.

S rRNA phylogenetic analysis and average nucleotide identity (ANI) values
The identification of A. sp. PAMC25564 was verified using 16 S rRNA sequence analysis (Fig. 2 [26]. In general, bacterial comparative genome analysis uses the ANI methods. As shown in Fig. 3, each ANI value ranged from 70.67 to 98.46 %. So we see that comparative genome results are much lower than the common ANI values of 92-94. The ANI analysis shows the average nucleotide identity of all bacterial orthologous genes that are shared between any two genomes and offers a robust resolution between bacterial strains of the same or closely related species (i.e., species showing 80-100 % ANI) [27]. However, ANI values do not represent genome evolution, because orthologous genes can vary widely between the genomes being compared. Nevertheless, ANI closely reflects the traditional microbiological concept of DNA-DNA hybridization relatedness for defining species, so many researchers use this method, since it takes into account the fluid nature of the bacterial gene pool and hence implicitly considers shared functions [28]. The results mean the PAMC25564 strain could either belong to the species from which Arthrobacter diverged, or could be a Pseudarthrobacter closely related new species. However, this study classified the strain and allocated a species through 16 S rRNA sequencing and ANI. While our classification is not conclusive, the PAMC25564 strain will probably be reclassified into the genus Pseudarthrobacter in a follow-up study.

CAZyme-encoding genes in Arthrobacter sp. PAMC25564
Among the 3,613 identified protein-encoding genes in PAMC25564, 108 were significantly annotated and classified into CAZyme groups (GH, GT, CE, AA, CBM, and PL) using dbCAN2. The results provided an insight into the carbohydrate utilization mechanisms of PAMC25564. Signal peptides gene retention predicted that 11 genes contained in CAZyme of strain PAMC26554 through Signal P tool. We found that proteins were distributed as follows: 33 GHs, 45 GTs, 23 CEs, 5 AAs, and 2 CBMs. However, no protein was assigned to the PL group. The GH gene annotations revealed that the PAMC25564 genome contains genes involved in glycogen and trehalose metabolism pathways such as β-glucosidase (GH1), glycogen debranching proteins (CBM48 and GH13_11), Table 2). Previous studies showed the complex interplay of glycogen metabolism in colony development of Streptomycetes (in Actinomycetes species was only reported), showing that spore germination is followed by an increase in glycogen metabolism [29]. The underlying genetic and physiological mechanisms of spore germination remain unknown, but some mechanisms associated with the accumulation of nutrients such as biomass and storage materials in the substrate mycelium during morphological phases of development have been reported [30]. However, not much research has been done yet about whether gram-positive obligate aerobes have glycogen metabolism mechanisms. Recently, Shigella sp. PAMC28760, a pathogen isolated from Antarctica, was also reported to be able to adapt and survive in cold environments through glycogen metabolism [31]. Also, Bacillus sp. TK-2 has been reported to possess cold evolution adaptability through CAZyme genes related to degradation of polysaccharides including cellulose and hemicellulose [32]. These complete genome analyses uncover genomic information and evolutionary insights regarding diverse strains and species from cold environments. However, characteristics of glycogen metabolism in prokaryotes remain less well-studied than those in eukaryotes, and the metabolism of microorganisms isolated from cold environments are not well understood [33]. This study predicts the role of CAZymes in cold adaptation, specifically as being those PAMC25564 genes involved in glycogen and trehalose metabolism.
Comparison of Arthrobacter sp. PAMC25564 genome characteristics with those from closely related species We compared CAZyme genes from Arthrobacter species to speculate about their bacterial lifestyles and identified relevant CAZymes for potential applications in biotechnology. Considering the accessibility of available genome data, the complete genomes of 26 strains were chosen for the comparative analysis of CAZymes: 19 genomes of Arthrobacter spp., one genome of A. crystallopoietes, three genomes of A. alpinus, and three genomes of Pseudarthrobacter spp. (Table 3). Our results showed that the number of total CAZymes in each genome ranged from a minimum of 56 (A. sp. YC-RL1) to a maximum of 166 (P. chlorophenolicus A6) (Fig. 4). We predicted that common CAZyme genes such as CE14, CE9, GH23, GH65, GT2, GT20, GT28, GT39, GT4, and GT51 would appear in each of the 26 genomes. However, when we compared strains isolated from cold environments, we found CAZyme genes were more common than what is found in the 26 comparison genomes. They include CE1, CE4, CE9, CE10, CE14, AA3, AA7, CBM48, GH1, GH3, GH13, GH15, GH23, GH25, GH38, GH65, GH76, GT2, GT4, GT20, GT28, GT39, and GT51. In particular, CAZyme members GH13, GH65, GH77, GT5, and GT20 (glycogen and trehalose-related genes) are involved in energy storage. This study focuses on those genes related to adaptations in metabolism that allow the species to withstand cold environments. These genes are involved in glycogen degradation and trehalose pathways and were found in strains PAMC25564, 24S4-2, FB24, Hiyo8, KBS0702, MN05-02, PGP41, QXT-31, U41, UKPF54-2, A6, Ar51, and sphe3. These Arthrobacter species isolated from extreme environments have a family of CAZymes and the related genes for proteins with a strong ability to store and release energy and this permits them to survive in such cold areas. We found that strain PAMC25564 had the largest number of CAZyme genes related to glycogen metabolism and the trehalose pathway. In general, CAZymes are a large group of proteins that are mainly responsible for the degradation and biosynthesis/modification of polysaccharides but not all the members of this group are secreted proteins. This study confirms small differences in the gene pattern of CAZymes between species (Additional file 1: Figure S1).

Bacterial glycogen metabolism in a cold environment
Glycogen is an energy source for plants, animals, and bacteria and is one of the most common carbohydrates. Glycogen consists of D-glucose residues joined by α (1→4) links; and it is a structural part of cellulose and dextran [43]. Glycogen is a polymer with approximately 95 % of α-1, 4 linkages, and 5 % of α-1, 6 branching linkages. In bacteria, glycogen metabolism includes five essential enzymes: ADP-glucose pyrophosphorylase (GlgC), glycogen synthase (GlgA), glycogen branching enzyme (GlgB), glycogen phosphorylase (GlgP), and glycogen debranching enzyme (GlgX) [44]. To adapt and survive in a cold environment, organisms need well-developed functional energy storage systems, one of which is glycogen synthesis. Bacteria have a passive energy saving strategy to adapt to cold environmental conditions such as nutrient deprivation, by using slow glycogen degradation. Glycogen is hypothesized to function as long durability energy reserves, which have been reported as a Durable Energy Storage Mechanism (DESM) to account for the long-term survival of some bacteria in cold environments [45]. Metabolism of maltodextrin has been linked with osmoregulation and sensitivity of bacterial endogenous induction to hyperosmolarity, which is related to glycogen metabolism. Glycogen-generated maltotetraose is dynamically metabolized by maltodextrin phosphorylase (MalP) and maltodextrin glucosidase (MalZ), while 4-αglucanotransferase (MalQ) is responsible for maltose recycling to maltodextrins [46]. Maltotetraose is produced using GlgB, MalZ, MalQ, and glucokinase (Glk), which act on maltodextrin and glucose. On the other hand, glucose-1-phosphate can be formed by MalP for glycogen synthesis or glycolysis [47]. Thus, glycogen degradation can play an essential role in bacterial adaptation to the environment. Additionally, maltose may form capsular α-glucan, which plays a role in environmental adaptation through the (TreS)-Pep2-GlgE-GlgB pathway [48,49]. Previous studies indicate that trehalose is involved in bacterial adaptation to temperature fluctuation, hyperosmolarity, and desiccation resistance. Recently, the accumulation of trehalose and glycogen under cold conditions in Propionibacterium freudenreichii has been reported [50,51]. Therefore, the role of glycogen in bacterial energy metabolism is closely linked to several metabolic pathways associated with bacterial persistence under environmental stresses such as starvation, drying, temperature fluctuations, and hyperosmolarity. Maltodextrin and trehalose pathways are examples of the relationship between glycogen and other metabolic pathways, as shown in Fig. 5. However, further exploration is needed to elucidate the relationship of glycogen with other compounds, and the mechanisms involved in bacterial persistence strategies [46]. The comparative analysis of predicted pathways for glycogen metabolism in Arthrobacter isolates (Additional file 2: Table S1), showed that in PAMC25564 the trehalose biosynthesis follows three metabolic pathways (OtsAB, TreYZ, and TreS) as in Mycobacterium [52]. The trehalose biosynthesis pathway is well known in numerous bacteria, for example, as a defense strategy involving the accumulation of trehalose. Three metabolic pathways to regulate osmotic stress have been reported in Corynebacterium glutamicum [53]. These three metabolic pathways are used for producing trehalose in C. glutamicum, where the gene galU/otsAB allows the increase of trehalose levels up to six times [54,55]. This pathway was found in A. sp. PAMC25564, and it was predicted that such an isolate could produce energy in cold environments.

Glycogen metabolism and the trehalose pathway in Arthrobacter species
We investigated the glycogen metabolic pathways in each Arthrobacter species strain (Fig. 6). To determine the three pathways of glycogen metabolism and trehalose biosynthesis in Arthrobacter species coming from diverse environments, the level of dissimilarity was analyzed based on the composition of GH, GT, and other major enzymes from the 16 genomes. The analysis showed that only QXT-31, U41, and PGP41 shared the same genes and pathway as our strain, while other strains have a slightly different pattern. This study assumed that the PAMC25564 strain uses different pathways to obtain energy or degrade polysaccharides. Based on the above-mentioned pathway-related genes, we confirmed that strains YN, Rue61, PAMC25486, ZXY-2, ERGS1:01, YC-RL1, ATCC21022, R.3.8, and A3 lack the malQ gene. These strains of Arthrobacter species showed a low number of genes for the three main pathways of trehalose biosynthesis and these are responsible for maltose recycling to maltodextrins. Therefore, the energy supply may be compromised in such isolates. Although most strains showed galU/otsAB genes, strains Hyo8 and ERGS1:01 lack the otsB gene ( Fig. 6; Additional file 2: Table S1). This result suggests that these strains would produce a significantly lower amount of trehalose than the isolates having the otsB gene. Additionally, we investigated the phosphotransferase systemrelated genes in strains R.3.8 and A3. These enzymes constitute another method used by bacteria for sugar uptake when the source of energy is phosphoenolpyruvate. As a result, these two strains probably produce polysaccharides by themselves or from an external source using phosphoenolpyruvate rather than consuming energy. Most of the compared strains were isolated in low-temperature (-18 to 15 ℃) and lowcontamination environments. These species have been reported as highly anticipated strains due to their fast adaptability to the environment, and these results suggest this adaptability is related to glycogen metabolism, trehalose, and maltodextrin pathways, which may have an impact on industrial applications. Microorganisms with related genes can make trehalose production economical and are able to draw on their own energy. This result also predicted that trehalose metabolism in microorganisms depends on the requirement of bacterial metabolism in the given environmental conditions, and this is one of the characteristics of bacteria that grow in extreme environments such as the cold.

Conclusions
In this study, we elucidated the complete genome sequence of Arthrobacter sp. PAMC25564 and conducted a comparative genome analysis with other species for studying CAZyme patterns. We isolated bacteria from cryoconite under laboratory conditions and confirmed that the isolate is an Arthrobacter species, based on the analysis of 16 S rRNA sequences. Although the isolation of this species from extreme or contaminated environments has been reported previously, there are no reports on the use of CAZymes in cold environments. We predicted that Arthrobacter sp. PAMC25564 could produce energy autonomously as fast as it can adapt to the environment. The PAMC25564 strain genome is 4.17 Mb in size with a GC content of 66.74 %. The analysis of its complete genome suggested that the isolate has glycogen, trehalose, and maltodextrin pathways associated to CAZyme genes. We confirmed that PAMC25564 has 108 active CAZyme genes from the following groups, 5 AA, 2 CBM, 23 CE, 33 GH, and 45 GT. In addition, a comparative genome analysis of Arthrobacter species revealed that they adapt quickly to the environment. In conclusion, we expect the genome sequence analysis to provide valuable information regarding novel functional enzymes, especially CAZymes, which are active at low temperatures and can be used for biotechnological applications and fundamental research purposes. This study provides a foundation to understand how the PAMC25564 strain produces energy in an extreme environment.

Methods
Isolation and genomic DNA extraction of Arthrobacter sp. PAMC25564 The

Genome annotation of Arthrobacter sp. PAMC25564
The genome of strain PAMC25564 was annotated using the rapid annotation subsystem technology (RAST) server [57]. The predicted gene sequences were translated and searched in the National Center for Biotechnology Information (NCBI) non-redundant database, the COG from the eggnog v.4.5.1 database [58], and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database by cutoff value 0.01 [59]. A circular map of the PAMC25564 genome was prepared using the CGView comparison tool [60]. CAZyme gene analyses were carried out by running dbCAN tool [61] scans using hidden Markov model (HMM) profile downloaded from dbCAN2 HMMdb (version 7.0). At the same time, we obtained results from Signal P (version 4.0) about the presence of CAZyme genes [62]. The e-value cutoff was 1e-15 and the coverage cutoff was > 0. 35. In addition, we used DIAMOND [63] (e-value < 1e102) and Hotpep [64] (frequency > 2.6, hits > 6) to improve the prediction accuracy.

Phylogenetic analysis
Strain PAMC25564 was compared with other Arthrobacter species using 16 S rRNA phylogenetic analysis. Alignments were performed using Basic Local Alignment Search Tool from the NCBI database and analyzed using EzBio Cloud (www.ezbiocloud.com). 16 S rRNA sequences were aligned using MUSCLE [65,66] and MEGA X [67] to reconstruct a neighbor-joining tree and maximum likelihood tree with 1,000 bootstrap replications.

Comparative genomics of Arthrobacter sp. PAMC25564
We used all complete genome sequences of Arthrobacter species available in GenBank (https://www.ncbi.nlm.nih. gov). Firstly, we determined the relationship of PAMC25564 with other strains from the same species using complete genome sequences and checked their similarity by comparing values of ANI, calculated using an OrthoANI in OAT (the Orthologous Average Nucleotide Identity Tool) [68]. The genome information of several Arthrobacter species is available in GenBank, and we compared the CAZymes from registered species referenced in CAZy (http://www.cazy.org). First, based on the 16 S rRNA sequence, strains with similarity were selected, and then 25 strains with a complete genome were chosen. All those sequences were downloaded from the database and all CAZymes were reannotated using the dbCAN2 server. In fact, some kinds of strains (PAMC 25,486, ERGS1:01, and ERGS4:06) reported isolated from cold environments such as the Spitsbergen of Arctic, glacial of Himalaya.  Table 1. Comparative analysis of predicted pathways for glycogen and trehalose metabolism in Arthrobacter species. The symbol + indicates that the isolate produces the enzyme but symbol -indicates that the not produces the enzyme.