Genome sequencing of the neotype strain CBS 554.65 reveals the MAT1–2 locus of Aspergillus niger

Background Aspergillus niger is a ubiquitous filamentous fungus widely employed as a cell factory thanks to its abilities to produce a wide range of organic acids and enzymes. Its genome was one of the first Aspergillus genomes to be sequenced in 2007, due to its economic importance and its role as model organism to study fungal fermentation. Nowadays, the genome sequences of more than 20 A. niger strains are available. These, however, do not include the neotype strain CBS 554.65. Results The genome of CBS 554.65 was sequenced with PacBio. A high-quality nuclear genome sequence consisting of 17 contigs with a N50 value of 4.07 Mbp was obtained. The assembly covered all the 8 centromeric regions of the chromosomes. In addition, a complete circular mitochondrial DNA assembly was obtained. Bioinformatic analyses revealed the presence of a MAT1-2-1 gene in this genome, contrary to the most commonly used A. niger strains, such as ATCC 1015 and CBS 513.88, which contain a MAT1-1-1 gene. A nucleotide alignment showed a different orientation of the MAT1–1 locus of ATCC 1015 compared to the MAT1–2 locus of CBS 554.65, relative to conserved genes flanking the MAT locus. Within 24 newly sequenced isolates of A. niger half of them had a MAT1–1 locus and the other half a MAT1–2 locus. The genomic organization of the MAT1–2 locus in CBS 554.65 is similar to other Aspergillus species. In contrast, the region comprising the MAT1–1 locus is flipped in all sequenced strains of A. niger. Conclusions This study, besides providing a high-quality genome sequence of an important A. niger strain, suggests the occurrence of genetic flipping or switching events at the MAT1–1 locus of A. niger. These results provide new insights in the mating system of A. niger and could contribute to the investigation and potential discovery of sexuality in this species long thought to be asexual. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07990-8.

ATCC 1015 as reference and then by submitting the genome sequence of CBS 554.65. The genome assembly has been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB42544.
PCRs were performed on the genomic DNA of CBS 554.65 to amplify 1756 bp in the left region (with primers chr5_left_fwd: ACTTATCCCTCGTCAATGA and chr5_left_rev: GGTCGACTTTTTGGGGAAA) and 1638 bp in the right region (with primers chr5_right_fwd_1: TTCTCCATATTGTCAGCCAT and chr5_right_rev_1: CATCGCTTCTTTTCCTCGGA) of chr5_00008F. PCR products were sequenced by Microsynth AG. The MAT locus sequences of 24 A. niger isolates were extracted from complete genome sequences obtained with the Illumina technology and assembled using SPADes [20] (data not published). In 18 out of the 24 A. niger isolates the MAT locus was distributed over multiple scaffolds. In order to verify the location of the MAT genes and their orientation in these strains, diagnostic PCRs and subsequent sequencing were performed to restore in silico gaps within the MAT locus. Primers used for gap restoration are listed in Table S2 (Additional File 2).

In silico analyses
The genome sequences of strains ATCC 1015 and NRRL3 were retrieved from JGI [21]. Analyses of the position of the MAT genes within the MAT locus were performed with FungiDB [22] for strains for which a complete genome sequence is available or on BLAST against the whole-genome shotgun contig database (wgs) of A. niger for A. welwitschiae strains. Sequence analyses and alignments were performed with CLC Main Workbench 8.0.1 (QIAGEN). Homologues of the MAT genes in 24 A. niger isolates were determined based on local BlastN searches using genes obtained from CBS 554.65 and ATCC 1015 as query. The sequences of the assembled MAT loci have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB42577.

Results And Discussion
Morphology of strain CBS 554.65 The strain CBS 554.65 is the A. niger neotype, a reference strain for morphological and taxonomical analyses. The morphology of this strain grown on minimal medium and malt extract agar can be observed in Fig. 1. On both media CBS 554.65 forms abundant conidia, black on minimal medium and dark brown on malt extract agar.

Genome sequence and analysis
The genome sequencing of the neotype strain CBS 554.65 yielded 5.3 Gbp in 287,000 subreads. The mean length was 18.4 Kbp for the longest subreads and half of the data was in reads longer than 29 Kbp. The assembly consisted of 17 contigs with a total of 40 Gbp and 55.2-fold coverage. Half of the size of the genome is comprised in 4 scaffolds (L50) of which the smallest has a length of 4.07 Mbp (N50). The GC content is 50.3%. The nuclear genome was annotated with Augustus, using the genome of the strain ATCC 1015 as reference. Based on this automated annotation 12,240 protein coding genes were predicted. In Table 1 [4,5] ATCC 1015 [5] NRRL3 [23,24] Genome size (Mb) 40 against the complete genome of strain NRRL3 showed that the putative centromeres are almost completely lacking from the genome assembly of NRRL3 ( Figure 2, grey areas in the blast graph). Although di cult to identify, centromeric regions in lamentous fungi are composed of complex and heterogeneous AT rich sequences which can stretch up to 450 kb [26,27]. Due to the likely presence of near-identical long repeats, centromeres are di cult to sequence and assemble [27] explaining why they are lacking in strain NRRL3. Transposon and retrotransposon have been identi ed in the centromeres of other eukaryotes, including fungi [26,28]. The blast analysis against NRRL3 showed that, besides the putative centromeric regions, other large regions constituting the genome of CBS 554.65 do not nd homology in NRRL3, explaining the difference in size between the strains. To con rm that these unique regions are not artifacts, the sequencing reads of CBS 554.65 were remapped to the genome. 192,283 reads were remapped to the genome and the mean read length of the remapped reads was 15,215.97 (see total coverage graph in Figure Figure 2). Sequencing of the PCR products con rmed the sequence obtained by genome sequencing. The higher read length obtained with PacBio sequencing allows to cover also repetitive sequences which are probably missing from previous genome sequences of A. niger obtained with Illumina, explaining the observed difference in genome size. The number of protein-coding genes in CBS 554.65 is in range with what found in ATCC 1015 and NRRL3. The large difference in the protein-coding genes in strain CBS 513.88 is likely caused by overpredictions, as previously suggested [5] Mitochondrial DNA Many genome projects focused on the nuclear genome while the mitochondrial DNA is often neglected. In A. niger only one mitochondrial DNA (mtDNA) assembly has been reported, for the strain N909 [30]. In this study, the mtDNA of strain CBS 554.65 was de novo assembled from PacBio reads as a circular DNA with a length of 31,363 bp. MtDNA is abundant in whole genome sequencing projects and the read coverage of the assembly (average: 1,220 x, min: 328 x, max: 1,674 x) is thus higher than for the nuclear genome. In total 18 ORFs, 26 tRNA and 2 rRNA sequences were annotated (Fig. 3). All 15 core mitochondrial genes reported for Aspergillus species were identi ed with a comparable gene organization [31]. In addition, three accessory genes orf1L, orf3 and endo1 were annotated. The gene endo1 is located in the intron of cox1 and encodes a putative homing endonuclease gene belonging to the LAGLIDADG family frequently found in the cox1 intron of other lamentous fungi [31]. The gene orf3 encodes for a hypothetical protein of 191 residues, which is also present in the mtDNA of strain N909 but was not annotated there.
Surprisingly this unknown protein has a good hit against an unknown protein of Staphylococcus aureus (99% identity), however not against other proteins of Aspergillus species. In A. niger strain N909 two other unknown proteins are encoded in orf1 and orf2. These two open reading frames are connected to a long one in A. niger CBS 554.65 yielding a potential protein product with 739 amino acid residues. This is comparable to an open reading frame located at the same position between nad1 and nad4 in the mtDNA of A. avus NRRL 3357 (AFLA_m0040), with a size of 667 amino acid residues. In the N-terminal region of both putative proteins, transmembrane spanning regions can be predicted supposing a location in a mitochondrial membrane, however the C-terminal regions are not conserved between A. niger and A. avus protein. It is suggested to use the mitochondrial assembly of CBS 554.65 as a reference sequence for A. niger mitochondria because it is known that strain N909 is resistant to oligomycin. This resistance is typically linked to mutations in the mtDNA, either in atp6 or atp9, and indeed two mutations are found in atp6 of strain N909 (L26W and S173L).
Discovery and sequencing of a MAT1-2 A. niger strain The genome sequencing and analysis of strain CBS 554.65 allowed to determine the mating-type of this strain. The sequence of the putative MAT1-2-1 gene (g9041) was searched in the whole nucleotide database using BlastN, giving as hits the mating-type HMG-box protein MAT1-2-1 of other aspergilli, including A. neoniger (with an identity of 93.25%) and A. tubingensis (with an identity of 93.07%). As such, we consider gene g9041 to be homologous to the MAT1-2-1 gene of other Aspergillus species.
This is in line with a previous study which indicated the presence of a MAT1-2-1 sequence in the CBS 554.65 strain through a PCR approach [8]. Here we report the rst complete genome sequence of an A. niger strain having a MAT1-2-1 gene. The availability of this genome sequence represents an important tool for further studies investigating the sexual potential of A. niger. The presence of both opposite mating-type genes in different strains belonging to the same species represents a strong hint of a sexual lifestyle [10]. ochraceoroseus. Aspni7I1160288 has a domain with predicted role in proteolysis and its homolog in other aspergilli is present at another genomic locus, not in proximity of the MAT gene. A homolog of gene g9046 was found by BlastN search in Aspergillus vadensis, in a different location of the genome than the MAT locus. These results suggest that these unique genes are likely not part of the "core" MAT locus. The gene g9040-2 is a putative homolog of the MAT1-2-4 gene in A. fumigatus, an additional mating-type gene required for mating and cleistothecia formation [32]. Another difference between ATCC 1015 and CBS 554.65 is represented by the gene putatively encoding for a HAD-like protein. While this gene is complete in CBS 554.65 (g9045), it appears disrupted in ATCC 1015 and, therefore, doubly annotated in this strain (Aspni7|1095364 and Aspni7|1128138). The other genes present in the selected genomic region show a high level of conservation, with a higher syntheny further away from the MAT genes (genes in the purple and blue boxes). Moreover, genes encoding for the DNA lyase apnB, the cytoskeleton control assembly factor slaB and the anaphase promoting complex apcE are present in both MAT loci. These genes are normally found in the MAT loci of other fungi, including yeast [17], and their presence in the MAT loci of A. niger further con rms the high level of conservation characterizing this locus. In heterothallic ascomycetes the MAT genes are commonly included between the genes apnB and slaB [17]. From the alignment in Fig. 4 the relative position of the MAT genes to apnB and slaB can be analyzed. In CBS 554.65 the MAT1-2-1 gene (g9041) is anked by apnB and slaB respectively upstream and seven genes downstream. In contrast, in the MAT1-1 locus of strain ATCC 1015 the MAT gene is anked downstream by apnB and upstream by a conserved sequence including adeA, while slaB is found on the same side of apnB. The entire genomic locus, containing the MAT1-1-1 gene and eight other genes (23 kbp indicated by the red arrow in Fig. 4), shows a ipped orientation compared to the corresponding locus in CBS 554.65 containing the MAT1-2-1 gene (indicated by an orange arrow in Fig. 4). The ORF direction of the conserved genes apnB, coxM and apcE additionally con rms the different orientation of this locus in the two strains. By sequence analysis, a repetitive 7 bp DNA motif (5´-TTACACT) was found in the MAT1-1 locus (orange triangles in Fig. 4), where the homology between the MAT1-1 and MAT1-2 loci breaks (in proximity to adeA and slaB). An additional site of this motif was found in the gene encoding a HAD-like hydrolase (Aspni7I1128138). This motif is present at similar positions in two other sequenced MAT1-1 strains of A. niger (N402, CBS 513.88). In contrast, the MAT1-2 strain presents this motif only at the site close to the adeA gene and in the putative HAD-like hydrolase gene (g9045), but not at the site close to the slaB gene. Methods to identify the opposite mating-type in natural isolates often rely on the use of primers designed to bind to apnB and slaB, since these are the genes that commonly ank the MAT gene itself [33,34]. In both mating-type A. niger strains, slaB is found more than 12 kbp far from the MAT gene and this might help explaining why the MAT1-2 locus was never previously described for this species, with only one study mentioning it [8].
Not only the particular orientation of the MAT locus but also the presence of a repetitive motif in the MAT loci suggest that a genetic switch or a ipping event might have occurred or is still ongoing in A. niger, which might affect the expression of the MAT genes. Genetic switching events at the MAT locus are known for other ascomycetes, particularly yeasts. For instance, in S. cerevisiae a switching mechanism involving an endonuclease and two inactive but intact copies of the MAT genes allows to switch the MAT type of the cell [35]. Expression of the MAT gene is instead regulated in the methylotrophic yeasts Komagataella pha i and Ogataea polymorpha via a ip/ op mechanism [36,37]. In these species, a 19 kbp sequence including both mating type genes is ipped so that a MAT gene will be close to the centromere (5 kbp from the centromere) and, therefore, silenced while the other will be transcribed. In CBS 554.65 the region comprising the MAT1-2-1 gene is present at around 280 kbp downstream of the putative centromere, which is much further away of what observed for K. pha and O. polymorpha. However, in certain basidiomycetes, such as Microbotryum saponariae and Microbotryum lagerheimii, the mating-type locus HD (containing the homeodomain genes) is around 150 kbp distant from the centromere and linked to it [38]. It was proposed that the proximity to the centromere in these species might be enough to reduce recombination events [38]. The effect of the distance between the centromere and the MAT genes in A. niger merits further attention, especially in view of a potential sexual cycle happening in this species.
Inversion at the MAT locus have been described for certain homothallic lamentous fungi such as Sclerotinia sclerotiorum and Sclerotinia minor [39,40]. Field analysis of a large number of isolates showed that strains belonging to these species can either present a non-inverted or an inverted MAT locus. In the inverted orientation two of the four MAT genes at the locus have the opposite orientation and one gene is truncated. In the case of S. sclerotiorum, differences in the gene expression were observed between inverted and non-inverted strains. This inversion, induced by crossing-over between two identical inverted repeat present in the locus, likely happens during the sexual cycle before meiosis [39]. The analysis of a larger number of A. niger natural isolates is required to investigate whether opposite orientations of both MAT loci exist for this species as well and what the implications of such inversions might be. Chromosomal inversions are considered to prevent recombination between sex determining genes in higher eukaryotes, such as animals and plants [41]. Further studies are therefore required to investigate whether a mechanism similar to those already described in other fungal species is also happening in A. niger, which might help to explain the di culty in nding if this species can bear a sexual cycle.

Genetic comparison of MAT loci in different aspergilli and additional A. niger strains
Due to the particular con guration observed in this study for the MAT1-1 locus of strain ATCC 1015, the orientation of the MAT locus of additional Aspergillus species for which a genome sequence is available was analyzed (Table 3). Firstly, the genes adeA and slaB were retrieved because they are conserved and often found at the right and left ank of the MAT gene, respectively (Fig. 4). Subsequently, the position of the MAT gene was checked in comparison to the three conserved genes apnB, coxM and apcE. The MAT gene could be either included between adeA and apnB, like in ATCC 1015 ( ipped position), or between apnB and slaB, like in CBS 554.65 (conserved position). The results of this analysis are reported in Table 3. A complete table with the identi ers of all genes analyzed is reported in the Additional le 4.  Table 3. MAT genes included between adeA and apnB have a ipped orientation while MAT genes included between apnB and slaB have a conserved orientation. Aspergillus species are grouped in sections based on the most updated classi cation [50]. For each species it is indicated if a sexual cycle was reported.
In the analyzed Aspergillus sequences the MAT gene (either MAT1-1-1 or MAT1-2-1) was mostly found between the genes apnB and slaB, such as in CBS 554.65 (conserved  Fig. 4 is a peculiar feature of A. niger and its close relative A. welwitschiae. Despite showing this unusual orientation, the presence of a 1:1 MAT1-1:MAT1-2 ratio among 24 randomly selected natural A. niger isolates is an important observation, which suggests that sexual reproduction is occurring in this species. Moreover, A. niger was previously shown to be able to form sclerotia [51][52][53][54], important prerequisite for sexual development in closely related species. Therefore, further research should focus on the possibility to e ciently induce a sexual cycle in A. niger.

Conclusions
The annotated genome sequence of CBS 554.65, belonging to the A. niger neotype strain, represents an important tool for further studies, considering the high quality of this genome sequence, covering all the 8 centromeres and including a complete mtDNA sequence. The analysis of this genome revealed the presence of a second mating-type locus (MAT1-2) in this strain, making it therefore suitable to investigate fungal development in A. niger. The position and the orientation of the MAT1-2-1 gene of A. niger, both in the CBS 554.65 strain and in 10 natural isolates, was found to be similar to that of other aspergilli, with the MAT gene included between the genes apnB and slaB. On the contrary, the unusual position of the MAT1-1-1 gene found in the ATCC 1015 strain and other 12 analyzed natural isolates might indicate that ipping or switching events occurred at the MAT locus. Further research is required to investigate whether this difference in the position of the MAT genes in the opposite mating-type strains could have an effect on the expression of the genes included in this genomic region and, therefore, on the possibility of A. niger to reproduce sexually.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.
Availability of data and material