- Research article
- Open Access
A BAC based physical map and genome survey of the rice false smut fungus Villosiclava virens
BMC Genomics volume 14, Article number: 883 (2013)
Rice false smut caused by Villosiclava virens is a devastating fungal disease that spreads in major rice-growing regions throughout the world. However, the genomic information for this fungal pathogen is limited and the pathogenic mechanism of this disease is still not clear. To facilitate genetic, molecular and genomic studies of this fungal pathogen, we constructed the first BAC-based physical map and performed the first genome survey for this species.
High molecular weight genomic DNA was isolated from young mycelia of the Villosiclava virens strain UV-8b and a high-quality, large-insert and deep-coverage Bacterial Artificial Chromosome (BAC) library was constructed with the restriction enzyme HindIII. The BAC library consisted of 5,760 clones, which covers 22.7-fold of the UV-8b genome, with an average insert size of 140 kb and an empty clone rate of lower than 1%. BAC fingerprinting generated successful fingerprints for 2,290 BAC clones. Using the fingerprints, a whole genome-wide BAC physical map was constructed that contained 194 contigs (2,035 clones) spanning 51.2 Mb in physical length. Bidirectional-end sequencing of 4,512 BAC clones generated 6,560 high quality BAC end sequences (BESs), with a total length of 3,030,658 bp, representing 8.54% of the genome sequence. Analysis of the BESs revealed general genome information, including 51.52% GC content, 22.51% repetitive sequences, 376.12/Mb simple sequence repeat (SSR) density and approximately 36.01% coding regions. Sequence comparisons to other available fungal genome sequences through BESs showed high similarities to Metarhizium anisopliae, Trichoderma reesei, Nectria haematococca and Cordyceps militaris, which were generally in agreement with the 18S rRNA gene analysis results.
This study provides the first BAC-based physical map and genome information for the important rice fungal pathogen Villosiclava virens. The BAC clones, physical map and genome information will serve as fundamental resources to accelerate the genetic, molecular and genomic studies of this pathogen, including positional cloning, comparative genomic analysis and whole genome sequencing. The BAC library and physical map have been opened to researchers as public genomic resources (http://gresource.hzau.edu.cn/resource/resource.html).
Rice false smut caused by Villosiclava virens (Cooke Tak) (anamorph Ustilaginoidea virens) [1, 2] has emerged as a devastating disease in rice, due to the intense application of nitrogen and phosphorus fertilizers and the cultivation of high-yielding semi-dwarf rice cultivars worldwide . Previously, rice false smut was considered as a minor rice disease because of its rare occurrence in limited regions, but this disease has spread widely in the last 20 years and has become a severely devastating disease in many major rice-growing regions, including Asia, Africa, the United States, South America and Italy [3, 4]. Rice false smut dramatically damaged rice production in 1988 and has continued to occur frequently . The ustiloxin produced by this pathogen in infected plant tissues is a kind of cyclopeptide mycotoxins, which inhibits the growth of microtubules and is toxic to humans and livestock .
To date, the knowledge of V. virens is still very limited. Ashizawa et al. reported a sensitive method to quantify V. virens pathogens in soil samples using real-time PCR . Ladhalakshmi et al. studied the intensity of rice false smut in India and found that the percentage of false smut-infected tillers ranged from 5% to 85% in the southern states, and 2% to 75% in northern states . Atia et al. first investigated rice false smut in Egypt and reported that the production loss caused by this pathogen ranged from 1.0% to 10.9% . Tanaka et al. established a simple transformation system of this pathogen using electroporation of intact conidial cells . Fu et al. described the morphologic characteristics more precisely .
At present, the effect of rice false smut control is far from ideal. For searching the effective and environment-friendly methods, more morphological, molecular, genomic and genetic data of V. virens are required to reveal the infection process, the interaction mechanism between host and pathogen, the genetic variety and diversity, and the genome composition of this specie.
BAC libraries, physical maps and BESs serve as important tools in genetic, molecular and genomic studies. BAC libraries are used as templates in targeted or whole genome sequencing, physical map construction and functional complementation of genes in positional cloning. Physical maps provide frames for genome sequencing and physical positions of genes and markers. BESs are accurate and inexpensive genome samples , from which initial insights into the genome composition and candidates of molecular markers can be obtained [10, 11]. The combined resources of BAC library, physical map and BESs of a genome play even more powerful roles synergistically in the above mentioned and extended research fields. BAC clones will largely increase the value and utility in targeted genome sequencing and positional cloning when mapped on a physical map. BESs embedded in physical map can be used as anchors in genome comparisons to detect sequence assembly errors of the same source genome and large structural changes of phylogenetically close genomes [12, 13].
BAC libraries and physical maps have been constructed for several agriculturally important fungal organisms, such as Magnaporthe oryzae[14, 15], Blumeria graminis, Fusarium graminearum, Cryptococcus neoformans, Trichoderma reesei and Ustilago maydis. We recently constructed a BAC library for a V. virens strain, UV-2 . The BAC library contains 10,368 clones and has an average insert size of 124.4 kb. However, no physical map was constructed and no BESs were produced with this BAC library. Here we report the construction of a BAC-based physical map and genome survey of the V. virens strain UV-8b. To our knowledge, this is the first physical map and genome sequence information developed for V. virens.
Phylogenetic analysis of strain UV-8b
The V. virens strain UV-8b was a single spore isolated from Japonica rice Zhonghua 11. To analyze the phylogenetic relationship between this strain and other fungal pathogens, we sequenced its 18S rRNA gene and compared it with other fungal 18S rRNA gene sequences. The 18S rRNA gene sequence of UV-8b showed a 99% identity to those of other V. virens strains and the phylogenetic tree constructed with the NJ algorithm clustered UV-8b into V. virens clade (Additional file 1: Figure S1). The UV-8b strain is also related to the members of Metarhizium, Trichoderma and Cordyceps (about 98% identities among 18S rRNA gene sequences).
BAC library construction, fingerprinting and contig assembly
To obtain basic genome resources for V. virens, we constructed a BAC library and a BAC-based physical map of the V. virens strain UV-8b. The BAC library consists of 5,760 clones arrayed in 15 384-well plates. Analysis of 180 random BAC clones showed that the library had an average insert size of 140 kb with a size range from 25 to 190 kb and an empty-vector rate of lower than 1% (Table 1; Additional file 2: Figure S2). The library was calculated to cover 22.7-fold of the UV-8b genome (based on a genome size of 35.5 Mb, Dr. Shaojie Li, personal communication).
We fingerprinted 2,688 BAC clones using five restriction enzymes (BamHI, EcoRI, XbaI, XhoI, HaeIII). After quality filtering as described in the methods, fingerprint profiles of 2,290 BAC clones were qualified for FPC assembly. The 2,290 BAC clones covered 9-fold genome equivalents and contained an average of 124 bands (consensus bands; CBs) per clone. Based on the average insert size of 140 kb, one CB was estimated to be 1.13 kb (Table 1).
The fingerprint data of the 2,290 clones were imported into FPC V9.4 for contig assembly. A series of tests were performed to find optimal parameters for the assembly. Table 2 displayed the results of assembly with tolerance 4 and different cutoff values. Based on these tests, we chose 10-15 as the initial cutoff value for contig assembly. This condition setting assembled 2,035 clones into 196 contigs containing 111 (4.85%) Q clones, and left 255 (11.14%) clones as singletons. The contig101 (4 clones) was end-merged to contig76 (18 clones), and contig193 (29 clones) was end-merged to contig1 (3 clones), by the “End to End” function at terminal cutoff 10-12 and match value 2. This result was referred as PhaseIA and used as standard version. The PhaseIA contigs covered 51.2 Mb in physical length. The discrepancy between the genome length and the physical length of all contigs might be generated by the potential redundancy of contigs, which could be detected and merged with more evidences. Using “End to End” function at terminal cutoff 10-08 and match value 1, we merged another 74 contigs. This result was referred as PhaseIB. The BAC library and two versions of physical map have been opened to researchers as public genomic resources (http://gresource.hzau.edu.cn/resource/resource.html).
To evaluate the PhaseIA contig quality, we used PCR with primers from repeat masked BESs to verify the overlaps of clones in contigs. We randomly picked contig184 (2 clones), contig155 (3 clones), contig149 (4 clones), contig70 (7 clones), contig50 (12 clones), contig36 (20 clones) and contig17 (28 clones), and designed 3, 3, 4, 7, 8, 7 and 8 pairs of primers for PCR, respectively. As a result, only one clone (U08J21 located in the contig17) could not be distinctly confirmed (Figure 1; Additional file 3).
BAC end sequencing
To perform a genome survey and provide anchor sequences on the physical map for genome comparisons, we sequenced 4,512 BAC clones that included those clones used in fingerprinting at both ends. A total of 6,560 high quality BESs were generated after quality trimming, of which 5,676 were paired-end (86.52%) sequences and 884 were single-end sequences (13.48%) (Table 1). The maximal and the average length of the BESs were 798 bp and 462 bp (Figure 2), respectively. The total length of the BESs was 3,030,658 bp representing 8.54% of the whole genome. The GC content was 51.52%. The 6,560 high quality BESs are available in GenBank [GenBank:JY267549 to GenBank:JY274108].
Analysis of repetitive DNA in BESs
Repeat sequences are usually a major component of eukaryotic genomes. To gain an initial insight into the composition of repeat elements contained in UV-8b BESs, RepeatMasker was used to identify the known repeat elements from existing databases. The result indicated that a total length of 138,502 bp (4.57%) of the known repeat sequences was identified and contained in 1,273 (19.41%) reads, among which only one read was completely recognized as repeat sequence. In the terms of the repeat category, retroelements were dominant and represented 3.07% of the total BES length, of which the LTR elements Ty1/Copia and Gypsy/DIRS1 accounted for 2.05% and 1.01%, respectively, while the LINE elements accounted for only 0.01% of the total BES length. Small RNA and simple repeats accounted for 0.04% and 0.95% of the total BES length, respectively (Table 3). It is interesting that few DNA transposons were identified in BESs in contrast to retroelements.
RepeatScout was used to de novo scan the repeat sequences contained in UV-8b BESs with the criterion described in the methods. A cumulative 682,351 bp (22.51%) were marked as repeat sequences with this pipeline, and were contained in 2,642 (40.27%) reads. Among these reads, 163 (2.48%) were marked as complete repeat sequences. The 1,384 reads in this result were not contained in the RepeatMasker result and 15 reads in the RepeatMasker result were not contained in this result. After repeat-masked, the BESs were self-BLASTed as described in the methods and no reads showed more than three matches to others, proving the high sensitivity of the RepeatScout pipeline.
Comparative analysis of UV-8b with other fungal pathogens through BESs
For functional genomics comparison and evolutionary studies of the UV-8b genome, the following 10 well-characterized fungal pathogen genomes were chosen: Magnaporthe oryzae, Botrytis cinerea, Puccinia spp, Fusarium graminearum, Fusarium oxysporum, Blumeria graminis, Mycosphaerella graminicola, Colletotrichum spp, Ustilago maydis and Melampsora lini. They were voted as the most scientifically/economically important fungal pathogens by plant mycologists . Four other fungi, Metarhizium anisopliae, Trichoderma reesei, Nectria haematococca and Cordyceps militaris, were also chosen as related species for this study, because they were close to V. virens in evolution distance and their whole genome sequences were available.
To identify the microsynteny regions of UV-8b to the above genomes, the repeat-masked UV-8b BESs were used in BLAST analysis with the above genome sequences. As shown in Table 4, 0.18-8.34% of masked BESs matched to the top 10 plant fungal pathogen genomes. In ascomycetes pathogens, F. oxysporum (8.34%) and F. graminearum (8.19%) showed the most hits, followed by C. graminicola (8.13%), and B. graminis (1.07%) showed the least hits. In basidiomycetes pathogens, U. maydis (0.95%) has the smallest genome size but the highest number of hits; P. graminis (0.18%) and Melampsora laricis (0.24%) showed less hits. Among masked BESs, 18.81%, 11.88%, 10.50% and 10.35% matched to M. anisopliae, T. reesei, N. haematococca and C. militar, respectively, and 326 masked BESs matched to all of those species (Figure 3). The similarity results were generally in agreement with the 18S rRNA gene analysis.
Among the BLAST hits, if paired-ends hit to target genomes with the criteria described in , the regions were considered to be collinear between UV-8b and the target genomes. The results (Table 4) showed that F. oxysporum and F. graminearum have more collinear regions than the others in the top 10 pathogen genomes. In the four related genomes, an insect fungal pathogen M. anisopliae had the most collinear regions. The higher degree of synteny between UV-8b and M. anisopliae was consistent with the results of the species distribution in the gene annotation step. Since most of the target genomes were not assembled completely (Table 4), the numbers of paired-end BESs potentially collinear with target genomes could be higher than detected.
In order to detect large syntenic regions, we used the SyMAP  program based on the BESs embedded in the contigs to anchor UV-8b PhaseIA contigs to the genomes of M. anisopliae, T. reesei, N. haematococca and C. militaris. Under the SyMAP default criteria, M. anisopliae had most anchored contigs, followed by T. reesei (Table 5), consistent with the comparative analysis results mentioned above (Table 4). Figure 4 shows an example of the graphical representation of the collinear regions.
Analysis of simple sequence repeats (SSRs)
SSRs are potential genetic markers due to their high rate of polymorphisms. To investigate the SSR contents and their distribution in UV-8b BESs, we scanned the BES dataset with SciRoko3.4 . First, the CAP3  program was used to reduce the redundancy of BESs; it clustered 1,821 BESs into 803 contigs and left 4,739 reads as singletons. The total length of these reads was 2,719,880 bp. Among these random genome sequences, a total of 1,023 SSR loci were identified from 849 reads with the criterion described in the methods. The SSRs had an average length of 25.17 bp, an average standard deviation of 10.55 bp and a density of 376.12/Mb.
Of these SSRs, 339 (33.14%), 57 (5.57%), 223 (21.80%), 113 (11.05%), 158 (15.44%), and 133 (13.00%) represented mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide types, that were composed of 2, 3, 10, 23, 53 and 85 SSR motifs, respectively (Table 6). The most abundant SSR types are mononucleotide and trinucleotide, in which the most abundant SSR motifs are A (268) and AGC (42), respectively. The dinucleotide (29.40 bp) and hexanucleotide (29.28 bp) SSR types had the longest average lengths among the different SSR types. The AACC motif (42.33 bp) had the longest length among the different SSR motifs. With the SSRs and their flanking sequences as input, a total of 836 pairs primers were designed for SSR loci by the Primer3  program. Information on the SSRs and the primers was showed in an additional file (Additional file 4: Table S1).
To compare the SSR contents and distribution patterns, the GSS sequences of B. graminis, Fusarium virguliforme, M. oryzae and T. reesei were downloaded from NCBI and scanned for SSRs using the same parameters of CAP3 and SciRoKo programs. The SSR contents and distribution patterns varied obviously (Figure 5). The SSR densities of the above species were 96.27/Mb, 107.71/Mb, 173.60/Mb and 234.06/Mb, respectively, in contrast to 376.12/Mb in UV-8b. The result indicated that UV-8b and T. reesei, which were closest in phylogenetic distance among the four species, had the highest SSR densities. The frequencies of the SSR types were also different among the above species. Mononucleotide types were dominant in UV-8b and M. oryzae, whereas the trinucleotide was most common in T. reesei. It is interesting that dinucleotides had the lowest frequency in all of the species. As for the frequencies of individual SSR motifs, SSR motif A was the most common motif in UV-8b, M. oryzae and B. graminis, AG was most common in T. reesei, and AGC was most common in F. virguliforme.
Before gene annotation, the repeat-masked BESs were pre-processed by the CAP3  program to reduce sequence redundancy. A total of 640 contigs were formed by the CAP3 program and 5,215 reads were left as singletons. The cumulative length of the processed sequences was 2,797,772 bp. An additional 876 (398,742 bp) reads, whose effective lengths were shorter than 100 bp, were removed to improve the result accuracy. The final 4,979 (contigs + singletons) reads, whose total length was 2,399,048 bp, were compared with the EST and NR databases of NCBI to identify coding regions. A total of 1,592 (31.97%) reads with a cumulative length of 835,095 bp (34.81%) were identified as homologous to ESTs (E-value cutoff of ≤ 10-10). Found to match to NR database (E-value cutoff of ≤ 10-6), were 2,219 (44.57%) reads with a cumulative length of 1,149,495 bp (47.91%), of which 1,492 reads were homologous to both the EST and the NR database. Taken together, 2,319 (46.58%) reads (1,592 + 2,219-1,492) with a cumulative length of 1,194,135 bp (49.78%) were identified to contain coding regions, and the cumulative length of coding regions was 863,826 bp, representing 36.01% of the total sequences (2,399,048 bp). Figure 6 shows the target species distribution in the NR database. M. anisopliae and M. acridum had the most BLASTX hits.
A total of 928 unique GO terms were assigned to 1,324 reads, and each read was associated with 3.37 GO numbers on average. The genes showed a wide range of functional categories (Figure 7; Additional file 5: Table S2). The binding and catalytic activities were most abundant in the molecular function category, whereas the cellular and metabolic processes were most common in the biological process category. A total of 971 reads matched to the InterProScan database provided a reliable dataset to understand gene function. On the other hand, 171 unique EC (Enzyme Code) annotations were assigned to 387 reads, and 74 pathways in which these enzymes participated were identified by the KEGG map module of BLAST2GO , such as the tricarboxylic acid cycle (TCA cycle). Six enzymes (from 7 reads) of the 171 unique EC were involved in the TCA cycle.
Rice is the staple food of more than 50% of people worldwide, and the problem of food deficiency is more and more severe with the expanding human population [28, 29]. Rice false smut caused by V. virens has emerged as a devastating disease in rice, and the ustiloxin produced by the pathogen is toxic to humans and livestock . However, little is known about this fungal pathogen to date. In this study, we constructed the first BAC-based physical map and generated a large set of BESs for V. virens. These resources will serve as fundamental tools for molecular, genetic and genomic studies of this pathogen.
Due to the lack of reference sequences and effective molecular markers, the contigs could not be edited. We used PCR with the primers derived from the masked BESs to evaluate the contig quality. From a total of 76 clones analyzed, only one clone was not verified by the PCR experiment, indicating that the contig assembly is reliable. The control samples and several pairs of primers in one contig helped to discriminate the false positive PCR bands.
Transposable elements (TEs) contribute largely to the evolution of fungal genomes [30, 31]. In UV-8b, we found that the known repeat elements represented 4.57% of the total BESs and are mainly LTR elements. Few DNA transposons were identified. This may be because the percentage of retroelements is higher than DNA transposons in the V. virens repeat family or because DNA transposons of V. virens are less homologous with the available repetitive sequences in the Fungi sub-database of RepeatMasker. In the M. oryzae genome, the retroelements were also more common than DNA transposons [32, 33].
By de novo searching the repetitive sequences contained in the UV-8b BES dataset with RepeatScout, a total of 682,351 bp (22.51%) sequences distributed in 2,642 reads (40.27%), were marked as repeat sequences. In our results, the core-repetitive sequences that were identified by RepeatScout were identified in 4 to 156 BESs, while the fragments which have lower occurrence may be false positives or lowly repetitive sequences. However, the percentage of repeat sequences reduced from 22.51% to 16.25% if the criterion threshold for hits in BESs was set as >5 instead of >3 times (please note that the BES sequences accounted for only 8.54% of the genome sequence). There was 10.3% genome sequences that were identified as repetitive sequences in the M. oryzae p131 assembly .
A total of 1,384 reads identified by RepeatScout were not identified by RepeatMasker, indicating that they are new repeat elements that have not been collected in the database. Fifteen known repeat element-containing reads identified by RepeatMasker were not identified by RepeatScout. It is possible that these elements have high-copy numbers in other fungi but low-copy numbers in the UV-8b genome or that they were under-represented in the BESs.
Bischoff et al. analyzed the phylogenetic placement of Villosiclavae and claimed that it is related to, but distinct from, the Clavicipitaceae and Hypocreaceae clades . This is in agreement with our phylogenetic analysis that most of the strains closely related to V. virens UV-8b belong to Clavicipitaceae and Hypocreaceae. In the processes of both genome comparison (Figure 3; Table 4) and gene annotation (Figure 6), the BLAST hit distributions were also consistent with the result of the phylogenetic tree, except for with T. reesei. This result could be due to the fact that the genome sequences used for genome comparison were draft sequences, and less genomic resources of T. reesei were deposited in GenBank for gene annotation.
To date, little is known about co-linearity of chromosome segments among filamentous ascomycete fungi  compared to plant and animal genomes. The synteny relationship could facilitate the acquisition of knowledge about genome evolution and dynamics, comparative genomics and phylogeny [35, 36]. We compared the repeat-masked BESs to the top 10 fungal pathogens to search microsyntenic regions. The result showed that F. oxysporum and F. graminearum have the most hit numbers. The hosts of F. oxysporum range from arthropods  to humans and also include gymnosperm and angiosperm plants , whereas F. graminearum was notorious for causing Fusarium head blight. However, the M. oryzae, which was a well-known pathogen of rice, showed few synteny regions.
Alignments of contigs and BAC clones to the target or reference genomes through BESs were widely used to detect phylogenetic relationships and large structural genomic variations between species, such as expansion, contraction, inversion and rearrangement in plants [12, 39]. These alignments could also assist in the sequence assembly and detect the assembly errors of the genome sequence of the same species . In this study, the numbers of UV-8b contigs aligned to the target genomes were not high. This was most probably due to both the high diversities among fungal genomes and the incompleteness of the target genome sequences.
The SSRs play an important role in genetic diversity analysis and genetic map construction due to their high level of polymorphisms, co-dominance and robustness . Before substantial genome sequence availability, the BESs, as random genome survey sequences, were an important resource for mining SSR markers. We found 1,023 SSRs with an average length of 25.17 bp and a density of 376.12/Mb, of which primers of 836 loci have been designed successfully. These primers are candidates for genetic analysis by PCR. It is interesting that UV-8b has a similar SSR content and distribution pattern with M. oryzae but not with the closely related T. reesei.
We constructed the first generation BAC-based physical map of V. virens and acquired 3,030,658 bp of BAC end sequences, representing 8.54% of the genome. The BAC library was equivalent to 22.7X genome coverage with an average insert size of 140 kb. A total of 2,035 BAC clones were assembled into 194 contigs and 255 clones were left as singletons. The BAC library and physical map provide tools for positional cloning, comparative genomics and whole genome sequencing of V. virens. In addition, the BAC end sequence analysis provides a glimpse into the V.virens genome composition, such as 51.52% GC content, 22.51% repetitive sequences, 376.12/Mb SSR density and approximately 36.01% coding regions. We believe that all these information is valuable to expedite the genomic and genetic research into the important rice false smut fungus.
The 18S rRNA gene identification and phylogenetic analysis
The 18S rRNA gene of UV-8b was amplified by PCR using the common primers NS1 and NS8 . The sequence was compared with those in the NCBI GenBank database by the BLASTN searching tool. The sequences were edited using ClustalX 1.83 software . The phylogenetic tree was constructed using the neighbor-joining (NJ) algorithm tested by 1,000 bootstrap with MEGA5 .
High molecular weight genomic DNA preparation
The V. virens strain UV-8b was subcultured on PSA medium (1 L: 200 g peeled potato, 20 g sucrose and 15 g agar; natural pH) at 28°C for 5 days. The fresh mycelium was harvested and transferred onto new PSA plates covered with a layer of cellophane to propagate enough amount of fresh mycelium. The fresh mycelium was collected, ground properly and cultured in liquid complete medium [44, 45] at 28°C, 180 rpm for 65 h. The culture was filtered through 2–4 layers of cheese cloth. The collected mycelium pellet was washed first with sterile ddH2O twice and then with 0.7 M NaCl twice, and incubated in 0.7 M NaCl solution containing 8 mg/ml Driselase (SIGMA D9515) at 31°C at 100 rpm for 3 h to release protoplasts. The protoplast-containing mixture was filtered through one layer of miracloth twice. The protoplast-containing solution was centrifuged at 1500 g for 15 min. The pellet was washed with 0.7 M NaCl for three times, with 1.2 M sorbitol once and then resuspended in a minimal volume (usually ~1 ml) of 1.2 M sorbitol to reach a compromise to obtain both as high as DNA concentration and as many as DNA plugs required for at least one attempt of BAC library construction (It is difficult to obtain a high DNA concentration from rice fungi). The protoplast suspension was mixed with an equal volume of 1% low melting point (LMP) agarose (prepared with 1.2 M sorbitol) at 45°C and then transferred into plug molds (Bio-Rad) to form plugs. The plugs were treated following our published protocol .
BAC library construction
BAC library construction was performed as previously described [46–48]. The linearized dephosphorylated low-copy BAC vector pIndigoBAC536-S was prepared with HindIII from a high-copy composite vector pHZAUBAC1 as previously described in Shi et al.. Individual BAC clones were arrayed in 384-well plates and stored at −80°C in our laboratory. The insert size of the BAC library was estimated by digesting random BAC clones with I-SceI and analyzing the digested products on 1% CHEF agarose gel at 5–15 s linear ramp, 6 V/cm, 14°C in 0.5× TBE buffer for 17 h.
BAC plasmid DNA preparation
BAC plasmid DNA was extracted as described by Kim et al. with minor modifications. BAC clones were inoculated in deep 96-well plates with a 96-well replicator, and each well contained 1.2 ml of 2 × YT medium plus 12.5 μg/ml chloramphenicol. The plates were covered with Airpore gas-permeable plate sealant (AXYGEN) and shaken on an orbital shaker at 180 rpm at 37°C for 20 hours. BAC plasmid DNA was extracted manually using the AxyPrep™ Easy-96 Plasmid Kit (24×96-prep (AXYGEN)), according to manufacturer’s instructions, and dissolved in 35 μl 1 mM Tris–HCl, pH 8.0.
BAC fingerprinting and contig assembly
BAC fingerprinting was performed using SNaPshot kit (ABI No. 4323159) as described by Luo et al. and Kim et al. with minor modifications. SNaPshot reaction products were purified and dissolved in 10 μl of Hi-Di formamide (ABI NO. 4311320) containing 0.15 μl of GeneScan-500 LIZ Size Standard (ABI No. 4322682). An ABI 3730 DNA analyzer with 50 cm capillaries (Applied Biosystems, Foster City, California) was used to separate fingerprinting fragments. Fingerprint profiles that contained fragment peaks between 50 and 200 were collected. FPC software version V9.4  was used for contig assemblies. The FPC parameters were adjusted as described [50, 52]. A series of cutoff and tolerance values were tested to obtain optimal assembly following the principles of decreasing the number of contigs without excessively increasing the number of questionable clones. After each round, when more than 5 Q clones existed in a contig, the “DQer” function was used to break up the Q contig with a step value of 2. Finally, the tolerance value was set to 4 and the Sulston cutoff value was set to 10-15. At the end, the contigs were improved using the “End to End” automerge function.
The primers which were used to evaluate the contig quality generated at cutoff 10-15 were designed from masked BESs by primer5, with the exception of contig184 whose 3 pairs of primers were SSR primers generated in SSR analysis. The conditions of the bacterial liquid PCR reaction were 94°C for 5 min for initial denaturation, followed by 35 cycles of denaturation at 94°C for 30 sec, annealing for 30 sec, and extension at 72°C for 40 sec, and a final cycle of extension for 10 min. The annealing temperature was selected based on the TM values of the primers. The products of PCR were separated in 1.0% agarose gels. The presence/absence of the bands of expected sizes were examined. The host cells, empty vector, and the clones U13J12 (contig6), U03H01 (contig133), U03J10 (contig53), and U03A03 (contig100) were randomly selected as control samples.
BAC end sequencing
BAC end sequencing was performed as previously described  with some modifications. BAC clones were sequenced at both ends on an ABI 3730 DNA Analyzer using Big-Dye v3.1 (Applied Biosystems, Foster City, California), following the manufacturer’s instructions. The two primers BACf (5′aacgacggccagtgaattg3′) and BACr (5′gataacaatttcacacagg3′) were used as forward and reverse sequencing primers, respectively. Sequences were base-called using Phred , and the vector and low-quality (Phred value <16) sequences were removed using the program LUCY . The reads less than 100 bp in length were removed. All the trimmed sequences were deposited in the GenBank database [GenBank:JY267549 to GenBank:JY274108].
Analysis of repetitive DNA in BESs
The known classes of repeat elements contained in the UV-8b BAC end sequences were identified by the RepeatMasker v3.3.0 pipeline (http://www.repeatmasker.org) from the Fungi subdatabase in RepBase17.07 . The BAC end sequences were used to search for novel repeat elements with RepeatScout1.0.5 . Only the sequences that were repeated > 3 times and were > 50 bp in length in the BES dataset were kept. Then, the remaining BES sequences were self-BLASTed to search for additional BESs that were repeated > 3 times and were > 50 bp in length [12, 58].
Genome comparative analysis
The genome sequences of the fungal pathogens M. oryzae, B. cinerea, P. graminis, F. graminearum, F. oxysporum, C. graminicola and U. maydis were downloaded from the Broad Institute Database (http://www.broadinstitute.org). The genome sequences of the fungal pathogens B. graminis, M. graminicola, M. laricis, M. anisopliae, N. haematococca, T.reesei and C. militaris were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov). The repeat-masked BESs were BLASTed against the above genome sequences using BLASTN with an E-value cutoff of 10-05. The matched sequences with longer than 50 bp and more than 80% identity were collected and analyzed. The BESs were also used to anchor the corresponding contigs to the genome sequences of M. anisopliae, F. oxysporum, N. haematococca and T. reesei using the SyMAP V3.4 program  (http://www.agcol.arizona.edu/software/symap/).
Analysis of simple sequence repeats
The BES sequences were clustered by the program CAP3  with default parameters to reduce the redundancy of the dataset. The non-redundant sequences were scanned by SciRoko3.4  to search for the potential SSRs, with the criteria of a minimum repeat number was of 3 and a minimum total length of 15 bp. The full standardization of SciRoKo, which groups all the similar and complementary SSR motifs together, was used for the SSR statistics, e.g. “TC”, “CT”, “AG”, and “GA” were grouped into “AG”. The genome sequences (GSS section) of B. graminis, F. virguliforme, M. oryzae and T. reesei were downloaded from NCBI (of December 2012) and mined for SSRs with the same criteria above for comparisons of the SSR contents and distribution patterns. The primers flanking SSRs were designed by standalone primer3  and the DesignPrimers program in the SciRoko3.4 package .
After the repeat elements were masked, the masked-BESs were clustered by CAP3  with default parameters to reduce redundancy. To identify protein-coding regions, the masked and non-redundant BESs were BLASTed to the GenBank EST database using the program BLASTN with an E-value cutoff of 10-10 and to the non-redundant protein database using the program BLASTX with an E-value cutoff of 10-06. Different functions of the program BLAST2GO  were used to analyze the BLASTX-identified sequences: GO terms were annotated by the GO function, the motifs and domains were identified by the InterProScan function and the pathways were annotated by the Enzyme Code and KEGG function.
Tanaka E, Ashizawa T, Sonoda R, Tanaka C: Villosiclava virens gen. nov., comb. nov., the teleomorph of Ustilaginoidea virens, the causal agent of rice false smut. Mycotaxon. 2008, 106 (1): 491-501.
Tanaka E, Kumagawa T, Tanaka C, Koga H: Simple transformation of the rice false smut fungus Villosiclava virens by electroporation of intact conidia. Mycoscience. 2011, 52 (5): 344-348. 10.1007/S10267-011-0115-6.
Fu R, Ding L, Zhu J, Li P, Zheng A-p: Morphological structure of propagules and electrophoretic karyotype analysis of false smut Villosiclava virens in rice. The Journal of Microbiology. 2012, 50 (2): 263-269. 10.1007/s12275-012-1456-3.
Ashizawa T, Takahashi M, Moriwaki J, Hirayae K: Quantification of the rice false smut pathogen Ustilaginoidea virens from soil in Japan using real-time PCR. Eur J Plant Pathol. 2010, 128 (2): 221-232. 10.1007/s10658-010-9647-4.
Yaegashi H, Fujita Y, Sonoda R: Severe outbreak of false smut of rice in 1988. Plant Protection Tokyo. 1989, 43 (6): 311-314.
Koiso Y, Li Y, Iwasaki S, Hanaoka K, Kobayashi T, Sonoda R, Fujita Y, Yaegashi H, Sato Z: Ustiloxins, antimitotic cyclic peptides from false smut balls on rice panicles caused by Ustilaginoidea virens. J Antibiot. 1994, 47 (7): 765-10.7164/antibiotics.47.765.
Ladhalakshmi D, Laha GS, Singh R, Karthikeyan A, Mangrauthia SK, Sundaram RM, Thukkaiyannan P, Viraktamath BC: Isolation and characterization of Ustilaginoidea virens and survey of false smut disease of rice in India. Phytoparasitica. 2012, 40 (2): 171-176. 10.1007/s12600-011-0214-0.
Atia MMM: Rice false smut (Ustilaginoidea virens) in Egypt. Zeitschrift Fur Pflanzenkrankheiten Und Pflanzenschutz-Journal of Plant Diseases and Protection. 2004, 111 (1): 71-82.
González VM, Rodríguez-Moreno L, Centeno E, Benjak A, Garcia-Mas J, Puigdomènech P, Aranda MA: Genome-wide BAC-end sequencing of Cucumis melo using two BAC libraries. BMC Genomics. 2010, 11 (1): 618-10.1186/1471-2164-11-618.
Ragupathy R, Rathinavelu R, Cloutier S: Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics. 2011, 12 (1): 217-10.1186/1471-2164-12-217.
Soler L, Conte M, Katagiri T, Howe A, Lee B-Y, Amemiya C, Stuart A, Dossat C, Poulain J, Johnson J: Comparative physical maps derived from BAC end sequences of tilapia (Oreochromis niloticus). BMC Genomics. 2010, 11 (1): 636-10.1186/1471-2164-11-636.
Lin H, Xia P, Wing RA, Zhang Q, Luo M: Dynamic Intra-Japonica Subspecies Variation and Resource Application. Mol Plant. 2012, 5 (1): 218-230. 10.1093/mp/ssr085.
Deng Y, Pan Y, Luo M: Detection and correction of assembly errors of rice Nipponbare reference sequence. Plant Biol. in press
Zhu H, Choi S, Johnston AK, Wing RA, Dean RA: A large-insert (130 kbp) bacterial artificial chromosome library of the rice blast fungus Magnaporthe grisea: genome analysis, contig assembly, and gene cloning. Fungal Genet Biol. 1997, 21 (3): 337-347. 10.1006/fgbi.1997.0996.
Martin SL, Blackmon BP, Rajagopalan R, Houfek TD, Sceeles RG, Denn SO, Mitchell TK, Brown DE, Wing RA, Dean RA: MagnaportheDB: a federated solution for integrating physical and genetic map data with BAC end derived sequences for the rice blast fungus Magnaporthe grisea. Nucleic Acids Res. 2002, 30 (1): 121-124. 10.1093/nar/30.1.121.
Pedersen C, Wu BQ, Giese H: A Blumeria graminis f.sp hordei BAC library - contig building and microsynteny studies. Curr Genet. 2002, 42 (2): 103-113. 10.1007/s00294-002-0341-8.
Chang Y-L, Cho S, Kistler HC, Hsieh C-S, Muehlbauer GJ: Bacterial artificial chromosome-based physical map of Gibberella zeae (Fusarium graminearum). Genome. 2007, 50 (10): 954-962. 10.1139/G07-079.
Doerks T, Copley RR, Schultz J, Ponting CP, Bork P: Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 2002, 12 (1): 47-56. 10.1101/gr.203201.
Diener SE, Chellappan MK, Mitchell TK, Dunn-Coleman N, Ward M, Dean RA: Insight into Trichoderma reesei’s genome content, organization and evolution revealed through BAC library characterization. Fungal Genet Biol. 2004, 41 (12): 1077-1087. 10.1016/j.fgb.2004.08.007.
Meksem K, Shultz J, Tebbji F, Jamai A, Henrich J, Kranz H, Arenz M, Schlueter T, Ishihara H, Jyothi LN, et al: A bacterial artificial chromosome based physical map of the Ustilago maydis genome. Genome. 2005, 48 (2): 207-216. 10.1139/g04-099.
Liu QL, Wang XM, Wang GJ, Luo CX, Tan XQ, Luo MZ: Construction and analysis of a BAC Library of Usitilaginoidea virens UV-2 Genome. Microbiology China. in press
Dean R, Van Kan JA, Pretorius ZA, Hammond-Kosack KE, Di Pietro A, Spanu PD, Rudd JJ, Dickman M, Kahmann R, Ellis J, et al: The Top 10 fungal pathogens in molecular plant pathology. Mol Plant Pathol. 2012, 13 (4): 414-430. 10.1111/j.1364-3703.2011.00783.x.
Soderlund C, Nelson W, Shoemaker A, Paterson A: SyMAP: A system for discovering and viewing syntenic regions of FPC maps. Genome Res. 2006, 16 (9): 1159-1168. 10.1101/gr.5396706.
Kofler R, Schlotterer C, Lelley T: SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007, 23 (13): 1683-1685. 10.1093/bioinformatics/btm157.
Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.
Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.
Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008, 36 (10): 3420-3435. 10.1093/nar/gkn176.
Skamnioti P, Gurr SJ: Against the grain: safeguarding rice from rice blast disease. Trends Biotechnol. 2009, 27 (3): 141-150. 10.1016/j.tibtech.2008.12.002.
Liu JL, Wang XJ, Mitchell T, Hu YJ, Liu XL, Dai LY, Wang GL: Recent progress and understanding of the molecular mechanisms of the rice-Magnaporthe oryzae interaction. Mol Plant Pathol. 2010, 11 (3): 419-427. 10.1111/j.1364-3703.2009.00607.x.
Thon MR, Martin SL, Goff S, Wing RA, Dean RA: BAC end sequences and a physical map reveal transposable element content and clustering patterns in the genome of Magnaporthe grisea. Fungal Genet Biol. 2004, 41 (7): 657-666. 10.1016/j.fgb.2004.02.003.
Thon MR, Pan H, Diener S, Papalas J, Taro A, Mitchell TK, Dean RA: The role of transposable element clusters in genome evolution and loss of synteny in the rice blast fungus Magnaporthe oryzae. Genome Biol. 2006, 7 (2): R16-10.1186/gb-2006-7-2-r16.
Xue M, Yang J, Li Z, Hu S, Yao N, Dean RA, Zhao W, Shen M, Zhang H, Li C, et al: Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae. PLoS Genet. 2012, 8 (8): e1002869-10.1371/journal.pgen.1002869.
Dean RA, Talbot NJ, Ebbole DJ, Farman ML, Mitchell TK, Orbach MJ, Thon M, Kulkarni R, Xu JR, Pan HQ, et al: The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 2005, 434 (7036): 980-986. 10.1038/nature03449.
Bischoff J, Sullivan R, Kjer K, White J: Phylogenetic placement of the anamorphic tribe Ustilaginoideae (Hypocreales, Ascomycota). Mycologia. 2004, 96 (5): 1088-1094. 10.2307/3762091.
Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, Town CD, Young ND: Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biol. 2005, 5 (1): 15-10.1186/1471-2229-5-15.
Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Synteny and collinearity in plant genomes. Science. 2008, 320 (5875): 486-488. 10.1126/science.1153917.
Teetor-Barsch GH, Roberts DW: Entomogenous Fusarium species. Mycopathologia. 1983, 84 (1): 3-16. 10.1007/BF00436991.
Nelson PE, Dignani MC, Anaissie EJ: Taxonomy, biology, and clinical aspects of Fusarium species. Clin Microbiol Rev. 1994, 7 (4): 479-504.
Kim H, San Miguel P, Nelson W, Collura K, Wissotski M, Walling JG, Kim JP, Jackson SA, Soderlund C, Wing RA: Comparative physical mapping between Oryza sativa (AA genome type) and O. punctata (BB genome type). Genetics. 2007, 176 (1): 379-390. 10.1534/genetics.106.068783.
Cavagnaro PF, Chung SM, Szklarczyk M, Grzebelus D, Senalik D, Atkins AE, Simon PW: Characterization of a deep-coverage carrot (Daucus carota L.) BAC library and initial analysis of BAC-end sequences. Mol Genet Genomics. 2009, 281 (3): 273-288. 10.1007/s00438-008-0411-9.
White TJ, Bruns T, Lee S, Taylor J: Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. PCR Protocols: A guide to methods and applications. 1990, 18: 315-322.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25 (24): 4876-4882. 10.1093/nar/25.24.4876.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.
Guo M, Chen Y, Du Y, Dong Y, Guo W, Zhai S, Zhang H, Dong S, Zhang Z, Wang Y: The bZIP transcription factor MoAP1 mediates the oxidative stress response and is critical for pathogenicity of the rice blast fungus Magnaporthe oryzae. PLoS Pathog. 2011, 7 (2): e1001302-10.1371/journal.ppat.1001302.
Zhang H, Tang W, Liu K, Huang Q, Zhang X, Yan X, Chen Y, Wang J, Qi Z, Wang Z: Eight RGS and RGS-like proteins orchestrate growth, differentiation, and pathogenicity of Magnaporthe oryzae. PLoS Pathog. 2011, 7 (12): e1002450-10.1371/journal.ppat.1002450.
Luo M, Wing RA: An improved method for plant BAC library construction. Methods Mol Biol. 2003, 236: 3-20.
Luo MZ, Wang YH, Frisch D, Joobeur T, Wing RA, Dean RA: Melon bacterial artificial chromosome (BAC) library construction using improved methods and identification of clones linked to the locus conferring resistance to melon Fusarium wilt (Fom-2). Genome. 2001, 44 (2): 154-162.
Wang C, Shi X, Liu L, Li H, Ammiraju JS, Kudrna DA, Xiong W, Wang H, Dai Z, Zheng Y, et al: Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives. Genetics. in press
Shi X, Zeng H, Xue Y, Luo M: A pair of new BAC and BIBAC vectors that facilitate BAC/BIBAC library construction and intact large genomic DNA insert exchange. Plant Methods. 2011, 7 (1): 33-10.1186/1746-4811-7-33.
Luo MC, Thomas C, You FM, Hsiao J, Shu OY, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J: High-throughput fingerprinting of bacterial artificial chromosomes using the SNaPshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics. 2003, 82 (3): 378-389. 10.1016/S0888-7543(03)00128-9.
Soderlund C, Humphray S, Dunham A, French L: Contigs built with fingerprints, markers, and FPC V4.7. Genome Res. 2000, 10 (11): 1772-1787. 10.1101/gr.GR-1375R.
Nelson WM, Bharti AK, Butler E, Wei FS, Fuks G, Kim H, Wing RA, Messing J, Soderlund C: Whole-genome validation of high-information-content fingerprinting. Plant Physiol. 2005, 139 (1): 27-38. 10.1104/pp.105.061978.
Luo M, Kim H, Kudrna D, Sisneros NB, Lee S-J, Mueller C, Collura K, Zuccolo A, Buckingham EB, Grim SM: Construction of a nurse shark (Ginglymostoma cirratum) bacterial artificial chromosome (BAC) library and a preliminary genome survey. BMC Genomics. 2006, 7 (1): 106-10.1186/1471-2164-7-106.
Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.
Li S, Chou HH: Lucy 2: an interactive DNA sequence quality trimming and vector removal tool. Bioinformatics. 2004, 20 (16): 2865-2866. 10.1093/bioinformatics/bth302.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110 (1–4): 462-467.
Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (suppl 1): i351-i358. 10.1093/bioinformatics/bti1018.
Aggarwal R, Benatti TR, Gill N, Zhao C, Chen MS, Fellers JP, Schemerhorn BJ, Stuart JJ: A BAC-based physical map of the Hessian fly genome anchored to polytene chromosomes. BMC Genomics. 2009, 10: 293-10.1186/1471-2164-10-293.
This work is supported by the Special Fund for Agro-Scientific Research in the Public Interest, P. R of China (200903039–3).
The authors declare that they have no competing interests.
XW carried out the BAC end sequencing, fingerprinting, contig quality evaluation, data analysis and elaborated this manuscript. QL constructed this BAC library and carried out the BAC end sequencing, fingerprinting and the phylogenetic analysis. HW carried out the BAC end sequencing and fingerprinting. ML, GW and C-XL designed the project. ML supervised this work. ML and GW finalized this manuscript. All the authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Figure S1: Neighbor-joining phylogenetic tree of 18S rRNA gene sequences. Phylogenetic tree showing the phylogenetic position of the strain UV-8b and the strains of the genera Villosiclava, Trichoderma, Metarhizium, Cordyceps and Torrubiella. Percentages at nodes are bootstrap values based on 1,000 replications. The scale bar represents 0.01 substitutions per nucleotide position. The 18S rRNA gene sequences of the strains UV-8b and UV-2 were generated in our laboratory. The 18S rRNA gene sequence of T. reesei QM6a was obtained by the draft genome sequence in BLAST. The 18S rRNA gene sequences of other strains were obtained from GenBank. (PNG 423 KB)
Additional file 2: Figure S2: Insert size analysis of randomly selected UV-8b BAC clones. The plasmid DNA of 180 randomly selected BAC clones from the UV-8b BAC library were digested with I-SceI, and the DNA fragments were separated on 1% CHEF agarose gel. Lane 21 was the MidRange PFG marker I (NEB). V is vector. (PNG 311 KB)
Additional file 3: Contig quality evaluation. The primers have been designed from repeat sequence masked BAC end sequences and were used to verify the overlap between clones in PhaseIA contig. (DOC 6 MB)
Additional file 4: Table S1: Information about the SSRs and the primers. With the SSRs and their flanking sequences as input, a total of 836 pairs primers were designed for SSR loci by Primer3 program. (XLS 224 KB)
Additional file 5: Table S2: GO annotations of gene products predicted from UV-8b BESs. GO annotations and terms at level 3. (XLS 50 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Wang, X., Liu, Q., Wang, H. et al. A BAC based physical map and genome survey of the rice false smut fungus Villosiclava virens. BMC Genomics 14, 883 (2013). https://doi.org/10.1186/1471-2164-14-883
- Rice false smut
- Villosiclava virens
- BAC library
- BAC end sequencing
- BAC fingerprinting
- Physical map