ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

Tang, Biao; Wang, Qi; Yang, Minjun; Xie, Feng; Zhu, Yongqiang; Zhuo, Ying; Wang, Shengyue; Gao, Hong; Ding, Xiaoming; Zhang, Lixin; Zhao, Guoping; Zheng, Huajun

doi:10.1186/1471-2164-14-289

Software
Open access
Published: 30 April 2013

ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

Biao Tang^1,2,
Qi Wang^3,6,
Minjun Yang²,
Feng Xie³,
Yongqiang Zhu²,
Ying Zhuo³,
Shengyue Wang²,
Hong Gao³,
Xiaoming Ding¹,
Lixin Zhang³,
Guoping Zhao^1,2,4,5 &
…
Huajun Zheng²

BMC Genomics volume 14, Article number: 289 (2013) Cite this article

8095 Accesses
29 Citations
11 Altmetric
Metrics details

Abstract

Background

With the emergence of next-generation sequencing, the availability of prokaryotic genome sequences is expanding rapidly. A total of 5,276 genomes have been released since 2008, yet only 1,692 genomes were complete. The final phase of microbial genome sequencing, particularly gap closing, is frequently the rate-limiting step either because of complex genomic structures that cause sequence bias even with high genomic coverage, or the presence of repeat sequences that may cause gaps in assembly.

Results

We have developed a Cytoscape plugin to facilitate gap closing for high-throughput sequencing data from microbial genomes. This plugin is capable of interactively displaying the relationships among genomic contigs derived from various sequencing formats. The sequence contigs of plasmids and special repeats (IS elements, ribosomal RNAs, terminal repeats, etc.) can be displayed as well.

Conclusions

Displaying relationships between contigs using graphs in Cytoscape rather than tables provides a more straightforward visual representation. This will facilitate a faster and more precise determination of the linkages among contigs and greatly improve the efficiency of gap closing.

Background

The emergence of next-generation sequencing (NGS) technology greatly facilitated genome sequencing. The long reads produced by Roche 454 or PacBio SMRT makes de novo assembly easier to complete. Despite the symmetrical representation of sequences produced by 454 or other NGS methods, tens to hundreds of contigs still exist due to repeat sequences or GC/AT-rich regions in the genomes. Therefore, determining the order of contigs and filling in the gaps among them using PCR are two essential and rate-limiting steps in the final phase of whole-genome sequencing. The ‘Newbler Assembler’ developed by Roche 454 has strict parameters to avoid mis-assembly and thus results in the breakdown of some contigs. For example, one read would be separated and placed into two contigs due to base-calling variation in different reads, and in some extreme cases, no gap truly existed between two such “contigs”. Several existing scaffolders for high throughput sequencing (HTS) genome assemblies, such as GRASS [1], SSPACE [2], OPERA [3] and MIP Scaffolder [4], may provide effective scaffolding; however, they lack global visualization and have to face the balance between scaffold length and accuracy. Most visualization tools, such as Consed [5], DNASTAR lasergene [6] and Gap [7], which are often used for genome completion and enable users to verify the assembly of contigs, can only display a linear relationship of contigs [8]. To provide a genome-level overview, ABySS-Explorer [9] and TGNet [10] were developed. TGNet incorporates several scripts for converting transcripts to facilitate assembly and represents contigs graphically using points. ABySS-Explorer [9] is another global viewer of contig assembly. However, neither program was designed to treat repeat contigs or display the reads that link contigs and imply the location of gaps and repeat contigs [8, 10] (Table 1). These programs also lack special functions for microbial genome analysis. Therefore, we developed ContigScape, a Cytoscape [11] plugin that can be used to display all relationships of contigs, including each contig and linked reads in a microbial genome; the gaps and repetitive sequences can then be confirmed by users. Our goal is to display the original relationships of all contigs instead of a manually trimmed result, as the real association of contigs should be depicted as a network rather than a linear linkage. Furthermore, repeat contigs, gaps and even plasmids can be highlighted, filtered, and customized.

Table 1 Comparison to other genomic display tools

Full size table

ContigScape is a convenient Java plugin based on Cytoscape [11], which is an established, free, and open-source software platform for the visualization and analysis of molecular interaction networks and can be used on Windows, Linux and Mac platforms. ContigScape is a simple and efficient plugin that makes gap closing during microbial genome sequencing more efficient.

Implementation

Sequencing of samples, de novo assembly of the genomes, and scaffolding

All genome sequences used in Table 2 had been released in GenBank and were generated by different laboratories in China and sequenced by the Chinese National Human Genome Center ast Shanghai. In our approach, genome sequencing was conducted using the Roche 454 GS FLX system and the GS FLX Titanium Sequencing Kit. Reads were then de novo assembled using Newbler v2.3. We constructed the mate-pair DNA libraries with insert sizes larger than 3 kb and sequenced using the Illumina Hiseq 2000 sequencing platform. A random subset of mate-pair reads were used for mapping and analysis with scaffold.pl (perl script, see Additional file 1, using BWA [18], Samtools [19], FASTX-Toolkit and BEDTools [20] programs).

Table 2 Strains used in this study and general sequence information

Full size table

Programming language, systems, and external programs

ContigScape was developed based on Cytoscape, which is available for Linux, Windows and MacOS X. The core programming language of ContigScape is Java. Users are provided with a comprehensive manual that explains all functions (see Additional file 1).

Counting contig abundance and copy number, and display

Our interest lies in estimating the abundance of repeat contigs. We define a repeat contig as one at least having twice as much read coverage than the average genome coverage. Average genome coverage is the ratio of the total bases of reads assembled into contigs and the total size of all contigs. When users input Contig Relationship Scape (CRS) file in our plugin without original assembly result, the default arithmetic for genome coverage is to count the average coverage of all contigs with size bigger than 20 kb (In our experience, the repeat contig bigger than 20 kb is rare in microbial genome except plasmid). Each copy number is calculated as the ratio of contig abundance and average genomic coverage, which represents the corresponding repetition rate of the contigs. Contig abundance is the ratio of total bases of reads assembled into this contig and the contig size. We define a specific contig as one having read coverage less than 1.5 fold average (default value is 1.5, which can be set by users). So, the contigs whose coverage is greater than 1.5 and less than 2 are probable repeats. They need to be confirmed by counting the connections at the end of the contig or PCR method. Like Figure 1B7, 106S-106E is a repeat contig verified by two linkages in each end. PCR needs to be used to determine the relationship “37S-37E-106S-106E-41S-41E-106E-106S-42E-42S” or “37S-37E-106S-106E-41E-41S-106E-106S-42E-42S”.

Meanwhile, the average number of linkages between contigs can be computed by $Z = \sum_{1}^{n} linkNum between largecontigs (size \geq 20 Kb) / n$ , where Z is the average number of linkages and n is the number of relationships conforming to the requirements. As above, the ratio of link number and Z indicates the width of edge representing linkage in CytoScape.

Principles of displaying Roche 454 genome assembly results

Roche 454 reads now exceed 700 base pairs in length and thus can be used to resolve gaps caused by small repeats. The ‘Newbler Assembler’ may produce a ‘454Contigs.ace’ file, which contains all assembly information and can be shown by ‘Consed’ [5]. As indicated in Supporting Figure 2, when a read was separated into two contigs, the coordinate of the read in each contig was shown after the read name, followed by the contig number with which this read was linked. The general principle to label the reads spanning the linked contigs is to use ‘fmX’ to represent the 5’ end of the reads located in contigX and ‘toY’ to represent the 3’ end of the reads located in contigY. This unique feature of the ‘Newbler Assembler’ labeling system in conjunction with long reads from 454 enables us to extract all the information of ‘fm’ and ‘to’ from the ‘454Contigs.ace’ file. This information can then be arranged into a relationship table (Figure 2C, D), such as ‘5’-end-Contig1’ linked to ‘3’-end-Contig2’. This relationship table can then be displayed by ContigScape as shown in Figure 3D.

Principles of displaying scaffolds constructed by mate-pair reads

A scaffold is a consensus sequence formed by ordered contigs using ‘N’ to fill any gaps. The most common method uses the mate-pair information to assemble contigs into scaffolds. Scaffolding programs can determine the separation of two contigs depending on the fragment size of the mate-pair reads. For example, if two contigs were separately mapped by a pair of 3-kb mate-pair reads, the two contigs could be joined into a scaffold, and the gap size would be 3 kb minus the distance between the mapping loci and the end of contigs. This method would allow repeat regions less than 3 kb in length to be bridged. However, ambiguous linkages can occur if the repeat region was longer than the fragment size of the mate-pair library (Figure 4). Similar to the results from ‘Newbler’ for 454 reads, ContigScape can display a relationship network within scaffolds by counting the number of mate-pair reads linking to large contigs (>500 bp).

Results and discussion

Visualization

Repeats are usually assembled into single contigs and thus cause gaps. After sequencing, two repeat regions (R1 and R2, Figure 3A) were assembled into the R1/R2 repeat contig (Figure 3B), and ContigScape reported all of its possible linkages with other regions (1–4, Figure 3C–D). Further PCR validation guided by this predicted linkage would exclude the incorrect relationships and result in a final correct consensus sequence. The repeat contigs in ContigScape are shown in red (Figure 3D) to distinguish them from the normal contigs shown in dark blue (default setting). In addition, the number of reads connecting two contigs is labeled with linkage edges, and the linkage reliability is illustrated by variable edge thickness.

The key feature of ContigScape is to determine the linkage of two contigs assembled from 454 or Illumina reads. An ‘Ace’ file can be opened directly by ContigScape and the relationship of contigs can be saved as a CRS format (see sample, tabbed.txt, tabbedCov.txt, Additional file 1). The CRS format includes two files, and each contains three columns. ‘tabbed.txt’ contains the number of connections among contigs, and ‘tabbedCov.txt’ describes the length and coverage of contigs. The ‘tabbed.txt’ is similar to AGP file and describes how the chromosomes and scaffolds were assembled from the component contigs, but does not require contigs to be sorted in advance. It will produce an original graph after loading the two files, and a final graph needed for the layout function of Cytoscape. Researchers can also obtain the CRS information by converting the results from GRASS, SSPACE, OPERA and MIP scaffolders.

Another prominent characteristic of ContigScape is the calculation of the coverage of contigs and the subsequent definition of the contig whose coverage exceeded two fold above the average, denoted as ‘repeat contig’. Each contig is represented by one edge and two nodes, with ‘XS’ and ‘XE’ indicating the 5’ end (Start) and 3’ end (End) of contigX (X represents a number), respectively. The linkage (reads) is represented by a sole edge whose thickness varies based on the number of supporting reads. The number on the edge of contigs indicates the contig length, whereas the number on the edge of linkages indicates the number of linking reads.

Application of technology to display 454 contigs and scaffolding by mate-pair reads

We have used this tool for the visualization of eleven genomes (Table 2, Figure 5), accelerating the completion of these genomes (nine of them have been published). After de novo assembly by 454 Newbler, researchers can estimate the complexity of specific genomes and the difficulty of gap closing with global views. In Figure 5, we see significant differences in the assembly of eleven genomes due to variance in the number of total contigs and repeat contigs. In addition, ContigScape has been applied to gap closing of an additional 40 genomes (Figure 6); the network of contigs in Streptomyces, Leptospira and Ralstonia is complex, whereas the contig graphs of Brucella, Mycoplasma and Ketogulonicigeniumis is simple. These genomes comprised bacteria, archaea, virus and fungi. It was clear that the gap closing for A. hospitalis W1 was easy. In the graph of A. hospitalis W1, we saw that the 28-kb contig3 was a tandem repeat, which had previously been identified as an integrated plasmid [26]. It is easy to determine if the plasmid is circular and if the copy number exceeds two, such as A. orientalis HCCB10007 and E.tarda EIB202 [25]. The 24th graph of Figure 6 shows four circular plasmids composed of only one contig. There was also a high-copy-number contig in the graph of Mycobacterium tuberculosis CCDC5079, and BLAST identified it as IS6110, an insertion element. The 14 rRNA operons of Bacillus thuringiensis BMB171 [24], each of approximately 5 kb in length, can also be clearly displayed (Figure 5). There were also many independent and closed rings in the assembly graph of Cotesia vestalis Bracovirus[27], which were identified as 35 non-redundant circular genome segments. The number of contigs in the fungus Cordyceps militaris[29] exceeded 2,000, therefore the contigs need further scaffolding.

We applied ContigScape to a recently assembled Streptomyces sp genome with 111 contigs sequenced by Roche 454 without scaffolding. We added seven contigs (contig140, 141, 142, 143, 144, 145 and 146) into the two CRS files to show different plasmids (Figure 1B). After processing, we found 25 repeat contigs, constituting six plasmids, 8 rRNA operons and one telomere (contig28, Figure 1B2). The remaining repeats include IS elements, phage or other sequences. Figure 1A shows that 52 nodes have no linkage, and they need additional scaffolding information. Therefore, PCR is necessary to fill the remaining gaps. Any relationships requiring validation are indicated by a green edge.

Judging whether a repeat contig was from chromosome or plasmid mainly depended on the linkage information of two ends of this contig. Four different types were shown in Figure 1B: 1). Repeat contigs connected in a circular fashion (Panel 3), 2). Individual contig connected itself without anyone else (Panel 4 and 6), 3). One end of repeat contig having no linkage to any other contigs, usually representing linear chromosome telomere or linear plasmid end (Panel 1 and 2), 4). A linear plasmid composed of only one repeat contig without connections to any contigs (Panel 5). While if a plasmid is linear and single copy, ContigScape cannot distinguish it. We can estimate whether or not a contig was a plasmid effectively based on above described situation in our experience. Of course researcher must confirm whether it is a plasmid or not by PCR, sequencing and annotation.

In Figure 1B, 143E has connections with 142E and 144E (Panel 3). But the number of connections (800) between 143E and 142E is more than that (10) between 143E and 144E. In this case, the latter might be a nonspecific connection caused by little overlap among the reads. Additionally, Figure 1B shows that contig78 in the linear plasmid 80E-80S-78E-78S-54E-54S also has another copy in the chromosome (Panel 1).

We also applied this program to another Streptomyces sp genome with 145 contigs sequenced by Roche 454 with mate-pair information (Figure 7). We can better interpret the relationship between contigs by using mate-pair reads. Figure 7C represents a linear chromosome with an 18 kb repeat at the ends (telomeres).

Display functionality of ContigScape

There are several unique features of ContigScape for microbial genome analysis (Figure 1). In particular, the “find genomic features” function may identify contigs belonging to plasmid/terminal repeats, determine whether the plasmid was linear or circular, and counting the read coverage of this plasmid (Figure 1B). Second, ContigScape may determine the locations of the ends of linear chromosomes based on a repeat contig where in one end has two edges and the other has none. After the ‘Ace’ file is loaded, the genomic structure network can be displayed, including the linkage of contigs, contig size and number of repeats. Meanwhile, another plugin called Network Analyzer [30] can be used to determine the complexity of the network (genome), and thus estimate the amount of work required to complete the genome. When viewing the graph, the 1,000 base pairs of both 5’-end and 3’-end can be loaded, with 20 “N” linking them representing the middle sequences. Clicking the edge of two contigs, the sequence containing corresponding contigs’ ends can also be displayed. The displayed sequence can be used to design primers in ContigScape and perform blast against NCBI database. In addition, the user can open “edit panel” to edit the connections of the network. In addition to gap closing in bacterial genomes, complete BAC or plasmid sequences can also be finished using ContigScape. It can also display if a CRS file, converted from scaffolding results using different methods, was imported. The workflow of ContigScape is shown in Figure 8. Other functions of ContigScape are described in an Additional file 1 (see ContigScape manual).

Discussion

Comparative assembly [31] utilizes a reference genome sequence as a guide to discern repeat contigs. However, there are three obvious weaknesses regarding comparative assembly: (1) the target species must have previously been sequenced and assembled; (2) structural variations exists in different references; (3) it cannot resolve large insertions. For example, we resequenced Amycolatopsis mediterranei S699 and assembled the genome de novo[21]. Comparing with the previously released A. mediterranei S699 assembly [32], which was assembled using A. mediterranei U32 as a reference, the genome we sequenced contained a 10-kb insertion. The differences can likely be attributed to the different strategies used for genome assembly [21]. De novo assembly is a reliable way to avoid these weaknesses of comparative assembly.

Each sequencing technology has its own biases that result in coverage gaps. As coverage increases, the number of gaps decreases. However, gaps can occur if reads that would typically be assembled into one contig cannot span a large repeat area. Therefore, utilizing repeat contigs is important. During scaffold construction, repeat contigs usually cause errors in scaffolding or in the creation of linkages. Some programs may elect to link two unique contigs with one repeat contig, thus the individual repeat contig is used only once. Therefore, correct judgment will greatly reduce the efforts invested in genome assembly. Displaying straightforward graph-based relationships of contigs in Cytoscape rather than tables also facilitates a faster and more precise determination of the linkages among contigs. Our goal is to display the original relationships of all contigs rather than the manually trimmed results because the true association of contigs should be depicted as a network rather than a linear linkage.

ContigScape isn’t an assembly program and cannot replace phred/phrap/consed package, indeed they are complementary to each other. Consed [33] and its process “autofinish” [34] are very useful in gap closing. Actually, all contigs’ PHD files together with ABI3730 data sequenced after PCR must be assembled using phrap and edited by consed at last in our finishing strategy. ContigScape looks like a canvas used to judge and edit the order among contigs and can evaluate the complexity of shot-gun assembly in global visually. The plugin can only process several NGS assembly data directly like 454Conitgs.ace and mate-pair reads, while the assembly result made by other programs should be transformed into CRS file as input.

Conclusions

Using ContigScape, contigs can be displayed and repeat contigs, gaps, and even plasmids can be highlighted, filtered, and customized. We designed unique functions for microbial genome analysis in ContigScape, such as the identification of plasmids, whether they are linear or circular and an estimation of their read coverage. We believe with the development of the third-generation sequencing technologies, gap closing will be much easier due to fewer assembled contigs. Long repeats will still hamper the assembly, especially in larger genomes; however, ContigScape will play an important role in gap closing for these genomes.

Accession numbers

The genome sequences have been deposited at NCBI under the accession numbers:

[GenBank: CP003729], [GenBank: CP002819], [GenBank: CP002820], [GenBank: CP003410], [GenBank: CP002884], [GenBank: CP002919], [GenBank: CP001903], [GenBank: CP001904], [GenBank: CP001135], [GenBank: CP002535], [GenBank: HQ009524-HQ009558], [GenBank: CP002513], [GenBank: AEVU00000000].

Availability and requirements

Project name: ContigScape

Project home page: http://sourceforge.net/projects/contigscape/.

Operating systems: Windows, Linux, MacOSX.

Programming language: Java, Perl

Software packages (Linux): Fastx_toolkit 0.0.13, BEDTools 2.14.3, BWA 0.5.7, Samtools 0.1.18

Other requirements: Java 1.6 or higher, Cytoscape 2.8.3 (After Java and Cytoscape are installed, put ContigScape.jar under cytoscape2.8.3/plugins folder).

License: GNU

Restriction for non-academics: Users willing to use ContigScape for non-academic purposes should contact the corresponding author for details.

Abbreviations

NGS:: Next-generation sequencing
HTS:: High throughput sequencing
CRS:: Contig relationship scape.

References

Gritsenko AA, Nijkamp JF, Reinders MJ, de Ridder D: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics. 2012, 28 (11): 1429-1437. 10.1093/bioinformatics/bts175.
Article CAS PubMed Google Scholar
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W: Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011, 27 (4): 578-579. 10.1093/bioinformatics/btq683.
Article CAS PubMed Google Scholar
Gao S, Sung WK, Nagarajan N: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol. 2011, 18 (11): 1681-1691. 10.1089/cmb.2011.0170.
Article PubMed Central CAS PubMed Google Scholar
Salmela L, Makinen V, Valimaki N, Ylinen J, Ukkonen E: Fast scaffolding with small independent mixed integer programs. Bioinformatics. 2011, 27 (23): 3259-3265. 10.1093/bioinformatics/btr562.
Article PubMed Central CAS PubMed Google Scholar
Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8 (3): 195-202.
Article CAS PubMed Google Scholar
Burland TG: DNASTAR’s Lasergene sequence analysis software. Methods Mol Biol. 2000, 132: 71-91.
CAS PubMed Google Scholar
Bonfield JK, Smith K, Staden R: A new DNA sequence assembly program. Nucleic Acids Res. 1995, 23 (24): 4992-4999. 10.1093/nar/23.24.4992.
Article PubMed Central CAS PubMed Google Scholar
Nielsen CB, Cantor M, Dubchak I, Gordon D, Wang T: Visualizing genomes: techniques and challenges. Nat Methods. 2010, 7 (3 Suppl): S5-S15.
Article CAS PubMed Google Scholar
Nielsen CB, Jackman SD, Birol I, Jones SJ: ABySS-Explorer: visualizing genome sequence assemblies. IEEE Trans Vis Comput Graph. 2009, 15 (6): 881-888.
Article PubMed Google Scholar
Riba-Grognuz O, Keller L, Falquet L, Xenarios I, Wurm Y: Visualization and quality assessment of de novo genome assemblies. Bioinformatics. 2011, 27 (24): 3425-3426. 10.1093/bioinformatics/btr569.
Article CAS PubMed Google Scholar
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.
Article PubMed Central CAS PubMed Google Scholar
Bonfield JK, Whitwham A: Gap5–editing the billion fragment sequence assembly. Bioinformatics. 2010, 26 (14): 1699-1703. 10.1093/bioinformatics/btq268.
Article PubMed Central CAS PubMed Google Scholar
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12 (6): 996-1006.
Article PubMed Central CAS PubMed Google Scholar
Stalker J, Gibbins B, Meidl P, Smith J, Spooner W, Hotz HR, Cox AV: The Ensembl Web site: mechanics of a genome browser. Genome Res. 2004, 14 (5): 951-955. 10.1101/gr.1863004.
Article PubMed Central CAS PubMed Google Scholar
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol. 2011, 29 (1): 24-26. 10.1038/nbt.1754.
Article PubMed Central CAS PubMed Google Scholar
Huang W, Marth G: EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res. 2008, 18 (9): 1538-1543. 10.1101/gr.076067.108.
Article PubMed Central CAS PubMed Google Scholar
Schatz MC, Phillippy AM, Shneiderman B, Salzberg SL: Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biol. 2007, 8 (3): R34-10.1186/gb-2007-8-3-r34.
Article PubMed Central PubMed Google Scholar
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.
Article PubMed Central CAS PubMed Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.
Article PubMed Central PubMed Google Scholar
Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26 (6): 841-842. 10.1093/bioinformatics/btq033.
Article PubMed Central CAS PubMed Google Scholar
Tang B, Zhao W, Zheng H, Zhuo Y, Zhang L, Zhao GP: Complete genome sequence of Amycolatopsis mediterranei S699 based on de novo assembly via a combinatorial sequencing strategy. J Bacteriol. 2012, 194 (20): 5699-5700. 10.1128/JB.01295-12.
Article PubMed Central CAS PubMed Google Scholar
Xu J, Zheng HJ, Liu L, Pan ZC, Prior P, Tang B, Xu JS, Zhang H, Tian Q, Zhang LQ: Complete genome sequence of the plant pathogen Ralstonia solanacearum strain Po82. J Bacteriol. 2011, 193 (16): 4261-4262. 10.1128/JB.05384-11.
Article PubMed Central CAS PubMed Google Scholar
Mi S, Song J, Lin J, Che Y, Zheng H: Complete genome of Leptospirillum ferriphilum ML-04 provides insight into its physiology and environmental adaptation. J Microbiol. 2011, 49 (6): 890-901. 10.1007/s12275-011-1099-9.
Article CAS PubMed Google Scholar
He J, Shao X, Zheng H, Li M, Wang J, Zhang Q, Li L, Liu Z, Sun M, Wang S: Complete genome sequence of Bacillus thuringiensis mutant strain BMB171. J Bacteriol. 2010, 192 (15): 4074-4075. 10.1128/JB.00562-10.
Article PubMed Central CAS PubMed Google Scholar
Yang M, Lv Y, Xiao J, Wu H, Zheng H, Liu Q, Zhang Y, Wang Q: Edwardsiella comparative phylogenomics reveal the new intra/inter-species taxonomic relationships, virulence evolution and niche adaptation mechanisms. PLoS One. 2012, 7 (5): e36987-10.1371/journal.pone.0036987.
Article PubMed Central CAS PubMed Google Scholar
You XY, Liu C, Wang SY, Jiang CY, Shah SA, Prangishvili D, She Q, Liu SJ, Garrett RA: Genomic analysis of Acidianus hospitalis W1 a host for studying crenarchaeal virus and plasmid life cycles. Extremophiles. 2011, 15 (4): 487-497. 10.1007/s00792-011-0379-y.
Article PubMed Central CAS PubMed Google Scholar
Chen YF, Gao F, Ye XQ, Wei SJ, Shi M, Zheng HJ, Chen XX: Deep sequencing of Cotesia vestalis bracovirus reveals the complexity of a polydnavirus genome. Virology. 2011, 414 (1): 42-50. 10.1016/j.virol.2011.03.009.
Article CAS PubMed Google Scholar
Li Y, Zheng H, Liu Y, Jiang Y, Xin J, Chen W, Song Z: The complete genome sequence of Mycoplasma bovis strain Hubei-1. PLoS One. 2011, 6 (6): e20999-10.1371/journal.pone.0020999.
Article PubMed Central CAS PubMed Google Scholar
Zheng P, Xia Y, Xiao G, Xiong C, Hu X, Zhang S, Zheng H, Huang Y, Zhou Y, Wang S: Genome sequence of the insect pathogenic fungus Cordyceps militaris, a valued traditional Chinese medicine. Genome Biol. 2011, 12 (11): R116-10.1186/gb-2011-12-11-r116.
Article PubMed Central CAS PubMed Google Scholar
Assenov Y, Ramirez F, Schelhorn SE, Lengauer T, Albrecht M: Computing topological parameters of biological networks. Bioinformatics. 2008, 24 (2): 282-284. 10.1093/bioinformatics/btm554.
Article CAS PubMed Google Scholar
Pop M, Phillippy A, Delcher AL, Salzberg SL: Comparative genome assembly. Brief Bioinform. 2004, 5 (3): 237-248. 10.1093/bib/5.3.237.
Article CAS PubMed Google Scholar
Verma M, Kaur J, Kumar M, Kumari K, Saxena A, Anand S, Nigam A, Ravi V, Raghuvanshi S, Khurana P: Whole genome sequence of the rifamycin B-producing strain Amycolatopsis mediterranei S699. J Bacteriol. 2011, 193 (19): 5562-5563. 10.1128/JB.05819-11.
Article PubMed Central CAS PubMed Google Scholar
Gordon D: Viewing and editing assembled sequences using consed. Curr Protoc Bioinformatics. 2003, Chapter 11 (Unit11): 12-
Google Scholar
Gordon D: Automated finishing with autofinish. Genome Res. 2001, 11 (4): 614-625. 10.1101/gr.171401.
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgments

We would like to thank the students of gap closing group in Chinese National Human Genome Center at Shanghai for suggestions about the plugin. This work was supported by the grants from National Natural Science Foundation of China (30830002, 31121001 ,31270056), from National Basic Research Program of China (2012CB721102) and the Shanghai Rising-Star Program (11QA1404600).

Author information

Authors and Affiliations

State Key Laboratory of Genetic Engineering, Department of Microbiology, School of Life Sciences, Fudan University, Shanghai, 200433, China
Biao Tang, Xiaoming Ding & Guoping Zhao
Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, 201203, China
Biao Tang, Minjun Yang, Yongqiang Zhu, Shengyue Wang, Guoping Zhao & Huajun Zheng
CAS Key Laboratory of Pathogenic Microbiology & Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100190, China
Qi Wang, Feng Xie, Ying Zhuo, Hong Gao & Lixin Zhang
CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
Guoping Zhao
Department of Microbiology and Li KaShing Institute of Health SciencesThe Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
Guoping Zhao
Graduate School of Chinese Academy of Sciences, Beijing, 100049, China
Qi Wang

Authors

Biao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minjun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhuo
View author publications
You can also search for this author in PubMed Google Scholar
Shengyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Ding
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guoping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Huajun Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lixin Zhang, Guoping Zhao or Huajun Zheng.

Additional information

Competing interests

The authors declare that they have no competing interests.

Biao Tang, Qi Wang contributed equally to this work.

Electronic supplementary material

Additional file 1: Listing all links of ContigScape, user manual and test datasets.(DOCX 18 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tang, B., Wang, Q., Yang, M. et al. ContigScape: a Cytoscape plugin facilitating microbial genome gap closing. BMC Genomics 14, 289 (2013). https://doi.org/10.1186/1471-2164-14-289

Download citation

Received: 29 December 2012
Accepted: 20 April 2013
Published: 30 April 2013
DOI: https://doi.org/10.1186/1471-2164-14-289

ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

Abstract

Background

Results

Conclusions

Background

Implementation

Sequencing of samples, de novo assembly of the genomes, and scaffolding

Programming language, systems, and external programs

Counting contig abundance and copy number, and display

Principles of displaying Roche 454 genome assembly results

Principles of displaying scaffolds constructed by mate-pair reads

Results and discussion

Visualization

Application of technology to display 454 contigs and scaffolding by mate-pair reads

Display functionality of ContigScape

Discussion

Conclusions

Accession numbers

Availability and requirements

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Competing interests

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us