WildSilkbase: An EST database of wild silkmoths
© Arunkumar et al; licensee BioMed Central Ltd. 2008
Received: 24 December 2007
Accepted: 17 July 2008
Published: 17 July 2008
Functional genomics has particular promise in silkworm biology for identifying genes involved in a variety of biological functions that include: synthesis and secretion of silk, sex determination pathways, insect-pathogen interactions, chorionogenesis, molecular clocks. Wild silkmoths have hardly been the subject of detailed scientific investigations, owing largely to non-availability of molecular and genetic data on these species. As a first step, in the present study we generated large scale expressed sequence tags (EST) in three economically important species of wild silkmoths. In order to make these resources available for the use of global scientific community, an EST database called 'WildSilkbase' was developed.
WildSilkbase is a catalogue of ESTs generated from several tissues at different developmental stages of 3 economically important saturniid silkmoths, an Indian golden silkmoth, Antheraea assama, an Indian tropical tasar silkmoth, A. mylitta and eri silkmoth, Samia cynthia ricini. Currently the database is provided with 57,113 ESTs which are clustered and assembled into 4,019 contigs and 10,019 singletons. Data can be browsed and downloaded using a standard web browser. Users can search the database either by BLAST query, keywords or Gene Ontology query. There are options to carry out searches for species, tissue and developmental stage specific ESTs in BLAST page. Other features of the WildSilkbase include cSNP discovery, GO viewer, homologue finder, SSR finder and links to all other related databases. The WildSilkbase is freely available from http://www.cdfd.org.in/wildsilkbase/.
A total of 14,038 putative unigenes was identified in 3 species of wild silkmoths. These genes provide important resources to gain insight into the functional and evolutionary study of wild silkmoths. We believe that WildSilkbase will be extremely useful for all those researchers working in the areas of comparative genomics, functional genomics and molecular evolution in general, and gene discovery, gene organization, transposable elements and genome variability of insect species in particular.
Sudden spurt in sequencing projects in recent years has resulted in exponential increase in the genome sequence repertoire of species that are close relatives of many model organisms. Availability of the sequence resources has accelerated comparative genomic analysis and has thus added to our understanding of organismal biology of these species. In fact new insights into human genome have come only after the sequences of related species such as chimpanzee, monkey and ape were published. However, in the insect order Lepidoptera, which consists of many economically important insects such as silkmoths, agriculture pests and beautiful butterflies, only the domesticated silkworm, Bombyx mori has achieved the distinction of the most well studied insect next only to Drosophila. Therefore, comparative genomics in this order is still in its infancy.
Functional genomics has particular promise in silkworm biology for identifying genes involved in synthesis and secretion of silk, pathways involved in processes such as insect-pathogen interactions and sex determination mechanisms. Among lepidopterans, several databases such as Silkbase  for ESTs, Kaikobase  for whole genome sequence, Silkdb  for ESTs and whole genome sequence, and Silksatdb  for microsatellites, have been developed for B. mori. Apart from B. mori, more than 13,000 ESTs have been made available for butterflies through Butterflybase  and 32,217 ESTs for the pest species, Spodoptera frugiperda through Spodobase . However, wild silkmoths are least represented owing largely to non-availability of genomic resources in these species. Hence, generation and utilization of genomic information from wild lepidopterans will be extremely useful in understanding these species at molecular level.
The members of family Saturniidae, collectively known as saturniids, are among the largest and most spectacular of the Lepidoptera, with an estimated 1,300 to 1,500 different described species distributed worldwide . The Saturniidae family includes the giant silkmoths, royal moths and emperor moths. The muga silkmoth, Antheraea assama (n = 15), confined to the North-eastern states of India, is the least understood and unique species among saturniid moths. The silk proteins of this species have not been studied so far despite their unique properties of providing golden lustre to the silk thread. Samia cynthia ricini (n = 13) a multivoltine silkworm commonly called as 'eri silkworm' is known for its white or brick-red eri silk. It is distributed in India, China and Japan. Its ecoraces (~16) are distributed across the Palaearctic and Indo-Australian biogeographic regions. The tropical Indian tasar silkmoth, Antheraea mylitta (n = 31) is a natural fauna of tropical India, represented by more than 20 well-described, genetically distinct ecoraces. Pursuing genetics and genomics of saturniids will be of significance for the following reasons: a) Typical of lepidopterans, B. mori females are heterogametic, with a ZW chromosome constitution; males are ZZ. Sex chromosomes are considered to be under evolutionary constraints different from those of autosomes. W chromosome is reported to be strongly female determining . The sex chromosome system of saturniid silkmoth A. assama, on the other hand, is ZZ/ZO as compared to ZZ/ZW observed in other silkmoths. Comparative study of the sex determining genes, would thus reveal diverse sex determination mechanisms in silkmoths, b) Photoperiod plays an important role in the life history traits of wild silkmoths and hence it is important to investigate the genes involved in circadian rhythm, c) Silk fibres of different wild silkmoths show vast differences in their tenacity, texture, lustre and many other biophysical properties. In the light of these, it is interesting to study the genes encoding the silk proteins of wild silkmoths and compare them with those of mulberry silkmoth, and d) Information on immune response genes in these species can throw light on diversity of immune repertoir in these moths and may lead to identification of novel immune genes.
Details of cDNA libraries, number of ESTs generated in each wild silkmoth species and results of EST analysis.
Total no. of ESTs
No. of contigs
No. of singletons
No. of unigenes
96 hours after oviposition
Samia cynthia ricini
96 hours after oviposition
12–24 hours after injection of E. coli into hemocoel of fifth instar larvae
12–24 hours after injection of Candida albicans into hemocoel of fifth instar larvae
24 hours after injection of E. coli into hemocoel of fifth instar larvae
Construction and content
EST sequences were generated from 3 wild silkmoth species by sequencing the cDNA clones amplified from mRNA isolated from several tissues at different developmental stages. All the protocols employed for RNA extraction, cDNA library preparation and EST processing pipeline are available online in the database and can be accessed from 'Protocols' section. The ESTs were further processed with Phred program  for base calling DNA sequence traces. A cut off Phred score of 15 was assigned to extract quality sequences from chromatograms. In order to enhance the quality of sequences, ESTs were screened for presence of vector sequences and subsequently detected vector sequences were then removed using 'Cross Match' program. EST reads with length less than 100 bp were discarded. Majority of ESTs included in the database are having the lengths ranging between 400–600 bp.
To produce non redundant EST dataset for further functional annotation and comparative analysis, 57,113 ESTs were clustered and assembled through TGICL package  with the CAP3  default options. Based on regions of similarity, EST sequences were merged into contigs. A total of 14,038 EST clusters consisting of 4,019 contigs and 10,019 singletons, putatively regarded as unigenes, was generated (Table 1). From these unigene sequences, poly-A tails were trimmed using TrimEST program of EMBOSS . Trimmed unigene sequences thus obtained were annotated for GO . The GO annotation is based on the closest homologues identified by BLAST search against Seqdblite FASTA sequence flat file . All the unigenes were assigned a biological process, molecular function and cellular component using GO database. ESTs are potential resources for SSR and SNP marker discovery and hence they were screened for SSRs by using Tandem Repeats Finder (TRF) . For extraction of repeats, we assigned the following TRF parameters: match = 2, mismatch = 3, indel = 5, match probability = 0.8, indel probability = 0.1, minimum score = 25 and maximum period = 10. A. mylitta EST sequence dataset was further analysed for potential SNPs in cDNA sequences (cSNPs) using SEAN SNP Prediction Program with default settings . A total of 118 cSNPs was predicted in 1412 EST sequences. These predicted cSNPs, after experimental validation, will be useful for the analysis of genetic variation and population structure of A. mylitta populations.
Utility and discussion
To categorise transcripts by function, we utilized the GO classification. The 'GO Viewer' interface is designed to browse GO terminologies as a tree of terms. The number next to GO term represents the number of gene products annotated to that term which are included in the database and selected in the current view. BLAST  search offered by WildSilkbase allows users to compare any query sequence against A. assama, S. c. ricini and A. mylitta ESTs and putative unigene sequence datasets. BLAST search results are returned directly to the user's web browser in HTML format (Figure 2). The sequence IDs on the BLAST result page are further linked to respective sequence information such as organism name, tissue of origin, sequence length, unigene ID and sequence. A link to ClustalW  alignment file of the sequences matched in the databases is also provided on the result page.
The 'cSNP' web page provides direct access to cSNPs of A. mylitta. The results include information such as, contig ID, contig sequence length, the ESTs included in the contig, SNP location, alleles and consensus sequence.
The database also provides information on wild silkmoth biology, cDNA library construction and EST processing pipeline. The 'Picture Gallery' section has been incorporated in the database to give access to pictures of wild silkmoths. Links to several other databases and resources related to ESTs and insects are provided on 'Useful Links' webpage. A 'General Help' page is included for easy and efficient use of the database. The technical terms occurring in the database are hyperlinked to the 'Glossary' page for quick reference. In general, WildSilkbase allows the users to access all applications. The EST sequences of wild silkmoths are also deposited in NCBI and can be accessed at the NCBI EST sequence database, dbEST (accession numbers: A. assama; FE952359-FE963860 and FG203277-FG226965, A. mylitta; EB742119-EB743530, S. c. ricini; DC858270-DC878540).
Gene Ontology annotation
GO mapping to molecular function revealed that a majority of genes from the tissue transcriptomes of wild silkmoths have almost equal distribution for 'binding' function (property of binding macro-molecules) and catalytic activity. The next abundant molecular function observed in the transcriptomes was structural molecule activity. In case of biological process, majority of the ESTs belonged to the category of physiological processes and cellular processes. Based on cellular localization results, most of the gene products were found to be localized in cell and synapse part (Figures 3 and 4). Putative unigenes for each category of GO, of any of the three wild silkmoth species can be browsed, viewed and downloaded from the 'GO Viewer' option of WildSilkbase.
Information on sequence similarity among genomes is a major resource for finding functional regions and for predicting their functions. Comparison of the genomes of closely related species is useful for finding the key sequence differences that may account for the differences in the organisms. Comparative genomics is thus a powerful and burgeoning discipline and has become more and more informative as genomic sequence data accumulate . The lepidopteran insects have taxonomically specific biological phenomena including sex-determination, pheromone-dependent sexual communication, silk production, silk protein organisation, circadian rhythms, insect-plant interactions and insect-microbe interactions. Comparing genes of the wild silkmoths with other lepidopterans and other model insect species would shed light on conservation and divergence of different gene families. In the present study we compared the putative unigenes of the three wild silkmoth species between each other and with the unigenes of four insect species, B. mori, D. melanogaster, A. mellifera and T. castaneum.
WildSilkbase aims to provide user-friendly access to EST data on wild silkmoths. Database will be continuously updated as and when the new information is available on wild silkmoth ESTs. We welcome feedback from users for further improvement of this database. Researchers working on silkmoths are encouraged to submit their wild silkmoth EST data to WildSilkbase, so that it can be made a single window portal for all information on wild silkmoths. In future, additional features will be added. Enhanced functional annotation would be mined from other cluster and pathway databases to make WildSilkbase more user-friendly and to extract maximum information from this database. WildSilkbase, we firmly believe, will be extremely useful for the researchers working in the areas of ecology, evolution, functional and comparative genomics, genetics and biochemistry of insects.
Availability and requirements
WildSilkbase is freely available through http://www.cdfd.org.in/wildsilkbase/. All questions, comments and suggestions should be sent to email@example.com.
JN acknowledges the financial assistance from the Department of Biotechnology, Government of India, under Centre of Excellence for Genetics and Genomics of Silkmoths Grant. The EST sequencing of S. c. ricini has been supported by Grants-in-Aid from MEXT and JSPS (Japan) to Toru Shimada, and by the Collaboration Program for Scientific Research between DST (India) and JSPS (Japan). KPA is recipient of the fellowship from Council of Scientific and Industrial Research (CSIR), India.
- Mita K, Morimyo M, Okano K, Koike Y, Nohata J, Kawasaki H, Kadono-Okuda K, Yamamoto K, Suzuki MG, Shimada T, Goldsmith MR, Maeda S: The construction of an EST database for Bombyx mori and its application. Proc Natl Acad Sci U S A. 2003, 100 (24): 14121-14126. 10.1073/pnas.2234984100.PubMed CentralView ArticleGoogle Scholar
- Mita K, Kasahara M, Sasaki S, Nagayasu Y, Yamada T, Kanamori H, Namiki N, Kitagawa M, Yamashita H, Yasukochi Y, Kadono-Okuda K, Yamamoto K, Ajimura M, Ravikumar G, Shimomura M, Nagamura Y, Shin IT, Abe H, Shimada T, Morishita S, Sasaki T: The genome sequence of silkworm, Bombyx mori. DNA Res. 2004, 11 (1): 27-35. 10.1093/dnares/11.1.27.View ArticleGoogle Scholar
- Wang J, Xia Q, He X, Dai M, Ruan J, Chen J, Yu G, Yuan H, Hu Y, Li R, Feng T, Ye C, Lu C, Wang J, Li S, Wong GK, Yang H, Wang J, Xiang Z, Zhou Z, Yu J: SilkDB: a knowledgebase for silkworm biology and genomics. Nucleic Acids Res. 2005, 33 (Database issue): D399-402. 10.1093/nar/gki116.PubMed CentralView ArticleGoogle Scholar
- Prasad MD, Muthulakshmi M, Arunkumar KP, Madhu M, Sreenu VB, Pavithra V, Bose B, Nagarajaram HA, Mita K, Shimada T, Nagaraju J: SilkSatDb: a microsatellite database of the silkworm, Bombyx mori. Nucleic Acids Res. 2005, 33 (Database issue): D403-6. 10.1093/nar/gki099.PubMed CentralView ArticleGoogle Scholar
- Papanicolaou A, Gebauer-Jung S, Blaxter ML, Owen McMillan W, Jiggins CD: ButterflyBase: a platform for lepidopteran genomics. Nucleic Acids Res. 2008, 36 (Database issue): D582-7.PubMed CentralGoogle Scholar
- Negre V, Hotelier T, Volkoff AN, Gimenez S, Cousserans F, Mita K, Sabau X, Rocher J, Lopez-Ferber M, d'Alencon E, Audant P, Sabourault C, Bidegainberry V, Hilliou F, Fournier P: SPODOBASE: an EST database for the lepidopteran crop pest Spodoptera. BMC Bioinformatics. 2006, 7: 322-10.1186/1471-2105-7-322.PubMed CentralView ArticleGoogle Scholar
- Grimaldi DA, Engel MS: Evolution of the Insects. 2005, New York , Cambridge University Press, xv: 755-Google Scholar
- Hasimoto H: The role of the W chromosome in the sex determination of Bombyx mori. Jap J Genet. 1933, 8: 245-247. 10.1266/jjg.8.245.View ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185.View ArticleGoogle Scholar
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19 (5): 651-652. 10.1093/bioinformatics/btg034.View ArticleGoogle Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.View ArticleGoogle Scholar
- Consortium GO: The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 2006, 34 (Database issue): D322-6. 10.1093/nar/gkj021.View ArticleGoogle Scholar
- Seqdblite FASTA sequence (Gene Ontology). [http://archive.geneontology.org/latest-full/]
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27 (2): 573-580. 10.1093/nar/27.2.573.PubMed CentralView ArticleGoogle Scholar
- Huntley D, Baldo A, Johri S, Sergot M: SEAN: SNP prediction and display program utilizing EST sequence clusters. Bioinformatics. 2006, 22 (4): 495-496. 10.1093/bioinformatics/btk006.View ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticleGoogle Scholar
- Gene Ontology Database Downloads. [http://www.geneontology.org/GO.downloads.shtml]
- Hardison RC: Comparative genomics. PLoS Biol. 2003, 1 (2): E58-10.1371/journal.pbio.0000058.PubMed CentralView ArticleGoogle Scholar
- Mahendran B, Ghosh SK, Kundu SC: Molecular phylogeny of silk-producing insects based on 16S ribosomal RNA and cytochrome oxidase subunit I genes. J Genet. 2006, 85 (1): 31-38. 10.1007/BF02728967.View ArticleGoogle Scholar
- Unigene database (NCBI). [ftp://ftp.ncbi.nih.gov/repository/UniGene/]
- Arunkumar KP, Metta M, Nagaraju J: Molecular phylogeny of silkmoths reveals the origin of domesticated silkmoth, Bombyx mori from Chinese Bombyx mandarina and paternal inheritance of Antheraea proylei mitochondrial DNA. Mol Phylogenet Evol. 2006, 40 (2): 419-427. 10.1016/j.ympev.2006.02.023.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.