Segmental duplications in the silkworm genome
© Zhao et al.; licensee BioMed Central Ltd. 2013
Received: 15 April 2013
Accepted: 30 July 2013
Published: 31 July 2013
Segmental duplications (SDs) or low-copy repeats play important roles in both gene and genome evolution. SDs have been extensively investigated in many organisms, however, there is no information about SDs in the silkworm, Bombyx mori.
In this study, we identified and annotated the SDs in the silkworm genome. Our results suggested that SDs constitute ~1.4% of the silkworm genome sequence (≥1 kb in length and ≥90% in the identity of sequence); the number is similar to that in Drosophila melanogaster but smaller than mammalian organisms. Almost half (42%) of the SD sequences are not assigned to chromosomes, indicating that the SDs are challenges for the assembling of genome sequences. We also provided experimental validation of large duplications using qPCR. The analysis of SD content indicated that the genes related to immunity, detoxification, reproduction, and environmental signal recognition are significantly enriched in the silkworm SDs.
Our results suggested that segmental duplications have been problematic for sequencing and assembling of the silkworm genome. SDs may have important biological significances in immunity, detoxification, reproduction, and environmental signal recognition in the silkworm. This study provides insight into the evolution of the silkworm genome and an invaluable resource for insect genomics research.
Genome sequencing provides the opportunity to assess fundamental biological processes of genome evolution . With the increasing of finished genome sequences, the field of genome evolution is experiencing a renaissance of activity and many questions of genome architecture as well as genome evolution are resolved using computational studies. However, the identification and characterization of highly homologous sequences in the genome remain problematic. Segmental duplications (SDs), defined as low-copy repeats of DNA segments (blocks of sequence ≥ 1 kb in length and showing ≥ 90% sequence identity), are a class of homologous sequences. Since SDs are hotspots of copy number variance (CNV) as well as pools of gene innovation and disease-causing rearrangement [2–15], they have long been regarded to be involved in functional redundancy, adaptive evolution, and structure dynamics of chromosomal evolution. Thus, identification and annotation of SDs are important for understanding the structure and evolution of a genome.
Up to now, although the analyses of SDs have been done in many organisms whose genome sequences were completed [2–11], no analysis has ever been performed in the domesticated silkworm, Bombyx mori. The silkworm genome sequence has been released [16, 17] and the amounts of hierarchical bacterial artificial chromosome (BAC) data are available, this provides us an opportunity to identify and annotate SDs in the silkworm genome. In this study, we used two computational methods to identify the SDs. The first one, named whole-genome assembly comparison (WGAC), is a BLAST-based approach that performs an all-by-all comparison of assembled genome sequence . The second one, whole-genome shotgun detection (WSSD), develops a model for distinguishing unique and duplicated sequence on the basis of the depth of coverage after whole-genome shotgun sequence reads were aligned to a reference genomic segment . Duplication regions would display a higher reads depth than depths-of-average. Experiments (real time fluorescent quantitative PCR (q-PCR)) have been used to validate these large duplication sequences [19–21]. Here, we present a set of the silkworm SDs that provides a framework for future evolutionary study. In addition, this resource also provides invaluable information in finishing the silkworm genome.
Genome-wide identification of the silkworm SDs
q-PCR validation of a subset of WSSD duplications in BACs
WSSD duplicated regions
Read depth (#/5 kb)
In this study, we totally detected 6.6 Mb SDs in B. mori, which cover ~1.4% of the silkworm genome (6.6 Mb/432 Mb; Figure 1), size ranging from 1 kb to 23 kb (Additional file 1). Previous studies suggested that high-identity duplications (identity > 94%) frequently collapsed within working draft sequence assemblies  and may represent artificial duplications within an assembly . We compared the WGAC results to those detected by WSSD approach and found that 45.1% of the SDs identified by WSSD were not detected by WGAC, which may be caused by collapsed duplications (Figure 1). In addition, we also found that 0.79 Mb of the duplications detected by WGAC were also detected by WSSD, and these are the high-confidence SDs in the silkworm genome.
Patterns of the silkworm SDs
Among duplication segments, there is a class of large tracts, termed as “duplication blocks” (if some other SDs were identified within 100 kb from the coordinates of a SD, this whole large region is termed as a duplication block and gaps were excluded) . We found that such duplication blocks contained protein-coding genes (Figure 4). The SDs are distributed near the gaps (within 1 Mb of the gaps) of the reference genome sequence for some chromosomes, for example, chromosomes 6, 14, 17 and 27 (45.5%-83.3%), indicating that these gaps themselves would be high-copy duplications. Furthermore, our results showed that a large proportion of SDs in the silkworm genome were on ChrUn. Thus, probably SDs may be the problems of the silkworm genome assembling.
Sequence properties of the silkworm segmental duplications
Repertoires and evolutionary mechanisms of selected duplicated genes or gene families in mammals and silkworm
Cytochrome P450 enzyme
catalyze the oxidation of organic substances
> = 3
> = 3
participate as central control elements in signal transduction pathways
variety of physiological processes, such as cell signaling, defense and development
Glucose-methanol-choline oxidoreductase (GMC)
Developmental or physiological process, immunity
responsible for the detection of odor molecules
30 K proteins (30KPs, Lipoprotein_11)
physiological processes such as energy storage, embryonic development, and immune response
Repeat properties of the silkworm genome, duplication, and flanking region
2.5-kb flanking region
Enrichment in duplication content
Total bp analyzed
Experimental validation of a subset of SDs
The SDs in (91%, 10/11) BACs were confirmed to be positive duplications by qRT-PCR (Figure 6). It should be emphasized that not all true duplications could be detected by qRT-PCR, especially low-copy duplications with divergent reads ratio > 0.8 are difficult to be detected. Thus, 9% for false positive rate is a conserved estimate in our WSSD strategy.
Quality of SD detection
SDs have been extensively studied in many organisms including vertebrates and invertebrate [5, 6, 8, 18, 22, 24, 25, 29]. Here we performed a systematic analysis of segmental duplication in the silkworm genome using two different approaches, a sequence assembly-based approach (WGAC) and a whole-genome shotgun sequence detection method (WSSD). The power of SD detection depends largely on the quality of the underlying sequence assembly and strategy used. There are four factors that would influence the detection of SDs in genome assembly: (1) depth of genome sequencing, (2) methodology of assembly, (3) quality of genome annotation and (4) level of allelic variation. In order to take advantage of low level of allelic variation, we implemented a modification of WSSD approach described before which is entailed a quality assessment of underlying reads to calculate percent identity and determine the proportion of variants within a certain region in a genome .
It should be noted that there are some limitations in this study. On one hands, the number of SDs may be underestimated. For example, regions of extremely high sequence identity may collapse during assembly, which may result in the underestimation of fraction of genome showing relatively high identity. That is why some highly homologous gene families, such as carboxylesterases , were not detected in our study, although other highly homologous gene families were detected (i.e., cytochrome P450 genes, serine protease, histones). Besides, the power of SD analysis depends largely on genomic sequence and its annotation. The presence of sequence gaps as well as contig orientation may influence the detection of SDs in a genome. The current silkworm genome sequence only covers about 85% of the genome size and has many gaps. Thus, this study may underestimate the SD content in the silkworm. On the other hand, there may be some false positives in the SDs identification using WSSD. This may be due to the incomplete annotation of repeats in the current silkworm genome. In order to get the accurate information about the large SDs and exclude false positive cases, further annotation of repeats as well as FISH hybridization are needed in further study. In the silkworm genome, about 85% of the SDs are shorter than 2 kb. This suggests that SDs in the silkworm are much smaller that in mammals, which is consistent with other invertebrates such as D. melanogaster. Thus, PCR validation would be more favorable.
Despite these limitations, some other important trends regarding the SDs in the silkworm appeared. Our estimation of the SD content is consistent with that in D. melanogaster but much lower than mammals [6, 15, 24] (Additional file 8). We proposed that this difference may be due to biological reasons to be investigated. A previous study also supposed that SDs in invertebrates are much less than that in vertebrates .
Based on a new assembly of the silkworm genome , we found that the SDs were distributed in a nonuniform fashion across the genome (Figure 4). For example, there are some SD enrichments on chromosomes (Additional file 3). And there are some SDs that reside within 1 Mb of the “gaps” on chromosomes (Chrs 6, 14, 17 and 27) (Figure 4), suggesting that SDs may be the problematic regions for both clone-based and whole-genome shotgun sequencing methods.
Enlightenment for genome assembling
The published silkworm genome sequence represents one of the first attempts to sequence and assemble a lepidopteran genome mainly based on shotgun sequencing read data. One of the greatest challenges of genome assembly lies in the segmental duplications, because of high degree of sequence identity comparing with each other [31, 33–35]. There are three possibilities when SDs are encountered during sequencing and assembling: (1) these SDs may be recognized distinctly and resolved properly; (2) because of the presence of virtually identical sequence reads in the database, the sequences may be underrepresented and (3) SDs may be mistakenly assembled into the genome. The second and the third outcomes create numerous gaps . Thus, genome-wide studies of segmental duplication contents become an effective measure to assess the quality of whole-genome sequence assemblies  and provide important information for the users of genome sequence.
There are a few important conclusions drawn from this study with respect to genome assembling. The complex, highly duplicated nature of SDs is not amenable to high-throughput assembly methods without refinement. For example, some whole-shotgun sequence approach, such as Arachne, would collapse highly identical duplications . Currently, three types of gaps are recognized within the working draft sequence [31, 38]. The first type, named as trivial gaps, is no more than 100 bp in length. Gaps between ordered clones or sequence contigs are the second type, which is easily closed by sequencing of bridging clones obtained from pair-end sequence data. However, the third type is more complicated because it is associated with SDs. The solution for this kind of gaps is difficult because we should recognize the SDs first in the genome. Some gaps in the silkworm genome belong to this type, since some SDs are distributed in the flanks of these gaps (Figure 4). The “unplaced” chromosome (ChrUn) showed a significant enrichment for SDs (Additional file 2), with almost 42% of the duplications assigned to ChrUn. Further efforts should target on these regions if we want to get the better sequence of the silkworm genome. Figure 1 showed the comparison of SDs detected by WSSD and WGAC and the results suggested that 9.82% of SDs could only be detected by WSSD. If we use the experimental qRT-PCR data to estimate false positive rate (9%), we conclude that 0.065 Mb SDs have not been resolved within the genome. Thus, our results suggest that, at present, clone-ordered-based approaches for sequence assembly appear to be a more effective resolution for identifying the true locations, organization and complexity of SDs. Furthermore, the intrachromosomal SDs are comparatively less based on the current silkworm genome assembly. Two reasons would contribute to this: (1) as many as 39.5% of interchrmosomal duplications were found to have paraloguous sequences on ChrUn. The gaps on the chromosome might lead to underestimate of the intrachromosomal SDs; (2) the silkworm genome has some distinctive features: there are 28 chromosomes while the genome is only about 432 Mb. The chromosome sizes are relatively small (about average 15.4 Mb for each chromosome); and TEs content is large in the genome (~35%). There is another possibility. Because of short chromosomes, intrachromosomal duplications are so few. A previous study showed that interchromosomal duplications are shorter (median length 2.5 kb) while intrachromosomal duplications are much larger (median length 20 kb) in the bovine genome . However, the silkworm genome is lack of large duplications and most of the duplications were less than 2.5 kb.
SD content analysis
The correct assembly of SDs is not considered to be high priority, especially the draft phase of a genome sequence, due to the gene-poor content of such regions . However, in some organisms, such as human, highly segmental duplications (~6%-7%) were rich in TEs and genes . A similar pattern is also found in the silkworm. The gene content in the silkworm SDs occupies about ~2% of the genome but the SDs constitute only 1.4% of the genome sequence. In addition, some TEs were enriched in the SDs, such as DNA transposons and LTR retrotransposons (Table 3). Comparing with other insects (e.g. fruit fly), the silkworm genome harbors a lot of TEs, about 35% of the genome  and LTR retrotransposons are the most common TEs in B. mori. Thus, TEs could be involved in the formation of SDs in the silkworm. Besides, many duplicated genes and gene families were found to reside in the SDs and some of them were implied in lineage-specific adaptations of organisms to a particular environment. Antimicrobial peptide (AMPs) genes, which play important roles in innate immune system in insects , were found to be enriched in the silkworm SDs (Additional file 6). Some of GMC genes, which expanded in the silkworm and associated with immunity, were also found in the SDs. The members of the lepidopteran-specific Lipoprotein_11 family and serine protease gene family related to immune response were enriched in the SDs . Furthermore, since frequently encountered a wide variety of secondary products in the mulberry leaves, such as plant allelochemicals, the silkworm has evolved special enzymes to adapt to the digestion of secondary products in mulberry leaves [26, 43]. For example, cytochrome P450 enzymes are involved in such biological processes in the silkworm. In this study, we found that some members of cytochrome P450 gene family are located in the silkworm SDs. Besides, some genes which were involved in silk production were also found in SDs, such as proteasome. In this sense, SDs may play important roles in the evolution of species specific functions.
There are some practical and biological implications for the identification of genes in SDs. Previous studies showed that SDs are candidates for the evolution of organism-specific genes [44, 45]. Some gene families under selection in vertebrates were identified, such as cytochrome P-450, olfactory receptor [46, 47]. However, the functions of many genes in the silkworm SDs are still unclear on the basis of BLASTP searching against nr databases. We used these unannotated genes located in SDs as references to search against the protein sets of related insects, especially Lepidopteran species. We found that most of these unannotated genes had orthologs in other insects, especially in Lepidoptera (Additional file 9). For example, BGIBMGA003910-PA, which is poorly annotated in the silkworm database, has orthologus in other insects (such as monarch butterfly, Danaus plexippus, Heliconius melpomene, Dendroctonus ponderosae, Nasonia vitripennis), but the identity was much higher in Lepidoptera (Additional file 10). The silkworm is an important economic insect and it is also the model organism for molecular genetic and genomic studies of order Lepidoptera . Our study presented invaluable information for the SDs in the silkworm, which facilitates understanding the evolution of the silkworm genome as well as the biology of the silkworm.
We for the first time analyzed the SDs in the silkworm genome and found that SDs constitute ~1.4% of the silkworm genome sequence (≥1 kb in length and ≥90% in the identity of sequence). This number is similar to that in D. melanogaster but smaller than mammalian organisms. Almost half (42%) of the SD sequences are not assigned to chromosomes, suggesting that the SDs are challenges for the assembling of genome sequences. Large duplications were also validated by qPCR experiments. The genes related to immunity, detoxification, reproduction, and environmental signal recognition are significantly enriched in the silkworm SDs, implying that SDs may have important biological significances in the above physiological processes. Our results provide insight into the evolution of the silkworm genome and an invaluable resource for insect genomics research.
We downloaded the silkworm genomic sequence (9×) from the silkworm genome database (SilkDB, http://silkworm.genomics.org.cn/) and the whole genome shotgun sequence (WGS) reads from . The source of the BAC library DNA was NCBI http://www.ncbi.nlm.nih.gov/. This BAC library contained 46 clones which are distributed in 22 chromosomes, representing 1.8% of the silkworm genome.
Whole-genome alignment comparison (WGAC)
We performed a combination of sequence analysis software and a list of Perl scripts to optimize the detection of large segmental duplications (length ≥ 1 kb and identity ≥ 90%) .
The large contigs in the silkworm genome were broken into tractable 400 kb segments. Using RepeatMasker (Smit and Green http://www.repeatmasker.org/, version open-3.3.0), we identified the high-copy repeats. The silkworm genome is rich in TEs (~35%) . We used our own TE dataset as repeat database (http://gene.cqu.edu.cn/BmTEdb/) in the running of RepeatMasker. These reference contigs were masked at 10% divergence level from TEs. After that, all these high-copy repeats were deleted out of the sequences. The resulting unique genome DNAs then underwent global BLASTN searches with reduced affine gap extension parameters, which allowed large gaps up to 1000 bp to be traversed. Alignments between these 400 kb segments were generated using the parameters (−G 180 –E 1 –q 80 –r 30 –z 3 × 10-9 –Y 3 × 10-9 –e 1e-20 –F F). We discarded self-alignments, and wrote a list of perl scripts to reinsert the high-copy repeats back to these alignments. BLASTN results were parsed for alignments with length ≥1 kb and identity ≥88%. These initial seed alignments were subsequently reintroduced to create local alignments and then trimmed to define their end points. We then performed an optimal global alignment to generate accurate alignment statistics. Only alignments with length ≥ 1 kb and identity ≥ 90% were considered in our analysis.
Whole-genome shotgun sequence detection of duplications (WSSD)
We used the WSSD strategy previously developed during the analysis of human genome  to assess duplication content in the silkworm. For a given genomic sequence, this method assesses depth-of-coverage and compares it with the average coverage depth. In regions of duplications, depth-of-coverage shows a statistically significant increase due to recruitment of paralogous reads. WSSD prefers to identify large SDs (≥10 kb in length, ≥ 94% sequence identity). We used two classes of sequences: (1) all finished BACs sequences deposited in GenBank; (2) whole silkworm genome sequence.
Firstly, short genome reads (<50 bp) and vector sequences were filtered out. After filtration, there were ~1.83 G clean reads left (size ranging from 52–964 bp long, ~4.5 converge of the genome) (Additional file 11). Each reference silkworm genome sequence masked for repeat sequences was compared by Megablast against the entire set of the silkworm whole-genome shotgun sequence reads (WGS, 3,810,411 reads). Our analysis was on the basis of a comparison of 3,810,411 WGS reads against the 432 Mb silkworm genome sequences. About 86.4% of (3,290,836) reads were remapped to the assembly. We used the following parameters (−D 3 –J F –P 93 –U T –F m –s 220), which allows for greed-algorithm extension into adjacent repetitive regions . We wrote a perl script to detect every segment. Alignments were considered if they represented 90% of the reads with a rescored similarity of > 94%.
We used sliding window method in WSSD pipeline to calculate the reads depth (RD) value. Reads were firstly counted in overlapping (1 kb), sliding 5 kb windows. Initial calls were selected if six of seven or more sequential 5 kb overlapping windows had RD values that differ significantly from the average. Since the reads length varied significantly, the STDEV (~ 380) of the reads length was high in the silkworm. Furthermore, no segmental reference was previously reported in the silkworm, and it is impossible to identify the accurate RD value in SDs in the silkworm. And there is no information about a set of unique regions validated by FISH or other experiments. Thus, we removed the SDs regions identified by WGAC and 10 kb flanking regions of SDs. We defined significant alignment depth that greater than 3 standard deviations from the mean (Additional file 11). Only SD calls greater than 10 kb in length were kept in the final dataset. Because the silkworm strain Dazao (the sequenced strain of silkworm) is an experimental line and highly inbred, there is a reduced allelic variation in Dazao. We used a more sensitive metric for the detection of SDs . This method increased sensitivity for detecting large single-duplications events (including recent, but low-frequency tandem duplications). In this way, we kept candidate segmental regions in which the divergent read (defined as those with identity higher than 99.8% aligned to the reference sequence with ratio higher than 0.5.
Gene content of the silkworm segmental duplications was assessed using the glean consensus gene set (http://silkworm.genomics.org.cn/) . We obtained a total of 14,623 silkworm peptides from SilkDB. In addition, using Gene Ontology (GO) , we tested the whether the molecular function, biological process, and pathway terms were under- or overrepresented in SDs . Pfam was also used to annotate the function of the genes in the SDs .
We also investigated the distribution of genes and segmental duplications on genomic sequences. It should be noted that a portion of genes in the silkworm have been not well-annotated or have been annotated with the designation “Unknown function”, which may result in the underestimation of the influence of genes in SDs.
Quantitative real-time PCR (qRT-PCR) validation
Primer Premier 5.0 was used to design primers for qRT-PCR experiments (Additional file 12). Each PCR reaction was prepared as follows: 10 μl of SYBR-Green PCR master mix, 1 μl of each primer (10 μM), 7 μl of water, and 1 μl of genome template (whole genome DNA). Quantitative real-time PCR was carried out using the ABI Stepone plus system. The thermocycler program had an initial 95 denaturation step followed by 40 cycles consisting of a 10-s denaturation at 95, a 40-s annealing at 60, and a 30-s extension step at 72. At the end of each reaction, a disassociation curve was created, which was used to help to detect the presence of primer dimers of other unwanted amplification products that may produce a detectable cycle threshold (Ct) value.
We chose three regions (control_1, control_2, control_3) as controls for all qRT-PCR experiments, which represented single copy, 4 copies and TEs. Copy number was analyzed according to comparative Ct method. The Δ CT and ΔΔ CT were calculated by the formulas Δ CT = CT target – CT control (single copy) and ΔΔ CT = Δ CT SD samples - Δ CT single copy sample, respectively. To detect the accuracy of this method, we used the pipeline  to calculate the copy number of control_2, which was identified to be 4 copies in silico. The result showed that this gene was ~3.95 copies based on our method. Thus, it is reasonable to apply this pipeline to assess the segmental duplications.
This work was supported by the Hi-Tech Research and Development (863) Program of China (2013AA102507), by a grant from Natural Science Foundation Project of CQ CSTC (cstc2012jjB80007).
- Eichler EE, Sankoff D: Structural dynamics of eukaryotic chromosome evolution. Science. 2003, 301 (5634): 793-797. 10.1126/science.1086132.View ArticlePubMedGoogle Scholar
- Muller HJ: Bar duplication. Science. 1936, 83 (2161): 528-530.View ArticlePubMedGoogle Scholar
- Ohno S: Evolution by gene duplication. 1970, New York: Springer-VerlagView ArticleGoogle Scholar
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297 (5583): 1003-1007. 10.1126/science.1072047.View ArticlePubMedGoogle Scholar
- Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong P, Wilson RK, Paabo S, Rocchi M, Eichler EE: A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature. 2005, 437 (7055): 88-93. 10.1038/nature04000.View ArticlePubMedGoogle Scholar
- Bailey JA, Church DM, Ventura M, Rocchi M, Eichler EE: Analysis of segmental duplications and genome assembly in the mouse. Genome Res. 2004, 14 (5): 789-801. 10.1101/gr.2238404.PubMed CentralView ArticlePubMedGoogle Scholar
- She X, Cheng Z, Zollner S, Church DM, Eichler EE: Mouse segmental duplication and copy number variation. Nat Genet. 2008, 40 (7): 909-914. 10.1038/ng.172.PubMed CentralView ArticlePubMedGoogle Scholar
- Nicholas TJ, Cheng Z, Ventura M, Mealey K, Eichler EE, Akey JM: The genomic architecture of segmental duplications and associated copy number variants in dogs. Genome Res. 2009, 19 (3): 491-499.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM, Clark RA, Schwartz S, Segraves R, Oseroff VV, Albertson DG, Pinkel D, Eichler EE: Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 2005, 77 (1): 78-88. 10.1086/431652.PubMed CentralView ArticlePubMedGoogle Scholar
- Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, Eis PS, Shannon WD, Li X, McLeod HL, Cheverud JM, Ley TJ: A high-resolution map of segmental DNA copy number variation in the mouse genome. PLoS Genet. 2007, 3 (1): e3-10.1371/journal.pgen.0030003.PubMed CentralView ArticlePubMedGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marsha CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73 (4): 823-834. 10.1086/378594.PubMed CentralView ArticlePubMedGoogle Scholar
- Lupski JR: Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 1998, 14 (10): 417-422. 10.1016/S0168-9525(98)01555-8.View ArticlePubMedGoogle Scholar
- Ji Y, Eichler EE, Schwartz S, Nicholls RD: Structure of chromosomal duplications and their role in mediating human genomic disorders. Genome Res. 2000, 10 (5): 597-610. 10.1101/gr.10.5.597.View ArticlePubMedGoogle Scholar
- Samonte RV, Eichler EE: Segmental duplications and the evolution of the primate genome. Nat Rev Genet. 2002, 3 (1): 65-72.View ArticlePubMedGoogle Scholar
- Mita K, Kasahara M, Sasaki S, Nagayasu Y, Yamada T, Kanamori H, Namiki N, Kitagawa M, Yamashita H, Yasukochi Y, Kadono-Okuda K, Yamamoto K, Ajimura M, Ravikumar G, Shimomura M, Nagamura Y, Shin-I T, Abe H, Shimada T, Morishita S, Sasaki T: The genome sequence of silkworm, Bombyx mori. DNA Res. 2004, 11 (1): 27-35. 10.1093/dnares/11.1.27.View ArticlePubMedGoogle Scholar
- Xia Q, Zhou Z, Lu C, Cheng D, Dai F, Li B, Zhao P, Zha X, Cheng T, Chai C, Pan G, Xu J, Liu C, Lin Y, Qian J, Hou Y, Wu Z, Li G, Pan M, Li C, Shen Y, Lan X, Yuan L, Li T, Xu H, Yang G, Wan Y, Zhu Y, Yu M, Shen W: A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science. 2004, 306 (5703): 1937-1940.22.View ArticlePubMedGoogle Scholar
- Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001, 11 (6): 1005-1017. 10.1101/gr.GR-1871R.PubMed CentralView ArticlePubMedGoogle Scholar
- Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, Tassell CP, Sonstegard TS, Eichler EE, Liu GE: Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 2012, 22 (4): 778-790. 10.1101/gr.133967.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Sakudoh T, Nakashima T, Kuroki Y, Fujiyama A, Kohara Y, Honda N, Fujimoto H, Shimada T, Nakagaki M, Banno Y, Tsuchida K: Diversity in copy number and structure of a silkworm morphogenetic gene as a result of domestication. Genetics. 2011, 187 (3): 965-976. 10.1534/genetics.110.124982.PubMed CentralView ArticlePubMedGoogle Scholar
- D’Haene B, Vandesompele J, Hellemans J: Accurate and objective copy number profiling using real-time quantitative PCR. Methods. 2010, 50 (4): 262-270. 10.1016/j.ymeth.2009.12.007.View ArticlePubMedGoogle Scholar
- Fiston-Lavier AS, Anxolabehere D, Quesneville H: A model of segmental duplication formation in Drosophila melanogaster. Genome Res. 2007, 17 (10): 1458-1470. 10.1101/gr.6208307.PubMed CentralView ArticlePubMedGoogle Scholar
- She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE: Shotgun sequence assembly and recent segmental duplications within the human genome. Nature. 2004, 431 (7011): 927-930. 10.1038/nature03062.View ArticlePubMedGoogle Scholar
- Liu GE, Ventura M, Cellamare A, Chen L, Cheng Z, Zhu B, Li C, Song J, Eichler EE: Analysis of recent segmental duplications in the bovine genome. BMC Genomics. 2009, 10: 571-10.1186/1471-2164-10-571.PubMed CentralView ArticlePubMedGoogle Scholar
- Tuzun E, Bailey JA, Eichler EE: Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. 2004, 14 (4): 493-506. 10.1101/gr.1907504.PubMed CentralView ArticlePubMedGoogle Scholar
- Ai J, Zhu Y, Duan J, Yu Q, Zhang G, Wan F, Xiang ZH: Genome-wide analysis of cytochrome P450 monooxygenase genes in the silkworm, Bombyx mori. Gene. 2011, 480 (1–2): 42-50.View ArticlePubMedGoogle Scholar
- Sun W, Shen YH, Yang WJ, Cao YF, Xiang ZH, Zhang Z: Expansion of the silkworm GMC oxidoreductase genes is associated with immunity. Insect Biochem Mol Biol. 2012, 42 (12): 935-945. 10.1016/j.ibmb.2012.09.006.View ArticlePubMedGoogle Scholar
- Zhang Y, Dong Z, Liu S, Yang Q, Zhao P, Xia Q: Identification of novel members reveals the structural and functional divergence of lepidopteran-specific Lipoprotein_11 family. Funct Integr Genomics. 2012, 12 (4): 705-715. 10.1007/s10142-012-0281-4.View ArticlePubMedGoogle Scholar
- Vergara IA, Mah AK, Huang JC, Tarailo-Graovac M, Johnsen RC, Baillie DL, Chen N: Polymorphic segmental duplication in the nematode Caenorhabditis elegans. BMC Genomics. 2009, 10: 329-10.1186/1471-2164-10-329.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu QY, Lu C, Li WL, Xiang ZH, Zhang Z: Annotation and expression of carboxylesterases in the silkworm, Bombyx mori. BMC Genomics. 2009, 10: 533-10.1186/1471-2164-10-533.View ArticleGoogle Scholar
- Eichler EE: Segmental duplications: what’s missing, misassigned, and misassembled–and should we care?. Genome Res. 2001, 11 (5): 653-656. 10.1101/gr.188901.View ArticlePubMedGoogle Scholar
- Consortium TISG: The genome of a lepidopteran model insect, the silkworm Bombyx mori. Insect Biochem Mol Biol. 2008, 38 (12): 1036-1045. 10.1016/j.ibmb.2008.11.004.View ArticleGoogle Scholar
- Green P: Against a whole-genome shotgun. Genome Res. 1997, 7 (5): 410-417.PubMedGoogle Scholar
- Eichler EE: Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. 1998, 8 (8): 758-762.PubMedGoogle Scholar
- Eichler EE: Repetitive conundrums of centromere structure and function. Hum Mol Genet. 1999, 8 (2): 151-155. 10.1093/hmg/8.2.151.View ArticlePubMedGoogle Scholar
- The BAC Resource Consortium: Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001, 409 (6822): 953-958. 10.1038/35057192.View ArticleGoogle Scholar
- Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES: ARACHNE: a whole-genome shotgun assembler. Genome Res. 2002, 12 (1): 177-189. 10.1101/gr.208902.PubMed CentralView ArticlePubMedGoogle Scholar
- Bork P, Copley R: The draft sequences, Filling in the gaps. Nature. 2001, 409 (6822): 818-820. 10.1038/35057274.View ArticlePubMedGoogle Scholar
- Osanai-Futahashi M, Suetsugu Y, Mita K, Fujiwara H: Genome-wide screening and characterization of transposable elements and their distribution analysis in the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2008, 38 (12): 1046-1057. 10.1016/j.ibmb.2008.05.012.View ArticlePubMedGoogle Scholar
- Gregory TR: Synergy between sequence and size in large scale genomics. Nature reviews. 2005, 6: 699-708. 10.1038/nrg1674.View ArticlePubMedGoogle Scholar
- Bulet P, Hetru C, Dimarcq JL, Hoffmann D: Antimicrobial peptides in insects; structure and function. Dev Comp Immunol. 1999, 23 (4–5): 329-344.View ArticlePubMedGoogle Scholar
- Zhao P, Wang GH, Dong ZM, Duan J, Xu PZ, Cheng TC, Xiang ZH, Xia QY: Genome-wide identification and expression analysis of serine proteases and homologs in the silkworm Bombyx mori. BMC Genomics. 2010, 11: 405-10.1186/1471-2164-11-405.PubMed CentralView ArticlePubMedGoogle Scholar
- Asano N, Yamashita T, Yasuda K, Ikeda K, Kizu H, Kameda Y, Kato A, Nash RJ, Lee HS, Ryu KS: Polyhydroxylated alkaloids isolated from mulberry trees (Morusalba L.) and silkworms (Bombyx mori L.). J Agric Food Chem. 2001, 49 (9): 4208-4213. 10.1021/jf010567e.View ArticlePubMedGoogle Scholar
- Johnson ME, Viggiano L, Bailey JA, Abdul-Rauf M, Goodwin G, Rocchi M, Eichler EE: Positive selection of a gene family during the emergence of humans and African apes. Nature. 2001, 413 (6855): 514-519. 10.1038/35097067.View ArticlePubMedGoogle Scholar
- Paulding CA, Ruvolo M, Haber DA: The Tre2 (USP6) oncogene is a hominoid-specific gene. Proc Natl Acad Sci U S A. 2003, 100 (5): 2507-2511. 10.1073/pnas.0437015100.PubMed CentralView ArticlePubMedGoogle Scholar
- Nei M, Rooney AP: Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005, 39: 121-152. 10.1146/annurev.genet.39.073003.112240.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomas JH: Rapid birth-death evolution specific to xenobiotic cytochrome P450 genes in vertebrates. PLoS Genet. 2007, 3 (5): e67-10.1371/journal.pgen.0030067.PubMed CentralView ArticlePubMedGoogle Scholar
- Komoto N, Quan GX, Sezutsu H, Tamura T: A single-base deletion in an ABC transporter gene causes white eyes, white eggs, and translucent larval skin in the silkworm w-3(oe) mutant. Insect Biochem Mol Biol. 2009, 39 (2): 152-156. 10.1016/j.ibmb.2008.10.003.View ArticlePubMedGoogle Scholar
- The International Silkworm Genome Consortium: The genome of a lepidopteran model insect, the silkworm Bombyx mori. Insect Biochem Mol Biol. 2008, 38 (12): 1036-1045. 10.1016/j.ibmb.2008.11.004.View ArticleGoogle Scholar
- Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L: WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006, 34 (Web Server issue): W293-W297.PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, 38 (Database issue): D211-D222.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang X, Cheng T, Wang G, Yan Y, Xia Q: Cloning and evolutionary analysis of tobacco MAPK gene family. Mol Biol Rep. 2012, 40 (2): 1407-1415.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.