Skip to main content
  • Research article
  • Open access
  • Published:

Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus



Higher crustaceans (class Malacostraca) represent the most species-rich and morphologically diverse group of non-insect arthropods and many of its members are commercially important. Although the crustacean DNA sequence information is growing exponentially, little is known about the genome organization of Malacostraca. Here, we constructed a bacterial artificial chromosome (BAC) library and performed BAC-end sequencing to provide genomic information for kuruma shrimp (Marsupenaeus japonicus), one of the most widely cultured species among crustaceans, and found the presence of a redundant sequence in the BAC library. We examined the BAC clone that includes the redundant sequence to further analyze its length, copy number and location in the kuruma shrimp genome.


Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences.


Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the duplicated DNA segments, our result suggested that it is difficult to analyze the shrimp genome following normal analytical methodology. Hence, it is necessary to avoid repetitive sequence (such as segmental duplications) when studying the other unique structures in the shrimp genome.


The genomes of crustaceans are extremely diverse in their size, with the smallest one having a C-value of 0.14 pg and the largest one weighing 64.62 pg, differing by a factor of 460 [1, 2]. Despite their economic importance and production in huge biomass, little is known about the genome organization of crustaceans, especially Malacostraca (including shrimps and crabs) except for the presence of numerous repetitive sequences [35]. Recently, although the genomic DNA sequence of a crustacean, water flea Daphnia pulex, has been determined, it seems improper to make any conclusion on the crustacean genome because recent phylogenetic analysis based on the DNA sequence data and morphology comparison between Hexapoda (including insects) and Crustacea provided an unexpected finding that Branchiopoda (including the water flea Daphnia) is phylogenetically much closer to Hexapoda rather than Malacostraca [68]. Therefore, the crustacean genome, in particular the genetic differences between Branchiopoda and Hexapoda group and other sister groups need to be elucidated.

The penaeid shrimp, which is classified into Decapoda in Malacostraca, has been the subject of intense research. Due to its commercial value, several papers on expressed sequence tag (EST) analysis and genetic linkage mapping has been published in the past few years. However, in depth information of their large genome, which is estimated to be about 70% of human genome in size and rich in AT and AAT sequences, is largely unknown [911]. As the first step towards understanding the shrimp genome organization, we constructed a BAC library (named MjBL2) from kuruma shrimp (Marsupenaeus japonicus) and performed BAC-end sequencing. The results clearly showed extreme redundancy of certain sequences in many BAC clones of the MjBL2 library. We chose one BAC clone (Mj024A04) for detailed analysis in terms of its entire sequence and redundancy in the shrimp genome and found numerous copies of DNA segments that contain the Mj024A04-sequence. This indicates that hyper-expansion of such peculiar DNA segments occurred through segmental duplication events during evolution of the kuruma shrimp genome.


BAC library construction and BAC-end sequencing

To provide an overview of the composition and organization of the kuruma shrimp nuclear genome, we constructed BAC library (MjBL2) using kuruma shrimp genomic DNA prepared from hemocytes of 13 shrimps and analyzed the BAC-end sequence (BES). MjBL2 consists of 49,152 BAC clones, which were arrayed in 128 microtiter plates and stored at -80°C. The average insert size was estimated to be 135 kb by Not I digestion of 205 randomly selected BAC clones. BES analysis was further performed using 192 BAC clones randomly selected from MjBL2 and retrieved reads were assembled for contiguity [DDBJ: AG993477-AG993734]. Resulting BESs were classified into 29 singletons and 51 contiguous sequences consisting of 2 to 24 reads. Notably, the BLASTN and BLASTX analyses revealed that many of these BESs (20 reads in BLASTN, 55 reads in BLASTX) contained a sequence encoding a protein similar to "inhibitor of apoptosis protein (IAP)" reported in black tiger shrimp (Penaeus monodon) (see Additional file 1).

DNA sequence of a representative BAC clone

One of the BAC clones (Mj024A04) that possessed a sequence similar to "black tiger shrimp IAP" gene was randomly selected from MjBL2 for detailed analysis of its entire DNA sequence by employing shotgun sequencing method. The resulting genomic DNA sequence of 120 kb (Mj024A04-sequence) was analyzed by in silico annotation, revealing 27 putative genes that apparently seemed to be normal genomic region with exon-intron structure (Figure 1, see Additional file 2 and Additional file 3) [DDBJ: AP010878]. As shown in Figure 1, large GGTTA repeats were found in the middle of the sequence flanking gene 09, which encodes a protein similar to a reverse transcriptase of Takifugu rubripes[12]. Notably, of the other 26 genes, three genes (gene 01, 06, and 24) encode a protein homologous to IAP of three species, Xenopus laevis (african clawed frog), Drosophila melanogaster (fruit fly) and Rattus norvegicus (norway rat) and seven genes (gene 11, 13, 14, 15, 16, 17 and 18) were homologous to ORFs in "White Spot Syndrome Virus (WSSV)", the major shrimp pathogenic dsDNA virus, which is highly virulent to penaeid shrimps as well as other crustaceans such as crabs and crayfish [13, 14].

Figure 1
figure 1

Schematic organization of putative genes on the kuruma shrimp BAC clone Mj024A04. Twenty-seven putative genes (boxes) and inter-genic regions (lines) are indicated with transcriptional orientation (arrows). Putative gene 09 is flanked with large GGTTA repeats (double lines).

Redundancy of Mj024A04-sequence homologues in the kuruma shrimp BAC library

To determine what portion of the Mj024A04-sequence were redundant in the MjBL2-kuruma shrimp BAC library, we performed colony hybridization using three distinct probes that correspond to the 5'-end (F), middle (M) and 3'-end (R) of the Mj024A04-sequence (primers used for probe DNA amplification were shown in Additional file 4). Surprisingly, numerous BAC clones were positive for at least one of the three distinct probes used (200 out of 381: 52.5%; results are shown schematically in Additional file 5) suggesting that relevant DNA fragments are highly redundant in the MjBL2-kuruma shrimp BAC library. The 200 positive BAC clones were sorted into 6 groups based on the hybridization pattern (F, F+M, M, F+M+R, M+R and R) (Additional file 5). Furthermore, we examined possible amplification of the 17 out of 27 putative genes (primers in Additional file 4) on 21 BAC clones (3 clones each from 7 groups) that were proven to be independent by DNA fingerprinting with restriction enzymes Hin dIII and Eco RI (see Additional file 6). Ten of the 17 genes (gene 01, 02, 03, 06, 08, 09, 22, 24, 25 and 27) were selected because they match other genes in the databases with E-values less than 1e-10 and the other seven (gene 11 and 13 to 18) were selected because of their homology to WSSV genes. As seen in Figure 2, two genes (01 and 02) were present in group F; seven genes (01 to 03, 06, 08, 11 and 13) in group F+M; seven genes (03, 06, 08, 09, 11 13 and 14) in group M; all genes except gene 09 in group F+M+R; twelve genes (09, 11, 13 to 18, 22, 24, 25 and 27) in group M+R; nine genes (14 to 18, 22, 24, 25 and 27) in group R. As expected, some genes were not detected because size of the inserted kuruma shrimp DNA fragments in the selected BAC clones varied. Nevertheless, it is noted that in the group F+M+R, which were supposed to contain all genes, all of the genes except gene 09, which was assigned as retro-transposon, were indeed detected (Figure 2). The independent amplifications of the retro-transposon (gene 09) in the BAC clone samples can be explained by its known nature, which tend to be randomly integrated [15, 16]. Random appearance of the retro-transposon suggests that the primordial DNA fragment is void of this gene.

Figure 2
figure 2

Amplification of known putative genes using BAC clone samples. F, M and R probes were designed at the 5'-end, middle and 3'-end portion of the Mj024A04-sequence as described in the result. The putative genes (indicated in left column) were detected with BAC clones that showed different hybridization patterns with F, M and R probes (indicated in top line). Three BAC clones for each hybridization group were tested. Reactions with three BAC clones that showed no signal (negative control: neg), Mj024A04 (positive control: pos) and without templates (-) are also included.

Detection of Mj024A04-sequence and its copy number in the kuruma shrimp genome

We next employed Southern blot hybridization to detect multiple copies of Mj024A04-sequence in the kuruma shrimp genome using several different restriction enzymes. Results showed multiple DNA bands for each of the 4 putative genes (gene 01, 09, 16 and 27) (Figure 3), confirming the presence of multiple copies of these genes. In addition, we performed fluorescent in situ hybridization (FISH) using labeled Mj024A04 BAC clone. FISH images clearly showed numerous fluorescence spots in the nucleus of the adult shrimp testis cells (Figure 4). With the duplication of the large repeats, we further examined the copy number of the putative genes by quantitative PCR of gene 01, 09, 16 and 27 using genomic DNAs prepared from 7 different organs (brain, hemocytes, heart, testis, muscle, swimleg, and intestine) and 3 larvae. Our results indicated that copy numbers of those putative genes are 100 times more than the putative single copy gene transglutaminase (TGase), except for the gene 09 (retro-transposon). This suggested the presence of multiple copies of Mj024A04-sequence (Figure 5). Taken all together, our results suggest that large DNA fragment Mj024A04 occurs numerous times in the genome.

Figure 3
figure 3

Southern blot hybridization of kuruma shrimp genomic DNA. Putative genes (gene 01; Birc-2 Prov protein, 09; Reverse transcriptase, 16; WSSV-like and 27; Semaphorin -1A) used for probe synthesis is indicated at the bottom. Genomic DNA was digested with different combinations of restriction enzymes as indicated at the top.

Figure 4
figure 4

FISH analysis of Mj024A04-sequence in adult kuruma shrimp testis cells. Multiple fluorescent signals of Alexa Fluor 594-labeled Mj024A04 are indicated as red spots in the nucleus couterstained with Hoechst 33258 (blue).

Figure 5
figure 5

Copy numbers of 4 putative genes in kuruma shrimp genome. Putative gene 01 (Birc-2 Prov protein), 09 (Reverse Transcriptase), 16 (WSSV-like) and 27 (Semaphorin -1A) were used to calculate copy numbers in different kuruma shrimp tissues as measured by quantitative PCR. Data represent copy numbers of each gene relative to TGase with mean values ± standard deviation (bars) of three experiments.

BAC genotyping and PCR detection of putative genes in Mj024A04-sequence

To exclude a possible cloning bias, we performed BAC genotyping using three microsatellite polymorphisms. 299 different genotypes out of 342 F, M, R positive BAC clones screened from MjBL2 plate 001 to 008 representing 0.2 coverage of shrimp genome were detected (see Additional file 7). PCR-based putative gene detection on eight independent BAC clones selected from different genotypes showed the presence of almost all of the 17 putative genes (Figure 6). Assuming that all shrimps have heterogeneous chromosomes, two genotypes from one allele should be detected. Since we constructed MjBL2 library from 13 individuals, at most 26 different haplotypes were expected for one chromosomal locus. The 299 different genotypes detected indicate that at least 11 different chromosome loci contain duplications of the entire Mj024A04-sequence. As we used only 342 BAC clones from MjBL2 plate 001 to 008, probability estimation method was also performed to estimate how many genotypes could be detected if we performed screening and genotyping with excess BAC clones. This is done with the assumption that all of genotypes were present only once in the 86 diploid chromosomes of the kuruma shrimp [9]. Result indicated 1240 genotypes with 95% confidence interval 960 to 1658, suggesting that at least 48 different chromosome loci might appear in each haploid genome (see Additional file 8).

Figure 6
figure 6

Amplification of known putative genes using random selected BAC clone samples from different genotypes. All of BAC clones used in the BAC genotyping were selected based on the hybridization pattern against F, M and R probes that correspond to 5'-end (F), middle (M) and 3'-end (R) of the Mj024A04-sequence as described in the result and method. All genotypes were classified based on 3 distinct microsatellite repeats as described in the result. The putative genes (indicated in top line) were detected with BAC clones that showed different genotypes (indicated in left column). Reactions without templates (Nt) and primers (Np) are included as negative control. Reactions with Mj024A04 as template are also included as positive control.

Kuruma shrimp Mj024A04-sequence has unique characteristic among arthropod genomes

We performed micro-synteny comparisons of Mj024A04-sequence with genome sequences of other 2 arthropods Drosophila melanogaster (version FB2008_02) [17] and Daphnia pulex (release 1, 2007/07/07) [18] using TBLASTN algorithm. However, we could not find any micro-synteny relation between Mj024A04 and Drosophilla or Daphnia genome (data not shown), suggesting the uniqueness of Mj024A04-sequence within known arthropod genomes.


BAC library construction and BAC-end sequencing for a first characterization of the kuruma shrimp genome

The amount of kuruma shrimp nuclear DNA has been reported to be 2.83 pg indicating that kuruma shrimp genome size is almost the same as other penaeid shrimps such as Litopenaeus vannamei and Penaeus monodon whose genome size are reported to be approximately 2,000 Mbp [19]. In this study, we first constructed BAC library from the kuruma shrimp. Average insert size of MjBL2 BAC clones were estimated to be 135 kb and total MjBL2 insert size could be calculated as approximately 6,600 Mbp, showing that MjBL2 represented 3.3 times coverage of kuruma shrimp genome. Although MjBL2 is not suitable for physical mapping and genome sequencing because it was constructed from 13 shrimps, MjBL2 is useful as the first step for characterizing the kuruma shrimp genome. We performed BES analysis to acquire the first glimpse into the sequence composition of the unsequenced kuruma shrimp genome. The results of BES analysis were very surprising because even with only 192 clones analyzed, we detected 51 contigs and each contigs contained multiple reads varying from 2 to 24. This suggested that following the typical BAC construction method [20], we obtained multiple copies of the same DNA fragments in the kuruma shrimp genome. However, putative genes such as black tiger shrimp IAP gene homologue that we annotated by BLAST does not seem to be the gene that has potential duplication activity like the transposable elements. To further ascertain the abnormality of the kuruma shrimp genome, we further analyzed these DNA segment in the kuruma shrimp genome.

Gene contents of a representative BAC clone Mj024A04

Mj024A04 BAC clone randomly selected from BAC clones that possessed black tiger shrimp IAP sequences was fully sequenced and 27 genes were predicted in silico. Of the 27 predicted genes, we found three genes homologous to IAP. It is known that apoptosis is a genetically programmed pathway of controlled cell suicide that has critical roles in several processes such as development, tissue homeostasis, DNA damage responses and pathological processes [21]. IAPs have been shown to block apoptosis by inhibition of the proteolytic activity of caspases, the central components of the apoptotic machinery, through direct binding of Baculoviral IAP Repeat (BIR) domains present in the IAPs [22]. Cellular homologues called the BIR-domain-containing protein (BIRPs) are characterized by the presence of a variable number of BIR domains. These homologues have been identified in yeasts, nematodes, flies and higher vertebrates [2123]. In Drosophila, four kinds of IAP homologues (Thread or IAP1, IAP2, Bruce and Deterin or CG12265) have been found [24]. We analyzed the phylogenetic relationships of putative gene 01, 06 and 24 with other BIR domains in several organisms (see Additional file 9). The BIR domains in the putative gene 24 were clustered together with the BIR domains found in black tiger shrimp IAP gene, suggesting the putative gene 24 may have the same function as black tiger shrimp IAP [25]. Particularly of interest, putative gene 06 contains five BIR domains and this is the first report on BIRPs containing more than three BIR domains. BIR domains are known to play important roles in protein-protein interactions and it has been shown that the presence of multiple BIR domains in a single protein molecule increases the affinity of BIRPs to a target protein. In addition, the range of target molecules in which BIRPs can interact also increases with the number of BIR domains [22]. Hence, putative gene 06 that has five BIR domains may be a novel BIRP that has a different function.

Furthermore, of the other 24 genes, we found seven genes homologous to ORFs in WSSV. It is known that certain mammalian dsDNA viruses, such as herpesvirus and poxvirus, mimic structure and function of host genes to evade detection and destruction by the host immune system [26]. Similarly, "potential horizontal gene transfers" has been found in baculoviruses, infectious pathogens of insects [27, 28], hence such viral genome structure can be regarded as repositories of important information about host immune processes [29]. The presence of multiple WSSV-like genes in kuruma shrimp genome strongly suggests similar mimicking mechanisms or horizontal gene transfers can also be seen in this virus group. Moreover, with the absence of homologous proteins in the current database, this information will provide a good starting point for understanding unknown WSSV-host interactions.

The first identification of multiple duplications of large DNA segments in the shrimp genome

As high-resolution whole genome sequences are not yet available for Malacostraca or Decapoda species, it is difficult to make any conclusion if multiple copies of peculiar large DNA segments (Mj024A04-sequence) found in kuruma shrimp are also present in other species. However, micro-synteny analysis revealed that Mj024A04-sequence is not found in two other arthropod genomes, Drosophila and Daphnia, suggesting that the duplicated large DNA fragments have occurred after establishment of Malacostraca in the Crustacea.

It is also unclear whether the redundancy is the result of polyploidization or segmental duplication. Previous studies revealed a wide range of chromosome numbers and variation of genomic DNA content in several species in Decapoda, suggesting the possibility of polyploidization. However, re-association kinetics of genomic DNA and electrophoretic analysis of enzyme polymorphism have suggested that polyploidization is considered to be a rare event [30, 31]. Thus, we assumed that highly redundant large DNA segments in the kuruma shrimp may have arose from segmental duplication events.

Segmental duplications (SDs) are duplicated blocks of genomic DNA, typically ranging in size from 1 kb to 200 kb [32]. SDs are composed of apparently normal genomic DNA containing high-copy repeats and gene sequences with intron-exon architecture, hence it is difficult to detect a priori without having well-assigned genome information [32]. In this regard, the human genome is the most studied genome about SDs. Human reference genome contains an abundance of large DNA segments with various copy numbers (from 2 to 18), representing ≥ 5% of the genome, that have been accumulated through evolution over 40 million years [33]. These duplications are shown to be clustered up to 10-fold enrichment within pericentromeric and subtelomeric regions of human chromosomes [32].

SDs are also reported in Drosophila melanogaster[34]. In fly, SDs account for ~1.4% of the genome (1.66 Mbp/118.35 Mbp), ranging from 346 bp to 81.1 kb in length. The Drosophila genome appears to be significantly poor in large (>10 kb) duplicated blocks with only 7.21% as compared to human genome. The chromosome 4 that appears to be enriched in heterochromatic domains and the pericentromeric regions of the chromosomes X, 2 and 3 in Drosophila have also high SD density.

It is reported that subtelomeres are notably rich in degenerate telomeric repeats relative to adjacent single-copy sequences or other genomic regions (~10- and ~100-fold, respectively) in the human genome [33]. We analyzed the number of kuruma shrimp BAC clones harboring GGTTA repeats based on colony hybridization [35]. Results showed that the rate of GGTTA-positive BAC clone are found to be 3 times higher in the BAC clones positive for F, M or R probes than GGTTA-positive rate in all BAC clones tested (45.4% and 17.1%, respectively), suggesting that Mj024A04-sequence and its duplicates are located predominantly in subtelomeric regions and perhaps in pericentromeric regions.

The absence of transcripts of putative genes in Mj024A04 in several tissues of an adult shrimp

We attempted to detect RNA transcripts for some putative genes analyzed in several tissues of kuruma shrimp but gene expression was so weak despite their high copy number. Together with subtelomeric localization, we considered that this low level of gene expression might be caused by epigenetic control mechanisms, such as CpG-methylation, histone-hypoacetylation and histone-methylation. Although we have attempted to detect CpG-methylation in Mj024A04 segments by genomic Sourthern blot analysis with CpG-methylation insensitive restriction enzyme Msp I and its sensitive isoschizomer Hpa II, we could not detect any CpG-methylation indicating that transcription level of Mj024A04 is strictly suppressed by other factors (see Additional file 10).


Genome rearrangements are common phenomena in the eukaryotes, which facilitate not only species diversification but also genetic variation within species. Studies based on the whole genome sequence in primates suggest that significant proportion of the lineage-specific duplication results in different gene expression pattern and mechanistic consequence of changes in the chromosome structure [36]. Furthermore, in a study on Plasmodium falciparum, a causative agent of severe human malaria, the authors revealed that eight SDs, which are located on seven different chromosomes, have copy number polymorphism among different strains. The expression levels of the genes found within the SDs are also correlated in part with the gene copy number [37]. These studies strongly suggest that SDs are widely distributed and play significant roles in making biological differences among closely related species. Biological significance of SDs in kuruma shrimp Marsupenaeus japonicus is still obscure due to lack of the entire genome sequence information of Decapoda species. Nonetheless, it is interesting how SDs and numerous putative genes such as WSSV homologues act in this species. Furthermore, such hyper-expansion of DNA segments should be taken into serious consideration in whole-genome sequencing and effective construction of genetic linkage maps of this economically important species.


BAC library construction and sequencing of BAC ends

Kuruma shrimp BAC library (MjBL2) was constructed according to the protocol as described previously, with minor modification [20]. Briefly, hemocytes from 13 kuruma shrimps were embedded in 1% low melting agarose plugs and digested in the presence of proteinaseK. Those high molecular weight DNA were partially digested with Hin dIII and size fractionated by electrophoresis on CHEF DR-II apparatus (BioRad). Over 150 kb genomic DNA was extracted with NaI and GELase (EPICENTRE), ligated into pBAC-lac vector and used for transformation of E. coli DH10B T1 phage resistant cells (Invitrogen). A total of 49,152 BAC clones were picked and arrayed on 128 microtiter plates each with 384 wells by Q-Pix (Genetix). High Density Replica (HDR) filters were made using Bio Grid (Bio Robotics). BAC-end sequencing was performed in Dragon Genomics Center (Takara Bio, Shiga, Japan) and retrieved AB1 files were processed for clustering using Phred, Phrap and Consed [3840]. To identify significant matches to the deposited sequences in the public database, BLASTN and BLASTX algorithms were employed after masking repeat elements with RepeatMasker (version 3.2.8) [41] using cross-match as a search engine.

Shotgun sequencing, data assembly and analysis

Shotgun library was made from purified DNA of Mj024A04 BAC clone using shotgun library construction kit (Invitrogen). Colony PCR conditions were; an initial denaturation step for 5 min at 95°C, followed by 35 cycles of denaturation step at 95°C for 30 sec, annealing at 55°C for 30 sec and extension step at 72°C for 2 min, and a final extension step at 72°C for 5 min to complete the reaction. M13 forward and reverse primers and rTaq DNA polymerase (Bioneer) mixed in a total volume of 15 μ l was used for the colony PCR. Excess primers and dNTPs were removed by ExoSAP-IT (GE Healthcare), following manufacturing instruction. Sequence reactions were performed with SP6 and T7 primers using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) following manufacturing instruction and electrophoresed with ABI 3130xl Genetic Analyzer (Applied Biosystems). Retrieved AB1 files were base-called and assembled by Phred, Phrap and Consed [3840]. The sequence gaps were closed using a combination of re-sequencing of shotgun clones and BAC direct sequencing. Presumative genes were predicted by GENSCAN [42]. Amino acid sequences of presumed genes were annotated using BLASTP algorithm. Micro-synteny analysis was performed by applying TBLASTN algorithm onto two databases FlyBase (version FB2008_02) [17] and wFleaBase (first release, 2007/07/07) [18].

Southern blot hybridization analysis

Kuruma shrimp genomic DNA (20 μ g) was digested completely with Bam HI, Eco RI, Hin dIII, Bam HI and Eco RI, Bam HI and Hin dIII, Eco RI and Hin dIII, Bgl II and Dra I, respectively and separated using 0.7% agarose gel. After hydrolysis in 0.25 N HCl and denaturation in 1.5 M NaOH and 0.5 M NaCl, the gel was then blotted onto positive charged nylon membranes (Pall Gelman Laboratory) in 0.4 N NaOH. Hybridization was performed with the probe labelled with [α-32P]dCTP using Random Primer DNA Labeling Kit Ver. 2 (Takara) at 42°C in PerfectHyb hybridization solution (TOYOBO) for 4 hrs and washing were carried out 3 times with 2× SSC/0.1% SDS at 50°C for 30 min. The autoradiogram was developed with a STARION FLA-9000 Reader (Fujifilm).

Chromosomal localization of Mj024A04-sequence

Mj024A04 BAC DNA was fluorescent labeled as a FISH probe by nick translation method using the FISH Tag DNA Multicolor kit (Invitrogen) according to manufacture's instructions. The specimens were prepared from the testis cells according to the previous report [43]. After the final heat denaturation of labeled probe and heat denaturation and dehydration of the specimens, hybridization was performed in 2× SSC/65% formamide hybridization buffer at 37°C for 24 hrs. Washings were performed three times with 2× SSC/50% formamide, 1× SSC and 4× SSC/0.1% Tween 20, respectively at 45°C for 5 min. Finally, the specimens were counterstained with Hoechst 33258 (Invitrogen) and examined under a Nikon Eclipse E600 epifluorescence microscope (Nikon). Photographs were taken with a MicroMax Cooled-CCD and IPLab software (Nippon Roper).

Copy number estimation of Mj024A04 genes

Primer pairs for quantitative PCR were designed for 4 predicted genes (gene 01, 09, 16 and 27) and the putative single copy gene, transglutaminase (TGase; DQ436474), using Primer Express Software Version 3.0 (Applied Biosystems) (primers were shown in Additional file 4). 0.1 ng of the kuruma shrimp genomic DNA were prepared from the brain, hemocytes, heart, testis, muscle, swimleg, intestine and 3 larvae were used as template in a 20 μ l reaction mixture containing 10 μ l of SYBR Green PCR Master Mix reagent (Applied Biosystems), 1 μ l of genomic DNA template or plasmid containing target DNA sequences as standard, 8.2 μ l of deionized water and 0.4 μ l of 10 μ M forward and reverse primer. PCR reactions were performed and quantified by the 7300 Real-Time PCR System (Applied Biosystems). All of PCR reactions were performed as follows: 50°C for 2 min and 95°C for 10 min, followed by 40 cycles of 95°C for 15 sec and 60°C for 1 min, with a dissociation stage at 95°C for 15 sec, 60°C for 30 sec and 95°C for 15 sec. The PCR reaction was repeated three times for each template. The copy number of each putative gene was estimated by absolute quantification method and Ct values of the amplified target genomic DNA fragments in each sample were computed by the SDS program, using default parameters.

BAC genotyping using microsatellites

MjBL2 BAC clones showing positive signals against F, M and R probes, which correspond to the 5'-end (F), middle (M) and 3'-end (R) of the Mj024A04-sequence, were used for BAC genotyping with 3 microsatellite markers (MS02, MS33 and MS64; primers used for probe DNA and microsatellite repeats amplification were shown in the Additional file 4). Approximately 15 ng of BAC DNA was used as template in a 10 μ l reaction mixture containing 1 μ l of 10× Ex Taq buffer, 1 μ l of dNTP mixture (2.5 mM each), 0.1 μ l of Ex Taq (5 U/μ l) (Takara), 1 μ l of BAC DNA template, 0.1 μ l of 100 μ M forward and reverse primer and 6.7 μ l of deionized water. PCR reactions were performed by TGradient Thermocycler96 (Biometra) with following condition: first denaturation step at 95°C for 5 min, followed by 40 cycles of 95°C for 30 sec, appropriate annealing temperature for 30 sec and 72°C for 30 sec and a final extension step at 72°C for 5 min. Appropriate annealing temperature determined for each primer pair was 57°C for MS02 and 59°C for MS33 and MS64. Amplified fragments were separated and detected with ABI PRISM 3100 Genetic Analyzer and signal intensity was scored with GeneScan and Genotyper software following instruction manuals (Applied Biosystems). Based on the results obtained from BAC genotyping of 342 BAC clones, we estimated the probable number of different genotypes if more BAC clones (>342) were used for screening and genotyping. The random sampling of size N was performed with the assumption that a population having θ different genotypes was present with the same abundance. The population was supposed to be large enough so that sampling with replacement is satisfied. Then, let Y i be a random outcome in the i-th sampling as follows:

The probability distribution of the initial trial is obviously given as Pr (Y1 = 1) = 1. Furthermore, the conditional distribution of Y i given previous outcomes is expressed by

We now focus on the distribution of an observed number of different genotypes,

The conditional distribution of K n is easily given as

Then, a recursive formula for the marginal distribution of K n can be derived as

once the outcome of K N by the random sampling of size N is observed, a likelihood function of θ (say L(θ)) can be obtain through its probability distribution, which is calculated by the recursive formula above. The parameter θ is then estimated by maximizing L(θ). The 100(1-α)% confidence interval is also derived by the likelihood profile as , where is the maximum likelihood estimated of θ and χ2(1-α) is the upper 100 α-percent of the χ2 distribution with the degree of freedom 1.

Authors' information

Current affiliation:


1. Vertebrate Section, National Fisheries Research and Development Institute, 940 Quezon Ave., Quezon City, 1103, Philippines

2. Biology Department, Ateneo de Manila University, Katipunan Ave., Loyola Heights, Quezon City, 1108 Philippines

RM: Charoen Pokphand Food Public Company Limited, Shrimp Culture Research Center, 82/2 M 4, Rama II Rd., Bangtorat, Amphor Muang, Samutsakorn City, 74000 Thailand


  1. Rees DJ, Dufresne F, Glémet H, Belzile C: Amphipod genome sizes: first estimates for Arctic species reveal genomic giants. Genome. 2007, 50: 151-158. 10.1139/G06-155.

    Article  CAS  PubMed  Google Scholar 

  2. Rheinsmith EL, Hinegardner R, Bachmann K: Nuclear DNA amounts in crustacea. Comp Biochem Physiol B. 1974, 15: 343-348. 10.1016/0305-0491(74)90269-7.

    Google Scholar 

  3. Vaughn JC, Traeger FJ: Conservation of repeated DNA base sequences in Crustacea: a molecular approach to decapod phylogeny. J Mol Evol. 1976, 29: 111-131. 10.1007/BF01732470.

    Article  Google Scholar 

  4. Christie NT, Skinner DM: Evidence for nonrandom alterations in a fraction of the highly repetitive DNA of a eukaryote. Nucleic Acids Res. 1980, 25: 279-298. 10.1093/nar/8.2.279.

    Article  Google Scholar 

  5. McClintock TS, Derby CD: Shelling out for genomics. Genome Biol. 2006, 7: 312-10.1186/gb-2006-7-4-312.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Giribet G, Edgecombe GD, Wheeler WC: Arthropod phylogeny based on eight molecular loci and morphology. Nature. 2001, 413: 157-161. 10.1038/35093097.

    Article  CAS  PubMed  Google Scholar 

  7. Regier JC, Shultz JW, Kambic RE: Pancrustacean phylogeny: hexapods are terrestrial crustaceans and maxillopods arenot monophyletic. Proc Biol Sci. 2005, 272: 395-401. 10.1098/rspb.2004.2917.

    Article  PubMed Central  PubMed  Google Scholar 

  8. Glenner H, Thomsen PF, Hebsgaard MB, Sørensen MV, Willerslev E: Evolution. The origin of insects. Science. 2006, 314: 1883-1884. 10.1126/science.1129844.

    Article  CAS  PubMed  Google Scholar 

  9. Chow S, Dougherty WJ, Sandifer PA: Meiotic chromosome complements and nuclear DNA contents of four species of shrimps of the genus Penaeus. J Crustacean Biol. 1990, 10: 29-36. 10.2307/1548667.

    Article  Google Scholar 

  10. Bachmann K, Rheinsmith EL: Nuclear DNA amounts in pacific Crustacea. Chromosoma. 1973, 43: 225-236. 10.1007/BF00294271.

    Article  CAS  PubMed  Google Scholar 

  11. Maneeruttanarungroj C, Pongsomboon S, Wuthisuthimethavee S, Klinbunga S, Wilson KJ, Swan J, Li Y, Whan V, Chu KH, Li CP, Tong J, Glenn K, Rothschild M, Jerry D, Tassanakajon A: Development of polymorphic expressed sequence tag-derived microsatellites for the extension of the genetic linkage map of the black tiger shrimp (Penaeus monodon). Anim Genet. 2006, 37: 363-368. 10.1111/j.1365-2052.2006.01493.x.

    Article  CAS  PubMed  Google Scholar 

  12. Dalle Nogare DE, Clark MS, Elgar G, Frame IG, Poulter RT: Xena, a full-length basal retroelement from tetraodontid fish. Mol Biol Evol. 2002, 19: 247-255.

    Article  CAS  PubMed  Google Scholar 

  13. van Hulten MC, Witteveldt J, Peters S, Kloosterboer N, Tarchini R, Fiers M, Sandbrink H, Lankhorst RK, Vlak JM: The white spot syndrome virus DNA genome sequence. Virology. 2001, 286: 7-22. 10.1006/viro.2001.1002.

    Article  CAS  PubMed  Google Scholar 

  14. Yang F, He J, Lin X, Li Q, Pan D, Zhang X, Xu X: Complete genome sequence of the shrimp white spot bacilliform virus. J Virol. 2001, 75: 11811-11820. 10.1128/JVI.75.23.11811-11820.2001.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  15. Kazazian HH: Mobile elements: drivers of genome evolution. Science. 2004, 303: 1626-1632. 10.1126/science.1089670.

    Article  CAS  PubMed  Google Scholar 

  16. SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, Bennetzen JL: Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996, 274: 765-768. 10.1126/science.274.5288.765.

    Article  CAS  PubMed  Google Scholar 

  17. FlyBase. []

  18. wFleaBase. []

  19. Zhang X, Zhang Y, Scheuring C, Zhang HB, Huan P, Wang B, Liu C, Li F, Liu B, Xiang J: Construction and Characterization of a Bacterial Artificial Chromosome (BAC) Library of Pacific White Shrimp, Litopenaeus vannamei. Mar Biotechnol (NY). 2009,

    Google Scholar 

  20. Asakawa S, Abe I, Kudoh Y, Kishi N, Wang Y, Kubota R, Kudoh J, Kawasaki K, Minoshima S, Shimizu N: Human BAC library: construction and rapid screening. Gene. 1997, 191: 69-79. 10.1016/S0378-1119(97)00044-9.

    Article  CAS  PubMed  Google Scholar 

  21. Bryant B, Blair CD, Olson KE, Clem RJ: Annotation and expression profiling of apoptosis-related genes in the yellow fever mosquito, Aedes aegypti. Insect Biochem Mol Biol. 2008, 38: 331-345.

    CAS  PubMed Central  PubMed  Google Scholar 

  22. Srinivasula SM, Ashwell JD: IAPs: what's in a name?. Mol Cell. 2008, 30: 123-135. 10.1016/j.molcel.2008.03.008.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  23. Verhagen AM, Coulson EJ, Vaux DL: Inhibitor of apoptosis proteins and their relatives: IAPs and other BIRPs. Genome Biol. 2001, 2: reviews3009.1-3009.10. 10.1186/gb-2001-2-7-reviews3009.

    Article  Google Scholar 

  24. Waterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, Barillas-Mury C, Bian G, Blandin S, Christensen BM, Dong Y, Jiang H, Kanost MR, Koutsos AC, Levashina EA, Li J, Ligoxygakis P, Maccallum RM, Mayhew GF, Mendes A, Michel K, Osta MA, Paskewitz S, Shin SW, Vlachou D, Wang L, Wei W, Zheng L, Zou Z, Severson DW, Raikhel AS, Kafatos FC, Dimopoulos G, Zdobnov EM, Christophides GK: Evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes. Science. 2007, 316: 1738-1743. 10.1126/science.1139862.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  25. Leu JH, Kuo YC, Kou GH, Lo CF: Molecular cloning and characterization of an inhibitor of apoptosis protein (IAP) from the tiger shrimp, Penaeus monodon. Dev Comp Immunol. 2008, 32: 121-133. 10.1016/j.dci.2007.05.005.

    Article  CAS  PubMed  Google Scholar 

  26. Finlay BB, McFadden G: Anti-immunology: evasion of the host immune system by bacterial and viral pathogens. Cell. 2006, 124: 767-782. 10.1016/j.cell.2006.01.034.

    Article  CAS  PubMed  Google Scholar 

  27. Katsuma S, Kawaoka S, Mita K, Shimada T: Genome-wide survey for baculoviral host homologs using the Bombyx genome sequence. Insect Biochem Mol Biol. 2008, 38: 1080-1086. 10.1016/j.ibmb.2008.05.008.

    Article  CAS  PubMed  Google Scholar 

  28. Hughes AL, Friedman R: Genome-wide survey for genes horizontally transferred from cellular organisms to baculoviruses. Mol Biol Evol. 2003, 20: 979-987. 10.1093/molbev/msg107.

    Article  CAS  PubMed  Google Scholar 

  29. Alcami A: Viral mimicry of cytokines, chemokines and their receptors. Nat Rev Immunol. 2003, 3: 36-50. 10.1038/nri980.

    Article  CAS  PubMed  Google Scholar 

  30. Hedgecock D, Tracey ML, Nelson K: Genetics. The Biology of Crustacea. Edited by: Abele LG. 1982, New York: Academic Press, 2: 283-403.

    Google Scholar 

  31. Otto SP, Whitton J: Polyploid incidence and evolution. Annu Rev Genet. 2000, 34: 401-437. 10.1146/annurev.genet.34.1.401.

    Article  CAS  PubMed  Google Scholar 

  32. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297: 1003-1007. 10.1126/science.1072047.

    Article  CAS  PubMed  Google Scholar 

  33. Linardopoulou EV, Williams EM, Fan Y, Friedman C, Young JM, Trask BJ: Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature. 2005, 437: 94-100. 10.1038/nature04029.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  34. Fiston-Lavier AS, Anxolabehere D, Quesneville H: A model of segmental duplication formation in Drosophila melanogaster. Genome Res. 2007, 17: 1458-1470. 10.1101/gr.6208307.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  35. Alcivar-Warren A, Meehan-Meola D, Wang Y, Guo X, Zhou L, Xiang J, Moss S, Arce S, Warren W, Xu Z, Bell K: Isolation and mapping of telomeric pentanucleotide (TAACC) n repeats of the Pacific whiteleg shrimp, Penaeus vannamei, using fluorescence in situ hybridization. Mar Biotechnol (NY). 2006, 8: 467-480. 10.1007/s10126-005-6031-z.

    Article  CAS  Google Scholar 

  36. Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong P, Wilson RK, Pääbo S, Rocchi M, Eichler EE: A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature. 2005, 437: 88-93. 10.1038/nature04000.

    Article  CAS  PubMed  Google Scholar 

  37. Mok BW, Ribacke U, Sherwood E, Wahlgren M: A highly conserved segmental duplication in the subtelomeres of Plasmodium falciparum chromosomes varies in copy number. Malar J. 2008, 7: 46-10.1186/1475-2875-7-46.

    Article  PubMed Central  PubMed  Google Scholar 

  38. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.

    Article  CAS  PubMed  Google Scholar 

  39. Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.

    Article  CAS  PubMed  Google Scholar 

  40. Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8: 195-202.

    Article  CAS  PubMed  Google Scholar 

  41. RepeatMasker. []

  42. GENSCAN. []

  43. Hasegawa M: Simple and rapid technique for a chromosome study of crustacea [in Japanese]. Researches on Crustacea. 1981, 11: 110-112.

    Google Scholar 

  44. InterProScan. []

Download references


This work was supported by the Sasakawa Scientific Research Grant from The Japan Science Society and Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ikuo Hirono.

Additional information

Authors' contributions

TKoyama designed and performed all molecular biology experiments and drafted the manuscript. SA and NS assisted in the design of the experiments and helped to draft the manuscript. TKatagiri and RM assisted in the construction of BAC library. FFF assisted in the construction of BAC library and helped to draft the manuscript. AS assisted in the BAC sequence assembly. KF and TS assisted in the BAC genotyping. TKitakado performed statistical analysis of BAC genotyping data. MDS helped to draft the manuscript. HK, TA and IH conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: BAC-ends anchored gene homologues identified by BLAST search. The significant matches (E-value < 0.1) of BAC-End-Sequences against public databases are shown in the list. (XLS 32 KB)


Additional file 2: 27 putative genes with nearest homologues in the BAC clone Mj024A04. The BLASTP top hits and the E-values are shown. (XLS 22 KB)


Additional file 3: Exon-intron architecture of putative genes in BAC clone Mj024A04. Exon-intron architectures predicted by GENSCAN are shown. (XLS 20 KB)


Additional file 4: Primers used in this study. The orientation (forward or reverse) of all primers is indicated by 'f' or 'r' in the end of each primer name. Primers whose name is 'Gene--f or r' were used for PCR detection of each putative gene fragment from BAC clones. Primers for quantitative PCR of 4 putative genes (gene 01, 09, 16 and 27) in the BAC clone Mj024A04 and TGase for internal control are indicated by 'qt' at the beginning of each primer name. Primers for three microsatellite markers are indicated by 'MS' at the beginning of each primer name. Three microsatellites repeat regions (02, 33 and 64) were determined by RepeatMasker program [41]. Forward primers (f) were labeled by 6-FAM and reverse primers (r) were designed with tailed primer (Applied Biosystems). (XLS 22 KB)


Additional file 5: Schematic representation and frequency of BAC clones that hybridized with probe F, M and R. Number and percentage of positive clones in each group are shown. Data were based on the hybridization results for 381 clones in MjBL2 Plate 24. Location of each probes used in this study is indicated by red boxes. (PDF 33 KB)


Additional file 6: DNA fingerprints of kuruma shrimp BAC clones. 3 BAC clones from each hybridization positive groups (represented by positive probes at the top of each figures) and negative (neg) group were randomly selected. BAC DNA of all clones and Mj024A04 (B) were digested with Eco RI and Hin dIII. (PDF 2 MB)


Additional file 7: BAC genotyping using three microsatellite markers (MS02, 33 and 64) in Mj024A04. Genotypes representing the same size as the three microsatellite markers were taken as the same group. One base difference was regarded as experimental error. (XLS 46 KB)


Additional file 8: Calculation of the number of genotypes in shrimp genomes. Possible numbers of total genotypes were calculated using recursive formula for the marginal distribution and observed number of different genotypes. X-axis indicates the number of genotypes (θ). Y-axis indicates log-likelihood function of each given number of genotype. 90% and 95% confidence intervals (CI) are indicated above. (PDF 22 KB)


Additional file 9: Phylogenetic tree of BIR domains of BIRPs. BIR domains in the putative IAP genes found in Mj024A04 were compared with BIR domains from several organisms. Amino acid sequences of putative IAP gene 01, 06 and 24 were predicted by GENSCAN [42]. Each BIR domains was identified using InterProScan (version 22.0) [44]. Multiple sequence alignment and the phylogenetic tree of BIR domains were constructed using ClustalW after excluding all gap positions and assigning confidence of 1000 bootstrap samples. If multiple BIR domains were observed in a single gene, they are labelled alphabetically at the end of the gene's name. The GenBank identifier (GI) numbers for BIRP amino acid sequences and regions of BIR domains used in the analysis are as follows: bir-1_CAEEL (17564820; 15-88), bir-2_CAEEL (17557418; 22-99 and 165-242), Bir1p_SACCE (6322548; 20-117 and 153-241), bruce_DROME (45550729; 246-322), Bruce_HOMSA (153792694; 284-360), cIAP-1_HOMSA (14770185; 44-115, 182-252 and 267-338), cIAP-2_HOMSA (13639695; 27-98, 167-237 and 253-324), deterin_DROME (21355525; 26-102), gp019_BMNPV (9630835; 27-98 and 129-201), gp041_OPMNV (9629979; 22-93 and 124-195), gp242_MSEV (9631408; 15-77), IAP_GVCP (1170470; 5-75 and 106-177), IAP_PENMO (133754273; 12-83, 103-173 and 253-324), Iap2B_DROME (28573797; 7-78, 111-181 and 210-281), ML-IAP_HOMSA (11545910; 85-156), NAIP_HOMSA (119393878; 58-129, 157-229 and 276-347), OpIAP_ORGPSMNPV (9629973; 16-86 and 109-180), sfIAP_SPOFR (7021325; 98-168 and 208-279), Survivin_HOMSA (59859878; 13-89), survivin_SCHPO (162312092; 20_100 and 115-195), threadB_DROME (24664971; 42-112 and 224-295), VF193_IIV6 (33302608; 35-110), XIAP_HOMSA (12643387; 24-95, 161-232 and 263-332). GeneIDs in which Daphnia plex BIRPs were retrieved from wFleaBase [18] and regions of the BIR domains used in the analysis are as follows: Bruce_DAPPL (NCBI_GNO_248214; 320-396), Deterin_DAPPL (NCBI_GNO_774064; 21-105), IAP2_DAPPL (NCBI_GNO_324854; 9-75 and 158-229), thread_DAPPL (NCBI_GNO_284524; 52-122 and 158-229). BIR domains of putative gene 01, 06 and 24 used in the analysis are as follows: Gene 01 (18-95, 124-194 and 244-315), Gene 06 (36-106, 124-199, 268-339, 659-730 and 787-858), Gene 24 (9-80, 100-170 and 248-319). (PDF 380 KB)


Additional file 10: Southern blot hybridization of putative genes for detection of CpG-methylation. Kuruma shrimp genomic DNA (20 μ g) was digested completely, electrophoresed and blotted. Hybridization and washing were performed under low stringency condition at 42°C. The restriction enzymes that were used are indicated by their initials (M; Msp I, H; Hpa II). The putative gene that was used for probe synthesis is indicated at the bottom. Left lane is λ/Hin dIII marker as size standard. (PDF 482 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Koyama, T., Asakawa, S., Katagiri, T. et al. Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus. BMC Genomics 11, 141 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: