Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus

Background Higher crustaceans (class Malacostraca) represent the most species-rich and morphologically diverse group of non-insect arthropods and many of its members are commercially important. Although the crustacean DNA sequence information is growing exponentially, little is known about the genome organization of Malacostraca. Here, we constructed a bacterial artificial chromosome (BAC) library and performed BAC-end sequencing to provide genomic information for kuruma shrimp (Marsupenaeus japonicus), one of the most widely cultured species among crustaceans, and found the presence of a redundant sequence in the BAC library. We examined the BAC clone that includes the redundant sequence to further analyze its length, copy number and location in the kuruma shrimp genome. Results Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences. Conclusions Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the duplicated DNA segments, our result suggested that it is difficult to analyze the shrimp genome following normal analytical methodology. Hence, it is necessary to avoid repetitive sequence (such as segmental duplications) when studying the other unique structures in the shrimp genome.

Results: Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences. Conclusions: Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the duplicated DNA segments, our result suggested that it is difficult to analyze the shrimp genome following normal analytical methodology. Hence, it is necessary to avoid repetitive sequence (such as segmental duplications) when studying the other unique structures in the shrimp genome.

Background
The genomes of crustaceans are extremely diverse in their size, with the smallest one having a C-value of 0.14 pg and the largest one weighing 64.62 pg, differing by a factor of 460 [1,2]. Despite their economic importance and production in huge biomass, little is known about the genome organization of crustaceans, especially Malacostraca (including shrimps and crabs) except for the presence of numerous repetitive sequences [3][4][5].
Recently, although the genomic DNA sequence of a crustacean, water flea Daphnia pulex, has been determined, it seems improper to make any conclusion on the crustacean genome because recent phylogenetic analysis based on the DNA sequence data and morphology comparison between Hexapoda (including insects) and Crustacea provided an unexpected finding that Branchiopoda (including the water flea Daphnia) is phylogenetically much closer to Hexapoda rather than Malacostraca [6][7][8]. Therefore, the crustacean genome, in particular the genetic differences between Branchiopoda and Hexapoda group and other sister groups need to be elucidated.
The penaeid shrimp, which is classified into Decapoda in Malacostraca, has been the subject of intense research. Due to its commercial value, several papers on expressed sequence tag (EST) analysis and genetic linkage mapping has been published in the past few years. However, in depth information of their large genome, which is estimated to be about 70% of human genome in size and rich in AT and AAT sequences, is largely unknown [9][10][11]. As the first step towards understanding the shrimp genome organization, we constructed a BAC library (named MjBL2) from kuruma shrimp (Marsupenaeus japonicus) and performed BAC-end sequencing. The results clearly showed extreme redundancy of certain sequences in many BAC clones of the MjBL2 library. We chose one BAC clone (Mj024A04) for detailed analysis in terms of its entire sequence and redundancy in the shrimp genome and found numerous copies of DNA segments that contain the Mj024A04sequence. This indicates that hyper-expansion of such peculiar DNA segments occurred through segmental duplication events during evolution of the kuruma shrimp genome.

BAC library construction and BAC-end sequencing
To provide an overview of the composition and organization of the kuruma shrimp nuclear genome, we constructed BAC library (MjBL2) using kuruma shrimp genomic DNA prepared from hemocytes of 13 shrimps and analyzed the BAC-end sequence (BES). MjBL2 consists of 49,152 BAC clones, which were arrayed in 128 microtiter plates and stored at -80°C. The average insert size was estimated to be 135 kb by NotI digestion of 205 randomly selected BAC clones. BES analysis was further performed using 192 BAC clones randomly selected from MjBL2 and retrieved reads were assembled for contiguity [DDBJ: AG993477-AG993734]. Resulting BESs were classified into 29 singletons and 51 contiguous sequences consisting of 2 to 24 reads. Notably, the BLASTN and BLASTX analyses revealed that many of these BESs (20 reads in BLASTN, 55 reads in BLASTX) contained a sequence encoding a protein similar to "inhibitor of apoptosis protein (IAP)" reported in black tiger shrimp (Penaeus monodon) (see Additional file 1).

DNA sequence of a representative BAC clone
One of the BAC clones (Mj024A04) that possessed a sequence similar to "black tiger shrimp IAP" gene was randomly selected from MjBL2 for detailed analysis of its entire DNA sequence by employing shotgun sequencing method. The resulting genomic DNA sequence of 120 kb (Mj024A04-sequence) was analyzed by in silico annotation, revealing 27 putative genes that apparently seemed to be normal genomic region with exon-intron structure ( Figure 1, see Additional file 2 and Additional file 3) [DDBJ: AP010878]. As shown in Figure 1, large GGTTA repeats were found in the middle of the sequence flanking gene 09, which encodes a protein similar to a reverse transcriptase of Takifugu rubripes [12]. Notably, of the other 26 genes, three genes (gene 01, 06, and 24) encode a protein homologous to IAP of three species, Xenopus laevis (african clawed frog), Drosophila melanogaster (fruit fly) and Rattus norvegicus (norway rat) and seven genes (gene 11, 13, 14, 15, 16, 17 and 18) were homologous to ORFs in "White Spot Syndrome Virus (WSSV)", the major shrimp pathogenic dsDNA virus, which is highly virulent to penaeid shrimps as well as other crustaceans such as crabs and crayfish [13,14].

Redundancy of Mj024A04-sequence homologues in the kuruma shrimp BAC library
To determine what portion of the Mj024A04-sequence were redundant in the MjBL2-kuruma shrimp BAC library, we performed colony hybridization using three distinct probes that correspond to the 5'-end (F), middle (M) and 3'-end (R) of the Mj024A04-sequence (primers used for probe DNA amplification were shown in Additional file 4). Surprisingly, numerous BAC clones were positive for at least one of the three distinct probes used (200 out of 381: 52.5%; results are shown schematically in Additional file 5) suggesting that relevant DNA fragments are highly redundant in the MjBL2-kuruma shrimp BAC library. The 200 positive BAC clones were sorted into 6 groups based on the hybridization pattern (F, F+M, M, F+M+R, M+R and R) (Additional file 5). Furthermore, we examined possible amplification of the 17 out of 27 putative genes (primers in Additional file 4) on 21 BAC clones (3 clones each from 7 groups) that were proven to be independent by DNA fingerprinting with restriction enzymes HindIII and EcoRI (see Additional file 6). Ten of the 17 genes (gene 01, 02, 03, 06, 08, 09, 22, 24, 25 and 27) were selected because they match other genes in the databases with E-values less than 1e-10 and the other seven (gene 11 and 13 to 18) were selected because of their homology to WSSV genes. As seen in Figure 2, two genes (01 and 02) were present in group F; seven genes (01 to 03, 06, 08, 11 and 13) in group F+M; seven genes (03, 06, 08, 09, 11 13 and 14) in group M; all genes except gene 09 in group F+M+R; twelve genes (09, 11, 13 to 18, 22, 24, 25 and 27) in group M+R; nine genes (14 to 18, 22, 24, 25 and 27) in group R. As expected, some genes were not Figure 1 Schematic organization of putative genes on the kuruma shrimp BAC clone Mj024A04. Twenty-seven putative genes (boxes) and inter-genic regions (lines) are indicated with transcriptional orientation (arrows). Putative gene 09 is flanked with large GGTTA repeats (double lines).

Figure 2
Amplification of known putative genes using BAC clone samples. F, M and R probes were designed at the 5'-end, middle and 3'end portion of the Mj024A04-sequence as described in the result. The putative genes (indicated in left column) were detected with BAC clones that showed different hybridization patterns with F, M and R probes (indicated in top line). Three BAC clones for each hybridization group were tested. Reactions with three BAC clones that showed no signal (negative control: neg), Mj024A04 (positive control: pos) and without templates (-) are also included. detected because size of the inserted kuruma shrimp DNA fragments in the selected BAC clones varied. Nevertheless, it is noted that in the group F+M+R, which were supposed to contain all genes, all of the genes except gene 09, which was assigned as retro-transposon, were indeed detected ( Figure 2). The independent amplifications of the retro-transposon (gene 09) in the BAC clone samples can be explained by its known nature, which tend to be randomly integrated [15,16]. Random appearance of the retro-transposon suggests that the primordial DNA fragment is void of this gene.
Detection of Mj024A04-sequence and its copy number in the kuruma shrimp genome We next employed Southern blot hybridization to detect multiple copies of Mj024A04-sequence in the kuruma shrimp genome using several different restriction enzymes. Results showed multiple DNA bands for each of the 4 putative genes (gene 01, 09, 16 and 27) ( Figure  3), confirming the presence of multiple copies of these genes. In addition, we performed fluorescent in situ hybridization (FISH) using labeled Mj024A04 BAC clone. FISH images clearly showed numerous fluorescence spots in the nucleus of the adult shrimp testis cells (Figure 4). With the duplication of the large repeats, we further examined the copy number of the putative genes by quantitative PCR of gene 01, 09, 16 and 27 using genomic DNAs prepared from 7 different organs (brain, hemocytes, heart, testis, muscle, swimleg, and intestine) and 3 larvae. Our results indicated that copy numbers of those putative genes are 100 times more than the putative single copy gene transglutaminase (TGase), except for the gene 09 (retro-transposon). This suggested the presence of multiple copies of Mj024A04-sequence ( Figure 5). Taken all together, our results suggest that large DNA fragment Mj024A04 occurs numerous times in the genome.

BAC genotyping and PCR detection of putative genes in Mj024A04-sequence
To exclude a possible cloning bias, we performed BAC genotyping using three microsatellite polymorphisms. 299 different genotypes out of 342 F, M, R positive BAC clones screened from MjBL2 plate 001 to 008 representing 0.2 coverage of shrimp genome were detected (see Additional file 7). PCR-based putative gene detection on eight independent BAC clones selected from different genotypes showed the presence of almost all of the 17 putative genes ( Figure 6). Assuming that all shrimps have heterogeneous chromosomes, two genotypes from one allele should be detected. Since we constructed MjBL2 library from 13 individuals, at most 26 different haplotypes were expected for one chromosomal locus. The 299 different genotypes detected indicate that at least 11 different chromosome loci contain duplications of the entire Mj024A04-sequence. As we used only 342 BAC clones from MjBL2 plate 001 to 008, probability estimation method was also performed to estimate how many genotypes could be detected if we performed screening and genotyping with excess BAC clones. This is done with the assumption that all of genotypes were present only once in the 86 diploid chromosomes of the kuruma shrimp [9]. Result indicated 1240 genotypes with 95% confidence interval 960 to 1658, suggesting that at least 48 different chromosome loci might appear in each haploid genome (see Additional file 8).

Discussion
BAC library construction and BAC-end sequencing for a first characterization of the kuruma shrimp genome The amount of kuruma shrimp nuclear DNA has been reported to be 2.83 pg indicating that kuruma shrimp genome size is almost the same as other penaeid shrimps such as Litopenaeus vannamei and Penaeus monodon whose genome size are reported to be approximately 2,000 Mbp [19]. In this study, we first constructed BAC library from the kuruma shrimp. Average insert size of MjBL2 BAC clones were estimated to be 135 kb and total MjBL2 insert size could be calculated as approximately 6,600 Mbp, showing that MjBL2 represented 3.3 times coverage of kuruma shrimp genome. Although MjBL2 is not suitable for physical mapping and genome sequencing because it was constructed from 13 shrimps, MjBL2 is useful as the first step for characterizing the kuruma shrimp genome. We performed BES analysis to acquire the first glimpse into the sequence composition of the unsequenced kuruma shrimp genome. The results of BES analysis were very surprising because even with only 192 clones analyzed, we detected 51 contigs and each contigs contained multiple reads varying from 2 to 24. This suggested that following the typical BAC construction method [20], we obtained multiple copies of the same DNA fragments in the kuruma shrimp genome. However, putative genes such as black tiger shrimp IAP gene homologue that we annotated by BLAST does not seem to be the gene that has potential duplication activity like the transposable elements. To further ascertain the abnormality of the kuruma shrimp genome, we further analyzed these DNA segment in the kuruma shrimp genome.

Gene contents of a representative BAC clone Mj024A04
Mj024A04 BAC clone randomly selected from BAC clones that possessed black tiger shrimp IAP sequences was fully sequenced and 27 genes were predicted in silico. Of the 27 predicted genes, we found three genes homologous to IAP. It is known that apoptosis is a genetically programmed pathway of controlled cell suicide that has critical roles in several processes such as development, tissue homeostasis, DNA damage responses and pathological processes [21]. IAPs have been shown to block apoptosis by inhibition of the proteolytic activity of caspases, the central components of the apoptotic machinery, through direct binding of Baculoviral IAP Repeat (BIR) domains present in the IAPs [22]. Cellular homologues called the BIR-domaincontaining protein (BIRPs) are characterized by the presence of a variable number of BIR domains. These homologues have been identified in yeasts, nematodes, flies and higher vertebrates [21][22][23]. In Drosophila, four kinds of IAP homologues (Thread or IAP1, IAP2, Bruce and Deterin or CG12265) have been found [24]. We analyzed the phylogenetic relationships of putative gene 01, 06 and 24 with other BIR domains in several organisms (see Additional file 9). The BIR domains in the putative gene 24 were clustered together with the BIR domains found in black tiger shrimp IAP gene, suggesting the putative gene 24 may have the same function as black tiger shrimp IAP [25]. Particularly of interest, putative gene 06 contains five BIR domains and this is the first report on BIRPs containing more than three BIR domains. BIR domains are known to play important roles in protein-protein interactions and it has been shown that the presence of multiple BIR domains in a single protein molecule increases the affinity of BIRPs to a target protein. In addition, the range of target molecules in which BIRPs can interact also increases with the number of BIR domains [22]. Hence, putative gene 06 that has five BIR domains may be a novel BIRP that has a different function.
Furthermore, of the other 24 genes, we found seven genes homologous to ORFs in WSSV. It is known that certain mammalian dsDNA viruses, such as herpesvirus and poxvirus, mimic structure and function of host genes to evade detection and destruction by the host immune system [26]. Similarly, "potential horizontal gene transfers" has been found in baculoviruses, infectious pathogens of insects [27,28], hence such viral genome structure can be regarded as repositories of important information about host immune processes [29]. The presence of multiple WSSV-like genes in kuruma shrimp genome strongly suggests similar mimicking mechanisms or horizontal gene transfers can also be seen in this virus group. Moreover, with the absence of homologous proteins in the current database, this information will provide a good starting point for understanding unknown WSSV-host interactions.
The first identification of multiple duplications of large DNA segments in the shrimp genome As high-resolution whole genome sequences are not yet available for Malacostraca or Decapoda species, it is difficult to make any conclusion if multiple copies of peculiar large DNA segments (Mj024A04-sequence) found in kuruma shrimp are also present in other species. However, micro-synteny analysis revealed that Mj024A04-sequence is not found in two other arthropod genomes, Drosophila and Daphnia, suggesting that the duplicated large DNA fragments have occurred after establishment of Malacostraca in the Crustacea.
It is also unclear whether the redundancy is the result of polyploidization or segmental duplication. Previous studies revealed a wide range of chromosome numbers and variation of genomic DNA content in several species in Decapoda, suggesting the possibility of polyploidization. However, re-association kinetics of genomic DNA and electrophoretic analysis of enzyme polymorphism have suggested that polyploidization is considered to be a rare event [30,31]. Thus, we assumed that highly redundant large DNA segments in the kuruma shrimp may have arose from segmental duplication events.
Segmental duplications (SDs) are duplicated blocks of genomic DNA, typically ranging in size from 1 kb to 200 kb [32]. SDs are composed of apparently normal genomic DNA containing high-copy repeats and gene sequences with intron-exon architecture, hence it is difficult to detect a priori without having well-assigned genome information [32]. In this regard, the human genome is the most studied genome about SDs. Human reference genome contains an abundance of large DNA segments with various copy numbers (from 2 to 18), representing ≥ 5% of the genome, that have been accumulated through evolution over 40 million years [33]. These duplications are shown to be clustered up to 10fold enrichment within pericentromeric and subtelomeric regions of human chromosomes [32].
SDs are also reported in Drosophila melanogaster [34]. In fly, SDs account for~1.4% of the genome (1.66 Mbp/ 118.35 Mbp), ranging from 346 bp to 81.1 kb in length.
The Drosophila genome appears to be significantly poor in large (>10 kb) duplicated blocks with only 7.21% as compared to human genome. The chromosome 4 that appears to be enriched in heterochromatic domains and the pericentromeric regions of the chromosomes X, 2 and 3 in Drosophila have also high SD density.
It is reported that subtelomeres are notably rich in degenerate telomeric repeats relative to adjacent singlecopy sequences or other genomic regions (~10-and 100-fold, respectively) in the human genome [33]. We analyzed the number of kuruma shrimp BAC clones harboring GGTTA repeats based on colony hybridization [35]. Results showed that the rate of GGTTA-positive BAC clone are found to be 3 times higher in the BAC clones positive for F, M or R probes than GGTTA-positive rate in all BAC clones tested (45.4% and 17.1%, respectively), suggesting that Mj024A04sequence and its duplicates are located predominantly in subtelomeric regions and perhaps in pericentromeric regions.

The absence of transcripts of putative genes in Mj024A04 in several tissues of an adult shrimp
We attempted to detect RNA transcripts for some putative genes analyzed in several tissues of kuruma shrimp but gene expression was so weak despite their high copy number. Together with subtelomeric localization, we considered that this low level of gene expression might be caused by epigenetic control mechanisms, such as CpG-methylation, histone-hypoacetylation and histonemethylation. Although we have attempted to detect CpG-methylation in Mj024A04 segments by genomic Sourthern blot analysis with CpG-methylation insensitive restriction enzyme MspI and its sensitive isoschizomer HpaII, we could not detect any CpG-methylation indicating that transcription level of Mj024A04 is strictly suppressed by other factors (see Additional file 10).

Conclusions
Genome rearrangements are common phenomena in the eukaryotes, which facilitate not only species diversification but also genetic variation within species. Studies based on the whole genome sequence in primates suggest that significant proportion of the lineage-specific duplication results in different gene expression pattern and mechanistic consequence of changes in the chromosome structure [36]. Furthermore, in a study on Plasmodium falciparum, a causative agent of severe human malaria, the authors revealed that eight SDs, which are located on seven different chromosomes, have copy number polymorphism among different strains. The expression levels of the genes found within the SDs are also correlated in part with the gene copy number [37]. These studies strongly suggest that SDs are widely distributed and play significant roles in making biological differences among closely related species. Biological significance of SDs in kuruma shrimp Marsupenaeus japonicus is still obscure due to lack of the entire genome sequence information of Decapoda species. Nonetheless, it is interesting how SDs and numerous putative genes such as WSSV homologues act in this species. Furthermore, such hyper-expansion of DNA segments should be taken into serious consideration in whole-genome sequencing and effective construction of genetic linkage maps of this economically important species.

BAC library construction and sequencing of BAC ends
Kuruma shrimp BAC library (MjBL2) was constructed according to the protocol as described previously, with minor modification [20]. Briefly, hemocytes from 13 kuruma shrimps were embedded in 1% low melting agarose plugs and digested in the presence of proteina-seK. Those high molecular weight DNA were partially digested with HindIII and size fractionated by electrophoresis on CHEF DR-II apparatus (BioRad). Over 150 kb genomic DNA was extracted with NaI and GELase (EPICENTRE), ligated into pBAC-lac vector and used for transformation of E. coli DH10B T1 phage resistant cells (Invitrogen). A total of 49,152 BAC clones were picked and arrayed on 128 microtiter plates each with 384 wells by Q-Pix (Genetix). High Density Replica (HDR) filters were made using Bio Grid (Bio Robotics). BAC-end sequencing was performed in Dragon Genomics Center (Takara Bio, Shiga, Japan) and retrieved AB1 files were processed for clustering using Phred, Phrap and Consed [38][39][40]. To identify significant matches to the deposited sequences in the public database, BLASTN and BLASTX algorithms were employed after masking repeat elements with RepeatMasker (version 3.2.8) [41] using cross-match as a search engine.

Shotgun sequencing, data assembly and analysis
Shotgun library was made from purified DNA of Mj024A04 BAC clone using shotgun library construction kit (Invitrogen). Colony PCR conditions were; an initial denaturation step for 5 min at 95°C, followed by 35 cycles of denaturation step at 95°C for 30 sec, annealing at 55°C for 30 sec and extension step at 72°C for 2 min, and a final extension step at 72°C for 5 min to complete the reaction. M13 forward and reverse primers and rTaq DNA polymerase (Bioneer) mixed in a total volume of 15 μl was used for the colony PCR. Excess primers and dNTPs were removed by ExoSAP-IT (GE Healthcare), following manufacturing instruction. Sequence reactions were performed with SP6 and T7 primers using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) following manufacturing instruction and electrophoresed with ABI 3130xl Genetic Analyzer (Applied Biosystems). Retrieved AB1 files were base-called and assembled by Phred, Phrap and Consed [38][39][40]. The sequence gaps were closed using a combination of resequencing of shotgun clones and BAC direct sequencing. Presumative genes were predicted by GENSCAN [42]. Amino acid sequences of presumed genes were annotated using BLASTP algorithm. Micro-synteny analysis was performed by applying TBLASTN algorithm onto two databases FlyBase (version FB2008_02) [17] and wFleaBase (first release, 2007/07/07) [18].

Southern blot hybridization analysis
Kuruma shrimp genomic DNA (20 μg) was digested completely with BamHI, EcoRI, HindIII, BamHI and EcoRI, BamHI and HindIII, EcoRI and HindIII, BglII and DraI, respectively and separated using 0.7% agarose gel. After hydrolysis in 0.25 N HCl and denaturation in 1.5 M NaOH and 0.5 M NaCl, the gel was then blotted onto positive charged nylon membranes (Pall Gelman Laboratory) in 0.4 N NaOH. Hybridization was performed with the probe labelled with [a-32 P]dCTP using Random Primer DNA Labeling Kit Ver. 2 (Takara) at 42°C in PerfectHyb hybridization solution (TOYOBO) for 4 hrs and washing were carried out 3 times with 2× SSC/0.1% SDS at 50°C for 30 min. The autoradiogram was developed with a STARION FLA-9000 Reader (Fujifilm).

Chromosomal localization of Mj024A04-sequence
Mj024A04 BAC DNA was fluorescent labeled as a FISH probe by nick translation method using the FISH Tag DNA Multicolor kit (Invitrogen) according to manufacture's instructions. The specimens were prepared from the testis cells according to the previous report [43]. After the final heat denaturation of labeled probe and heat denaturation and dehydration of the specimens, hybridization was performed in 2× SSC/65% formamide hybridization buffer at 37°C for 24 hrs. Washings were performed three times with 2× SSC/50% formamide, 1× SSC and 4× SSC/0.1% Tween 20, respectively at 45°C for 5 min. Finally, the specimens were counterstained with Hoechst 33258 (Invitrogen) and examined under a Nikon Eclipse E600 epifluorescence microscope (Nikon). Photographs were taken with a MicroMax Cooled-CCD and IPLab software (Nippon Roper).

Copy number estimation of Mj024A04 genes
Primer pairs for quantitative PCR were designed for 4 predicted genes (gene 01, 09, 16 and 27) and the putative single copy gene, transglutaminase (TGase; DQ436474), using Primer Express Software Version 3.0 (Applied Biosystems) (primers were shown in Additional file 4). 0.1 ng of the kuruma shrimp genomic DNA were prepared from the brain, hemocytes, heart, testis, muscle, swimleg, intestine and 3 larvae were used as template in a 20 μl reaction mixture containing 10 μl of SYBR Green PCR Master Mix reagent (Applied Biosystems), 1 μl of genomic DNA template or plasmid containing target DNA sequences as standard, 8.2 μl of deionized water and 0.4 μl of 10 μM forward and reverse primer. PCR reactions were performed and quantified by the 7300 Real-Time PCR System (Applied Biosystems). All of PCR reactions were performed as follows: 50°C for 2 min and 95°C for 10 min, followed by 40 cycles of 95°C for 15 sec and 60°C for 1 min, with a dissociation stage at 95°C for 15 sec, 60°C for 30 sec and 95°C for 15 sec. The PCR reaction was repeated three times for each template. The copy number of each putative gene was estimated by absolute quantification method and Ct values of the amplified target genomic DNA fragments in each sample were computed by the SDS program, using default parameters.

BAC genotyping using microsatellites
MjBL2 BAC clones showing positive signals against F, M and R probes, which correspond to the 5'-end (F), middle (M) and 3'-end (R) of the Mj024A04-sequence, were used for BAC genotyping with 3 microsatellite markers (MS02, MS33 and MS64; primers used for probe DNA and microsatellite repeats amplification were shown in the Additional file 4). Approximately 15 ng of BAC DNA was used as template in a 10 μl reaction mixture containing 1 μl of 10× Ex Taq buffer, 1 μl of dNTP mixture (2.5 mM each), 0.1 μl of Ex Taq (5 U/ μl) (Takara), 1 μl of BAC DNA template, 0.1 μl of 100 μM forward and reverse primer and 6.7 μl of deionized water. PCR reactions were performed by TGradient Thermocycler96 (Biometra) with following condition: first denaturation step at 95°C for 5 min, followed by 40 cycles of 95°C for 30 sec, appropriate annealing temperature for 30 sec and 72°C for 30 sec and a final extension step at 72°C for 5 min. Appropriate annealing temperature determined for each primer pair was 57°C for MS02 and 59°C for MS33 and MS64. Amplified fragments were separated and detected with ABI PRISM 3100 Genetic Analyzer and signal intensity was scored with GeneScan and Genotyper software following instruction manuals (Applied Biosystems). Based on the results obtained from BAC genotyping of 342 BAC clones, we estimated the probable number of different genotypes if more BAC clones (>342) were used for screening and genotyping. The random sampling of size N was performed with the assumption that a population having θ different genotypes was present with the same abundance. The population was supposed to be large enough so that sampling with replacement is satisfied. Then, let Y i be a random outcome in the i-th sampling as follows: 1 if the genotype in the -th sample is newly observed 0 0 o w . .

⎧ ⎨ ⎩
The probability distribution of the initial trial is obviously given as Pr (Y 1 = 1) = 1. Furthermore, the conditional distribution of Y i given previous outcomes is expressed by The conditional distribution of K n is easily given as Then, a recursive formula for the marginal distribution of K n can be derived as once the outcome of K N by the random sampling of size N is observed, a likelihood function of θ (say L(θ)) can be obtain through its probability distribution, which is calculated by the recursive formula above. The parameter θ is then estimated by maximizing L(θ). The 100 (1-a)% confidence interval is also derived by the likeli- where is the maximum likelihood estimated of θ and c 2 (1-a) is the upper 100 a-percent of the c 2 distribution with the degree of freedom 1.