Molecular phenotype of zebrafish ovarian follicle by serial analysis of gene expression and proteomic profiling, and comparison with the transcriptomes of other animals

Background The ability of an oocyte to develop into a viable embryo depends on the accumulation of specific maternal information and molecules, such as RNAs and proteins. A serial analysis of gene expression (SAGE) was carried out in parallel with proteomic analysis on fully-grown ovarian follicles from zebrafish (Danio rerio). The data obtained were compared with ovary/follicle/egg molecular phenotypes of other animals, published or available in public sequence databases. Results Sequencing of 27,486 SAGE tags identified 11,399 different ones, including 3,329 tags with an occurrence superior to one. Fifty-eight genes were expressed at over 0.15% of the total population and represented 17.34% of the mRNA population identified. The three most expressed transcripts were a rhamnose-binding lectin, beta-actin 2, and a transcribed locus similar to the H2B histone family. Comparison with the large-scale expressed sequence tags sequencing approach revealed highly expressed transcripts that were not previously known to be expressed at high levels in fish ovaries, like the short-sized polarized metallothionein 2 transcript. A higher sensitivity for the detection of transcripts with a characterized maternal genetic contribution was also demonstrated compared to large-scale sequencing of cDNA libraries. Ferritin heavy polypeptide 1, heat shock protein 90-beta, lactate dehydrogenase B4, beta-actin isoforms, tubulin beta 2, ATP synthase subunit 9, together with 40 S ribosomal protein S27a, were common highly-expressed transcripts of vertebrate ovary/unfertilized egg. Comparison of transcriptome and proteome data revealed that transcript levels provide little predictive value with respect to the extent of protein abundance. All the proteins identified by proteomic analysis of fully-grown zebrafish follicles had at least one transcript counterpart, with two exceptions: eosinophil chemotactic cytokine and nothepsin. Conclusion This study provides a complete sequence data set of maternal mRNA stored in zebrafish germ cells at the end of oogenesis. This catalogue contains highly-expressed transcripts that are part of a vertebrate ovarian expressed gene signature. Comparison of transcriptome and proteome data identified downregulated transcripts or proteins potentially incorporated in the oocyte by endocytosis. The molecular phenotype described provides groundwork for future experimental approaches aimed at identifying functionally important stored maternal transcripts and proteins involved in oogenesis and early stages of embryo development.


Background
Folliculogenesis and oogenesis include the formation of ovarian follicles, the initiation and completion of meiosis, and the accumulation of specific information and molecules such as RNAs, proteins, or imprinted genes in the female germ cells to sustain embryo development to the stage where zygotic gene activation takes over [1][2][3].
The zebrafish, Danio rerio, is currently the most popular fish model in developmental and genomic analyses and a genome sequencing project is currently underway [4]. A large number of expressed sequence tags (ESTs) is already available [5]. A number of methods are currently used for gene expression profiling. They differ in scale, economy, and sensitivity. Delineation of the transcriptome of teleost fish ovaries has been evaluated using large-scale EST sequencing [6] of cDNA libraries of zebrafish [7] and Atlantic salmon [8,9] or subtractive hybridisation of cDNA libraries [10] of medaka [11] or rainbow trout [12] gonads. While these methods give an idea of transcript abundance or enrichment in a specific tissue, a few genes expressed at high levels usually represent a large proportion of the total transcripts and are thus more frequently represented in the EST database [13].
Serial analysis of gene expression (SAGE) based on the enumeration of directionally reliable short cDNA sequences (tags), provides qualitative as well as quantitative analysis of a large number of genes in a defined tissue [14,15]. This is a method of choice for discovering novel genes and spliced variants. This technique has been widely applied in human studies and various SAGE tags/ SAGE libraries have been generated from different cells/ tissues, including human oocytes [16,17], thus enabling the successful identification of differentially expressed genes in normal physiological processes and pathological conditions [18].
The aim of this work was to profile the transcriptome of fully-grown zebrafish follicles using the SAGE method and compare it with the protein repertoire determined at the same stage of oogenesis after one-(1D) and twodimensional (2D)-polyacrylamide-gel electrophoresis (PAGE) protein fractionation and in-gel proteolysis, followed by tandem mass spectrometry (MS/MS) identification of the resulting peptides [19]. The database generated was compared with several vertebrate ovary/unfertilized egg transcriptomes generated with the large-scale EST sequencing approach in order to identify functionally important maternal transcripts and proteins stored in germ cells.

Zebrafish fully-grown follicle transcriptome
An essential step in SAGE library analysis is the unambiguous assignment of each SAGE tag to the corresponding mRNA or EST sequence. This tag-to-gene mapping requires an initial in silico SAGE tag extraction of virtual tags found in public EST/cDNA sequence databases. Existing web sites [20][21][22][23] provide the correspondence between SAGE tags and transcripts. A limited number of species have been subjected to SAGE analysis, as seen on the SAGEmap web site [20,24], which presents SAGE tag to UniGene mapping for eighteen species. As no previous study had used the SAGE strategy in any fish species, there was no in silico SAGE tag database available for zebrafish. A generic computer package named FISHTAG was designed to extract virtual SAGE tags from UniGene and TIGR EST databases and generate a zebrafish in silico SAGE tag database, ZEBRATAG. Applying SAGE to the zebrafish fully-grown ovarian follicle generated a catalogue of 27,486 sequenced tags, ZEBRAOV. The list of these experimental SAGE tags and their relative frequencies were deposited in the NCBI gene expression and hybridization array data repository (GEO) [25] under accession number GSE3679 (Additional file 1: Table 1). Analysing these tags led to the identification of 11,399 unique transcripts, including 3,329 tags with an occurrence superior to one. ZEBRAOV was subdivided into eighty-three abundance classes, from 1 to 284 tag copies per tag species (Figure 1), according to frequency of occurrence.
Around 87% (2,898) and 67% (2,223) of ZEBRAOV nonsingleton tags were assigned using FISHTAG to at least one UniGene (Build#87) or TIGR (release 16.0) cluster, respectively. Fifty-eight transcripts, identified by their SAGE tags, were expressed at over 0.15%, i.e. the number of times a tag was counted in ZEBRAOV was at least 42, (Table 1), accounted for 17.34% of the mRNA population identified and represented forty-six of the most abundant classes (Figure 1). Among these SAGE tags, thirty-seven were assigned, using the FISHTAG software package with UniGene database Build#87 as a reference, to at least one UniGene cluster in a correct position of sense (R1) or antisense (R1cr) transcript sequences, which were annotated or not. Nineteen tags were multiple-matched, i.e. matching over one UniGene cluster. Two tags did not match any reference cDNA sequences in GenBank™/EBI Data Bank after extraction of their SAGE tags at the first three positions and could not, thus, be classified. Unmatched SAGE tags in the SAGE library could be due to the presence of genes in the zebrafish genome or spliced variant transcripts that had not previously been identified by EST data. Some of these unidentified transcripts were widely expressed in the zebrafish fully-grown follicle transcriptome (see Table 1).  The most abundant tag in ZEBRAOV was recovered 284 times, thus representing 1.033% of the mRNA population evaluated by the SAGE method (Table 1). This SAGE tag was identified four times on chromosome 22 genomic contig (GenBank (gb) accession number gb|NW_634208|) and in two deduced transcript variants, gb|XM_702755| and gb|XM_704598|, the latter being nearly identical to the consensus sequence of full-length ovary cDNA entries gb|BM141044| and gb|CO351300|. These transcripts are part of a large conserved protein family with at least thirteen transcript variants centred on locus LOC561392 of chromosome 22. Deduced protein sequence gb|BM141044| was 41.6% identical in 125 amino acid overlap with the N-terminal part of human latrophilin-2 precursor protein (UniProt (up) accession number up|O95490|). It contains the galactose/rhamnose-binding lectin domain found in numerous proteins with sugar binding properties (Pfam accession number PF02140) [26], including the two domains found in rhamnose-binding lectins in catfish (up|Q9PVW8|) and rainbow trout (up|Q9IB51|, up|Q9IB52|, up|Q9IB53|) eggs. It should be noted that UniGene (ug) cluster number ug|Dr.12439| enclosing SAGE tag N°1 was in fact a chimera cluster also containing EST sequences from heterogeneous nuclear ribonucleoprotein K (hnrpk) transcript gb|NM_212994| and, therefore, could not be used as a valid reference UniGene cluster. hnrpk is in fact located on chromosome 8 in the zebrafish genome (Ensembl Gene ID ENSDARG00000018914).
A transcribed locus (gb|XM_701504|) similar to the H2B (h2b) histone family and derived from an annotated Distribution of ZEBRAOV library SAGE tags by abundance classes Figure 1 Distribution of ZEBRAOV library SAGE tags by abundance classes. The library was subdivided into 83 abundance classes, from 1 to 284 tag copies per tag species, according to the occurrence frequency of each tag species. Tag frequencies were plotted on a logarithmic scale on the x-axis against the number of SAGE tag species in each class on the y-axis. The corresponding UniGene cluster ug|Dr.46793| was a chimera cluster, while the correct EST cluster of this transcribed portion of the zebrafish genome was located in TIGR (TC291160). We also observed that a Transcriptional activity of zebrafish fully-grown follicles and ovaries evaluated by the composition of the ZEBRAOV SAGE tag database and EST clusters in two selected cDNA libraries Figure 2 Transcriptional activity of zebrafish fully-grown follicles and ovaries evaluated by the composition of the ZEBRAOV SAGE tag database and EST clusters in two selected cDNA libraries.  The gene products for the following enzymes: lactate dehydrogenase B4 (ldhb) (SAGE tag N°11), subunit 9 of ATP synthase of mitochondrial F0 complex (atp5g) (SAGE tag N°13), retinol dehydrogenase 10 (rdh10) (SAGE tag N°28), a protein similar to cytochrome oxidase III (SAGE tag N°32), glutathione S-transferase pi (gstp1) (SAGE tag N°35), and a proteolytic enzyme similar to cathepsin S (SAGE tag N°41) were also recovered.

Comparison of ZEBRAOV with zebrafish ovary cDNA libraries
The cumulative percentages of SAGE tag species or EST clusters were plotted in order of abundance, according to the cumulative percentages of tag or EST counts, providing a comparative view of transcriptional activity (Figure *The total number of cDNA clones sequenced was 576 to generate ZEBRAOV instead of 11,344 with cDNA library ID.9767 and 13,029 with cDNA library ID.15519. **GenBank accession number of the sequence from which the SAGE-tag was extracted. ***UniGene Build#87. All the tags were at R1 sense position with the exception of notch1a and notch1b transcripts where a Rn1 sense position was found. Nbr, is the number of times a tag was counted in ZEBRAOV. n.d., the corresponding transcript was non-detected. The values indicated are the percentages of UniGene clusters as evaluated by the percentages of corresponding tags or EST counts in the mRNA population identified in ZEBRAOV SAGE tag database and dbEST libraries. Localisation of selected maternal transcripts in zebrafish fully-grown follicles as revealed by whole-mount in situ hybridization Figure 3 Localisation of selected maternal transcripts in zebrafish fully-grown follicles as revealed by whole-mount in situ hybridization. A, Polarization of the oocyte along the animal-vegetal axis was visualized using two colour whole-mount in situ hybridization. Cyclin B1 (ccnb1) mRNAs were identified at animal pole (arrow) after probe labelling with fluoresceinlabelled antisense riboprobes and Deleted AZoospermia-Like (dazl) transcripts located at vegetal pole (arrowhead) after probe labelling with digoxigenin-labelled antisense riboprobes. The hybridization signal is coloured red with fluorescein and dark brown to blue with digoxigenin. B, The animal pole localisation of transcripts similar to rhamnose-binding lectin at the fullygrown follicle stage (stage IV) was detected with digoxigenin-labelled antisense riboprobes. C, No staining signal was observed using sense riboprobes of transcripts similar to rhamnose-binding lectin. D-H, Stage-specific polarised distribution of metallothionein 2 (mt2) short transcript isoform in stage IB (D, E), II (F), early stage III (G), and stage IV (H) follicles. The met2 hybridization signal detected at early stages was widely distributed in the ooplasma, whereas it concentrated at the animal pole from early stage III to the end of vitellogenesis. For stages III and IV, the animal pole, oriented toward top of page, is indicated by an arrow. Scale bar = 100 µm.
Dr.11308 Ribosomal protein, large, P0  (Table 2B). ZEBRAOV and ID.15519 had three clusters in common, i.e. ribosomal protein L3 (rpl3), rdh10, and ccna1 (Table  2C). Twenty transcripts expressed at over 0.15% of the total mRNA population in ZEBRAOV were recovered below this limit in the ovary cDNA libraries, while some of them were not expressed at all in library ID.15519 (Table 2D) EST data from the two selected zebrafish ovary cDNA libraries were also compared with ZEBRAOV to determine the sensitivity for detecting transcripts with a characterized maternal genetic contribution [29] (Table 3). Using the same maternal transcripts as the target sequences, the ZEBRAOV database was around 1.8 times more sensitive, for the detection of these transcripts, than the EST method previously used to describe the zebrafish ovary transcriptome.

Polarised distribution of metallothionein 2 transcripts and those similar to rhamnose-binding lectins in zebrafish oocyte
The polarization of oocytes along the animal-vegetative axis was visualized with ccnb1 mRNAs, identified at the animal pole, and Deleted AZoospermia-Like (dazl) located at the vegetative pole ( Figure 3A). The ccnb1 transcript was SAGE tag N°106, with a tag occurrence of 25, detected in both ovary cDNA libraries ( Table 3). The dazl transcript was SAGE tag N°3122, with a tag occurrence of 2, found only in library ID.15519 with two EST sequence entries. Whole-mount in situ hybridization using an RNA labelled probe that was potentially capable of hybridizing with all rhamnose-binding lectin transcripts variants (SAGE tag N°1, tag occurrence 284), due to their very high sequence conservation, revealed preferential polarization at the animal pole ( Figure 3B). Whole-mount in situ hybridization using a specific mt2 (SAGE tag N°5, tag occurrence 142) riboprobe confirmed the presence of this mRNA in zebrafish ovarian follicles, although only four and six EST sequence entries were found in libraries  *Nbr, is the number of times a tag was counted in ZEBRAOV. **Zebrafish UniGene Build#87. ***SDS-PAGE slices numbered I to XI or twodimensional PAGE excised spot area numbered 1 to 24 (see Figure 4). ID.15519 and ID.9767, respectively. As with rhamnosebinding lectins, colocalization of the mt2 transcript with ccnb1 mRNA by two-colour whole-mount in situ hybridization demonstrated a signal restricted to the animal pole of the oocyte (data not shown). The distribution of this transcript was stage-dependent ( Figure 3D-H). The hybridization signal was homogeneously distributed in stages I and II of oogenesis and restricted from early stage III to the animal pole of the oocyte.

Comparison of ZEBRAOV with ovary/egg functional genomic data from other vertebrate and non-vertebrate species
The annotated transcripts expressed at over 0.15% in zebrafish fully-grown ovarian follicles ( hsp90b, ldhb, atp5g, and 40S ribosomal protein S27a, were also some of the most expressed genes in the vertebrate ovaries. The other abundant ZEBRAOV transcripts recovered in at least one vertebrate species at a homologous cluster frequency >0.15% were zp3 or related gene zp3b, cldnd, rpl3, and ccna1. The cirbp transcript was also widely distributed, but in a lower relative proportion of mRNA. The annotated transcripts of ZEBRAOV were also compared with published data obtained from human germinal vesicle (GV)-stage oocytes by PCR-SAGE [16]. This human SAGE tag library has not been deposited at the GEO database. The published short-list of human tags was checked against UniGene Build#187 using SAGEmap tools. Out of a total of 175 SAGE tags analysed, 81 were identified in R1 sense position. This updated list was compared with the homologous UniGene clusters expressed at over 0.15% in ZEBRAOV. Three homologous clusters were identified in the human oocyte catalogue, i.e. actin gamma 1 (ACTG1)/actin beta (ACTB) (ug|Hs.514581| and ug|Hs.520640|), ZP glycoprotein 3 (ZP3) (ug|Hs.488877|), and heat shock 90 kDa protein 1, beta (HSPCB) (ug|Hs.509736|). These clusters were also recovered at high levels in SAGE tag libraries deposited on SAGEmap for human ovarian cancer cell lines (e.g. GEO accession number GSM726). Comparison between the updated annotated list of human GV oocyte and ZEBRAOV clusters revealed that, in addition to ACTG1/ ACTB, ZP3, and HSPCB, other homologous expressed clusters, identified by their SAGE tag at R1 sense position and expressed above 0.01% of the total expressed transcripts, had also been detected in ZEBRAOV: tubulin alpha 6 (TUBA6) and beta 4 (TUBB4), programmed cell death 5 (PDCD5), proliferating cell nuclear antigen (PCNA), barrier to autointegration factor 1 (BANF1), guanine nucleotide binding protein, beta polypeptide 2-like 1 (GNB2L1), CD9 antigen (CD9), and glyceraldehyde-3phosphate dehydrogenase (GAPD) were found.
The most expressed transcripts in ZEBRAOV (Table 1), including the conserved expressed transcripts of vertebrate ovaries/unfertilized eggs (Table 4), were used to screen ovary/egg dbEST or SAGE tag libraries from non-vertebrate species. As expected, some homologous housekeeping genes, which encode ribosomal proteins or proteins responsible for cell structure, were among the most highly-expressed genes in these libraries.

Comparison of zebrafish follicle protein repertoire deduced from SAGE with the protein repertoire isolated after proteomic analysis
The proteins extracted from fully-grown follicles were resolved by 1D-SDS-PAGE or 2D-PAGE, subjected to ingel tryptic digestion, and analysed by MS/MS. The protein repertoire determined was then compared with the repertoire deduced from ZEBRAOV (Table 5). Forty-three out of a total of sixty proteins identified by proteomic analysis were initially retrieved using 1D-SDS-PAGE fractionation, forty-one by 2D-PAGE fractionation, and twenty-four were common to both. Potential molecular functions of the proteins identified using proteomic analysis according to GO terms were: 26% structural molecule activity, 25% binding properties, including 3% nucleic acid binding, 22% catalytic activity, 3% each for transporter, translation-regulator, and enzyme-regulator activity, 2% each for signal transducer, antioxidant, and electron transporter activity, while the remaining 12% had no postulated molecular function. The three most abundant categories were: (i) structural molecules, represented by beta-actin, tubulin, and ZP variant isoforms, as well as ribosomal proteins; (ii) binding proteins, mostly chaperonins and heat shock proteins; and (iii) proteins with a catalytic activity, mostly oxidoreductases, like acyl-Coenzyme A dehydrogenase or enolases, and transferases, like creatine kinase and pyruvate kinase. Comparison of transcriptome and proteome data indicated that forty-three proteins were recovered with a corresponding transcript identified by an experimental SAGE tag at a correct, R1 or R1cr, position. Seven proteins were also found with a corresponding tag in R2/R3 or Rn1/Rn2 position, and eight with corresponding multiple-matched R1 or R1cr tags. Comparing transcriptome and proteome also revealed a weak predictive value between mRNA and protein abundance. The MS-based protein identification approach recovered around 23% of the proteins, including bactins, zp2.3, zp2.4, zp3, zp3b, tubb2, ldhb, and ribosomal protein large, P0 (rplp0), deduced from transcripts expressed at over 0.15% and identified by an experimental SAGE tag in a correct, R1 or R1cr, position (Table 1). All the proteins identified by proteomic analysis had at least one transcript counterpart in ZEBRAOV, with two exceptions, eosinophil chemotactic cytokine (chia) and nothepsin (nots). UniGene cluster ug|Dr.831| of the chia transcript was detected three and five times in ovary cDNA banks ID.15519 and ID.9767, respectively. UniGene cluster ug|Dr.10788| of nots transcript was not detected either zebrafish cDNA library used. Zebrafish chia encoded a protein highly similar to zebrafish protein isoforms similar to chitinase (gb|XP_708403.1|, gb|XP_686386|) or proteins encoding by multiple chitinase genes in rainbow trout (gb|CAD59687|) and Japanese flounder (gb|BAD15061|) and, to a lesser extent, acidic mammalian chitinase precursor, e.g. up|Q9BZP6| in humans. Zebrafish nots encoded a protein similar to vertebrate aspartic-type endopeptidases, such as zebrafish cathepsin D (up|Q8JH28|) and human cathepsin E (up|P14091|).
Proteins synthesized by bactin1 and bactin2 were not resolved by the proteomic analysis, due to the amino acid sequence identity of these isoforms. However, corresponding differentially expressed transcripts were discriminated by a specific SAGE tag, due to divergent 3'untranslated part sequences. On the contrary, a well conserved divergent 3'-untranslated part sequence led to a common SAGE tag identified in zp2.3 and zp2.4 transcripts, while a distinct SAGE tag was recovered with zp2.2 (Tables 1 and 5). The high sequence similarities of these three protein isoforms led to an unsolved protein identification on the gel map produced by proteomic analysis. Identical peptide sequences were also identified with tubulin alpha isoforms and recovered in different areas after 2D-PAGE fractionation. Sequence differences or additional identified peptides made it possible to discriminate between tubulin alpha 1, alpha 8 like 4, alpha 2, and alpha 3. In all cases, specific SAGE tags were identified for each tubulin transcript isoform, even if isoform alpha 8 like 4 was identified using a multiple-matched tag.
Furthermore, comparison of transcriptome and proteome data also revealed that two different SAGE tags, with an occurrence of 3, were identified for the unannotated deduced protein zgc:103482 (up|Q5XJA5|), due to the presence of a 3'-untranslated region that could be extended by 127 bp, as revealed by the nucleotide sequence of dbEST clone gb|BM101604|. The unique common translated region of both transcripts encoded a protein that was part of the described proteomic profile.
Vitellogenin (VTG) derivatives were identified in most of the gel pieces excised from the 1D-or 2D-PAGE. It should be noted that VTG 1 (vg1) SAGE tag (occurrence 4) and VTG 3 (vg3) multiple-matched SAGE tag (occurrence 3) were extracted from ZEBRAOV. Other annotated cleaved proteins revealed by the identification of specific peptide sequences with different mass values for the protein of interest were ZP glycoprotein forms, alpha 1 and alpha 8 like 4 tubulins, enolase 3, elongation factor 1-gamma, and nothepsin. Annotated proteins with non-cleavage posttranslational modifications, predicted by variation of the isoelectric point of the protein of interest, were: VTG derivatives, ZP glycoproteins, alpha 1 and alpha 8 like 4 tubulins, enolase 3, elongation factor 1-gamma, mitochondrial aldehyde dehydrogenase 2, chaperonin containing TCP1 subunit 6A, serpin a1, creatine kinase, and pyruvate kinase.
The zebrafish follicle protein repertoire determined by proteomic analysis (Table 5) and the corresponding transcript levels inferred from ZEBRAOV or EST count from ovary libraries ID.9767 and ID.15519 was compared with the protein repertoire deduced from transcripts expressed at over 0.15% in human ovary cDNA libraries ID.4908, ID.5611, and ID.10552 ( Table 6). Some of the proteins identified in zebrafish fully-grown follicles have a high, i.e. bactins, tubb2, ldhb, and rplp0, or moderate, i.e. glyceraldehyde-3-phosphate dehydrogenase (gapd), ribosomal proteins S3 (rps3), elongation factor 1-gamma (eef1g), transcript count of homologous UniGene clusters in human ovary cDNA libraries. However, human clusters homologous to ribosomal proteins L7a (rpl7a) and SA (rpsa), pyruvate kinase (pkm2), and enolase 1 alpha (eno1) were highly expressed in human cDNA libraries, while the homologous transcripts were counted at very low levels in the zebrafish transcriptome, even if these proteins had been identified by proteomic analysis. It should be noted that ZEBRAOV contained an additional enolase family member homologous to enolase 3 beta (eno3), with a moderate expression level (ug|Dr.25678|, 0.109%), while its human annotated counterpart (ug|Hs.224171| was expressed at very low levels or not in human cDNA libraries used for analyses. Enolase transcript (ug|Dm.18435|) was detected in Drosophila ovary dbEST libraries ID.1058 (0.052%) and ID.1059 (0.102%).

Discussion
As in other vertebrates [1], somatic gonadal cells in zebrafish surround a single oocyte to establish a follicle [31]. The entire folliculogenesis process, from primary growth to post-vitellogenic stage takes about ten days in zebrafish [32]. Since large numbers of follicles at different developmental stages are easily obtained year round in this species, zebrafish offer an excellent alternative model for analysing some fundamental aspects of ovarian development and regulation [33], as well as identifying conserved maternal factors [29], which are important in early stages of embryo development. This study analysed the transcriptome and proteome of zebrafish fully-grown ovarian follicles and compared these data with other animal ovary/follicle/egg molecular phenotypes published or available in public sequence databases.
The delineation of the transcriptome of teleost fish ovaries has already been evaluated using large-scale EST sequencing of cDNA libraries [7][8][9], subtractive hybridisation [11], and microarray-based analyses [12]. These large-scale strategies were used, together with digital differential display analysis, to identify expressed genes specific to ovaries/follicles/oocytes and early embryos in mice [34][35][36][37][38][39], bovines [40][41][42], rats [43,44], and humans [45]. While these methods give an idea of transcript abundance or enrichment in a specific tissue, it has been demonstrated that the SAGE method is reproducible [46], provides an unbiased, quantitative report of gene expression, that may be correlated with microarray data [47,48], and seems more efficient than EST-based methods for discovering novel genes and spliced variant transcripts [13]. However, one limitation of the SAGE method is the presence of transcripts that produce multiple matched tags [ [49][50][51], this study]. SAGE has been applied to human oocytes [16,17], and silkworm eggs [52], and successfully identified differentially expressed genes in human ovarian carcinomas and normal ovarian surface epithelium [53]. Large-scale analyses of proteomes from mouse oolemmal proteins [54], matured pig oocyte proteins [55], microtubule-associated proteins from Xenopus egg extracts [56], and proteins extracted from Drosophila oocytes [57] have also been carried out. To our knowledge, no previous study had analysed the transcriptome and proteome profiles of samples of ovarian origin at the same biological stage on a large-scale.
The transcript repertoire obtained using the SAGE method is an accurate picture of gene expression on both qualitative and quantitative levels and gives a global expression profile of transcripts present in zebrafish fully-grown ovarian follicles. Sequencing of 27,486 SAGE tags identified 11,399 different tag species, classified into 3,437 Uni-Gene clusters with tags in position R1 or R1cr, including 3,329 tag species with an occurrence greater than one. Comparative analysis of transcriptional activity, using the ZEBRAOV SAGE tag database and dbEST libraries currently available for zebrafish ovaries, revealed a globally similar pattern between ZEBRAOV and the ID.9767 library ( Figure 2). However, a clearly different quantitative pattern was obtained with library ID.15519, due to an over-representation of the number of EST sequences attached to a small number of unique transcripts. This bias is commonly observed with the EST sequencing method [13]. Consequently, some of the abundant transcripts found by the SAGE method were not detected in library ID.15519. For example, a transcript moderately similar to zygote arrest 1 (ZAR1), with a domain similar to the atypical homeodomain (PHD) finger motif found in ZAR1 in vertebrate species, including zebrafish [58], is highly expressed in ZEBRAOV, moderately in library ID.9767, and not in library ID.15519. In a second example, the transcript of signal sequence receptor beta (ssr2), also called translocon-associated protein beta, was detected at high levels with SAGE, and very low levels with dbEST ovary library sequencing (Table 2). This protein is part of the translocon-associated protein (TRAP) complex required for the translocation of nascent polypeptides into the lumen of the endoplasmic reticulum, and the corresponding zebrafish ssr2 mRNA is maternally supplied to the egg [59]. We also found the ZEBRAOV database around 1.8 times more sensitive than EST sequencing in detecting transcripts with a characterized maternal genetic contribution (Table 3), even if the number of cDNA clones sequenced to generate the ZEBRAOV SAGE tags database was around 21 times lower. However, this number is not sufficient to identify some of these transcripts, as demonstrated with the maternal-effect vasa transcript that was not detected in ZEBRAOV. The vasalike genes are expressed in the germ cells of many animal species [60], including zebrafish oocytes and early-stage embryos [61,62]. Furthermore, the presence of unmatched tags in the SAGE library generated from ovarian follicles indicates the presence of genes in the zebrafish genome or spliced variant transcripts that had not previously been identified by EST data. A broader snapshot of gene expression was therefore obtained by SAGE, as previously reported [13,46]. It should be pointed out that some of these unidentified transcripts are largely expressed in the zebrafish fully-grown follicle transcriptome.
Comparison of the transcriptome of zebrafish fully-grown follicles as evaluated by the SAGE method with ovary/egg transcriptomes available for other animal species revealed both similarities and differences. SAGE revealed the presence of several tags corresponding to novel transcripts, some highly expressed in the zebrafish fully-grown follicle transcriptome and well-conserved in vertebrates. As expected, some of the most abundant transcripts identified in zebrafish, corresponding to some ribosomal proteins or translated to housekeeping genes, including betaactins, and tubulins, or well known ovary-enriched proteins, like ZP protein isoforms or cyclins [7,63], are wellconserved in the ovarian transcriptomes of other fish species. Homologous highly-expressed transcripts were also recovered in mammals and Xenopus transcriptomes and, to a lesser extent, in silkworm, nematode, sea urchin, and amphioxus egg profiles [34,37] (Table 4, and Results section). Some of these transcripts are members of multigene families that may be widely expressed in the zebrafish fully-grown follicle transcriptome, e.g., ZP protein isoform transcripts. This high transcript level may be restricted to a few members of other gene families, as illustrated with claudin genes. Claudins, the major tight junction transmembrane proteins, are members of the tetraspanin protein superfamily that mediate cellular adhesion and migration [64]. Numerous claudin genes have been identified in zebrafish [65] but only the cldnb transcript was recovered at very high levels in ZEBRAOV and library ID.9767, and cldng in library ID.9767. Some of the claudin isoform transcripts of maternal origin are then downregulated in the early stages of zebrafish embryogenesis [66].
The most abundant transcript in zebrafish fully-grown follicles belongs to a large conserved protein family containing one domain with sequence similarities to the galactose/rhamnose-binding lectin domain found in numerous proteins with sugar binding properties. This domain was initially characterized in sea urchin (Anthocidaris crassispina) egg lectin (SUEL) [67]. It was then characterized in rhamnose-binding lectins from of rainbow trout eggs (Oncorhynchus mykiss), which consist of two homologous SUEL domains repeated in tandem [68]. It has been suggested that this domain plays a role from egg maturation to fertilization [69]. Rhamnosebinding lectin in catfish (Parasilurus asotus) is composed of three tandem-repeat domains homologous to the SUEL lectin domain [70]. A cysteine-rich domain homologous to the SUEL protein has been also identified in the N-terminal part of mammalian latrophilin-2 precursor protein [71].
The SAGE approach also revealed numerous transcripts highly expressed in zebrafish that were not previously known to be significantly expressed by zebrafish ovaries, including mt2, hsp90b, ldhb, atp5g, fth1, cirbp, rplp0, and 40S ribosomal protein S27a. The relative abundance of molecules stored in oocytes may differ between species but some of the abundant transcripts found in zebrafish follicles are common highly-expressed transcripts in vertebrate ovaries/unfertilized eggs (Table 4). It is noteworthy that almost all ribosomal protein transcripts identified from the SAGE tags, expressed at over 0.15%, were recovered below this limit from the two selected zebrafish ovary cDNA libraries. In some cases, these differences may be related to the loss of these small size transcripts after size selection of cDNAs during construction of the libraries, a process that did not occur using the SAGE method.
Short mt2 transcript is a very good example of the quantitative as well as qualitative original data obtained after SAGE analysis. We found that mt2 was very abundantly expressed in zebrafish oocytes, at a level ten times higher than that previously inferred from analysis of zebrafish ovary dbEST libraries (Table 2). This difference in mt2 transcript levels may be due to the loss of this small size transcript during cDNA library construction. An enrichment of this transcript in fully-grown oocytes versus ovaries is less plausible due to the asynchronous development of zebrafish ovaries, containing oocytes at different stages in development [31]. In addition, wholemount in situ hybridization demonstrated a strong stagedependent mt2 polarized hybridization signal in the cyto-plasm of zebrafish oocytes (Figure 3). These data are consistent with the metallothionein activity content of zebrafish oocytes [72] and the presence of this transcript before the mid-blastula transition of the embryo [73]. Metallothionein transcripts were also recovered from sea urchin egg and salmon ovary dbEST libraries, as well as lizard ovarian follicles (Podarcis sicula), with the highest level in ovulated eggs [74]. Expression of the rat Mt2 gene is also strongly regulated during primordial follicle assembly and development in rat ovaries [44]. SAGE may also help to distinguish between the expressions of several isoforms at the 3'-end of a transcript. In the same UniGene mt2 cluster a second mt2 transcript, identified with its in silico SAGE tag, contained an identical sequence in the coding region but a long untranslated 3'-part. This long transcript was not expressed in zebrafish fully-grown follicles. It should be noted that differential expression of 3'end transcript isoforms was easily identified using the SAGE method, as also demonstrated with the ccnb2 transcript.
In addition to the mt2 short transcript, abundant transcripts of heavy chain ferritins, including fth1, related to metal binding, were also detected in ZEBRAOV. This is in accordance with a disproportionately high number of salmon ovary assembled ESTs seen in GO categories related to heavy metal (copper, iron, and zinc) [9] and the presence of ferritin H mRNA in rainbow trout (Oncorhynchus mykiss) eggs [75]. Homologous genes to zebrafish fth1 are expressed in all vertebrate ovary dbEST libraries available at UniGene, with very high relative levels in salmon, swine, dog, and human libraries. Ferritin-containing inclusions were demonstrated in yolk platelets of schistosome (Schistosoma mansoni) [76], a species in which a female-specific yolk ferritin transcript is expressed at high levels in the vitellarium [77]. Ferritin also occurs in amphibian [78,79] and snail [80] eggs. It should be noted that high-level expression of ferritin H-chain mRNA is observed in metastatic human ovarian tumours [81].
The second significant difference in transcript abundance between zebrafish fully-grown follicle transcriptomes as evaluated by SAGE and the profile defined in ovary cDNA libraries concerned the hsp90β transcript (Table 2). Extensive molecular characterization, including zebrafish transcripts, and biochemical studies have revealed that vertebrate members of the heat shock protein 90 (HSP90) family play a post-translational regulatory role within the cell by interacting with several important cellular signalling molecules and transcription factors, such as steroid receptors, and modulating their activity [82]. Homologous transcripts are highly expressed in mouse and human ovaries (Table 4) and a strong HSP90 immunoreactivity was demonstrated in rat primordial germ cells [83]. This signal was also detected in both male and female pre-meiotic germ cells. HSP90 was also identified as one of the highly abundant proteins in mature mouse eggs and is strongly associated with the plasma membrane [54]. In addition, hsp83 transcript, the Drosophila homologue of the mammalian Hsp90 family of regulatory molecular chaperones, is present at high levels through the end of oogenesis and both maternal and zygotic transcripts are spatially restricted during early embryo development [84]. All these data are consistent with the high transcript level of hsp90β in zebrafish ovaries, whereas a high number of hsp90β cDNA clones was observed in the library generated from testes but not ovaries [7]. The large discrepancy in the relative level of hsp90β transcripts observed between the SAGE and cDNA approaches may be related to an enrichment of this transcript in the terminal stages of folliculogenesis.
Other transcripts highly expressed in zebrafish follicles and consistently represented in vertebrate ovarian transcriptomes are transcripts of ATP synthase, H+ transporting, mitochondrial F0 complex, subunit c (ATP5G), coldinducible RNA-binding protein (CIRBP), and lactate dehydrogenase B4. atp5g is highly expressed in fish ovaries and the encoded protein is one of the chains of the nonenzymatic membrane component (F0) of mitochondrial ATPase in mitochondrial membrane. CIRBP apparently plays an essential role in cold-induced suppression of cell proliferation [85]. One of the Xenopus CIRBP homologues is a major RNA-binding protein in fullygrown oocytes and may be involved in translational regulation via modulation of oocyte ribosomal function [86]. Lactate dehydrogenase B transcripts are widely distributed in animal ovarian transcriptomes, with high levels found in mice and humans. It has been previously demonstrated that lactate dehydrogenase B mRNA is one of the most abundant transcripts in fully-grown mouse oocytes [87].
Lactate dehydrogenase mRNA appears to be translated efficiently during oocyte growth and then downregulated during maturation and after fertilization [88].
The egg is a transcriptionally inactive cell and, as such, is a storehouse of maternal mRNA and proteins required for fertilization and initiation of zygotic development. However, many of the proteins comprising the animal egg proteome have yet to be identified, as very few large-scale proteome analyses have been performed. As expected, the zebrafish follicle protein repertoire, determined by proteome analysis, identified ribosomal proteins, ZP family protein members, components of the cytoskeleton, chaperonins, heat shock proteins, and VTG derivatives, but also some proteins not previously reported in ovary protein repertoires, e.g. a Sjogren syndrome antigen B homologous protein (Table 5). This RNA-binding protein binds to several small cytoplasmic RNA molecules, known as Y RNAs, and may stabilize these RNAs, preventing degrada-tion [89]. At least eight proteins, out of a total of thirtyeight deduced using the SAGE assigned transcript method and expressed at over 0.15%, were identified by proteome analysis. The identification of abundant mRNAs without the corresponding translated proteins may be due to insufficient proteome delineation and/or the presence of oocyte stage-specific maternal transcripts, stored inside the oocyte cytoplasm and translated during early embryo development. There were several proteins distributed in over one spot position after 1D-, 2D-PAGE separation and MS/MS (Table 5). While some of them, e.g. creatine kinase (CK), were present in closely isoelectric focusing located spots, suggesting the presence of isoforms or posttranslational modifications, the distribution of other spots, e.g. VTG derivatives, indicates a cleavage of precursor proteins with the presence of lower-molecular-weight derivative fragments. CKs play crucial roles in intracellular energy transfer and expression of a CK brain-type isoenzyme during oogenesis has been demonstrated in rodents [90,91]. A homologous transcript was also identified at high levels in amphioxus eggs (ug|Bfl.4313|, 0,119%). ckb mRNA is shown to be maternally supplied in zebrafish embryos [92].
Comparison of the zebrafish follicle protein repertoire deduced from SAGE with the protein repertoire isolated after proteomic analysis revealed that some abundant transcripts identified by their SAGE tags, but not previously reported to be present in abundance in fish ovaries, had corresponding proteins. This was the case of lactate dehydrogenase B4 and, to a lesser extent, ribosomal protein large P0 (Tables 2 and 5). Comparison also revealed that bactin1 and bactin2 transcripts were differentially expressed in zebrafish ovarian follicles, but their protein sequences were not resolved due to the very high sequence conservation of these duplicated gene copies. On the other hand, some ZP family protein members could be discriminated on the protein level, while the same SAGE tags were generated with zp2.3 and zp2.4 or zp3a and zp3al transcripts, due to the high sequence similarities of the 3'end untranslated part of these transcripts.
Oocyte growth, particularly in oviparous species, is characterized by intense deposition of RNAs and proteins, not necessary of the same nature and origin. These maternal factors can be stored for very long periods of time until their use during embryonic development. Comparison of transcriptome and proteome data revealed that transcript levels provide little predictive value with respect to the extend of protein abundance, taking into account the fact that the protein identification approach used detects relatively abundant proteins from the biological extract, while the mRNA abundance evaluated by SAGE tag frequency varied by over two orders of magnitude (Table 5). Transcript profiling provides a measure of RNA abundance, which may be affected not only by transcription levels but also by RNA processing and degradation. Moreover, not all transcripts are translated and RNA abundance may not correspond to protein levels. High transcription and translation rates during folliculogenesis and oocyte growth are followed by differential translational silencing and degradation of many mRNA species, especially at the end of the oocyte growth phase [2,93]. The identification of zebrafish follicle proteins, e.g. pyruvate kinase and enolase I, by proteome analysis, with very low corresponding transcript levels but very high homologous transcript counts in human ovary transcriptomes used as an external reference, suggests a downregulation of the quantity of these transcripts and storage of the proteins at the end of zebrafish folliculogenesis.
A comparison of transcriptome and proteome data revealed two proteins encoded by chia and nots without corresponding transcripts in ZEBRAOV. chia is related to the multiple chitinases genes identified in rainbow trout and Japanese flounder [94], as well as, to a lesser extent, the acidic mammalian chitinase precursor in humans [95]. While the molecular functions of these proteins are related to chitin binding and chitinase activity, the functionality and origin of the protein identified in zebrafish fully-grown follicles remains to be determined. However, the presence of a small amount of chia transcript in multiple follicular stage zebrafish cDNA libraries ID.9767, ID.15519 supports a stage-specific transcription of this gene during zebrafish folliculogenesis as previously demonstrated by the downregulation of the transcription of some fish maternal genes, e.g. VTG/very-low density lipoprotein receptor [96] at the end of oogenesis. A high rate of protein deposition has also been observed during oocyte growth in oviparous species via a receptor-mediated endocytosis of exogenous precursors. The presence of an abundant protein in the repertoire without a corresponding transcript in ZEBRAOV may be due to endocytosis of the protein from the plasma to the oocyte. Zebrafish vg1 and vg3 are mainly expressed in the liver and, to a lesser extent, in several non-liver tissues, including the adipocytes associated with several organs, such as ovaries [97]. This may explain the presence of a limited amount of vg1 and vg3 transcripts in ZEBRAOV (Table 5). These precursor proteins are synthesized outside oocytes during vitellogenesis, specifically incorporated in the oocyte by receptor-mediated endocytosis and cleaved into yolk proteins. The identification of nothepsin in zebrafish fully-grown follicles by proteome analysis although no transcript was detected in the ovary by Northern blot [98], EST sequencing of ovary cDNA libraries, or SAGE (this study), strongly suggests an extraovarian origin for this enzyme that may be present in the plasma of females undergoing vitellogenesis. Zebrafish nots encodes a paralogous aspartic proteinase related to endoproteolytic pro-teinases, such as cathepsin D, cathepsin E, and pepsin. This gene is specifically expressed in the liver under estrogenic control [99]. The sexual dimorphic expression of nots may be related to the reproductive process, like VTG precursor processing, or other sex-specific proteins inside the oocyte cytoplasm.

Conclusion
This study provides a complete sequence data set of maternal mRNA stored in zebrafish germ cells at the end of oogenesis. This catalogue contains highly-expressed transcripts that were not previously known to be significantly expressed in the fish ovaries, including some that are part of a vertebrate ovarian expressed gene signature. Comparison of transcriptome and proteome data identified downregulated transcripts or proteins potentially incorporated in the oocyte by endocytosis. The molecular phenotype described provides groundwork for future experimental approaches aimed at identifying functionally important maternal transcripts and proteins involved in oogenesis and early stages of embryo development.

Isolation of fully-grown ovarian follicles
Zebrafish, Danio rerio, were obtained from our facilities and maintained at 28.5°C on a 12L:12D photoperiod. Salts (0.23 g/l Instant Ocean, Aquarium System, Inc, Mentor, USA and 0.1 g/l CaSO 4 , 2H 2 O) were added to reverse osmosis (Optima 60, Veolia Water STI, Blagnac, France) purified water in order to ensure an optimal water quality. The zebrafish ovaries undergo asynchronous development and oocyte development is divided into five stages: I (primary growth), II (cortical alveolus or pre-vitellogenic), III (vitellogenic), IV (maturation), and V (mature egg) [31]. Sexually mature females were anaesthetized by immersion in 2-phenoxyethanol (1/2000, v/v) and fullygrown follicles (diameter > 0.69 mm) [31] were obtained by gently squeezing the abdomen. Histological analysis demonstrated that, in some areas, the stripped oocytes were covered with a single layer of granulosa cells, attached to the zona radiata (data not shown).

SAGE library construction
Total RNAs were isolated from about 250 fully-grown follicles, isolated from five none in-bred mature females, using the NucleoSpin RNAII kit (Macherey-Nagel, Duren, Germany). Fifty micrograms of total RNAs were used to generate the SAGE library using the I-SAGE kit (Invitrogen, Cergy Pontoise, France). The SAGE library was constructed according to the manufacturer's instructions, with minor modifications. Briefly, total RNAs were bound to magnetic Dynal oligo(dT) beads and cDNAs were synthesized directly on oligo(dT) beads. cDNAs were digested with NlaIII at 37°C for 1 h and divided into two equal parts, pools A and B. These pools were ligated at 16°C for 2 h to specific adapters, adapX (5'-TTTGGATTTGCTGGT-GCAGTACAACTAGGCTTAATAGGGACATG-3') and adapY (5'-TTTCTGCTCGAATTCAAGCTTCTAACGATG-TACGGGGACATG-3'), containing the priming sites for PCR amplification at the 5'-end and the type IIS restriction endonuclease BsmFI site at the 3'-end. The 3'-ends of the adapters were modified with an amino group to prevent self-ligation. The two ligation products were then cleaved with the tagging enzyme BsmFI at 65°C for 1 h. The resulting tags from pools A and B were ligated at 16°C overnight in a 3 µl mixture to form ditag cassettes. The ligated ditag mixture was diluted 1:110 (v/v), and 1 µl was used in a 50 µl PCR mixture. A total of 300 ditag PCR amplifications were performed for 33 cycles using the following primers derived from adapX (5'GGATTTGCTGGTGCAGT-ACA-3') and adapY (5'-CTGCTCGAATTCAAGCTTCT-3'), respectively. Individual ditag PCR products (100 bp) were purified on 12% (w/v) polyacrylamide gel. Adapters (40 bp) were removed from ditags by NlaIII digestion at 37°C for 2 h 30 min and ditags without adapters (26 bp) were purified on 12% (w/v) polyacrylamide gel. Purified ditags (26 bp) were ligated together at 16°C overnight and resolved in an 8% (w/v) polyacrylamide gel. Concatemer fractions ranging from 0.3 to 0.6 kb, 0.6 to 1.5 kb, and over 1.5 kb, were purified separately. The purified concatemers were cloned into the SphI site of the pZErO-1 plasmid (Invitrogen, Cergy Pontoise, France). The ligated mixture was transformed into One Shot TOP10 electrocompetent cells (Invitrogen, Cergy Pontoise, France). Positive transformants were selected by plating on low-salt Luria-Bertani plates containing 50 µg/ml Zeocin and incubating at 37°C for 24 h. The concatemer sizes were screened for 10% of total clones by colony PCR using Sp6 forward and T7 reverse primers.

SAGE library sequencing
High throughput sequencing reactions were carried out by Genome Express (Meylan, France). Automatic tag detection, extraction, counting and quality control were undertaken by Skuld-Tech (Montpellier, France). Ditags longer than 50 bp (2.97%), fewer than 24 bp (0.4%), or repeated ditags (2.69%) were discarded and not taken into account for tag number calculation. Contamination rates with linker and ribosomal 18 S and 28 S RNA sequences were 0.2% and 0.22%, respectively. The ZEBRAOV was generated from 576 sequenced clones containing inserts originating from the 0.6 to 1.5 kb concatemer fraction. Consequently, 96% of the sequenced clones have concatemers with over 500 bp, resulting in an average SAGE tag number of 47 per concatemer.

Data analysis and tag-to-gene mapping
To our knowledge, we were the first to carry out SAGE analysis on a fish species. Consequently, linking an experimental SAGE tag to an annotated transcript required the development of a species-specific SAGE tag database from the tags extracted from ESTs or transcripts available for a particular fish species. FISHTAG, a generic computer package written in PERL and implemented on a UNIX workstation, was designed for automatic extraction and annotation of SAGE tags from the sequences deposited in UniGene and TIGR public sequence databases (Rousselot et al., in preparation). Briefly, three 14-long nucleotide in silico tags, identified with the 5'-CATG ending sequence, were extracted from each reference transcript sequence at the first three sense positions, starting from the 3'-end of the transcript, and named R1, R2, or R3 in the presence of a polyadenylation signal and poly(A) tail. The tags were named Rn1, Rn2, Rn3 for EST sequences without a polyadenylation signal, a poly(A) tail, or both. Furthermore, the corresponding in silico tags of complementary reversed antisense sequences of deposited ESTs called "3'-reads" were also extracted and named R1cr, R2cr, or R3cr in the presence of a polyadenylation signal and poly(A) tail. The tags were named Rn1cr, Rn2cr, Rn3cr for EST sequences without a polyadenylation signal, a poly(A) tail or both. A table was constructed and additional data, like EST cluster descriptions, GenBank™ annotations, and URL links to the UniGene or TIGR sites were attached to each extracted in silico tag. This table was then used as a reference for tagto-gene mapping by comparison between the experimental SAGE tags and the EST-derived in silico extracted SAGE tags.
The zebrafish sequence data were downloaded from Uni-Gene and TIGR ftp sites [100,101]. There were 673,076 public UniGene sequence entries (Build#87), including 99,968 3'-read entries, in 31,681 UniGene clusters. Data retrieved from TIGR contained 33,752 unique TC leader sequences (release 16.0). The EST sequences used and included in UniGene were from more than two-hundred dbEST libraries originating from at least twenty different zebrafish tissues or developmental stages [102]. The in silico zebrafish SAGE-tag database, ZEBRATAG, generated using FISHTAG, was used to combine the copy number of each experimental SAGE tag from the ZEBRAOV library with its annotation.

Ovary/egg cDNA library sequence data used
The teleost fish ovary dbEST cDNA libraries currently available at the GenBank™ database and used to analyse their EST sequences were: ID.15519 (NIH_ZGC_5) with 13,029 EST sequences classified into 2,610 clusters and ID.9767 (Gong zebrafish ovary), with 11,344 EST sequences classified into 2,794 clusters, for zebrafish The other non-vertebrate dbEST cDNA library accession numbers used for comparison were ID.13749, with 6,003 EST sequences classified into 1,775 clusters for unfertilised sea urchin (Strongylocentrotus purpuratus) eggs (Uni-Gene Build#9), ID.17404, with 38,522 EST sequences classified into 3,720 clusters for amphioxus (Branchiostoma floridae) eggs (UniGene Build#1), and ID.1977, with 53,273 EST sequences classified into 5,777 clusters for fertilized nematode (Caenorhabditis elegans) eggs (UniGene Build#25). In addition, a silkworm (Bombyx mori) egg SAGE tag library [52] was also used.

Whole-mount in situ hybridization
Follicles were fixed in 4% paraformaldehyde overnight at 4°C, rinsed three times in phosphate buffered saline (PBS) buffer, twice in methanol, and stored at -20°C in methanol until used. A first primer pair (sense oligonucle-Electrophoretic pattern of zebrafish fully-grown follicle proteins Figure 4 Electrophoretic pattern of zebrafish fully-grown follicle proteins. A, SDS-polyacrylamide gradient 8-16% slab gel of total yolk proteins with increasing amounts of proteins deposited per lane. Eleven gel slices, numbered I to XI, were cut after staining the gel with See-Band Forte (GeBA). B, Two-dimensional polyacrylamide gel electrophoresis of total yolk proteins. Twenty-four spot areas, numbered 1 to 24, were excised after staining the gel with GeBA. In both procedures, gel pieces were in-gel digested with trypsin and the resulting peptides identified by mass spectrometry (see Table 5). The rectangles from which VTG derivatives were isolated are identified with a solid lined line while the rectangles with no derivatives (numbers 8, 9, and 20) are labelled with a broken line. M, relative molecular weights of the standards × 10 -3 . otide 5'-CAGCAGAATCATGCGCTC-3' and antisense oligonucleotide 5'-CAACATATGGACGACAGG-3') was used to amplify a 611 bp of a cDNA fragment from nucleotides -10 to +601 (numbered from the translation initiator codon) of a transcript (GenBank™/EBI (gb) Data Bank accession number gb|BM141044|) with similarities to human latrophilin-2, identified as a rhamnose-binding lectin. A second primer pair (sense oligonucleotide 5'-GACTGGAACTTGCAACTG-3' and antisense oligonucleotide 5'-GACGGTACAGGAAACAGAT-3') was used to amplify a 445 bp cDNA fragment from nucleotides +24 to +468 of mt2 short transcript (gb|BC049475|). A third primer pair (sense oligonucleotide 5'-TATCCAGAAG-CATCGTCAGG-3' and antisense oligonucleotide 5'-CCT-TCACATCACACTCATGC-3') was used to amplify a 764 bp cDNA fragment from nucleotides +57 to +820 of dazl transcript (gb|NM_131524|). A fourth primer pair (sense oligonucleotide 5'-AGGCTGCTTCAGGAGACC-3' and antisense oligonucleotide 5'-CCTAAAGAAGTGACGG-TACC-3') was used to amplify a 769 bp cDNA fragment from nucleotides +550 to +1318 of ccnb1 transcript (gb|AB040435|). These amplified cDNA fragments were used as templates to generate the corresponding RNA probes. Both antisense-and sense-digoxigenin-and fluorescein-labelled RNA probes were obtained using T7 or SP6 RNA polymerase (Promega, France) and the digoxigenin/fluoresceine RNA labelling mix (Roche, Germany), following the manufacturer's instructions. RNA probes were purified using the ProbeQuant G-50 micro columns (Amersham Biosciences, England) and checked for purity by denaturing agarose gel electrophoresis. Whole-mount in situ hybridization was carried out as previously described http://zfin.org/zf_info/zfbook/chapt9/ 9.82.html, with minor modifications. The PBS buffer used contained 0.04% (w/v) KCl. Samples were treated with 20 µg/ml proteinase K for 20 min and hybridized at 58°C with 50% formamide. Two-colour whole-mount in situ hybridization was carried out with a fluorescein-labelled RNA probe to detect the most-strongly expressed gene, then with the digoxigenin-labelled RNA probe. Samples were mounted in 100% glycerol and observed under an Eclipse E1000 Nikon microscope. Histology was performed according to the procedure previously described [103].

Protein extraction
Proteins were extracted from zebrafish fully-grown follicles using TRI Reagent™ (Sigma-Aldrich, U.S.A.) [104]. A 100 µl aliquot of frozen follicles in L-15 was mixed with 1 ml of cold TRI Reagent and incubated at room temperature for 10 min. 200 µl chloroform were added and mixed, incubated at room temperature for 10 min and centrifuged at 10,000 g at 4°C for 10 min. The protein phase was separated from the DNA and RNA and the proteins precipitated with 300 µl 100% ethanol at room tem-perature for 10 min. After centrifugation at 2,000 g at 4°C for 10 min, the supernatant containing the proteins was precipitated with isopropanol at room temperature for 10 minutes, and then centrifuged at 10,000 g at 4°C for 10 min. The protein pellet was washed twice with 2 ml 0.3 M guanidine hydrochloride in 95% ethanol and incubated for 20 min at room temperature, then centrifuged at 6,500 g, at 4°C for 10 min. The pellet was further washed three times with 10 ml 100% ethanol for 20 min at room temperature and centrifuged at 6,500 g, in 4°C for 10 min. The remaining salts were removed by adding 1.5 ml 80% cold acetone (-20°C) and centrifuging at 6,000 g at 4°C for 10 min three times. The air-dried pellet was used for 1D-and 2D-PAGE.

1D-, 2D-PAGE and tandem mass spectrometry
Protein pellets were dissolved in 2D-PAGE sample buffer containing 7 M urea, 2 M thiourea, 2% CHAPS, 65 mM DTT, 1.25% ampholytes (pH 3-10, BioRad) with a trace of bromophenol blue. The protein amounts were assayed with the Bradford reagent (BioRad, Hercules, U.S.A.). For isoelectric focusing, 600 µg of each protein sample were applied to 11 cm IPG Strips (pH 3-10 NL, BioRad). Isoelectric focusing was at 70,000 Volt/hr with an IPG-Phor System (Amersham Biosciences), followed by seconddimension electrophoresis, using pre-cast Criterion Tris-HCl gels (4-20% Linear Gradient, BioRad). 1D SDS-PAGE (133 × 87 × 1 mm) was performed on a home-made polyacrylamide gradient gel 8-16%. The gels were stained with See-Band Forte (GeBA). Images were taken with a FlourS imager (BioRad) and the images were analysed using PDQuest software (BioRad). Spots were excised, ingel digested with trypsin, and identified either by peptide mass fingerprinting and CID using a 4700 MALDI-TOF-TOF mass spectrometer (Applied Biosystems) or a nanocapillary RP-HPLC and ESI-QIT mass spectrometer (LCQ-Deca, ThermoFinnigan). The MS data was analysed using Sequest [105], Pep-Miner [106] and Mascot [107] software tools and searching the NCBInr or ZFIN Zebrafish databases. Each peptide identified was then manually checked against the corresponding Swiss-Prot/TrEMBL/ PIR UniProt Data Bank entry at the UniProt web site [108]. The presence of a protein in zebrafish fully-grown follicle protein extract was confirmed when at least one peptide sequence perfectly matching the target UniProt protein entry appeared in the proteome database at least twice. nated the work, and drafted the manuscript. All authors read and approved the final manuscript.