Proteomic characterization of spermiogenesis in C. elegans
Un-activated spermatids were collected from males using a novel microfluidic dissection technique. This male dissection technique utilizes a custom microfluidic device with a fine glass needle to slice through the cuticle and testis of males to release stored spermatids (Fig. 2). The un-activated spermatids were lysed to characterize non-membrane-bound sperm proteins (Fig. 1b). The un-activated spermatid proteome was dominated by the MSP, confirming that pure sperm cell samples were being collected (Additional file 1). The most abundant proteins, however, were from the Nematode-Specific Peptide family, group D (NSPD), which comprised approximately 50% of the total protein abundance. Since mass spectrometry identified a single peptide motif for these proteins, NSPD abundance was described at the gene family level. The NSPD family is uncharacterized, but has been previously shown to exhibit a pattern of male-enriched expression [17]. Actin proteins were also identified at < 1% abundance, which is comparable to previous biochemical estimates [6]. While relatively few total protein calls were made, fully one third of the un-activated spermatid proteome is previously uncharacterized in biological function.
To isolate soluble proteins within the membranous organelle from those associated with the sperm body, we took advantage of natural membranous organelle-membrane fusion during sperm activation. Since this analysis required a higher-throughput, un-activated spermatids were collected using a male crushing technique (modified from [18, 19]). This method squeezes the testis out of males to release spermatids. Spermatids were then activated in vitro by changing the intracellular pH [8] and the proteomes of the membranous organelle secretions and activated sperm fractions were collected via centrifugation (Fig. 1b). Again, the MSP was in high abundance, though now identified in both the membranous organelle and activated sperm proteomes (Fig. 3). Interestingly, our data reveal three previously unannotated genes (Y59E9AR.7, Y59H11AM.1, and ZK1248.4) as MSPs based on high nucleotide sequence identity and presence of the MSP domain [20]. Overall, 62% of the proteins identified in the un-activated spermatid proteome were also identified in either the membranous organelle or activated sperm proteome. The lack of one-to-one correspondence between the un-activated proteome and the two activated components is unsurprising given the low total number of proteins identified and the pseudo-quantitative nature of shotgun proteomics. Nevertheless, all the proteins identified were previously found in the un-activated spermatid proteome collected by Ma et al. [21].
The proteins released from the membranous organelle during activation were distinct from those remaining in the activated sperm (Fig. 3a). Seventeen proteins were unique to the membranous organelle proteome, including the NSPD family, which comprised 10% of the total membranous organelle protein abundance (Fig. 3b). The actin gene family was also unique to the membranous organelle, as were several other housekeeping-related gene families. Within the activated sperm proteome, we identified 14 unique proteins, the majority of which were involved in energy production (Fig. 3c). Of noticeable interest were the genes F34D6.7, F34D6.8, and F34D6.9, which again were described using a single abundance measure due to identical mass spectrometry peptide sequence identification. These genes were in fact the most abundant membranous organelle protein after MSP, with a ten-fold greater abundance in membranous organelles than in activated sperm (Fig. 3b–c). The F34D6.7, F34D6.8, and F34D6.9 genes in C. elegans, display male-specific expression [17], consistent with our observations. They are organized distinctly from other genes in this region as an array and have a nucleotide sequence similarity of 93.9%. Given their genomic organization, sequence similarity, and co-localization of expression, these genes appear to be a small gene family that originated via tandem duplication. Additionally, an amino acid blast search of these F34D6 sequences in NCBI reveals that they are nematode-specific. Thus, they comprise a newly identified Nematode-Specific Peptide family, which we designate as NSP group F (NSPF).
Proteome composition is largely conserved between species
Spermatids were also collected from the obligate outcrossing nematode C. remanei. To compare proteome composition between divergent species, we condensed all protein calls to the gene family level. Within C. remanei, we identified 64 gene families in the membranous organelle proteome and 94 gene families within the activated sperm proteome, with 51 families being shared between the proteomes (Additional file 2). Of all the proteins identified, eight did not have an annotated C. elegans ortholog. However, a BLAST search against the C. elegans genome indicates that three of these genes (CRE18007, CRE13415, CRE00499) may have unannotated orthologs. Of the remaining unique genes, three appear to be paralogs (CRE12049, CRE30219, CRE30221), suggesting a potential C. remanei-specific sperm protein family. A total of 34 gene families were identified in both C. elegans and C. remanei, capturing the majority of highly abundant genes identified. However, more proteins of low abundance were identified in C. remanei. Three gene families – NSPD, Actin, and Ribosomal Proteins, Large subunit – unique to the membranous organelle proteome in C. elegans were identified in low abundance within activated sperm in C. remanei, potentially because of differential success in activating C. remanei sperm in vitro (Additional file 2). Two noticeable differences between species were the presence of histone proteins and the absence of NSPF orthologs in C. remanei.
Evolutionary analysis of membranous organelle proteins
Proteomic analysis identified NSPD and NSPF proteins as being highly abundant and localized their expression to the membranous organelle. Yet no information exists about the molecular or biological function of these genes. To better understand the nature of these gene families, we analyzed their evolutionary history across the Elegans supergroup within Caenorhabditis. We made custom annotations of these gene families in 11 species using the annotated C. elegans genes (ten NSPD and three NSPF) as the query dataset. Our sampling included the three lineage transitions to self-fertilizing hermaphroditism [22, 23] and the single lineage transition to sperm gigantism [24] found within this supergroup.
Across all 12 species we identified 69 NSPD homologs (Additional file 3). The NSPD gene family ranged from three to ten gene copies, with C. elegans having the highest copy number and C. kamaaina having the lowest (Fig. 4). Coding sequence length was largely conserved between paralogs, but differed across species. Sequence length differences were particularly driven by a 24–30 base pair region in the middle of the gene containing repeating of asparagine and glycine amino acids, which tended to be the same length within a species, but differed across species (Additional file 4). Despite these species-specific repeats, amino acid sequence identity between paralogs was high, ranging from 81.3 to 95.3%. No secondary structure was predicted for these genes and in fact they were biochemically categorized as being 73% intrinsically disordered due to low sequence complexity and amino acid composition biases [25, 26].
The NSPD genes were broadly distributed across the genome, occurring as single copies on multiple chromosomes or scaffolds in each species (Additional file 3). This seemingly independent arrangement of individual genes throughout the genome precluded a robust syntentic analysis. Additionally, phylogenetic analysis showed NSPD genes predominantly cluster within species and thus they do not convey a strong signal of ancestral gene orthology (Additional file 5). Since orthologous genes could not be assigned, the protein coding sequences were analyzed within the four monophyletic clades represented. Even within these shorter evolutionary timescales, orthologous genes were not readily apparent, again suggesting species-specific evolution at the gene family level. To assess variation in evolutionary rate across the gene family, we estimated a single, alignment-wide ratio of non-synonymous to synonymous substitutions (ω) using reduced sequence alignments. Specifically, we removed the species-specific amino acid repeats in the middle of the gene, which were highly sensitive to alignment parameters. The ω-values varied widely from 0.07 to 0.37 with the more recently derived clades having higher values (Fig. 4), although none indicate a strong signal of positive selection. Rather, these genes seem to be weakly constrained outside of the species-specific repeats, which was unexpected given their disordered nature.
We identified and annotated 22 NSPF orthologs in ten species (Additional file 3). Like the NSPD family, the NSPF genes do not have a predicted secondary structure and are 40% intrinsically disordered. They are, however, biochemically predicted to be signaling peptides (mean signal peptide score = 0.9) with a predicted cleavage site between amino acid residues 20 and 21 (Additional file 6). No genes were located within C. sp. 34 genome (which is very well assembled). Nine species had two gene copies, while C. doughertyi has a single copy and, as mentioned, C. elegans has three annotated copies. Examination of 249 sequenced C. elegans natural isolates [27] suggests that nspf-2 arose through a duplication of nspf-1 as, while all copies of nspf-1 align to the same position, there is variation in the intergenic space across the isolates. This duplication appears fixed within the C. elegans lineage––though one strain (CB4856) has a premature stop codon––and sequence identity is high between duplicates. Additionally, the C. elegans NSPF gene family has translocated to Chromosome II while the other species show conserved synteny to Chromosome IV (Fig. 5). Using syntenic relationships coupled with gene orientation and phylogenetic clustering, we were able to assign gene orthology within the family (Additional file 7). Within these orthologous groups, species relationships were largely recapitulated with ω-values of 0.53 and 0.26 for the nspf-1 and nspf-3 orthologs, respectively. However, when the C. elegans lineage was excluded, the ω-values sharply decreased to 0.15 for the nspf-1 and 0.17 for the nspf-3 orthologs, indicating a pattern of sequence constraint (Fig. 6). We explicitly tested if the C. elegans lineage was evolving at a different rate than the other lineages. Indeed, the nspf-1 (ω = 1.1, C.I. of ω = 0.78–1.5, − 2Δln = 5.11) and to a lesser extent the nspf-3 (ω = 0.57, C.I. of ω = 0.34–0.87, − 2Δln = 2.34) C. elegans lineages showed some evidence of positive selection, although the differences in the likelihoods of the two models were not statistical significant.
Functional analysis of the NSPF gene family
Given the high abundance of the NSPF protein, the conserved nature of these genes, and their potential as signaling peptides, we hypothesized these genes could be important for male fertility either during spermatogenesis or in sperm competition. Using CRISPR, we knocked out the three NSPF genes in the C. elegans standard laboratory strain (N2) to directly test the function of this gene family. We quantified male reproductive success, by allowing single males to mate with an excess of females over a 24 h period. Very little difference in progeny production was observed between knockout and wildtype males (t = − 0.81, df = 26, p = 0.42; Fig. 6a). Given the size of our experiment and the large sampling variance in individual fecundity, we would have been able to detect a difference between backgrounds of 24% with 80% power, so we possibly missed some effects if they were particularly subtle. We also measured the role of these genes in male competitive success, finding again that knocking out these genes had no effect on male fertility (Fig. 6b). In fact, knockout males were no worse competitors than wildtype males (z = − 0.12, p = 0.90) and produced roughly 50% of the progeny measured (proportions test: χ2 = 1.27, df = 1, p = 0.26, C.I. of progeny produced = 27.4–55.9%). Overall, then, despite is prevalence within the sperm membranous organelle, the NSPF gene family does not appear to play an important role in male fertilization success.