Contrasting patterns of evolutionary constraint and novelty revealed by comparative sperm proteomic analysis

Background Rapid evolution is a hallmark of reproductive genetic systems and arises through the combined processes of sequence divergence, gene gain and loss, and changes in gene and protein expression. While studies aiming to disentangle the molecular ramifications of these processes are progressing, we still know little about the genetic basis of evolutionary transitions in reproductive systems. Here we conduct the first comparative analysis of sperm proteomes in Lepidoptera, a group that broadly exhibits dichotomous spermatogenesis, in which males simultaneously produce a functional fertilization-competent sperm (eupyrene) and an incompetent sperm morph lacking DNA (apyrene). Through the integrated application of evolutionary proteomics and genomics, we characterize the genomic patterns associated with the origination of this unique spermatogenic process and assess the importance of genetic novelty in Lepidoptera sperm biology. Results Comparison of the newly characterized Monarch butterfly (Danaus plexippus) sperm proteome to those of the Carolina sphinx moth (Manduca sexta) and the fruit fly (Drosophila melanogaster) demonstrated conservation at the level of protein abundance and post-translational modification within Lepidoptera. In contrast, comparative genomic analyses across insects reveals significant divergence at two levels that differentiate the genetic architecture of sperm in Lepidoptera from other insects. First, a significant reduction in orthology among Monarch sperm genes relative to the remainder of the genome in non-Lepidopteran insect species was observed. Second, a substantial number of sperm proteins were found to be specific to Lepidoptera, in that they lack detectable homology to the genomes of more distantly related insects. Lastly, the functional importance of Lepidoptera specific sperm proteins is broadly supported by their increased abundance relative to proteins conserved across insects. Conclusions Our results suggest that the origin of heteromorphic spermatogenesis early in Lepidoptera evolution was associated with a burst of genetic novelty. This pattern of genomic diversification is distinct from the remainder of the genome and thus suggests that this transition has had a marked impact on Lepidoptera genome evolution. The identification of abundant sperm proteins unique to Lepidoptera, including proteins distinct between specific lineages, will accelerate future functional studies aiming to understand the developmental origin of dichotomous spermatogenesis and the functional diversification of the fertilization incompetent apyrene sperm morph.

related insects. Lastly, the functional importance of Lepidoptera specific sperm proteins is broadly 23 supported by their increased abundance relative to proteins conserved across insects. 24

Conclusions 26
Our results suggest that the origin of heteromorphic spermatogenesis early in Lepidoptera evolution 27 was associated with a burst of genetic novelty. This pattern of genomic diversification is distinct from 28 the remainder of the genome and thus suggests that this transition has had a marked impact on 29 Lepidoptera genome evolution. The identification of abundant sperm proteins unique to Lepidoptera,30 including proteins distinct between specific lineages, will accelerate future functional studies aiming to 31 understand the developmental origin of dichotomous spermatogenesis and the functional diversification 32 of the fertilization incompetent apyrene sperm morph.

Introduction 34
Spermatozoa exhibit an exceptional amount of diversity at both the ultrastructure and molecular levels 35 despite their central role in reproduction [1]. One of the least understood peculiarities in sperm 36 variation is the production of heteromorphic sperm via dichotomous spermatogenesis, the 37 developmental process where males produce multiple distinct sperm morphs that differ in their 38 morphology, DNA content and/or other characteristics [2]. Remarkably, one sperm morph is usually 39 fertilization incompetent and often produced in large numbers; such morphs are commonly called 40 "parasperm", in contrast to fertilizing "eusperm" morphs. Despite the apparent inefficiencies of 41 producing sperm morphs incapable of fertilization, dichotomous spermatogenesis has arisen 42 independently across a broad range of taxa, including insects, brachiopod molluscs and fish. This 43 paradoxical phenomenon, where a substantial investment is made into gametes that will not pass on 44 genetic material to the following generation, has garnered substantial interest, and a variety of 45 hypotheses regarding parasperm function have been postulated [3]. In broad terms, these can be 46 divided into three main functional themes: (1) facilitation, where parasperm aid the capacitation or 47 corrected spectral counts and the R (v 3.0.0) package EdgeR [21]. Results were corrected for multiple 171 testing using the Benjamini Hochberg method within EdgeR. 172 Lift-over between D. plexippus version 1 and 2 gene sets 173 Two versions of gene models and corresponding proteins are currently available for D. plexippus. Official 174 gene set one (OGS1) was generated using the genome assembly as initially published [22], while the 175 more recent official gene set 2 (OGS2) was generated along with an updated genome assembly [23]. 176 While our proteomic analysis employs the more recent OGS2 gene models, at the time of our analysis 177 only OGS1 gene models were included in publicly available databases for gene function and orthology 178 (e.g. Uniprot and OrthoDB). In order to make use of these public resources, we assigned OGS2 gene 179 models to corresponding OGS1 gene models by sequence alignment. Specifically, OGS2 coding 180 sequences (CDS) were aligned to OGS1 CDS using BLAT [24], requiring 95% identity; the best aligning 181 OGS1 gene model was assigned as the match for the OGS2 query. In this way, we were able to link 182 predictions of OGS1 gene function and orthology in public databases to OGS2 sequences in our analysis. 183 Of the 584 OGS2 loci identified in the sperm proteome 18 could not be assigned to an OGS1 gene. 184

Functional annotation and enrichment analysis 185
Two approaches were employed for functionally annotating D. plexippus sperm protein sequences. First, 186 we obtained functional annotations assigned by Uniprot to corresponding D. plexippus OGS1 protein 187 sequences (Additional file 1) [25]. Additionally we used the Blast2GO software to assign descriptions of 188 gene function and also gene ontology categories [26]. The entire set of predicted protein sequences 189 from OGS2 were BLASTed against the GenBank non-redundant protein database with results filtered for 190 E<10 -5 , and also queried against the InterPro functional prediction pipeline [27]. Functional enrichment 191 of GO terms present in the sperm proteome relative to the genomic background was performed using 192 Blast2GO's implementation of a Fisher's exact test with a false discovery rate of 0.01%.

Orthology predictions and analysis 194
Two approaches were employed for establishing orthology among proteins from different species. First,195 we used the proteinortho pipeline [28] to assess 3-way orthology beween D. plexippus OGS2, M. sexta 196 OGS1 [29], and D. melanogaster (flybase r6.12) gene sets. Proteinortho uses a reciprocal blast approach 197 to cluster genes with significant sequence similarities into clusters to identify orthologs and paralogs. 198 For each species, genes with protein isoforms were represented by the longest sequence in the 199 proteinortho analysis. D. melanogaster and M. sexta ortholog predictions were then cross referenced to 200 the published sperm of these two species [9,30], allowing a three-way assessment of orthology in 201 relation to presence in the sperm proteome. Using proteinortho allowed the direct analysis of the D. 202 plexippus OGS2 sequences, which were not analyzed for homology in OrthoDB8 [31]. 203 A taxonomically broader set of insect ortholog relationships was obtained from OrthoDB8 and 204 used to assess the proportion of orthologs among sperm proteins relative to the genomic background. A 205 randomized sampling procedure was used to determine the null expectation for the proportion of 206 orthologous proteins found between D. plexippus and the queried species. A set of 584 proteins, the 207 number equal to detected D. plexippus sperm proteins, was randomly sampled 5000 times from the 208 entire Monarch OGS2 gene set. For each sample, the proportion of genes with an ortholog reported in 209 OrthoDB8 was calculated, yielding a null distribution for the proportion of orthologs expected between 210 D. plexippus and the queried species. For each query species, the observed proportion of orthologs in 211 the sperm proteome was compared to this null distribution to determine whether the sperm proteome 212 had a different proportion of orthologs than expected and to assign significance. Comparisons were 213 made to 12 other insect species, reflecting five insect orders: Lepidoptera (Heliconius melpomene, M. were chosen to maximize species distribution across the full phylogenetic breadth of Lepidoptera, while 224 also utilizing the most comprehensively annotated genomes based on published CEGMA scores 225 (http://lepbase.org, [32]). Taxonomically restricted proteins were defined as those identified 226 consistently across a given phylogenetic range without homology in any outgroup species. Proteins 227 exhibiting discontinuous phylogenetic patterns of conservation were considered unresolved. Percentage 228 identity information from BLAST searches between Monarch and Manduca were averaged for those 229 proteins identified as Lepidoptera specific, those not specific to the Lepidoptera but with resolved 230 taxonomic distribution, those with identified Drosophila orthology, and those without orthology in 231 Drosophila. Mann-Whitney U tests were conducted to compare each average to the average percentage 232 identity of Lepidoptera specific proteins. Bonferonni corrections were employed to cases of multiple 233 testing.

Results and Discussion 235
Monarch sperm proteome 236 Characterization of the Monarch sperm proteome as part of this study, in conjunction with our previous 237 analysis in Manduca [9], allowed us to conduct the first comparative analysis of sperm in Lepidoptera, 238 and in insects more broadly, to begin to assess the origin and evolution of dichotomous 239 spermatogenesis at the genomic level. Tandem mass spectrometry (MS/MS) analysis of Monarch 240 sperm, purified in triplicate, identified 380 proteins in two or more replicates and 553 proteins identified 241 by two or more unique peptides in a single replicate. Together this yielded a total of 584 high confidence 242 protein identifications (Additional file 2). Of these, 41% were identified in all three biological replicates. 243 Comparable with our previous analysis of Manduca sperm, proteins were identified by an average of 7.9 244 unique peptides and 21.1 peptide spectral matches. This new dataset thus provides the necessary 245 foundation to refine our understanding of sperm composition at the molecular level in Lepidoptera. 246 (Note: Drosophila melanogaster gene names will be used throughout the text where orthologous 247 relationships exist with named genes; otherwise Monarch gene identification numbers will be used.) 248

Gene Ontology analysis of molecular composition 249
Gene ontology (GO) analyses were first conducted to confirm the similarity in functional composition 250 between the Monarch and other insect sperm proteomes. Biological process analyses revealed a 251 significant enrichment for several metabolic processes, including the tricarboxylic acid (TCA) cycle (p= 252 2.22E-16), electron transport chain (p= 9.85E-18), oxidation of organic compounds (p= 1.33E-25) and 253 generation of precursor metabolites and energy (p= 1.09E-30) (Fig. 1a). GO categories related to the TCA 254 cycle and electron transport have also been identified to be enriched in the Drosophila and Manduca 255 sperm proteomes [9]. Generation of precursor metabolites and energy, and oxidation of organic 256 compounds are also the two most significant enriched GO terms in the Drosophila sperm proteome [30]. 257 Thus, broad metabolic functional similarities exist between the well-characterized insect sperm 258 proteomes. 259 An enrichment of proteins involved in microtubule-based processes was also observed, a finding 260 that is also consistent with previously characterized insect sperm proteomes. Amongst the proteins 261 identified are cut up (ctp), a dynein light chain required for spermatogenesis [33], actin 5 (Act5), which 262 is involved in sperm individualization [34], and DPOGS212342, a member of the recently expanded X-263 linked tektin gene family in Drosophila sperm [35]. Although functional annotations are limited amongst 264 the 10% most abundant proteins (see below), several contribute to energetic and metabolic pathways. 265 For example, stress-sensitive B (sesB) and adenine nucleotide translocase 2 (Ant2) are gene duplicates 266 that have been identified in the Drosophila sperm proteome and, in the case of Ant2, function 267 specifically in in mitochondria during spermatogenesis [36]. Also identified was Bellwether (blw), an ATP 268 synthetase alpha chain which is required for spermatid development [37]. 269 The widespread representation of proteins functioning in mitochondrial energetic pathways is 270 consistent with the contribution of giant, fused mitochondria (i.e. nebenkern) in flagellum development 271 and presence of mitochondrial derivatives in mature spermatozoa (Fig 1a-b) [38]. During lepidopteran 272 spermatogenesis, the nebenkern divides to form two derivatives, which flank the axoneme during 273 elongation; ultrastructure and size of these derivatives varies greatly between species and between the 274 two sperm morphs [7]. In Drosophila, the nebenkern acts as both an organizing center for microtubule 275 polymerization and a source of ATP for axoneme elongation, however it is unclear to what extent these 276 structures contribute to energy required for sperm motility. Of particular note is the identification of 277 porin, a voltage-gated anion channel that localizes to the nebenkern and is critical for sperm 278 mitochondrion organization and individualization [39]. Consistent with these patterns, Cellular 279 Component analysis also revealed a significant enrichment of proteins in a broad set of mitochondrial 280 structures and components, including the respiratory chain complex I (p = 7.73E-09), proton-transporting V-type ATPase complex (p = 9.90E-08) and the NADH dehydrogenase complex (p = 7.73E-282 09) (Fig. 1b). Aside from those categories relating to mitochondria, a significant enrichment was also 283 observed amongst categories relating to flagellum structure, including microtubule (p = 5.43E-18) and 284 cytoskeleton part (p = 2.54E-12). The two most abundant proteins in the proteome identified in both 285 Monarch and Manduca, beta tubulin 60D (βTub60D) and alpha tubulin 84B (αTub84B), contributed to 286 these GO categories. αTub84B is of particular interest as it performs microtubule functions in the post-287 mitotic spermatocyte, including the formation of the meiotic spindle and sperm tail elongation [40]. 288 Molecular Function GO analysis revealed an enrichment of oxidoreductase proteins acting on 289 NAD(P)H (p = 7.06E-19), as well as more moderate enrichments in several categories relating to 290 peptidase activity or regulation of peptidase activity (data not shown). The broad representation of 291 proteins involved in proteolytic activity is worthy of discussion, not solely because these classes of 292 proteins are abundant in other sperm proteomes, but also because proteases are involved in the 293 breakdown of the fibrous sheath surrounding Lepidoptera eupyrene sperm upon transfer to the female 294 [7]. This process has been attributed to a specific ejaculatory duct trypsin-like arginine C-endopeptidase 295 (initiatorin) in the silkworm (B. mori) [41] and a similar enzymatic reaction is needed for sperm 296 activation in Manduca [42]. Blast2GO analyses identified three serine-type proteases in the top 5% of 297 proteins based on abundance, including a chymotrypsin peptidase (DPOGS213461) and a trypsin 298 precursor (DPOGS205340). These highly abundant proteases, particularly those that were also identified 299 in Manduca (two of the most abundant proteases and 10 in total), are excellent candidates for a sperm 300 activating factor(s) in Lepidoptera. 301

Conservation of Lepidoptera Sperm Proteomes 302
Our previous analysis of Manduca was the first foray into the molecular biology of Lepidopteran sperm 303 and was motivated by our interest in the intriguing heteromorphic sperm system that is found in nearly 304 all species in this order [7]. Here we have aimed to delineate the common molecular components of 305 lepidopteran sperm through comparative analyses. Orthology predictions between the two species 306 identified relationships for 405 (69%) Monarch sperm proteins and 298 of these (73.5%) were previously 307 identified by MS/MS in the Manduca sperm proteome [9]. An identical analysis in Drosophila identified 308 203 (35%) Monarch proteins with orthology relationships, including 107 (52.7%) that were previously 309 characterized as components of the Drosophila sperm proteome [30,43]. Thus, and as would be 310 expected given the taxonomic relationship of these species, there is a significantly greater overlap in 311 sperm components between the two Lepidopteran species (two tailed Chi-square = 25.55, d.f. = 1, p < 312 0.001). 313 Recent comparative analyses of sperm composition across mammalian orders successfully 314 identified a conserved "core" sperm proteome comprised of more slowly evolving proteins, including a 315 variety of essential structural and metabolic components [61]. To characterize the "core" proteome in 316 insects, we conducted a GO analysis using Drosophila orthology, ontology and enrichment data to assess 317 the molecular functionality of proteins identified in the proteome of all three insect species. This 318 revealed a significant enrichment for proteins involved in cellular respiration (p= 4.41e-21), categories 319 associated with energy metabolism, including ATP metabolic process (p= 1.64e-15), generation of 320 precursor metabolites and energy (p= 9.77e-21), and multiple nucleoside and ribonucleoside metabolic 321 processes. Analysis of cellular component GO terms revealed a significant enrichment for mitochondrion 322 related proteins (p= 3.72e-22), respiratory chain complexes (p= 8.25e-12), dynein complexes (p= 1.37e-323 5), and axoneme (p=3.31e-6). These GO category enrichments are consistent with a core set of 324 metabolic, energetic, and structural proteins required for general sperm function. Similar sets core 325 sperm proteins have been identified in previous sperm proteome comparisons [9,30,43,44]. Among this 326 conserved set are several with established reproductive phenotypes in Drosophila. This includes 327 proteins associated with sperm individualization, including cullin3 (Cul3) and SKP1-related A (SkpA), 328 which acts in cullin-dependent E3 ubiquitin ligase complex required for caspase activity in sperm individualization [45], gudu, an Armadillo repeat containing protein [46], and porin (mentioned 330 previously) [39]. Two proteins involved in sperm motility were also identified: dynein axonemal heavy 331 chain 3 (dnah3) [47] and an associated microtubule-binding protein growth arrest specific protein 8 332 (Gas8) [48]. 333

Comparative analysis of protein abundance 334
Despite the more proximate link between proteome composition and molecular phenotypes, 335 transcriptomic analyses far outnumber similar research using proteomic approaches. Nonetheless, 336 recent work confirms the utility of comparative evolutionary proteomic studies in identifying both 337 conserved [49] and diversifying proteomic characteristics [50]. We have previously demonstrated a 338 significant correlation in protein abundance between Manduca and Drosophila sperm, although this 339 analysis was limited by the extent of orthology between these taxa [9]. To further investigate the 340 evolutionary conservation of protein abundance in sperm, a comparison of normalized abundance 341 estimates between Monarch and Manduca revealed a highly significant correlation (R 2 = 0.43, p=<1x10 -342 15 ) ( Fig. 2A). We note that this correlation is based on semi-quantitative estimates [20] and would most 343 likely be stronger if more refined absolute quantitative data were available. Several proteins identified 344 as highly abundant in both species are worthy of further mention. Sperm leucyl aminopeptidase 7 (S-345 Lap7) is a member of gene family first characterized in Drosophila that has recently undergone a 346 dramatic expansion, is testis-specific in expression and encodes the most abundant proteins in the D. 347 melanogaster sperm proteome [51]. As would be expected, several microtubule structural components 348 were also amongst the most abundant proteins (top 20), including αTub84B and tubulin beta 4b chain-349 like protein, as well as succinate dehydrogenase subunits A and B (SdhA and SdhB), porin, and 350 DPOGS202417, a trypsin precursor that undergoes conserved post translational modification (see 351 below). 352 We next sought to identify proteins exhibiting differential abundance between the two species. 353 As discussed earlier, Monarch and Manduca have distinct mating systems; female Monarch butterflies 354 remate considerably more frequently than Manduca females, increasing the potential for sperm 355 competition [10]. These differences may be reflected in molecular diversification in sperm composition 356 between species. An analysis of differential protein abundance identified 45 proteins with significant 357 differences (P<0.05; Fig. 2B), representing 7% of the proteins shared between species (Additional file 3). identified as more abundant in Manduca. These included dynein light chain 90F (Dlc90F), which is 366 required for proper nuclear localization and attachment during sperm differentiation [52], and cut up 367 (ctp), a dynein complex subunit involved in nucleus elongation during spermiogenesis [33]. Serine 368 protease immune response integrator (spirit) is also of interest considering the proposed role of 369 endopeptidases in Lepidoptera sperm activation [41,42]. Although it would be premature to draw any 370 specific conclusions, some of these proteins play important mechanistic roles in sperm development and 371 function and will be of interest for more targeted functional studies. 372

Post-translational modification of sperm proteins 373
During spermatogenesis, the genome is repackaged and condensed on protamines and the cellular 374 machinery required for protein synthesis are expelled. Consequently, mature sperm cells are considered 375 primarily quiescent [53]. Nonetheless, sperm undergo dynamic molecular transformations after they leave the testis and during their passage through the male and female reproductive tract [54]. One 377 mechanism by which these modifications occur is via post translational modification (PTM), which can 378 play an integral part in the activation of sperm motility and fertilization capacity [55,56]. melanogaster group [58]. The relative paucity of phosphorylation PTMs may reflect the fact that 391 phosphorylation is one of the more difficult PTMs to identify with certainty via mass spectrometry based 392 proteomics [59]. However, it is also noteworthy that sperm samples in this study were purified from the 393 male seminal vesicle, and thus, before transfer to the female reproductive tract. Although far less is 394 known about the existence of capacitation-like processes in insects, dynamic changes in the mammalian 395 sperm phosphoproteome are associated with sperm capacitation and analogous biochemical alterations 396 might occur within the female reproductive tract of insects [56]. We note that a similar extent of protein 397 phosphorylation has been detected from Drosophila sperm samples purified in a similar manner 398 (unpublished data; Whittington and Dorus). Lastly, identical acetylation and phosphorylation PTM 399 patterns were identified for Monarch and Manduca HACP012 (DPOGS213379), a putative seminal fluid protein of unknown function previously identified in the Postman butterfly (Heliconius 401 melpomene) [60,61]. The identification of HACP012 in sperm, in the absence of other seminal fluid 402 components, is unexpected but its identification was unambiguous as it was amongst the most 403 abundant 10% of identified Monarch proteins. Seminal protein HACP020 (DPOGS203866), which 404 exhibits signatures of recent adaptive evolution [61], was also identified as highly abundant (5 th 405 percentile overall); this suggests that some seminal fluid proteins may also be co-expressed in the testis 406 and establish an association with sperm during spermatogenesis. 407

Rapid evolution of genetic architecture 408
Rapid gene evolution [62] and gene creation/loss [63], including de novo gene creation [64], are 409 predominant processes that contribute to the diversification of male reproductive systems. Our 410 previous study identified an enrichment in the number of Lepidoptera specific proteins (i.e. those 411 without homology outside of Lepidoptera) in the sperm proteome relative to other reproductive 412 proteins. We were unable, however, to determine from a single species whether novel genes 413 contributed to sperm biology more broadly across all Lepidoptera. Here we employed two comparative 414 genomic approaches to confirm and expand upon our original observation. First, we obtained whole-415 genome orthology relationships between Monarch and nine species, representing five insect orders, and 416 compared the proportion of the sperm proteome with orthologs to the whole genome using a random 417 subsampling approach. No significant differences were observed for three of the four Lepidoptera 418 species analyzed and an excess of orthology amongst sperm proteins was identified in the Postman 419 butterfly (p < 0.05; Fig. 3). In contrast, we identified a significant deficit of sperm orthologs in all 420 comparisons with non-Lepidopteran genomes (all p < 0.01). Orthology relationships in OrthoDB are 421 established by a multi-step procedure involving reciprocal best match relationships between species and 422 identity within species to account for gene duplication events since the last common ancestor. As such, 423 the underrepresentation of orthology relationships is unlikely to be accounted for by lineage-specific gene duplication. Therefore, rapid evolution of sperm genes appears to be the most reasonable 425 explanation for the breakdown of reciprocal relationships (see below). This conclusion is consistent with 426 a diverse body of evidence that supports the influence of positive selection on male reproductive genes 427 [62,65], including those functioning in sperm [43,[66][67][68]. We note that we cannot rule out the influence 428 of de novo creation but it is currently difficult to assess the contribution of this mechanism to the overall 429 pattern. 430 The second analysis aimed to characterize the distribution of taxonomically restricted Monarch 431 sperm proteins using BLAST searches across 12 insect species. Based on the analysis above, our a priori 432 expectation was that a substantial number of proteins with identifiable homology amongst Lepidoptera 433 would be absent from more divergent insect species. This analysis identified a total of 45 proteins 434 unique to Monarch, 140 proteins (23.9% of the sperm proteome) with no homology to proteins in non-435 Lepidopteran insect taxa and 173 proteins conserved across all species surveyed (Fig. 4a). Proteins with 436 discontinuous taxonomic matches (n = 171) were considered "unresolved". Although the number of 437 Monarch-specific proteins is considerably higher than the eight Manduca-specific proteins found in our 438 previous study, the number of Lepidoptera specific is comparable to our previous estimate in Manduca 439 (n = 126). These observations support the hypothesis that a substantial subset of lepidopteran sperm 440 proteins are likely to be rapidly evolving and thus exhibit little detectable similarity. To pursue this 441 possibility, we estimated Lepidoptera specific protein divergence between Monarch and Manduca and 442 compared the distribution of amino acid divergence to those proteins identified in other insect species 443 (Fig. 4B). The average percentage identity of Lepidoptera specific proteins (55.1% ± 17.6) was 444 significantly lower than all non-Lepidopteran specific proteins (74.6% ± 13.4, W=3074.5, p=3.35e-16), 445 those with Drosophila orthology (75.5% ± 13.6, W=8285.5, p=<1x10 -15 ), and those non-Lepidopteran 446 specific proteins without Drosophila orthology (62.4% ± 18.4, W=2980, p=1.16e-5). Therefore, we can 447 conclude that Lepidoptera specific proteins evolve more rapidly than other sperm proteins and that 448 proteins with resolved orthology relationships in Drosophila experience higher levels of conservation 449 than those that do not. To assess their potential contribution to sperm function, we used protein 450 abundance as a general proxy in the absence of functional annotation for nearly all of these proteins. As 451 was observed in Whittington et al [9], Lepidopteran specific proteins were found to be significantly more 452 abundant than the remainder of the sperm proteome (D=0.2, p=0.0009, Fig. 4c). 453

Conclusion 454
This comparative proteomic analysis of heteromorphic sperm, a first of its kind, provides important 455 perspective and insights regarding the functional and evolutionary significance of this enigmatic 456 reproductive phenotype. Our analyses indicate that a substantial number of novel sperm genes are 457 shared amongst Lepidoptera, thus distinguishing them from other insect species without dichotomous 458 spermatogenesis, and suggest they are associated with heteromorphic spermatogenesis and the 459 diversification of apyrene and eupyrene sperm. Our comparative and quantitative analyses, based on 460 protein abundance measurements in both species, further suggests that some of these proteins 461 contribute to apyrene sperm function and evolution. Given that apyrene sperm constitute the vast 462 majority of cells in our co-mixed samples, it is reasonable to speculate that higher abundance proteins 463 are either present in both sperm morphs or specific to apyrene cells and thus good candidates for 464 further study in relation to apyrene sperm functionality.