Insects
The N. lugens strain was originally collected from a rice field located in the Huajiachi Campus of Zhejiang University in Hangzhou, China. The insects used in this experiment were the offspring of a single female. Insects were reared on rice seedlings at 28 °C (Xiu shui 128) under a 12:12 h light: dark photoperiod.
Preparation of N. lugens MRT transcriptome database
N. lugens males were anesthetized on ice for 20 min and dissected under a Leica S8AP0 stereomicroscope. The whole MRT (including the TE, VD, and MAGs) (Fig. 1) were isolated and quickly washed in a diethylpyrocarbonate (DEPC)-treated phosphate-buffered saline (PBS) solution (137 mM NaCl, 2.68 mM KCl, 8.1 mM Na2HPO4, and 1.47 mM KH2PO4 at pH 7.4) and were immediately frozen at −80 °C. The MRT sample was used for transcriptome and DGE sequencing, and the MAG sample was used for DGE sequencing.
Total RNA was isolated from N. lugens MRT and MAG using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s instructions. Sequencing and assembly of transcriptome reads, including DGE library reads, was performed using Illumina HiSeq™2000 and Trinity (v2012-10-05), respectively, and the annotation of unigenes were performed as described previously [23]. The longest assembled transcripts of each gene were taken as unigenes. The readcount of each unigenes was normalized to RPKM (Reads Per Kilo bases per Million mapped Reads) to display the expression level of each unigene. The coding sequence (CDS) of each unigene was analyzed using blastx and estscan (3.03). The generated peptide database was used to support the proteomic analysis.
Seminal fluid protein sample preparation
Seminal fluid samples were collected from males and mated females, and soluble protein samples were collected from unmated females (all individuals were 4–7 days post-eclosion). Mated females were obtained by placing one female in a glass tube containing a rice seedling with one male for 2 h. The female copulatory bursa (CB) and seminal receptacle (SR) (Fig. 1) were dissected in PBS solution and squeezed using grinding rod in 100 μl PBS with 1 % protease inhibitor cocktail (Thermo, USA). Reproductive tracts from ≈ 50 females were pooled for each biological replicate. MAGs (Fig. 1) were dissected using the same method, and MAGs from ≈ 50 males were collected for each biological replicate. Samples were centrifuged at 12,000 rpm for 20 min at 4 °C. The supernatant was transferred to a separate tube and stored at −80 °C. Three replicates were prepared for each kind of sample including the MAG, the mated-female reproductive tract (FRT) and the unmated-FRT.
A filter aided sample preparation (FASP) method was used for the preparation of samples [27]. Samples were added to 3 kD ultrafiltration centrifuge tubes (Millipore), and centrifuged at 14000 g for 20 min. 100 μl of UA solution [8 M Urea (Sigma), 0.1 M Tris/HCl pH 8.5, 1 % EDTA (Thermo), 1 % protease inhibitor Cocktail (Thermo)], and centrifuged at 14000 g for 20 min; this step was repeated twice. 2 μl DTT (Sigma) (200 mM) was added. Samples were vortexed for 1 min and incubated at 37 °C for 1 h. 20 μl iodoacetamide (Sigma) (200 mM) was then added. Samples were vortexed for 1 min and incubated at 25 °C for 1 h in the dark. Samples were centrifuged at 14000 g for 20 min. To each sample, 100 μl UA was added. Samples were then centrifuge at 14000 g for 20 min; this step was repeated once. 200 μl NH4HCO3 (Sigma) (0.05 M) was added, and samples were centrifuged at 14000 g for 20 min; this step was repeated twice. The remaining sample was moved into a 10 kD ultrafiltration centrifuge tube, and 40 μl NH4HCO3 (0.05 M) and trypsin (Promega) (5 μg in total) were added. Samples were incubated at 30 °C for 12 h, and then centrifuged at 14000 g for 20 min. 40 μl NH4HCO3 (0.05 M) was added; then samples were centrifuged at 14000 g for 30 min. Filtered liquid was removed into a 1.5 ml centrifuge tube and dehydrated in a vacuum freeze-drying device. Dehydrated samples were dissolved in 25 μl 0.1 % formic acid (Sigma). The concentrations of the dissolved peptide solutions were analyzed by A280 absorption using a NanoDrop UV–vis spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA).
UPLC/MS/MS methods and data analyses
The peptide mixtures were injected onto the trap column at a flow rate of 10 μl/min for 2 min (2 μg) using a Thermo Scientific Easy nanoLC 1000. The trap was equilibrated at a maximum pressure of 500 bar for 12 μl, followed by column equilibration at a maximum of 500 bar for 3 μl before beginning the gradient elution of the column. The samples were subsequently eluted using the following five-step linear gradient (A: ddH2O with 0.1 % formic acid,B: ACN with 0.1 % formic acid): 0–10 min, 3–8 % B; 10–120 min, 8–20 % B; 120–137 min, 20–30 % B; 137–143 min, 30–90 % B; and 143–150 min, 90 % B. The column flow was maintained at 250 nL/min. The chromatographic system was composed of a trapping column (75 μm × 2 cm, nanoviper, C18, 3 μM, 100 Å) and an analytical column (50 μm × 15 cm, nanoviper, C18, 2 μM, 100 Å). Data collection was performed using a Thermo LTQ-Orbitrap Velos Pro equipped with a Nanospray Flex ionization source and a FTMS (Fourier transform ion cyclotron resonance mass spectrometry) analyzer combined with a Thermo LTQ-Orbitrap Elite equipped with an ion trap analyzer. The parameters for FTMS were as follows: Data collection at 60 K for the full MS scan, positive polarity, data type profile, and then proceeded to isolate the top 20 ions for MS/MS by CID (1.0 m/z isolation width, 35 % collision energy, 0.25 activation Q, 10 ms activation time). The scan range was set as 300 m/z first mass and 2000 m/z last mass. The parameters for the ion trap analyzer were the normal mass range, rapid scan rate, and centroid data type.
A SEQUEST HT search engine configured with a Proteome Discoverer 1.4 workflow (Thermo Fischer Scientific, Bremen, Germany) was used for mass spectrometer data analyses. An N. lugens MRT peptide database generated from transcriptome unigene sequences database containing 17902 sequences were configured with SEQUEST HT for dataset searches. The search parameters included 10 ppm and 0.8 Da mass tolerances for MS and MS/MS, respectively, trypsin as the proteolytic enzyme with two allowed missed cleavages, oxidation and deamidated as dynamic modifications, and carbamidomethyl as a static modification. Furthermore, the peptides were extracted using high peptide confidence. 1 % FDR (False discovery rate) was calculated using a decoy database by searching both the MRT peptide sequence and the decoy database.
Identification of seminal fluid proteins of N. lugens
High confidence proteins were identified with the following standards: 1) Proteins identified from more than two samples (proteins derived from at least two MAG samples, two unmated-FRT samples and two mated-FRT samples) were predicted to be “true” detected proteins. 2) Seminal fluid proteins must have been identified from both MAG and mated-FRT samples. 3) We tested for predicted secretion signal sequences of detected proteins using SignalP 4.1 (www.cbs.dtu.dk/services/SignalP/). Some sequences had a “bad” coding sequence CDS prediction (Lost in N-terminal), in which the signal peptide was not be predicted from the sequence. We re-predicted CDS sequences for proteins with no signal peptide using ESTScan (http://myhits.isb-sib.ch/cgi-bin/estscan) from unigene nucleotide sequences, and performed Signalp detection with new predicted CDS sequences for improved signal peptide detection. Proteins possessing a signal peptide were considere d to be secreted proteins. 4) Some proteins did not possess a signal peptide. Proteins without signal peptides that were not detected in unmated-FRT samples and showed male-specific expression (an analysis of the male-specific expression of unigenes was performed as described previously [13]) were also predicted to be secreted. 5) In addition, other proteins that were not predicted to be secreted and had homologues in the SFPs of other insects were classied as unconfirmed SFPs.
Annotation of seminal fluid proteins and comparison with other insects
In addition to machine annotation, we performed a manual annotation for the sequences detected. Blast results from NCBI, conserved domains, and GO terms were used in combination to annotate proteins. Brief descriptions from NCBI, SMART (http://smart.embl-heidelberg.de/) descriptions of conserved domains, and functional descriptions of gene names from UniProtKB (http://www.uniprot.org/help/uniprotkb) were used to classify the functions of each sequence. Based on these matches, proteins were classified into one of the following categories: cell structure (including cell structure proteins and their binding proteins), metabolism, protein modification machinery, proteolysis regulators (proteases and protease inhibitors), signal transduction (including hormones), transporters and protein export machinery, and RNA and protein synthesis (transcription factors, transcription machinery, and protein synthesis enzymes). Proteins that were classified into different categories were classified as “other” (including salivary proteins, chitin binding proteins, binding proteins, proteasome machinery, protein kinases, ubiquitination pathway proteins, protein phosphatases, and oxidoreductases). Proteins that were not assigned a function were classified as “unknown”.
Seminal fluid proteome sequences of D. melanogaster, A. aegypti, A. albopictus, A. mellifera and Homo sapien [28] were chosen for comparison with SFPs of N. lugens. SFP sequences of D. melanogaster were extracted from Flybase (http://flybase.org/) using IDs given from reference [16]. SFPs sequences of A. aegypti were extracted from Ensembl Metazoa (http://metazoa.ensembl.org/info/website/ftp/index.html) using IDs given from reference [18]. SFP sequences of A. albopictus were directly given by reference [19]. SFPs sequences of A. mellifera were extracted from NCBI (http://www.ncbi.nlm.nih.gov/sites/batchentrez) using IDs given from reference [8]. Signal peptides of these proteins were identified as mentioned in 2.5. And prediction of conserved domains of predicted protein domains was using the Batch Web CD-Search Tool (http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). N. lugens SFPs possessing the same conserved domain with other insect SFPs were marked as “Domain”. The rest proteins with blastp (Evalue < 10−5) hits with other insect SFPs were marked as “Blast”. The same method was used for comparison of SFPs between insect species.
To locate the detected proteins in the N. lugens genome scaffold sequences, we run a megablast with Evalue < 10−20, and indentity > 95 % between detected proteins and scaffold sequences.
Phylogenetic analysis
The functional serine protease domains of the N. lugens seminal fluid trypsins were aligned with seminal fluid trypsins of other insect species using the ClustalX program. The phylogenetic tree was constructed by the maximum likelihood (ML) method using the program Mega 5.05 (http://www.megasoftware.net/). Homologous relationships were determined using bootstrap analysis with 1000 replications.
Reverse-transcription quantitative PCR (RT-qPCR) analysis
MRT, FRT, and dissections of MRT (including testes, vas deferens, and male accessory glands) (Fig. 1) were dissected from males (4–7 days post-eclosion). As the mRNA quantity of an individual tissue is extremely low, tissues dissected from 40 individuals were pooled into each tissue sample, respectively. RT-qPCR was performed according to the method of [29]. Primers used in RT-qPCR for the tissue specific expressions of seminal fluid protein genes are given in Additional file 1: Table S1.