Establishing bee populations, HB testing, breeding and sample collection
The collection of honey bee samples, field testing and breeding was performed at two breeding locations in Western Canada, one near Grand Forks, BC (49°N, 118°W), the other at the Research Farm of Agriculture and Agri-Food Canada in Beaverlodge, AB (55°N, 119°W). An initial experiment was performed in BC as part of the BC Bee Breeders Association Queen Testing Project to confirm that hygienic behavior could be selectively bred for in our apiaries. In this experiment, selection to enrich for HB was based on field testing using the freeze-killed brood assay explained below. A second experiment was performed both in BC and AB with the aim to correlate proteome profiles with field test results in search for biomarkers of HB. For this breeding and proteomic experiment, the year 1 (Y1) colonies in BC included stock spanning a range of HB and Varroa resistance, including local and broader Canadian stock selected for mite resistance or HB, as well as descendants of a close-mated population at the University of Minnesota inbred for hygienic behavior [20,50] and VSH lines [51,52]. The starting colonies in AB consisted of eight populations described previously [27] of which five were sampled for proteomic analysis; these originated from Ontario (ON), California 1 (CA1), California 2 (CA2), Chile (Ch), and Saskatchewan (SK). Instrumental insemination (ii) was used for all breeding, except for the Y2 breeding in BC. Instrumental insemination of virgin queens from high- and low-scoring colonies followed a partial diallel cross design [53] which created high and low scoring colonies, as well as hybrids, intended to facilitate the identification protein expression patterns unique to HB. In Y3, we also performed ii of virgin queens in BC to evaluate the heritability of the protein markers identified. All inseminated or closed mated queens were introduced into new colonies. After the queens started laying, colonies were allowed to develop for at least six weeks to allow worker populations to turn over before they were tested for HB and antennae were collected for proteomic analysis. Colonies were assessed for HB using the freeze-killed brood method [14], where the proportion of sealed cells that nurse bees uncap (uncapped, U) and remove dead pupae from (removed, R) is counted at 24 and 48 h using two separate tests performed one week apart on each colony. For proteomic analysis, antennae were cut from adult workers sampled from brood frames (three pools of ten bees from each colony). Invertebrate research (except on cephalopods) does not require ethics certification at our institution.
Reagents
All chemicals used were of analytical grade or better and all solvents were of HPLC-grade or better; all, with the exceptions specified below, were obtained from ThermoFisher-Scientific (St. Waltham, MA, USA). Chemicals for protein expression and purification and for binding assays were purchased from Sigma-Aldrich and were of reagent grade, with the exception of selected compounds used in binding assays, that were prepared using conventional synthetic routes. Selected reagents were purchased from the following commercial sources: Endopeptidase Lys-C (Wako Chemicals, Osaka, Japan); porcine modified trypsin (Promega, Nepean, Ontario, Canada); loose ReproSil-Pur 120 C18-AQ 3 μm (Dr Maisch, Ammerbuch-Entringen, Germany); 96-well full skirt PCR plates (Axygen, Union City, CA, USA); fused silica capillary tubing (Polymicro, Phoenix, AZ, USA); protease inhibitor mixture (Roche Applied Science, Basel, Switzerland); NuPAGE Novex BisTris Gels (Invitrogen, Carlsbad, CA, USA). All cloning enzymes were from New England Biolabs. Oligonucleotides were custom synthesized at Eurofins MWG GmbH, Ebersberg, Germany.
Matrix for sample analysis
The isotopic labelling strategy employed here is limited to triplexing so to enable a comparison of the protein expression profile for one colony to all others we employed a design similar to what we have done previously that maximized the statistical power of the experiment to detect effects in our parameters of interest [27]. We collected three replicate samples from each colony, grouped the samples in blocks of three, assigned one of the three isotopic labels to each sample, and assigned colonies to blocks in order to minimize the variance of the hygienic behavior variables. We constrained the experiment so no two colonies from the same population were in the same block and no two samples from the same colony were assigned the same isotopic label. This ensured the experimental design did not confound the hygienic behavior effect with the bee population or the isotopic label.
Protein preparation for mass spectrometry
Bee antennae samples were washed three times with phosphate-buffered saline (PBS) and bead-homogenized in buffer (50 mM Tris-Cl, 150 mM NaCl, 1% NP-40, 1% DTT) for three 20 s bursts at 6.5 M/s, with 1 min rest on ice between each burst. Insoluble material was pelleted at 600 relative centrifugal force (RCF) and protein was precipitated from the supernatants using 800 μL of ethanol, 20 μL of 2.5 M sodium acetate (pH 5.5) and 2 μL of glycogen (10 mg/ml). The precipitation was allowed to proceed at room temperature for 90 min. After centrifugation at 16,000 r.c.f. for 15 min, the pellets were dried and solubilized in buffered urea (6 M urea, 2 M thiourea, 100 mM Tris-Cl at pH 8.0, 20 mM DTT). Any insoluble material was then removed by centrifugation at 16,000 r.c.f. for 15 min. Protein concentrations were measured by a micro Bradford assay using serial dilutions of bovine serum albumin to generate a standard curve. Protein samples were resolved on 1-D Nu-PAGE (Invitrogen) gels and visualized with Coomassie Safe Blue (Pierce) to check the protein stability and quantity. For each sample, 20 μg of protein was diluted to 1 μg/μl in urea buffer (6 M urea, 2 M thiourea, 100 mM Tris-Cl, pH 8.0) before digestion [54].
Peptide clean-up and labelling
Ten micrograms of digested peptides were purified with STop And Go Extraction (STAGE) tips [55] and labelled via reductive dimethylation using formaldehyde isotopologues [56,57]. In each triplex block one sample received 10 μl of 200 mM CH2O (light) and 1 μl of 1 M NaBH3CN, one received 10 μl of 200 mM C2H2O (medium) and 1 μL of 1 M NaBH3CN and one received 10 μL of 200 mM 13C2H2O (heavy) and 1 μL of 1 M NaBH3CN. The labelling reaction was performed twice on each sample for 1 h each. The reactions were terminated by the addition of 20 μL of 3 M NH4Cl. Samples were adjusted to pH <3 by adding sample buffer (3% (w/v) acetonitrile, 1% (v/v) trifluoroacetic acid, 0.5% (v/v) acetic acid). Finally, 4 μg of each of the three differentially-labeled samples were combined and cleaned up again with a STAGE tip; two technical replicates were prepared for each block, with one to be analyzed on the LTQ-FT and the other on the LTQ-Orbitrap. Samples were stored on STAGE tips at 4°C as needed.
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)
Peptides were eluted from the STAGE tips using elution buffer (0.1% trifluoroacetic acid, 80% acetonitrile). Then they were dried and resuspended in sample buffer (1% trifluoroacetic acid, 3% acetonitrile, 0.5% acetic acid). LC-MS/MS was performed using an 1100 Series nanoflow high performance liquid chromatography system (Agilent Technologies) on-line coupled to a linear trapping quadrupole (LTQ)-Fourier transform (FT) or a LTQ-Orbitrap (ThermoFisher Scientific, Bremen, Germany). Peptide separation was performed by reversed phase chromatography using a 75 μm inner diameter fused silica emitter self-packed with 3 μm Reprosil-Pur C18-AQ resin (Dr. Maisch). Peptides were loaded in 4.8% (v/v) acetonitrile, 0.5% (v/v), acetic acid at 0.6 μL/min and then resolved at 200 nL/min for 75 min using a linear gradient of acetonitrile from 4.8% to 64% in 0.5% (v/v) acetic acid. Operating in data dependent mode, the LTQ-FT was set up to acquire full scan data in the FT detector over a mass range of 350–1600 m/z before performing FT selected ion monitoring (SIM) and MS/MS in the ion trap on the top 3 most intense multiply charged ions [58]. The LTQ-OrbitrapXL was set to acquire a full-range scan at 60,000 resolution from 350 to 1600 Th in the Orbitrap to simultaneously fragment the top five peptide ions in each cycle in the LTQ (minimum intensity 1000 counts). Parent ions were then excluded from MS/MS for the next 30 s. Singly-charged ions were excluded since in ESI mode peptides usually carry multiple charges. The Orbitrap was continuously recalibrated using the lock-mass function.
Protein identification and quantification
Fragment spectra peak lists were created using DTASuperCharge [59] with default parameters. For each block of samples, the peak list generated from LTQ-FT was combined with the peak list from LTQ-Orbitrap before performing Mascot search (v2.2) against the Honey Bee, A. mellifera Amel_4.0 translation (forward plus reversed sequences) of the genome with additional entries for human keratins, porcine trypsin and LysC. Tryptic cleavage rules (R/K, except preceding a P) were specified with up to two missed cleavages allowed. Carbamidomethyl (C) was set as a fixed modification, Acetyl (Protein N-term), Deamidated (NQ), Oxidation (M), Dimethyl (K), Dimethyl (N-term), Dimethyl: 2H(4) (K), Dimethyl: 2H(4) (N-term), Dimethyl: 2H(6)13C(2) (K), Dimethyl: 2H(6)13C(2) (N-term) as variable modifications. Peptide tolerance was set to 10 ppm and MS/MS tolerance was 0.6 Da for the initial search. After recalibration of systematic mass errors the peptide mass accuracy is typically <2 ppm. The false discovery rate within each block was limited to 1%, estimated by counting the number of ‘hits’ against the reversed sequences. Across the whole experiment, however, the FDR approaches zero since no reversed hits survived the filter requiring that a protein had to be detected in at least one quarter of all blocks. All peptides with an IonsScore ≥25 were quantified using MSQuant (v1.5) [59]; after automated quantitation all files were manually edited to ensure consistent quantitation and the peak area ratios were exported for further analysis. An in-house script, finalList.pl (available here: http://www.chibi.ubc.ca/wp-content/uploads/2013/01/finalList.pl_.txt) for applying parsimony (Occam’s razor) to generate a non-redundant list of identified proteins from a large pool of independent experiments was adapted to simultaneously calculate average peptide ratios for each protein in each block. All raw data are available from the Honey Bee Peptide Atlas (http://www.peptideatlas.org/builds/honeybee/) and ProteomeXchange (identifier PXD001616) while all the proteins identified and their quantitative ratios can be found in Additional file 4: Table S1.
Statistical analysis and marker selection
Identification of proteins whose expression patterns correlated with population of behavioral data was performed as described previously [27]. Briefly, logarithms of intensities were normalized by first subtracting the average of the three measurements in each block (for each protein independently) and then centering and standardizing within each label (across proteins) by the median and median absolute deviation. For each protein, a Linear Mixed Effects model was used to estimate the effect of each predictor variable, either population or hygienic behavior, on the protein expression level, adjusting for block and label factors. In the case of the BL-Y1 dataset, analysis of the effect of hygienic behavior was done adjusting for population of origin. For the predictor variables, an estimated effect, standard error and P-value were computed for each protein response. FDRs (Q-values) were computed for the set of P-values of a given predictor over all protein response variables to adjust for multiple comparisons. All calculations were performed in the R statistical language. In addition to selecting markers with low Q values from the BL-Y1 dataset, we used the Y1 colonies from the two different apiaries and selected proteins that had P < .05 across all field parameters in both datasets. After completing the proteomic analysis of all Y1, Y2 and Y3 datasets, we further selected proteins by ordering them based on an overall HB correlation factor The HB correlation was computed by adding a heritability factor to an average of the HB factors computed from each dataset. The HB factor in each dataset was calculated by combining a biological and statistical factor as detailed in the Additional 5. The heritability factor for each protein, was based on a regression of the protein level observed in the F1 daughters on the observed level in the paternal (Sir) and maternal (Dam) parent colonies (see Additional 5 for more details).
Expression clustering and gene ontology enrichment
SOTA (self-organizing tree algorithm) clustering was used to determine one side probability metrics for all thirty-eight population-significant (Q ≤ .05) proteins across all honey bee populations. Using MultiExperiment Viewer, six hard clusters were generated and hierarchical dendrograms for population and proteins were constructed using Euclidean distances [60]. For each cluster, gene ontology (GO) enrichment analysis was performed based on the Drosophila orthologs to the complete protein sequence of the bee proteins identified. DAVID (Database for Annotation, Visualization, and Integrated Discovery) [61,62] was used to calculate enrichments between protein lists of interest using the entire identified antenna proteome characterized here (470 proteins) as background.
Expression of recombinant proteins and binding assays
RNA extraction and cDNA synthesis
Total RNA was extracted using TRI® Reagent (Sigma), following the manufacturer’s protocol. cDNA was prepared from total RNA by reverse transcription, using 200 units of SuperScript™ III Reverse Transcriptase (Invitrogen) and 0.5 mg of an oligo-dT primer in a 50 μL reaction volume. The mixture also contained 0.5 mM of each dNTP (GE-Healthcare), 75 mM KCl, 3 mM MgCl2, 10 mM DTT and 0.1 mg/ml BSA in 50 mM Tris–HCl, pH 8.3. The reaction mixture was incubated at 50°C for 60 min and the product was directly used for PCR amplification or stored at −20°C.
Polymerase chain reaction
Aliquots of 1 μL of crude cDNA were amplified in a Bio-Rad Gene Cycler thermocycler, using 2.5 units of Thermus aquaticus DNA polymerase (GE-Healthcare), 1 mM of each dNTP (GE-Healthcare), 1 μM of each PCR primer, 50 mM KCl, 2.5 mM MgCl2 and 0.1 mg/ml BSA in 10 mM Tris–HCl, pH 8.3, containing 0.1% v/v Triton X-100. At the 5’ end, we used specific primers corresponding to the sequence encoding the first five amino acids of the mature protein. The primers also contained an NdeI restriction site, for ligation into the expression vector and providing at the same time the ATG codon for an additional methionine in position 1. At the 3’ end specific primers were used, encoding the last six amino acids, followed by a stop codon and an EcoRI restriction site for ligation into the expression vector. Therefore, we used the following primers for the each protein (enzyme restriction sites are italicized):
fwAmelOBP16: 5’- GAGGAATAACATATGACACATGAGGAATT -3’
rvAmelOBP16: 5’- GAATTCTTAGGAATTTAATATATCAGT -3’
fwAmelOBP18: 5’- GAGGAATAACATATGACACTTGAAGAATT -3’
rvAmelOBP18: 5’- GAATTCTTAGCCACTTAACATTTCTTT -3’
After a first denaturation step at 95°C for 5 min, we performed 35 amplification cycles (1 min at 95°C, 30 s at 50°C, 1 min at 72°C) followed by a final step of 7 min at 72°C. We obtained amplification products of 300–400 bp, in agreement with the expected sizes.
Cloning and sequencing
The crude PCR products were ligated into a pGEM (Promega) vector without further purification, using a 1:5 (plasmid:insert) molar ratio and incubating the mixture overnight, at room temperature. After transformation of E. coli XL-1 Blue competent cells with the ligation products, positive colonies were selected by PCR using the plasmid’s primers SP6 and T7 and grown in LB/ampicillin medium. DNA was extracted using the Plasmid MiniPrep Kit (Euroclone) and custom sequenced at Eurofins MWG (Martinsried, Germany).
Cloning in expression vectors
pGEM plasmids containing the appropriate sequences were digested with Nde I and Eco RI restriction enzymes for 2 h at 37°C and the digestion products were separated on agarose gel. The obtained fragments were purified from gel using QIAEX II Extraction kit (Qiagen) and ligated into the expression vector pET5b (Novagen, Darmstadt, Germany), previously linearized with the same enzymes. The resulting plasmids were sequenced and shown to encode the mature proteins.
Preparation of the proteins
For expression of recombinant proteins, each pET-5b vector containing the appropriate odorant-binding protein (OBP) sequence was used to transform BL21(DE3)pLysS and BL21(DE3)Rosetta-gami E. coli cells, for OBP18 and OBP16 respectively. Protein expression was induced by addition of IPTG to a final concentration of 0.4 mM when the culture had reached a value of O.D.600 = 0.8. Cells were grown for further 2 h at 37°C, in the case of OBP18, while they were grown overnight at 30°C for OBP16 expression. They were then harvested by centrifugation and sonicated. After centrifugation, OBP16 was soluble while OBP18 was present as inclusion bodies. To solubilize it, the pellet from 1 L of culture was dissolved in 10 mL of 8 M urea, 1 mM DTT in 50 mM Tris buffer, pH 7.4, then diluted to 100 mL with Tris buffer and dialysed three times against Tris buffer.
Purification of the proteins was accomplished by combinations of chromatographic steps anion-exchange resins, such as DE-52 (Whatman), QFF or Mono-Q (GE-Healthcare), followed by gel filtration on Sephacryl-100 or Superose-12 (GE-Healthcare) along with standard protocols previously adopted for other odorant-binding proteins [63,64]. The electrophoretic analysis of crude bacterial pellets and representative fractions from the last purification steps for OBP16 and OBP18 are shown in Additional file 6: Figure S1.
Fluorescence measurements
Emission fluorescence spectra were recorded on a Jasco FP-750 instrument at 25°C in a right angle configuration, with a 1 cm light path quartz cuvette and 5 nm slits for both excitation and emission. The protein was dissolved in 50 mM Tris–HCl buffer, pH 7.4, while ligands were added as 1 mM stock solutions in methanol.
Fluorescence binding assays
To measure the affinity of the fluorescent ligand 1-NPN (N-phenyl-1-naphthylamine) to each protein, a 2 μM solution of the protein in 50 mM Tris–HCl, pH 7.4, was titrated with aliquots of 1 μM ligand in methanol to final concentrations of 2–16 μM. The probe was excited at 337 nm and emission spectra were recorded between 380 and 450 nm. The affinity of other ligands was measured in competitive binding assays, using 1-NPN as the fluorescent reporter at 2 μM concentration and 2–16 μM concentrations of each competitor.
To avoid the artefact provided by the strong fluorescent signals observed in the presence of ligands capable of forming micelles, such as long-chain fatty acids, we used 0.2-0.6 μM concentrations of each competitor. In fact, when this happens, the probe can bind inside the hydrophobic core of the micelle, emitting a signal similar to that produced in the binding pocket of a protein.
For determining binding constants, the intensity values corresponding to the maximum of fluorescence emission were plotted against free ligand concentrations. Bound ligand was evaluated from the values of fluorescence intensity assuming that the protein was 100% active, with a stoichiometry of 1:1 protein:ligand at saturation. The curves were linearized using Scatchard plots. Dissociation constants of the competitors were calculated from the corresponding IC50 values (concentrations of ligands halving the initial fluorescence value of 1-NPN), using the equation: KD = [IC50]/1 + [1-NPN]/K1-NPN, [1-NPN] being the free concentration of 1-NPN and K1-NPN being the dissociation constant of the complex Protein/1-NPN.
Availability of supporting data
All mass spectrometry raw data used here is available in either Peptide Atlas (http://www.peptideatlas.org/builds/honeybee/) or the ProteomeXchange Consortium [65] via the PRIDE partner repository with the dataset identifier PXD001616”.