Stem cells, generation of macrophages and experimental media
The human embryonic stem cell line HUES-2 was obtained from the HUES Facility, University of Harvard [36]. Feeder-free PSC cells were cultured in mTeSRTM-1 medium (Stem Cell Technologies) on Matrigel (Corning)-coated tissue culture dishes, passaged with TrypLE (Invitrogen) with the addition of 10 μmol/L Rho-kinase inhibitor Y-27632 (Abcam). A double-nicking CRISPR-Cas9 approach was used to generate SAMHD1-knockout stem cell lines [37]. Plasmid pX462 (gift from Feng Zhang; Addgene plasmids cat. 48141, [38]), expressing the guide RNA, D10A-mutated Cas9 and a puromycin-selection cassette was adapted to target SAMHD1 at exon 4 (GTGTATCAATGATTCGGACGAGG and CGATACATCAAACAGCTGGGAGG; PAM underlined) or exon 5 target sites (CGTTCACTTATCTGCAGCTCTGG and GGATGTCTAGTTCACGCACTGGG; PAM underlined) using protocols previously described [38, 39]. PSCs were transfected with all four plasmids targeting SAMHD1 using the Neon® Transfection system (Invitrogen) according to manufacturer’s guidelines (2 × 106 cells electroporated with 15 μg DNA using a 100 μL tip at 1000 V, 40 ms pulse width, one pulse), cultured without antibiotics for 48 h and then for 48 h with selection in 0.4 μg/mL puromycin (Sigma). Single-cell clones were generated by plating transfected PSCs at low density onto mitotically-inactivated mouse embryonic feeder (MEF) cells [40, 41] on gelatin-coated tissue culture plates in stem cell medium (KO-DMEM, 2 mmol/L L-Glutamine, 100 mmol/L nonessential amino acids, 20% serum replacement, and 8 ng/mL FGF2; Invitrogen). Clones with modifications at the SAMHD1 locus were identified by high resolution melt analysis. Sequencing confirmed that clone E2 had an out-of-frame insertion (29 and 49 bp) into each allele of SAMHD1 exon 4 and G9 had an out-of-frame deletion (43 bp) in one allele and an in-frame deletion (39 bp) that deleted the essential allosteric GTP binding site (amino acids 135 to 147) and would alter the catalytic site. Macrophages derived from these clones were screened for SAMHD1 expression by western blotting using a mouse anti-SAMHD1 antibody (clone 2D7, Insight Biotechnology Ltd) and a rabbit anti-GAPDH antibody (Sigma) (Additional file 1: Figure S1).
Feeder-free PSC cells were cultured in mTeSRTM-1 medium (Stem Cell Technologies) on Matrigel (Corning)-coated tissue culture dishes, passaged with TrypLE (Invitrogen) with the addition of 10 μmol/L Rho-kinase inhibitor Y-27632 (Abcam). A protocol devised in our laboratory was used to generate macrophages from PSC cultures. Briefly, embryoid bodies were formed using the spin method in AggreWells™800 (Stemcell Technologies) plates, each of which was split into two monocyte factories in T175 tissue culture flasks containing ~150 embryoid bodies. The monocytes released into the supernatant were harvested regularly and plated into 96-well plates at 5 × 104 cells per well in macrophage differentiation medium consisting of XVIVO™15 (Lonza) supplemented with 100 ng/mL M-CSF (Invitrogen), 2 mM glutamax (Invitrogen), 100 U/mL penicillin and 100 μg/mL streptomycin (Invitrogen). Four days after plating the media was replaced with fresh macrophage differentiation media with additional 10% fetal bovine serum (FBS; Invitrogen) and the cells were used on day 7 of differentiation. We used three batches (A, B, G, each corresponding to one AggreWells™800 plate) of wild-type cells and two batches (G9: C, D and E2: E, F) each of the two SAMHD1-knockout clones for the Polaris macrophage stimulation experiments.
Four kinds of media (‘conditioned’ and ‘standard’, with or without LPS) were used in the microfluidic chips. To generate conditioned media, stem cell-derived monocytes were plated at 5 × 105 cells/well in a 12-well tissue culture plate and differentiated for 4 days in macrophage differentiation medium followed by 3 days in the presence of 10% FBS. The medium was then replaced with fresh macrophage medium plus 10% FBS with or without 100 ng/mL LPS (Sigma). “Standard media” were generated by incubation of macrophage differentiation medium plus 10% FBS either with or without 100 ng/ml LPS at 37 °C to simulate incubation of conditioned media with cells. For each kind of medium, after 24 h with LPS/mock stimulation, supernatants were recovered by centrifugation, 0.45 μm-filtered and stored at −80 °C then thawed and clarified by centrifugation at 14,100 rcf for 10 min before use.
Polaris™ protocols
Polaris runs followed the protocol ‘Using Polaris to Generate Single-Cell cDNA Libraries for mRNA Sequencing’ (PN 101–0082 A1, Fluidigm). Set-up parameters are shown in Table S1. Polaris integrated fluidic circuits (IFCs) were prepared for cell capture by a priming step, during which the capture chambers were also coated with the extracellular matrix compound fibronectin (25 ng/μl, cat. F4759, Sigma-Aldrich) for handling adherent cells, and capture beads (prepared to Fluidigm specifications) were loaded to prevent the release of captured single cells (Fig. 1). Priming was arranged to finish as the cell mix became ready for loading (Additional file 1: Figure S2A).
For each experimental run, one well of cells from a single batch each of wild-type and knockout cells was used. One sample was stained with CellTracker™ Orange CMRA Dye and the other with both CellTracker™ Orange CMRA and CellTracker™ Green CMFDA Dye (cat. C34551 and C7025, Thermo Fisher Scientific) in Wash Buffer (Fluidigm). Dye concentrations were adjusted upwards between runs during the experiment to improve sensitivity and resolution: 1 μM for runs 1 and 2; 2 μM for runs 3 to 12; and 3 μM for runs 13 to 20. Wild-type and knockout cells were dual- or single-stained in approximately equal numbers of experiments (Additional file 1: Figure SB). The culture medium was removed and the cells were washed twice with 250 μl Wash Buffer (Fluidigm) then stained with 100 μl of staining solution at 37 °C for 15 min before a further wash with 150 μl of Wash Buffer. The Wash Buffer was removed, 100 μl of 0.5 mM EDTA (cat. 15575, Gibco) in PBS was added and the cells were incubated at 37 °C for 15 min before the EDTA was removed and the cells were resuspended gently in 50 μl of Feed Media (X-VIVO™ 15 cat. BE02-061Q (Lonza), 1% Penicillin-Streptomycin (10,000 U/mL), cat. 15140–122, (Gibco)). Cells were counted on a TC-20TM Automated Cell Counter (Bio-Rad) and samples were adjusted to a final concentration of 350–400 cells/μl.
Fluidigm guidelines (Fluidigm Single-Cell Preparation Guide, PN 100–7697) were used to establish optimal buoyancy at an 4:1 (cells:cell suspension reagent) ratio for the Polaris experiment. For each run, 15 μl each of differentially stained wild type and knockout cells were mixed with 7.5 μl of Cell Suspension Reagent (Fluidigm, PN 101–0434) 25 μl of the resulting suspension, containing an estimated 7000–8000 cells, was loaded on the Polaris IFC. On-board imaging settings to control automatic cell selection were set for each run, with a threshold that varied between 4000 and 6000 for a constant 1.0 s exposure. After completion of the cell selection step, the IFC was removed from the Polaris system and placed in an incubator at 37 °C and 5% CO2 for 2 h to allow the cells to settle prior to dosing. Dosing culture media were centrifuged in small volumes (500 μl) at 14,100 rcf for 10 min before 27 μl of medium and 200 μl of Feed Media were loaded into the appropriate wells of the IFC (Additional file 1: Figure S2C). For 1-h dosing runs, the dosing step was stopped manually and for 8-h dosing runs, cells were also dosed at 4 h. Successive runs were scheduled to alternate between one and eight hour dosing.
The Polaris acquired images of all 48 chambers in all available fluorescence channels during cell capture, at the start of dosing and every 1 h thereafter until the end of the run. IFCs were removed from the Polaris for additional, high-resolution imaging on a Leica TCS SP8 confocal microscope (Leica Microsystems) at 3 stages of each run: after cell capture, after the 2-h incubation and at the end of the dosing step, just before lysis. (Additional file 1: Figure S3A). The imaging protocol was designed to acquire bright-field and fluorescence images, for CellTracker Orange and CellTracker Green: 488 nm and 561 nm excitation and hybrid detectors for emission at 500–550 nm and 571–630 nm respectively, were used with a 20× objective and a template to automatically locate the 48 IFC cell isolation chambers, in an image acquisition process lasting ~5 min during which the microscope chamber temperature was maintained at 37 °C. The black vinyl film on the lower surface of the Polaris chip was removed temporarily for each imaging stage. For each time-point and channel, the Polaris stored a single image of the IFC’s 48 cell chambers, 5200 × 1000 pixels in size, in which each pixel had dimensions of approximately 5.5 × 5.5 μm. On the Leica, one image of 512 × 512 pixels, each 1.39 × 1.39 pixels, was captured per cell chamber in bright field, orange and green channels. The initial aims of the image analysis were to differentiate double- and single-stained cells (wild-type and knockout or vice-versa, depending on the run) and to assess cell shape, motility and phagocytosis behaviour. The R package EBImage [42] was used with a set of bespoke R functions to automatically identify individual cell chambers and the cells they carried. Formally, within the large Polaris image(s), cells were detected as clusters of 4 or more of the brightest 30 pixels in the trimmed image corresponding to each cell chamber, for each fluorescent channel. Motility was assessed by measuring changes in the cluster positions between imaging time-points. To assess cell morphology (circularity, the proportion of pixels within the smallest circle encompassing the cluster that belong to the cluster), the higher-resolution Leica images were analysed using a re-scaled version of the same algorithm, requiring clusters of 60 or more of the brightest 411 pixels. Automated imaging assessments were checked extensively by eye, leading to the identification of irregularly shaped or unevenly stained cells. Macrophage phagocytosis of capture beads was assessed manually.
For each Polaris run, a bulk control sample of the wild-type and of the knockout cells used in the run was prepared for sequencing. Aliquots containing 2000–4000 of the stained cells loaded on the Polaris were kept at room temperature, incubated along with the loaded IFC for the 2-h incubation step, and kept at 37 °C during dosing. Within 30 min of starting the Polaris lysis step, bulk-cell samples were lysed and processed using the RNA extraction kit RNeasy Micro Kit (cat# 74004, Qiagen) and eluted in 14 μl of RNAse-free water. At the end of the dosing protocol and after any additional imaging, cells were lysed for reverse transcription and amplification for cDNA generation in the Polaris. Master mixes for this procedure were prepared using the SMARTer Ultra Low RNA Kit (Clontech), according to the Fluidigm Polaris protocol with minor modifications. The cell lysis mix (28 μl) contained 8.0 μl Polaris Lysis Reagent, 9.6 μl of a 1/200 dilution of Polaris Lysis Plus Reagent (in PCR-grade water, prepared immediately before use), 9.0 μl SMARTer kit 3′ SMART CDS Primer IIA and (for runs 1–11) 1.4 μl of diluted ERCC spike-in RNA, prepared by adding 1.0 μl of a 1/10 dilution of ERCC ExFold RNA Spike-In Mixes (cat. 4456739, Ambion) to 96.5 μl Polaris Loading Reagent and 2.5 μl SMARTer Kit RNase Inhibitor (40 U/μl). The reverse transcription reaction mix (48 μl) contained 15.5 μl 5× First-Strand Buffer (RNase-free), 1.9 μl DTT, 7.7 μl dNTP mix, 7.7 μl IIA Oligonucleotide, 1.9 μl RNAse Inhibitor, 7.7 μl SMARTscribe Reverse Transcriptase (all SMARTer Kit, Clontech), 2.4 μl Polaris Loading Reagent and 3.2 μl Polaris RT Plus Reagent. The PCR mix (90 μl) contained 63.5 μl PCR-grade water, 10.0 μl 10× Advantage 2 PCR Buffer (not SA – Short Amplicon), 4.0 μl 50× dNTP Mix, 4.0 μl 50× Advantage 2 Polymerase Mix (all Advantage 2 Kit, Clontech) and 4.0 μl SMARTer IS PCR primer. The cDNA products were harvested into a 96-well plate in the arrangement shown in Additional file 1: Figure S2D. Harvest wells with atypical volumes (some with no material, others with an excess) were excluded from further analysis. Bulk control samples comprising 1 μl of RNA extracted from the bulk samples above were each mixed with 4.5 μl of cell lysis mix and processed using the following temperature sequence: 37 °C for 5 min, 72 °C for 3 min, 25 °C for 1 min, then 4 °C (hold), then reverse transcribed using 9.0 μl of the reaction mix, at 42 °C for 90 min followed by enzyme inactivation at 70 °C for 10 min, then 4 °C (hold). Bulk-sample PCRs contained 1 μl of cDNA generated in the previous step and 9.3 μl PCR mix. PCR conditions were as follows: 95 °C for 1 min then 5 cycles of {95 °C/20 s, 58 °C/4 min, 68 °C/6 min}, 9 cycles of {95 °C/20 s, 64 °C/30 s, 68 °C/6 min}, 7 cycles of {95 °C/30 s, 64 °C/30 s, 68 °C/7 min}, then 72 °C for 10 min and 4 °C (hold). After Quant-iT™ PicoGreen (Thermo Fisher) normalization of cDNAs to 0.22 ng/μl, Illumina sequencing libraries were prepared together using Nextera XT DNA Library Preparation Kit (Illumina), according to the manufacturer’s specifications, at quarter-scale on a Beckman-Coulter FXp automated liquid handling instrument, and pooled as up to 192 multiplexed libraries using in-house dual-indexing library tags. Some samples that had failed because of empty cell chambers or the presence of more than one cell in a chamber were included, and two bulk-cell samples per run were included in each pool, which was sequenced as a single lane of 75 b paired-end reads on an Illumina HiSeq 4000 instrument in 4 different runs.
RNA-Seq data generation and initial quality control
Samples were prepared and paired-end sequenced using the Illumina HiSeq™4000 Sequencing platform, as described above. RNA-seq reads were trimmed for Nextera/Illumina adapter sequences using skewer-v0.1.125 [43]. Trimmed reads were mapped to a modified reference genome comprising the human genome, Homo sapiens GRCh37, and fasta sequences for ERCC spike-ins (ThermoFisher). Reads in gzipped fastq format were aligned using Hisat2 version-2.0.0-beta [44] with default parameters. Duplicate reads were marked using MarkDuplicates.jar implemented in Picard tools v1.92. BAM alignments were name sorted with Samtools version 1.1. Alignment metrics were calculated using CollectRnaSeqMetrics.jar implemented in Picard tools v1.92 for full BAM files and with potential PCR duplicates marked. RNA-SeQC [45] was used to calculate sequencing bias, as the median estimators in Picard can result in zero estimates. Reads mapping uniquely to genes annotated in ENSEMBL release 76 were counted using featureCounts [46] implemented in subread-v1.5.0 [47]. Read distribution between various features - assigned reads (mapped uniquely to exons), multiple mapping, ambiguous mapping, No features (mapped uniquely to intronic and intergenic regions) – was obtained from featureCounts results. Read counts were normalized to Transcripts per million (TPM), and number of detected genes per sample were calculated by counting genes with at least 1 TPM. Details of the supplied data and meta-data provided are provided in Additional file 1, including further QC using the R scater package [48]. In brief, we excluded culture chambers with visually confirmed doublets (two cells), numbers of detected genes more similar to bulk controls, and cells with very low starting cDNA. FASTQ files for the data have been uploaded to the Gene Expression Omnibus (GSE87849).
Clustering the macrophages and gene expression modeling
All modeling and analyses were performed in the R environment (version 3.2.1, x86_64-pc-linux-gnu 64-bit, Ubuntu 14.04.2 LTS). Multidimensional scaling (MDS) on the cell rank correlations - without gene weighting - was used to reduce one hour and eight hour cells to five dimensions, as detailed in the provided Additional file 1 code wrapped in the cellStates() function. Inclusion of a greater number of dimensions did not influence the clustering. Additional file 1: Figure S16 & S17 demonstrate the cell densities and individual cells for the five dimensions in the eight hour cells, where some clusters of cells can be seen to be unique to individual chips. To ensure reproducibility, the constraint was added that a cluster with fewer than three cells in more than half of the replicates not be considered in the downstream modelling. While such groups of cells may represent illuminating features in the macrophage model’s behaviour, these are not reproducible effects.
Extracting major cell states was performed using a hybrid of model-driven clustering (Gaussian mixtures) and non-parametric clustering (partitioning around medoids), as detailed in the Additional file 1 code wrapped in the consensusCluster() function. Repeated fitting of mixtures to the data - with one chip replicate omitted per iteration - was used to produce a consensus matrix of the proportion of iterations cells share clusters. This was non-parametrically re-clustered, selecting the maximum number of clusters for which experimental replication was strongly represented in each cluster. We note that although this combination of methods was employed, other tested approaches such as hierarchically clustering the consensus matrix produced similar results due to the consensus and reproducibility constraints. The defined clusters are recorded in the cell meta-data (Additional file 2). The major latent cell states identified with exploratory data analysis remain the only two identified major states in the one hour cells, with a third reproducible cluster emerging at eight hours that shares properties with the main group of cells (Additional file 1: Figure S18).
Altered gene expression was modelled as the change in mean conditioned on (i.e. tested per) cell subtype of interest. Overall mean across cell subtypes (μ) was used as a measure of global shift in gene expression, while mean absolute deviation (MAD) of the subtypes was used as a measure of variability and so context/subtype specificity. For example, for di = change in mean expression over time for cell subtype i, μ = (d1 + d2 + .. + dn)/n and MAD = (|d1-μ| + |d2-μ| + .. + |dn-μ|)/n. More robust estimators, such as the use of quantiles, provided similar top hits, so here we present μ and MAD estimates, focusing rather on a rank product framework to determine statistical significance of rank reproducibility per sequencing library [15]. A focus on gene ranks has several advantages well suited to single-cell work: it is non-parametric, robust, comments directly on reproducibility, and allows data fusion or meta-analysis without the need for complex data normalisations. For computational speed, the Heskes rank product algorithm was used to assess bounds on the statistical significance. Reported p-values are the geometric means of the upper and lower bounds provided by the rankprodbounds() function, with the qvalue package used to estimate global significance at 5% false discovery rate [49]. Rank (Spearman) correlations were used to estimate gene co-expression, followed by testing for co-expression signature enrichment with Preranked Gene Set Enrichment Analysis (default settings) [23].