Overview of single-cell qPCR and RNA-Seq of developing mouse livers
To comprehensively understand the transcriptional program during liver development, we carried out single-cell transcriptomic analysis, including qPCR, on 722 cells and RNA-Seq on 559 cells derived from mouse fetal livers at eight developmental stages, including embryonic day (E) 11.5, 12.5, 13.5, 14.5, 16.5, 18.5 and postnatal day (P) 2.5 and P3.25 (Fig. 1a-b). We first randomly selected 467 single cells and then assessed them via single-cell qPCR with genes related to cell types and liver development (Fig. 1c and Additional file 1: Figure S1). We observed that hepatic marker genes such as Afp, Alb, Ttr and Serpina1a were highly expressed in some cells from E11.5 to E16.5 livers, which were later identified as hepatoblasts. However, a similar gene expression pattern was rarely observed in single cells from E18.5 and P2.5 livers (Additional file 1: Figure S1). After removing low quality libraries, we performed RNA-Seq on 415 single cells using the same cDNA libraries as qPCR. We proposed the molecular patterns for putative LSPCs after analysis of these cells and then collected 255 single cells from another batch of fetal livers as biological replicates, and 92 single cells were chosen for RNA-Seq (Fig. 1b). We also used flow cytometry to isolate Epcam+ cells from P3.25 livers, which were likely to be cholangiocytes [7, 18], and then sequenced 52 these Epcam+ single cells (Fig. 1b).
In this study, the median mapping rates of sequencing reads within each developmental stage ranged from 57% to 78%. The median numbers of unique mapped reads ranged from 1.1 to 3.8 million per cell. The median numbers of genes detected with confidence of fragments per kilobase of exon model per million (FPKM) > 1 ranged from approximately 3000 to 6000 for all stages except Epcam+ cells from P3.25 livers, which only showed a median number of around 2000 genes despite similar sequencing depth and mapping rate (Additional file 1: Figure S2a and Additional file 2: Table S1). The decreased number of genes expressed in Epcam+ cells from P3.25 livers could be due to their more differentiated status. We introduced ERCC RNA Spike-ins as technical controls, and high correlation coefficients among single cells at each stage based on the 92 Spike-ins were observed (Additional file 1: Figure S2b), indicating low technical noise in our data. We further quantitatively evaluated the correlation between RNA-Seq and qPCR data from the same single cells, and they were positively correlated with each other (Additional file 1: Figure S2c). Here, the median correlation coefficients between single-cell RNA-Seq and qPCR were approximately 0.9 for all stages (Additional file 1: Figure S2c).
Identification of LSPCs in developing mouse livers via marker-free transcriptomic profiling
Limited markers may lead to the incorrect identification of cell populations, and single-cell transcriptomic profiling facilitates ab initio cell-type characterization. Because a very large portion of E18.5 and P2.5 cells were mature hepatocytes (Additional file 1: Figs. S1, S5), we focused on single cells from E11.5 to E16.5 for cell type identification. To reduce the disturbance in the gene expression-based cell clustering analysis, the transcripts severely corrupted by technical noise were filtered via a statistical model (Additional file 1: Figure S2d). Subsequently, hierarchical clustering (HC) enabled decomposition of these cells into six groups which were later confirmed as endothelial cells, erythrocyte, hepatoblast, macrophage, megakaryocyte and mesenchymal cells (Fig. 2a and Additional file 1: Figure S3a-b), where the classification also could be clearly visualized with t-distributed stochastic neighbor embedding (t-SNE) plot (Fig. 2b).
Among the six groups, one group was classified as the endothelial lineage, while three were assigned to myeloid lineages of the hematopoietic system, including erythrocytic, megakaryocytic, monocytic or macrophagic (likely Kupffer cells) lineages based on specific markers (Fig. 2c and Additional file 1: Figure S3b) and the Gene Ontology (GO) enrichment analysis (Additional file 1: Figure S3c), consistent with the fact that fetal liver is the main hematopoietic organ during this developmental period. These single cells expressed known lineage-specific marker genes; for example, endothelial cells express Lyve1 and Kdr; erythrocytes highly express Hba-a1 and Hbb-bt; macrophages express Ptprc (Cd45), Cd68 and Cd52; and megakaryocytes express Itga2b and Itgb3.
One of the remaining two groups belonged to hepatoblasts that express hepatic markers such as Afp and Alb, stem/progenitor-related genes such as Gpc3 and Dlk1 (Fig. 2c and Additional file 1: Figure S3a), and genes related to liver functions such as lipid metabolism and blood coagulation (Additional file 1: Figure S3c). Interestingly, the other group expressed some stem/progenitor-related genes, such as Gpc3, Dlk1, Sox9 and Sox11, but had no or low expression of genes related to hepatic lineages (Additional file 1: Figure S3b).
As known, the liver bud is expanding in septum transversum mesenchyme at E11.5 stage, so we speculated that this un-identified group may be related to the mesenchymal phenotype. By checking the expression profiles of certain marker genes closely related to LSPCs and mesenchymal cells (Fig. 2d), we figured out the distinct signatures that can distinguish hepatoblasts from mesenchymal cells. Significantly, hepatoblasts highly expressed many hepatic lineage-specific markers, such as Afp, Alb, Hnf4a, Krt18, Krt8, Hnf1b, Hhex and Met (c-Met), as well as Anpep (Cd13) and Cdh1 (E-cadherin), while the un-identified group expressed significantly higher levels of mesenchymal-related marker genes such as Vim, Col1a2, Mest, Mmp2, Pdgfra, Ncam1 and Lhx2. We thereby named the group as mesenchymal cells. The interesting thing is that Dlk1, a marker of hepatic stem cells, was expressed in the group classified as mesenchymal cells. Here we thus employed immunofluorescence assay and confirmed the existence of such cells co-expressing both Dlk1 and vimentin, a mesenchymal marker (Fig. 2e and Additional file 1: Figure S4a-b).
Epcam is a marker closely related to the differentiation of LSPC into cholangiocytes [4, 18]. Our single-cell RNA-Seq data showed that Epcam expression was detected in 1 of 2 cells of E11.5 hepatoblasts and 5 of 43 cells of E11.5 mesenchymal cells, not detected in E12.5 ~ E14.5 cells, but was detected again in 2 of 12 cells of E16.5 hepatoblasts (Fig. 2d). This temporal change of Epcam expression indicated our single-cell RNA-seq data was consistent with the previous observation via immunofluorescence assay or flow cytometric analysis [7]. As single-cell RNA-Seq may suffer drop-out issue, here we selected the 8 cells with Epcam transcript and 8 cells without Epcam transcript form E11.5 and E16.5 for further qPCR validation. For cDNA library before Nextera amplification, Epcam expression was only detected in 1 and 2 cells by microfluidic-based and tube-based qPCR, respectively; while for cDNA library after Nextera amplification, Epcam expression was detected in 4 cells by tube-based qPCR (Additional file 1: Figure S4c). The comparison of the two approaches indicated that our single-cell RNA-Seq data at the current sequencing depth was more sensitive than qPCR in detecting low expressed genes, which also showed consistence with the results in Additional file 1: Figure S2c. The reason that we didn’t detect Epcam expression in many cells in these developing livers is probably because Epcam is expressed in a small fraction of LSPCs, where the marker-free approach was difficult to detect them.
We then characterized the distribution of the six groups at each stage. Interestingly, the fetal livers from E11.5 to E16.5 all had six cell types despite different proportions (Fig. 2f). The proportion of erythrocytes increased from E11.5 to E14.5 and then decreased at E16.5, and a stepwise decrease in mesenchymal cells and a slight increase in hepatoblasts from E11.5 to E16.5 were observed.
Dynamic developmental process of LSPCs at single-cell resolution
To decipher the dynamic developmental process of LSPCs during liver development, we performed gene set enrichment analysis (GSEA) of hepatoblasts along the developmental stages. The gene expression pattern of hepatoblasts from E12.5 to E14.5 had no statistically meaningful differences, but the comparison of E14.5 and E16.5 hepatoblasts revealed that the cell cycle and mitosis-related genes were significantly enriched in the E14.5 stage (Additional file 1: Figure S5a). This finding suggested that the transition from E14.5 to E16.5 may be the critical differentiation switch for hepatoblasts via cell division.
We then analyzed all hepatoblasts from E11.5 to E16.5 livers to construct the landscape of the dynamic developmental processes. Analysis of variance (ANOVA) was first applied to rank the genes that were differentially expressed across the five developmental stages, and the top 30 genes (Additional file 3: Table S2) were used for the developmental track construction. The gradually up-regulated genes included Apoh, Ahsg, Alb, Kng2, Adh1 and Aldob, which are specific for hepatocyte function; whereas the genes that were decreased stepwise included Mdk that is highly expressed in mid-gestation [19], and Hhat (hedgehog acyltransferase) that is required for SHH signaling which is closely related to embryonic development, liver regeneration and liver cancer stem cells [20,21,22,23] (Fig. 3a-b and Additional file 1: Figure S5b). In the developmental track, it is clear that E12.5 ~ E14.5 hepatoblasts were closely related in a continuous manner (Fig. 3c). The stepwise gene expression profile changes also provided insights into the transcriptional programs of hepatoblast differentiation and maturation (Fig. 3d).
Theoretically, hepatoblasts will differentiate into hepatocytes and cholangiocytes, but the hepatoblasts from E11.5 to E16.5 only exhibited stepwise increased expression of hepatic-related proteins without expressing biliary markers. Are these cells indeed hepatoblasts or differentiated immature hepatocytes? To figure this out, we compared these embryonic hepatoblasts with P2.5 hepatic cells. Only one hepatic cell from the P2.5 stage was closely related to E16.5 hepatoblasts, and all others were distinct from hepatoblasts, where they had low or no expression of stem/progenitor-related markers, such as Gpc3, Dlk1 and Afp, but higher expression of hepatocyte-related genes, such as Fbp1, Mat1a and Sult1a1 (Additional file 1: Figure S5c-d). The data indicated that the majority of hepatic cells from P2.5 livers were hepatocytes, although there were a few hepatoblasts, possibly representing precursors for oval cells within adult livers. Here, our data indicated that hepatoblasts in E11.5 ~ E16.5 are authentic hepatoblasts.
LSPC differentiation into cholangiocytes
As hepatoblasts are the bi-potential progenitors for both hepatocytes and cholangiocytes, it is important to reveal the cell fate decision of LSPC differentiation into cholangiocytes. LSPCs are generally believed to co-express hepatocyte and cholangiocyte markers, and we checked such a possibility in these single liver cells from E11.5 to E16.5. Only 4 cells from E11.5, E12.5 and E14.5 livers co-expressed hepatocyte-specific markers, such as Krt8 and Krt18, and cholangiocyte-specific markers, such as Krt7 and Krt19 (Fig. 2d). The results indicated that only a minority of hepatoblasts from mouse fetal livers co-express hepatocyte and cholangiocyte markers at this developmental period. Our randomly selected single cells contain few cells showing markers of cholangiocytes, probably due to the asymmetric lineage fate of hepatoblast differentiation into hepatocyte and cholangiocyte. We thus employed flow cytometry to enrich the relative rare mouse cholangiocytes using the well-known marker Epcam [6, 7, 18], which will enable single-cell transcriptomic comparison between cholangiocytes and hepatoblasts and facilitate understanding the fate-decision stage for differentiation into cholangiocytes. We obtained 52 Epcam+ single cells from P3.25 fetal livers and sequenced them.
We compared the expression of LSPC-related markers between hepatoblasts and the P3.25 Epcam+ cells (Fig. 4a-b). Some genes such as Hnf4a, Dlk1, Anpep and Prom1 (Cd133) were highly expressed in hepatoblasts, but not in the Epcam+ cells. Significantly, Gpc3 expression was maintained in all Epcam+ cells, whilst Afp was only expressed in about half of the Epcam+ cells. Another interesting phenomenon is that the expression level of Cdh1 was increased in the Epcam+ cells, suggesting high E-cadherin level could be a putative marker for cholangiocyte isolation. The genes highly and specifically expressed in the Epcam+ cells included Sox9 and Spp1, which are well-documented markers for cholangiocytes. The expression of Krt7 or Krt19 emerged in some of the Epcam+ cells, but not in E11.5 ~ E16.5 hepatoblasts. We also checked the expression pattern of marker genes from Fig. 4a in P3.25 Epcam+ cells and hepatoblasts to hESC-derived cholangiocytes and hepatoblasts (hESC-Chol and hESC-HB) [24]. Seven genes from Fig. 4a were found to be differently expressed between hESC-Chol and hESC-HB, and similar changing expression patterns were also observed between mouse cholangiocytes and hepatoblasts (Additional file 1: Figure S6a). The expression of DLK1 was decreased in hESC-Chol in comparison with hESC-HB, also consistent with disappearing expression of Dlk1 in single cells of mouse cholangiocytes (Additional file 1: Figure S6b). There was one exception that AFP expression was increased in hESC-Chol, while single cells of mouse cholangiocytes showed heterogeneous Afp expression pattern (Additional file 1: Figure S6b). This is probably because the hESC derived cholangiocytes are not exactly the same as their in vivo counterparts. The collective data indicated that these Epcam+ cells are cholangiocytes.
Currently it is still not clear when LSPCs are fated to hepatocytes or cholangiocytes. To decipher the issue, we compared transcriptomic feature of these P3.25 Epcam+ cholangiocytes with hepatoblasts from E11.5 ~ E16.5 using the differentially expressed transcripts among these stages (see Fig. 3). Interestingly, the gene expression profiles of P3.25 cholangiocytes were more closely related to that of E11.5 hepatoblasts rather than other hepatoblasts from later developmental stages (Fig. 4c-d). The resembled gene expression profiles between cholangiocytes and early hepatoblasts hint that hepatoblasts may be fated to cholangiocytes at an early stage. However, this single-cell genomics-derived model still needs further validation, especially solid evidence from well-designed lineage tracing experiments employing reliable markers.
We further focused on the signaling pathways related to liver development, including the Wnt/β-catenin, Notch, TGF-β and Hedgehog pathways. The results showed that Jag1, Notch2 and Hes1 were significantly (p < 0.01) higher expressed in cholangiocytes, suggesting that Notch activation was associated with cholangiocytes differentiation (Fig. 4a-b), consistent with previous results [18].
As known, Transcription factors (TFs) play important roles in liver development. Therefore, we screened TFs that were differentially expressed between hepatoblasts and cholangiocytes and found more TFs were specifically or highly expressed in hepatoblasts (Additional file 4: Table S3), suggesting stem/progenitor cells with higher plasticity could maintain a high self-renewal and differentiation capability under complex regulatory conditions. We then performed expression covariance network analysis and constructed two TF covariance networks specific to hepatoblasts and cholangiocytes (Fig. 4e). Within the hepatoblast-specific network, the TFs that had significant correlations with more TFs included Atf4, Tfam, Hmga1, Maz, Sall4, Mbd3, Tfdp1, Hnf4a, Ssrp1 and E2f4, some of which are required for liver functions or regulation of cell self-renewal and differentiation. For examples, zinc finger transcription factor Sall4 controls the cell fate decision of hepatoblasts and serves as a marker for progenitor subclass of hepatocellular carcinoma [25, 26]. For the cholangiocyte-specific network, such TFs included Sox9, Nfib, Nfia, Ehf, Id2, Gatad1, Pbx1 and Hes1, some of which are known to be involved in various stem cell-related signaling pathways. For example, Hes1 is closely related to Notch pathway, while the HMG-box transcription factor Sox9 has been shown to be related to progenitor status and differentiation of cholangiocytes [27, 28]. The TF covariance analysis not only supports that some well-studied TFs may play critical roles in LSPC differentiation but also provides candidate TFs for future investigation.
Assessment and prediction of LSPC biomarkers
The isolation of LSPCs in previous studies generally relied on limited surface markers. The gene expression patterns of commonly used markers confirmed the heterogeneity within LSPCs (Fig. 5a and Additional file 1: Figure S7a). We checked the combined expression profiles of some representative genes. As an example, the marker-positive cells with the gene pairs such as Dlk1 vs. Lgr5 and Cd24a vs. Igdcc4 (Nope) were not identical in hepatoblasts at E13.5 although overlapped (Fig. 5a). The results indicated that LSPCs isolated with different markers may represent overlapping but not identical LSPC pool. As our single cells were randomly selected from fetal livers, systematic assessment of the sensitivity and specificity of isolation markers for LSPCs could be performed according to the proposed approach (Fig. 5b).
As the phenotypes of LSPCs are dynamically changing during the developmental process, we separately analyzed 11 marker genes commonly used for cell isolation in hepatoblasts at each stage (Fig. 5c and Additional file 1: Figure S7b). Cdh1 exhibited the best sensitivity and specificity for hepatoblast isolation from E12.5 to E16.5 livers (Fig. 5d), followed by Anpep and Prom1 (Additional file 1: Figure S7c). Dlk1 was expressed in both hepatoblasts and mesenchymal cells at early stage, where its specificity gradually increased from E11.5 to E16.5 for hepatoblasts and decreased from E11.5 to E14.5 for mesenchymal cells (Fig. 5d). The specificity of Igdcc4 for hepatoblasts gradually increased from E11.5 to E16.5, despite its decreased sensitivity (Additional file 1: Figure S7c). We also checked Gpc3, and the specificity decreased in mesenchymal cells but increased in hepatoblasts over time (Additional file 1: Figure S7c). Our data provide a systematic evaluation of the isolation markers for LSPCs, which is helpful for evaluating the reliability of previously isolated LSPCs and for predicting isolation markers for further validation. For example, Gcgr and Cdhr2 are predicted to be isolation markers for hepatoblasts (Additional file 1: Figure S7c).
To validate the results, we performed flow cytometric analysis of mouse fetal liver cells from different stages with antibodies against E-cadherin, Anpep, Dlk1 and Prom1. The co-expression of E-cadherin, Anpep and Dlk1 was observed in E12.5 ~ E16.5 fetal liver cells (Fig. 5e and Additional file 1: Figure S8a-b), consistent with our single-cell RNA-Seq data. However, there was inconsistency in regard to Dlk1 expression in E12.5 fetal livers. Flow cytometry indicated that Dlk1+ cells almost perfectly overlapped with E-cadherin+ cells (Additional file 1: Figure S8a), whereas RNA-Seq data showed that Dlk1 expression could not distinguish between hepatoblasts and mesenchymal cells at E12.5 stage and that Cdh1 is a specific marker for hepatoblasts (Fig. 5d). This finding suggested that protein levels are not always consistent with transcript levels. Flow cytometric analysis also revealed that only a portion of the Prom1+ cells from E14.5 and E16.5 fetal livers were Dlk1-positive (Fig. 5f and Additional file 1: Figure S8c). In general, both single-cell RNA-Seq and flow cytometry supported E-cadherin, Anpep and Dlk1 as appropriate biomarkers for isolation of LSPCs, while Prom1 may slightly differ from the above three markers (Figs. 2d, 5e-f and Additional file 1: Figs. S7c, S8).