The small airway epithelium, the cell population lining the bronchial tree ≥ 6 generations, plays a central role in normal lung function and in the pathogenesis of many lung disorders . Among the most common SAE-associated diseases are those caused by cigarette smoking, including chronic obstructive pulmonary disease (COPD) and lung adenocarcinoma. The development of massive parallel RNA sequencing (RNA-Seq) technology permits quantitative assessment of poly(A)+ mRNA levels to a high degree of sensitivity [19–24]. Compared to hybridization-based methodologies of transcriptome analysis, RNA-Seq has low background, broad dynamic range and high specificity . Using this approach, we have built upon the body of microarray-generated data to provide quantitative characterization of the transcriptome of the normal healthy human SAE and characterize how it changes with smoking [11–18]. By comparing the SAE RNA-Seq data to that of other tissues and organs, the present study grouped the SAE transcriptome into 2 categories: (1) ubiquitous genes, i.e., SAE genes shared with other organs and tissues, and (2) SAE-enriched genes, i.e., those expressed by the SAE, but not in the majority of other organs and tissues. Using this classification, and based on the capacity of RNA-Seq to provide quantification of mRNA, we further characterized the effect of smoking on the SAE transcriptome.
Comparison of the expression profile of different tissues by RNA-Seq and Serial Analysis of Gene Expression (SAGE) allows the identification of ubiquitous and tissue specific genes [24, 51]. By comparing to the RNA-Seq data obtained for other organs and tissues, we found that among 15,877 genes expressed in the SAE, 52% of genes are enriched in the SAE in a relatively selective manner and 48% of genes are ubiquitous. Interestingly, the SAE transcriptome includes more tissue-characteristic RNAs than other epithelial (breast, kidney, colon) and non-epithelial (heart, brain, skeletal muscle, adipose tissue, lymph node) tissues, where ubiquitous genes contribute to 65 to 85% of the transcriptome . This may reflect the high purity of the epithelial cells obtained by bronchial brushing (i.e. they are not contaminated by endothelium, connective tissue or inflammatory cells and therefore do not appear to express genes that are expressed by contaminating cell types). Notably, SAE genes with low expression levels contributed to 36% of the SAE-enriched and only 2% of ubiquitous genes, indicating that molecular uniqueness of the SAE is determined to a considerable degree by the transcripts with a low abundance. From the functional perspective, ubiquitous SAE genes dominated in the categories related to housekeeping biologic processes such as translation and transcription, whereas SAE-enriched genes were prevalent in more specific categories such as immunity, signal transduction, adhesion, and ion transport.
SAE Transcriptome Specialization
Specialized biological properties of a given organ or tissue are determined by a unique pattern of genes expressed in distinct cell populations typical for each tissue. The human SAE is composed of various cell types, including ciliated, secretory (mostly Clara cells but also surface epithelium mucus-producing cells), basal, undifferentiated columnar, and rare neuroendocrine cells [1, 2, 13, 31, 52]. Although most of the SAE-enriched genes are represented by low expressed transcripts, the top 30 highly expressed SAE-enriched genes accounted for about 20% of the total SAE mRNA, suggesting that a limited number of genes may dictate the specific pattern of biological processes dominating in the SAE under steady-state conditions. Detailed analysis of the most highly expressed SAE-enriched genes revealed a unique pattern of epithelial differentiation and molecular functions.
Genes related to secretory epithelial differentiation dominated the most highly expressed SAE-enriched genes. Of the total SAE transcripts identified, 13% mapped to the secretoglobin SCGB1A1. The high level of expression of SCGB1A1 is expected in the SAE, where Clara cells are enriched and play an important role in the pulmonary host defense [27–29]. SCGB1A1, is involved in regulation of critical processes in the distal airways such as protection against oxidative stress, maintenance of the normal airway lining fluid homeostasis, regulation of inflammation and airway reactivity during respiratory infection, and control of macrophage activation in the lung [53–57]. Another secretoglobin, SCGB3A1, originally called HIN-1, was the second-highest expressed SAE-enriched gene. Previous studies have identified the lungs as major site of SCGB3A1 in humans . Expression of SCGB3A1 is induced during epithelial differentiation and restricted to terminally differentiated airway epithelial cells and down-regulated in cancer [58, 59]. There is evidence that SCGB3A1 is also produced by Clara cells  and exerts growth-inhibitory activities . Consistent with its putative tumor-suppressor function, SCGB3A1, is aberrantly methylated in various types of cancer, including lung carcinomas . Based on previous observations, the quantitative data in the present study suggests that SCGB3A1 could be a major steady-state tumor-suppressor gene in the human SAE.
High expression of Clara cell-associated secretoglobin genes in the SAE was accompanied with enrichment of transcription factors forkhead box A1 (FOXA1), NK2 homeobox 1 (NKX2-1), FOXA2, and CCAAT/enhancer binding protein, alpha (C/EBPα), transcription factors that constitute a major regulatory network for the development and maintenance of SAE and Clara cell differentiation [43, 63–65]. NKX2-1 interacts with FOXA1 , FOXA1 and FOXA2 complement each other , and both NKX2-1 and FOXA2 are thought to act upstream of C/EBPα in lung epithelial differentiation [65, 66]. A number of secretory genes, not previously described for the human SAE, were identified by RNA-Seq as highly abundant SAE-enriched genes, including tetraspanin-1 (TSPAN1), a protein involved in secretory pathways in glandular cells , cytochrome CYP4B1, a CYP family member localized within the secretory granules of the respiratory mucosa , and microseminoprotein-beta (MSMB), an androgen-responsive secretory protein regulating cell growth and apoptosis .
Mucosal host defense
Secretory leukocyte peptidase inhibitor (SLPI) and polymeric immunoglobulin receptor (PIGR), two key components of the mucosal defense system, were among the most highly expressed SAE-enriched genes. SLPI has multiple contributions to pulmonary defense, including its ability to neutralize neutrophil elastase, one of the major mediators of lung derangement associated with inflammatory diseases, direct antimicrobial and anti-inflammatory activities, and augmentation of anti-oxidant resistance by increasing glutathione levels in the respiratory surface fluid [70–73]. PIGR is essential for the transepithelial basal-to-apical transport of the polymeric immunoglobulin IgA to the epithelial surface, where it functions to sample and neutralize luminal pathogens . Lipocalin 2 (LCN2), a siderophore-binding antimicrobial protein secreted by pulmonary epithelial cells , and the whey acid protein four-disulfide core domain 2 (WFDC2), a SLPI-related gene with potential innate immune functions , were also among the most abundant genes enriched in the SAE. Among the most highly expressed genes in the SAE was ELF3, a helix-loop-helix transcription factor expressed in diverse epithelial tissues implicated in the regulation of inflammatory responses . In the context that the airway epithelium is at the interface of the environment (the apical surface) and potential inflammatory/immune mediators (the basolateral surface), the host defense genes identified in the present study as the most abundant SAE genes may play a central role in both mediating and controlling the responses of the airway to environmental xenobiotics and pathogens.
The ability to resist deleterious effects of the oxidative stress is critical for the SAE, continuously interacting with oxidants present in the inhaled air. Apart from the secretory genes with anti-oxidant functions such as SCGB1A1 and SLPI, a number of other genes directly related to the protection from oxidative stress, including glutathione S-transferases pi 1 and alpha 1, and glutathione peroxidase 1 (GPX1), were identified as highly expressed SAE-enriched genes. One of these components, GPX1, also known as Clara cell-specific protein CC26, is selectively expressed by Clara cells , suggesting that high abundance of both secretory and oxidative stress-related features in the SAE might reflect a secretory cell origin of at least some of the anti-oxidant mechanisms in the human SAE.
Consistent with the abundance of ciliated cells in the human SAE, transcription factor FOXJ1, the major regulator of ciliogenesis and ciliated cell differentiation in the airway epithelium [41, 42], was among the top 20 SAE-enriched genes and the most highly expressed transcription factor. Other cilia-related genes enriched in the SAE were tektin-1 and -2, structural determinants of the basal bodies of cilia , cilia apical structure protein sentan , dynein chains DNAI1, DNALI1, DNAI2 and sperm associated antigen SPAG6, the classic components of motile cilia . In addition to these well-known genes, RNA-Seq analysis revealed that several recently discovered cilia-related genes were highly enriched in the human SAE, including the member of the membrane-spanning 4-domain family MS4A8B, which has high sequence homology to cilia-associated gene L985P .
Surprisingly, a considerable number of mucus-related genes, such as trefoil factor 3 , mucin 1 and mucin 5B [82, 83], were highly expressed in the SAE transcriptome along with AGR2, a secretory factor related to goblet cell differentiation [84, 85]. Of note, as compared to the large airways, where secreted polymeric mucins are abundant , the SAE transcriptome was enriched in membrane-tethered mucins such as MUC1, MUC4, MUC15, MUC20, MUC16, and MUC13, which have various signaling functions .
Stem/progenitor cell features
Although Clara cells are considered to be stem/progenitor cells of the mouse bronchiolar epithelium [8, 88], the identity of stem/progenitor cell population of the SAE in humans is unknown. Several genes related to stem/progenitor cells were identified in the present study as SAE-enriched genes, including aquaporin-3, a marker of basal cell and suprabasal cell populations with progenitor cell function described for the human tracheobronchial epithelium  and aldehyde dehydrogenase ALDH1, a marker of normal and malignant stem cells in various tissues [90, 91]. It is notable that among the top 5 highly-expressed SAE-enriched transcription factors were ELF3, which promotes epithelial morphogenesis , and embryonic stem cell-related gene SOX2, recently shown to be important for the progenitor cell function of the airway basal cells and Clara cells and induction of the airway epithelial cell phenotype in mice [36–38]. Due to its high sensitivity, RNA-Seq analysis also identified markers of the putative stem/progenitor cells previously found in the airway epithelium with relatively low frequency, such as keratin 14, a marker of a basal cell subpopulation , and surfactant protein C, a gene ascribed to a unique population of bronchoalveolar stem cells in mice . Together, the RNA-Seq data of the present study demonstrates SAE expression of multiple pathways potentially relevant for the maintenance of hu-man SAE via local stem/progenitor cell activity.
Transmembrane receptors, signaling ligands and growth factors
The most highly expressed transmembrane receptor in SAE of nonsmokers was DDR1 (discoidin domain receptor 1), a receptor tyrosine kinase . Expression of the DDR1 protein is located on the basolateral surface of human bronchial epithelium, where it interacts with type IV collagen with consequent activation of its tyrosine kinase activity. The second most abundant SAE-enriched receptor was CELSR1 (cadherin, EGF LAG seven-pass G-type receptor 1), a G protein coupled receptor known to be critical for branching morphogenesis in mouse lung . The most highly expressed SAE-enriched ligand was midkine (MDK), which has a role in lung morphogenesis and is believed to be essential for vascular maintenance in the adult lung . In mouse, midkine expression is controlled by Nkx2-1  which, as mentioned above, is also highly expressed in the human SAE. Among the highly expressed ligands, there was a clear prevalence of chemokines such as MDK, CXCL1 and CX3CL1. Consistent with this observation, expression of diverse cytokines by airway epithelium and cell lines derived from airway epithelium is well established and epithelial derived chemokines are recognized to play an important role in attracting immune and inflammatory cells [97, 98].
The RNA-Seq data also pointed to potentially novel aspects of cell signaling in epithelial biology. For example, the oxytocin receptor (OXTR) was expressed at high levels in all subjects who were male. This was initially surprising due to roles of oxytocin in childbirth, lactation and brain biology  but, relevant to the airway epithelium, a role for oxytocin in autocrine signaling in small cell lung cancer has been described .
SAE Transcriptome Response to Smoking
Extensive microarray studies have identified a dramatic effect of smoking on the gene expression profile of human airway epithelium [11–18]. By using RPKM quantification as a measure of smoking-dependent changes in SAE transcript levels, the present study expands the insights into the airway epithelial response to smoking. The quantitative RNA-Seq analysis revealed that smoking suppressed the expression level of a greater number of genes than it induced. Interestingly, among the up-regulated genes, smoking has a larger effect on SAE-enriched genes rather than the ubiquitous genes. From the functional perspective, the SAE-enriched smoking-up-regulated genes were related to transcription, signal transduction, protease/antiprotease homeostasis, and immunity.
The top 2 SAE-enriched genes, the Clara cell associated genes SCGB1A1 and SCGB3A1, were both down-regulated by smoking with a large magnitude of change in expression levels. Smoking and especially COPD have been associated with the loss of Clara cells and the levels of SCGB1A1 in both induced sputum and serum are lower in smokers with COPD as compared to both nonsmokers and healthy smokers [104–107]. It is possible that down-regulation of Clara cell secretoglobins, with their anti-oxidant, anti-inflammatory and tumor-suppressor activities [60, 108, 109], is a critical component of smoking-related development of COPD. The decreased number of SCGB1A1-expressing Clara cells in smokers is generally accompanied by an increased frequency of mucin-secreting cells . Indeed, a subset of highly expressed SAE-enriched genes, such as C20orf114 (also known as long PLUNC1), and MSMB, both associated with mucin-producing secretory cell phenotype [110, 111], were among the smoking-induced genes with the highest amplitude of up-regulation. Other genes related to a secretory phenotype such as WFDC2, TSPAN1, TFF3, S100P, and short PLUNC, were also induced by smoking; each of these genes has been associated with epithelial carcinogenesis [67, 112–115]. Thus, a broad induction of a mucin-producing cell secretory program, characteristic of epithelial malignancies, may represent an early molecular phenotype relevant to smoking-induced carcinogenesis in the distal airways.
Other smoking-induced changes among the highly expressed SAE-enriched genes included up-regulation of oxidative stress-related genes ALDH3A1 and GSTA2, also associated with cancer development [116, 117], and down-regulation of genes associated with epithelial differentiation such as CD74, C9orf24 (also known as ciliated bronchial epithelium 1), and luminal cell-associated keratin 19[118, 119]. Some of these changes have not been previously detected by microarrays, likely due to microarray saturation of signal with high levels of expression and/or higher sensitivity of the RNA-Seq methodology to gene expression changes with relatively low overall fold-difference between the groups.
The ability of RNA-Seq to assess genes with low steady-state expression was utilized in the present study to characterize the effect of smoking on the expression of low abundance SAE genes. Although some of changes, such as up-regulation of oxidative stress-responsive genes AKR1B10, CABYR, and CYP1B1 have been previously reported [11, 45–47], RNA-Seq quantification revealed a number of novel smoking-responsive genes, including smoking-induced NOS3, a gene encoding nitric oxide isoform usually expressed by endothelial cells but induced in the airway epithelium in association with squamous differentiation , and smoking-suppressed Ly6/neurotoxin 1 (LYNX1), an allosteric modulator of nicotinic acetylcholine receptors .
Functional classification of the low level, smoking-related genes also identified the class of ion transport genes as being modulated by smoking. One example was CNGB1, a smoking-induced gene that encodes a cyclic nucleotide gated channel that was first identified for its role in light activated cellular polarization in retinal photoreceptor cells  and linked to olfactory receptor function . The discovery that airway ciliated cells have olfactory receptors that operate by the same signal transduction pathways as visual rhodopsin  suggests a role for CNGB1 in airway epithelial function. Also notable among smoking-dependent genes were a series of ion channels whose overall low expression level in the SAE may reflect expression predominantly in neuroendocrine cells which constitute < 0.01% of total airway epithelium. For example, CACNG4, the gamma subunit of a voltage gated calcium channel, is a smoking-induced gene. Previous reports suggest that this gamma subunit is expressed primarily in brain  but expression of voltage gated calcium channels in neuroendocrine cells and neuroendocrine-derived tumors has been demonstrated .