Data-mining the FlyAtlas online resource to identify core functional motifs across transporting epithelia

Background Comparative analysis of tissue-specific transcriptomes is a powerful technique to uncover tissue functions. Our FlyAtlas.org provides authoritative gene expression levels for multiple tissues of Drosophila melanogaster (1). Although the main use of such resources is single gene lookup, there is the potential for powerful meta-analysis to address questions that could not easily be framed otherwise. Here, we illustrate the power of data-mining of FlyAtlas data by comparing epithelial transcriptomes to identify a core set of highly-expressed genes, across the four major epithelial tissues (salivary glands, Malpighian tubules, midgut and hindgut) of both adults and larvae. Method Parallel hypothesis-led and hypothesis-free approaches were adopted to identify core genes that underpin insect epithelial function. In the former, gene lists were created from transport processes identified in the literature, and their expression profiles mapped from the flyatlas.org online dataset. In the latter, gene enrichment lists were prepared for each epithelium, and genes (both transport related and unrelated) consistently enriched in transporting epithelia identified. Results A key set of transport genes, comprising V-ATPases, cation exchangers, aquaporins, potassium and chloride channels, and carbonic anhydrase, was found to be highly enriched across the epithelial tissues, compared with the whole fly. Additionally, a further set of genes that had not been predicted to have epithelial roles, were co-expressed with the core transporters, extending our view of what makes a transporting epithelium work. Further insights were obtained by studying the genes uniquely overexpressed in each epithelium; for example, the salivary gland expresses lipases, the midgut organic solute transporters, the tubules specialize for purine metabolism and the hindgut overexpresses still unknown genes. Conclusion Taken together, these data provide a unique insight into epithelial function in this key model insect, and a framework for comparison with other species. They also provide a methodology for function-led datamining of FlyAtlas.org and other multi-tissue expression datasets.


Introduction
Most cells in multicellular organisms share a common genome, but improve their collective fitness by delegating specialized functions to specialized tissues. As mRNA is costly to make, genes that are particularly abundantly expressed in a tissue can provide a valuable indication of likely important functions within that tissue. Based on this premise, comparative atlases of gene expression across multiple tissues and life stages have become valuable and heavily used tools in the functional genomics arsenal [1][2][3]. At the simplest level, such resources allow an experimenter to establish which tissues express a gene of interest most abundantly, a necessary preliminary to a reverse-genetic work-up [4]. However, as well as allowing simple gene-by-gene lookup, such datasets allow new insights to be synthesised by data mining. For example, large microarray datasets are ideal for clustering genes by co-expression, and thence for inference of shared cis-acting regulatory elements [5] and gene regulatory networks [6][7][8]. However, there is also scope for meta-analysis of function, a relatively unexplored area. For example, it is possible to ask the question: "which genes are uniquely expressed in the larval, rather than adult, CNS?" This paper illustrates this methodology, using meta-analysis of tissue-specific transcriptomics datasets generated in our lab, which form the FlyAtlas.org online resource [9,10], that has quickly become one of the most widely used Drosophila online resources, to seek a common expression signature shared by major epithelia.
The FlyAtlas.org online resource [9,10] curates Affymetrixderived expression data (in 4 biological replicates) for each of 18 matched adult and 8 larval tissues, and one cell line, so providing unique opportunities to investigate expression across different tissues. The aim of this paper is thus to identify both the common and unique transport components across the major Drosophila transporting epithelia, using both a hypothesis-led approach, based on already known transport processes, and a hypothesis-free approach, based on enriched expression in one or more of these tissues. Insects make an ideal starting point for such study, because it is generally agreed that all insect epithelia are energized by an apical plasma membrane H + V-ATPase (the "Wieczorek model" - Figure 1), rather than the basolateral Na + , K + ATPase familiar to vertebrate physiologists [11,12] although we have shown the latter ATPase also to be important [13]. Although transcriptomic abundance is not necessarily a predictor of active protein, epithelia are particularly suited to such an approach, because the relatively low turnover numbers of most transporters requires high levels of both proteins and their encoding mRNAs. We have previously shown that, across the large V-ATPase gene family, very high mRNA abundance is indeed a good indicator of functional significance in epithelia [14,15]. The concept of a core epithelial transcriptome is thus perfectly plausible, and so here we test the model by meta-analysis of larval and adult transcriptomes of the key epithelia of the alimentary canal: the salivary glands, midgut, Malpighian tubules, and hindgut ( Figure 1). We adopted parallel hypothesis-led and hypothesis-free approaches (Figure 2), to maximise the unbiased discovery both of genes that underly functions already described in the physiological literature, and to uncover new coenriched genes that might provide novel insights into epithelial function.

Epithelial transcriptomes cluster separately from other tissues
The first step is to establish that there is indeed a story to tell, and that epithelial transcriptomes resemble each other more than other tissues. Principal component analysis (PCA) clearly showed grouping of the epithelial tissues that was separable from neuronal or reproductive tissues, in both larvae and adults ( Figure 3). This tight clustering of the 4 biological replicates of each tissue, and of the epithelial transcriptomes together and distinct from other tissues, provides broad validation for the concept that an epithelial core transcriptome is a calculable and worthwhile enterprise.
Given that epithelia sit together as a distinct group, it is logical to ask which epithelia are most closely related to each other in terms of transcriptional profile. Hierarchical clustering [17][18][19] confirmed that, even though most insect tissues undergo extensive remodelling during metamorphosis, the pairs of cognate adult and larval tissue transcriptomes clustered more closely together than to any other tissue. Within the hierarchy, the midgut and hindgut transcriptomes were most similar to each other, as were the tubules and salivary glands ( Figure 3B). This may reflect a basic difference between absorptive (midgut and hindgut) and fluid secretory (salivary gland and tubule) epithelia, respectively. These differences are more marked than those which would have been predicted from development; the salivary glands, tubules and hindgut are ectodermal, but the midgut endodermal, in embryonic origin. Transport is energized by an apical protonmotive V-ATPase, which establishes a gradient that drives an Na + /H + or K + /H + exchanger. These ions enter basally through unspecified mechanisms, likely to be cotransports or channels.
Testing the model -is there a core epithelial signature?
The first approach adopted ( Figure 2) was to profile the expression of genes underpinning functions considered to be integral to the Wieczorek model for insect epithelia, namely the V-ATPase, and putative exchangers and co-transports ( Table 1). The FlyAtlas dataset allows both larval and adult tissues to be compared. Additionally, two non fluid-transporting tissues (brain and testes) were selected as out-groups, to allow comparison with the true epithelia. By inspection it is clear that, for the major classes of transporter listed in Table 1, it is possible to identify a minimal core epithelial module across mostif not all-epithelia.
In all cases, the plasma membrane isoform of the V-ATPase dominates, and a single gene is favoured for each subunit in all the epithelia studied. These genes also map precisely to those previously implicated by in situ hybridization and other techniques [14], increasing our confidence in the accuracy of this transcriptome-led approach. The Wieczorek model also requires an apical exchanger, possibly electrogenic [20]. Here, there is more variability; but all epithelia show very high levels of one or more of Nhe1, Nha1 or Nha2. In the Malpighian tubule, NHA1 & 2 have previously been shown to localize to the apical plasma membrane, and to constitute the 'Wieczorek exchanger' [21], although Nha2 has been localized to the apical plasma membrane of the stellate (rather than principal) cell in Aedes tubule [22].
Basolaterally, it is important that the plasma membrane is 'balanced' so that the cell is not unduly stressed by the potent transport demands of the apical surface. It has been traditional in the insect literature to downplay the role of the Na + , K + ATPase, because insect epithelia are famously insensitive to the inhibitor ouabain [23]. However, in Drosophila Malpighian tubule the genes encoding the α and β subunits of the Na + , K + ATPase were very abundantly expressed [15], andthat the pump was normally protected by a co-localized OATP transporter, so conferring apparent ouabain insensitivity to a ouabain-sensitive pump [13]. Table 1 confirms that the same α and β subunit genes (atpalpha and nirvana/nirvana3) are abundantly and specifically expressed in every epithelium (Table 1), confirming the general importance of the Na + , K + ATPase in insect epithelia.
A basolateral Na + or K + entry step is also required for transepithelial transport. Various cotransports have been implicated, and the Na + -Dependent Anion Exchanger NDAE1 [24] and Na + /K + /2Clcotransport NKCC [25] both show enriched expression in most epithelia. However, most insect epithelia actively transport K + in preference to Na + ; and the conspicuous story in the FlyAtlas dataset is the dominance of epithelial transcriptomes by inward rectifier K + channels (Table 1). Although the K + channel repertoire is the most diverse in any organism, only these three channels are ever abundant in epithelia. These inward-rectifying channels would allow entry, but not exit, of K + , and so would provide a perfect foil to the apical exchanger. Consistent with this, secretion by the tubule is known to be inhibited by basolateral application of antidiabetic sulphonylureas such as glibenclamide, classical inhibitors of inward rectifier channels [26].
The Wieczorek model focuses on the electrogenic active transport of cations, but this is only part of epithelial function. Many transporting epithelia are specialized to move water, typically with active cation transport that energizes a passive anion flux (typically of chloride); the resulting transepithelial flux of salt drives osmotic movement of water. Although in some insect tubules chloride movement has been argued to be paracellular [27], it is reasonable to look for enriched expression of chloride and water l l l l l l l l l l l Figure 2 Comparison of the hypothesis-led and hypothesis-free approaches. The former seeks to identify genes underlying processes demonstrated experimentally, or predicted, in the literature. The latter is based on co-expression or enrichment in tissues of interest compared with other tissues, or the whole organism. Although both identify genes of interest that underly known functions, the hypothesis-free approach also identifies co-enriched genes without prior knowledge, potentially leading to unexpected research hypotheses. channels across the FlyAtlas dataset. These are indeed observed across the epithelia. Although there is variability in the choice of channels, all epithelia show high levels of expression of one or more of the CLC or CLIC chloride channels; and of one of three aquaporins (there are a total of 6 in Drosophila).
Carbonic anhydrase is regularly found at high levels in epithelia, such as the human kidney [28]. Although the reaction it catalyses (hydration of CO 2 : CO 2 + H 2 O < − > H + + HCO 3 -) is reversible and spontaneous, this enzyme has a high turnover number, and is thought to be critical in providing sufficient ions for transport to occur at high rates. Although there are several carbonic anhydrase genes in the genome, only CAH1 is found at high levels in all epithelia.
Generally, then, the Wieczorek model holds for these epithelia, but the minimal core can reliably be extended to include other channels, aquaporins, exchangers and cotransports to produce a new model for insect epithelia.

Novel gene signatures common to transporting epithelia
The hypothesis-led approach confirmed the existence of a conserved epithelial core transcriptome (summarized in Figure 4). However, one advantage of global datasets is that they also permit a hypothesis-free approach ( Figure 2). Are there any unsuspected commonalities between epithelial tissues, and what do they tell us about insect epithelial function? There are several potential methods to identify such genes; for example, one could identify all those genes scored by the Affymetrix software   as 'present' in all epithelia and 'absent' in all other tissues. However, this would be an excessively stringent criterion, and indeed would not identify any genes in this dataset. Accordingly, we settled on a simple enrichment of RNA signal in each tissue, compared with the whole organism, and to restrict the number of hits to a manageable level, present genes with an enrichment >2.5 in all epithelial tissues ( Table 2). Although some of the genes mentioned identified in the consensus model (like Drip and Nhe1) also feature in this table, most do not. This is for one of two reasons. Either (as for several V-ATPase subunits), they are also expressed generally at reasonable abundance throughout the organism, so reducing the apparent enrichment; or (as for the inward rectifier channels) different epithelia select one from a restricted set, so no single gene makes the table. This approach is thus conservative in nature, but any genes that emerge are potentially of great interest. The list includes transcription factors (bowl, lola and hr39), cytoskeletal or vesicle/trafficking proteins (Msp-300, synaptobrevin), septate junctional or cell polarity proteins (such as Scrib) and a collection of cell defence genes, such as cytochrome P450s and the zinc finger protein Traf-like. The list is also enriched for some cell signalling genes, notably two enigmatic G-protein coupled receptors of the Methuselah family, and a G-protein, G αs60A .
It is thus interesting that the transcriptomic enumeration of the Wieczorek model for insect epithelia can be complemented by an array of further genes based on the hypothesis-free approach, and that these sit naturally in groups of epithelial determination and development, cell junctions and polarity, trafficking and defence.

Unique gene expression patterns that delineate specialized function
Finally, having identified a common transcriptomic motif for the major transporting epithelia in insects, it is interesting to seek transcriptomic insights as to the unique roles For brevity, the mean normalized Affymetrix signal is shown for each tissue and gene. Errors are typically 5-10% of the mean, and can be found by direct interrogation of the full dataset at flyatlas.org. Brain, testis and whole fly signals are included as non-epithelial out-groups for comparison.
played by each epithelium. To accomplish this, the 50 most tissue-specific genes in either larvae or adult (again, based on enrichment compared to the whole organism) for each tissue were identified (Additional file 1 Table S1,  Additional file 1 Table S2, Additional file 1 Table S3 and  Additional file 1 Table S4), and their functions (where known) identified from FlyBase or from the literature. Rather than a purely in silico exercise, generations of classical insect physiology allow the data to be interpreted in the context of the known physiology of each tissue.

Salivary glands
As in humans, insect salivary glands are thought to produce a watery secretion containing enzymes, to aid in the maceration and initial digestion of food. Secretion is under neural control (typically by biogenic amines) [29,30]. There may also be a defence function, protecting the rest of the alimentary canal by including antimicrobial peptides in the secreted saliva. Larval and adult salivary glands do not necessarily perform identical functions, as diet can change radically in the life cycle of holometabolous insects. Although fluid secretion is by the classical Wieczorek model (Table 1), larval and adult salivary glands use different exchangers, with Nhe1 present throughout, but Nha1 specific to the larva (Additional file 1 Table S1). For cotransports, NKCC predominates in the larva, but Ndae in the adult; and aquaporins are more prominent in the larva, suggesting an increased emphasis on fluid secretion during the active growing phase when food intake is maximal. As in the closely related blowfly (Calliphora erythrocephala), control of secretion in the adult is via serotonin, in which separate 5-HT receptors (5-HT2 and 5-HT7) drive secretion through two independent signaling mechanisms constituting the key second messengers cAMP and Ca 2+ respectively [31]. However, these are not the only adult salivary gland receptors; the putative GABA/glycine receptor CG7589 is expressed at extraordinary levels. By contrast, in the larva, the only G-protein coupled receptor of any abundance is methuselah-like 4, an enigmatic receptor of unknown function.
Although saliva is traditionally considered rich in digestive enzymes, levels of amylases, proteases and peptidases were unremarkable compared with other tissues. However, lipases are strongly enriched, suggesting that saliva helps to burst cells open, rather than do assist downstream digestion. Lysozyme is virtually salivary gland specific, implying both digestive and defensive roles. Defence indeed seems to play a key role, with cecropin C also being virtually salivary gland specific. Another surprise is the relative specificity of expression of yellow and yellow-d, major components of bee royal jelly, a caste- Figure 4 Graphical summary of the core epithelial transcriptome from Table 1, illustrating a common 'core' set of transporters shared by transporting insect epithelia. Note that the localization (apical, basolateral etc.) is not proven by transcriptomic data, but is based on experimental physiology in previous publications.
determining secretion from the analogous hypopharangeal glands [32]. Drosophila lacks a royalactin gene, and does not feed its young; nonetheless, the parallel in gene expression across a broad phylogenetic range is compelling.

Midgut
The midgut transcriptomes of larvae and adults are broadly similar (Figure 4), and are conspicuous for almost midgut-specific digestive enzymes and organic solute carriers and transporters (Additional file 1 Table S2). There is also specialization for innate immunity, as the midgut is the first highly permeable tissue encountered by incoming food (Figure 1). In this context, peritrophic membrane constituents provide a mechanical protection for the delicate midgut apical microvilli.
Perhaps most intriguing is the midgut-specific expression of vha100-4, a subunit of the V-ATPase. This is surprising because the midgut is already abundantly served by highly-expressed a-subunit genes (Table 1). In particular, the putative plasma membrane isoform, which includes vha100-2 (Figure 4), is highly expressed in midgut. The solution is that the midgut is itself a complex tissue, with multiple domains. It is thus possible that vha100-4 serves a specialized function within a geographically distinct subregion of the midgut. There are two candidate processes which involve H + transport, and which do not occur anywhere else in the animal. There is a region of low pH, associated with the cuprophilic, or goblet cells in the anterior midgut [33]; and a region of high pH at the posterior midgut [34]. We speculate that this isoform is associated with one of these unique functions.

Malpighian tubules
Although the transcriptome of the Malpighian tubule was previously described [15], it is instructive to reexamine it in the light of a much more complete microarray (Affymetrix version 2 cf. 1), and against the full FlyAtlas collections of transcriptomes, allowing unique expression to be asserted with much more authority (Additional file 1 Table S3). The tubules strictly obey the consensus transport model (Figure 4), with abundant representation of both inward rectifier potassium channels and aquaporins. Of these, irk3 and CG17764 are relatively tubule-specific. Otherwise, they are conspicuous for organic solute transporters, with the major families hugely represented. Of interest, many of the classical eye colour genes (e.g. white, plum, scarlet) are highly tubule-enriched, reflecting the role of the tubule in storing and processing pigment precursors. The tubule is also enriched for several genes associated with purine metabolism (the major route for nitrogen excretion); in particular, rosy [35], urate oxidase [36] and 5-hydroxy isourate hydrolase. The control repertoire of the tubule has been discussed extensively elsewhere [27,[37][38][39]; of particular note are the tissuespecific expression of the receptor for the capa neuropeptides [40,41], and one of the two cyclic GMP dependent protein kinase genes, Pkg21D [42,43].

Hindgut
The ectodermally-derived hindgut is the "last chance saloon" for rescue of desirable solutes (for example, water, ions, sugars, amino acids). The hindgut also finally adjusts the osmoregulatory poise of the insect, in terrestrial insects typically by producing hyperosmotic excreta to protect against the ever-present danger of desiccation.
Additional file 1 Table S4 shows that sodium regulation is conspicuous in the hindgut transcriptome, as are general substrate transporters of the OAT family. The hindgut is one of the few places in Drosophila where FlyAtlas reports that sodium channels of the pickpocket/ degenerin family (notably ppk6 and ppk12) are detectably expressed; elsewhere in Drosophila, they have been implicated in mainly sensory roles [44,45]. The hindgut is known to play a key role in selective Na + reabsorption -Na + is a conserved ion in most herbivorous insects [46]. Ion transport peptide (ITP) acts to raise hindgut Na + reabsorption through the second messenger cAMP [47][48][49]; consistent with this, the phosphorylation site prediction algorithms NetPhos 2.0 [50] and Disphos 1.3 [51] both predict multiple serine and at least one threonine, phosphorylation consensus in each of ppk6 and ppk12 (data not shown). It is also conspicuous that there are also many genes of unknown function that are selectively and strongly expressed in the hindgut, hinting at processes that are yet to be identified.

Conclusions
Here, we review a typical data-mining workflow for expression resources such as FlyAtlas.org that allows very general, rather than single-gene, insights to be obtained, and illustrate its utility with analysis of the nature of epithelial function. Although this approach is illustrated in Drosophila, the results are likely to have a more general significance across the insects (and thus half of all living species), and the workflow could equally be applied to other tissues, or to mammalian systems for which authoritative expression datasets are available. The epithelia that constitute the alimentary tract, and thus the major transport sites in the insect, have diverse embryonic origins, but still demonstrate a coherent core transport transcriptome. Remarkably, despite their diverse embryonic origins, they share a closely similar transcriptomic signature that extends beyond ion and solute transport, to epithelial specification, structure and defence.
The data presented here provide clear evidence for the generality of an extended Wieczorek model, which explains the transepithelial active transport of sodium and potassium, based on a primary electrogenic pumping of protons by a conserved plasma membrane isoform of the V-ATPase, of the Na + , K + ATPase, previously deprecated in insect models for ion transport because of apparent insensitivity to the Na + , K + ATPase inhibitor, ouabain [13]. These results also confuse the search for the apical 'Wieczorek exchanger'; although we have previously shown that in tubules the recently discovered NHAs dominate [21], in other epithelia, the other major class of cation-proton exchanger, the NHEs dominate (Table 1). It may be that this diversity reflects the differing requirements of different epithelia for transporting sodium and potassium. However, with a relatively clear picture of differential expression of the CPA gene family, it should be easier to frame experimental questions to address the issue.
It is also interesting to see how individual tissues add to the basic consensus motif to achieve tasks specific to each epithelium. The salivary glands, for example, are specialized for the breakdown of cell membranes, perhaps both to aid digestion and to destroy pathogens. They are also notably controlled by 5HT by comparison with the other epithelia, and express the enigmatic yellow proteins, just like the corresponding glands in honeybees. The midgut is loaded with digestive enzymes, and (presumably) uptake transporters, and the tubules express probably the widest profile of organic solute transporters of any tissue. The hindgut emphasises sodium flux, consistent with sodium being a relatively scarce resource for phytophagous insects.
A unique combination of history and availability have made Drosophila the insect of choice for a wide range of investigations, and indeed the availability of a wellannotated genome sequence, transcriptomics and powerful genetic tools have more than offset its very small size. However, Drosophila melanogaster is one of perhaps 30 M insect species, and so it will be interesting to see to what extent the models developed here can be generalized. To date, the model for insect epithelia being dominated by an apical V-ATPase has not been seriously challenged, so early indications are that broad applicability is likely. Of course, the demonstration of a core transcriptomic profile for insect epithelia of diverse function and embryonic origin also begs another question: could such an approach be generalized to vertebrates?