Skin transcriptome profiles associated with coat color in sheep

Background Previous molecular genetic studies of physiology and pigmentation of sheep skin have focused primarily on a limited number of genes and proteins. To identify additional genes that may play important roles in coat color regulation, Illumina sequencing technology was used to catalog global gene expression profiles in skin of sheep with white versus black coat color. Results There were 90,006 and 74,533 unigenes assembled from the reads obtained from white and black sheep skin, respectively. Genes encoding for the ribosomal proteins and keratin associated proteins were most highly expressed. A total of 2,235 known genes were differentially expressed in black versus white sheep skin, with 479 genes up-regulated and 1,756 genes down-regulated. A total of 845 novel genes were differentially expressed in black versus white sheep skin, consisting of 107 genes which were up-regulated (including 2 highly expressed genes exclusively expressed in black sheep skin) and 738 genes that were down-regulated. There was also a total of 49 known coat color genes expressed in sheep skin, from which 13 genes showed higher expression in black sheep skin. Many of these up-regulated genes, such as DCT, MATP, TYR and TYRP1, are members of the components of melanosomes and their precursor ontology category. Conclusion The white and black sheep skin transcriptome profiles obtained provide a valuable resource for future research to understand the network of gene expression controlling skin physiology and melanogenesis in sheep.


Background
Sheep are the most important fiber producing animals worldwide. Fiber diameter, length and color are key traits contributing to the economic value of sheep and are determined by both genetics [1,2] and environment [3]. Factors that determine coat color in sheep are becoming of increasing interest. White fleece holds greatest economic value due to its ability to be dyed to virtually any color, whereas interest in natural colors is increasing due to the green revolution and consumer preference for natural products.
Coat color genes are good candidates for facilitation of traceability of animal breeds [4]. Coat color is determined by amounts and types of melanin produced and released by melanocytes resident in the skin [5,6]. The genetic basis for coat color is well understood in rodents [7,8], with many common genes also implicated in regulation of coat color in other species, including sheep. For example, MC1R and ASIP are known to be major regulators of coat color in mice and MC1R [9] and ASIP [10] loci are functionally linked to undesirable coat color phenotypes in sheep. In addition, tyrosinase-related protein 1 (TYRP1) is a strong positional candidate gene for color variation in Soay sheep [11]. Recent studies have combined SNP analysis and gene expression profiling to dissect the basis for the piebald pigmentation phenotype in Merino sheep [12]. Despite considerable knowledge of the genetic regulation of coat color in mice and identification of loci involved in coat color regulation in fiber producing species, the molecular mechanisms, at the level of gene expression, associated with differences in coat color phenotype are not well understood. This information is critical not only to enhanced basic understanding of regulation of melanogenesis, but also to the identification of novel pharmacological and molecular genetics approaches to regulate or select for coat color in fiber producing species.
Transcriptional profiling is a powerful approach for identification of genes globally and functionally expressed in various tissues including skin [13]. Limited information is currently available regarding differences in transcriptome profiles of skin associated with coat color in fiber producing species. To investigate genes that may play important roles in coat color regulation in sheep skin and gain insight into molecular mechanisms responsible for biochemistry of skin and fibers (including pigmentation) in animals producing hair such as sheep and alpaca, we investigated the transcriptome profiles in skin of sheep with black versus white coat color using high throughput RNA deep sequencing. Results provided novel insight into differences in gene expression associated with coat color, including key genes implicated in the melanogenesis pathway.

Assembly of unigenes
After the raw reads were filtered, 51,297,002 clean reads with 52.2% GC percentage and 51,655,390 clean reads with 53.7% GC percentage were obtained from white and black sheep skin, respectively. These clean reads were assembled into unigenes, yielding 90,006 and 74,533 unigenes from white and black sheep skin, respectively. There were 2,892 and 2,884 unigenes with sequence size greater than 3,000 nucleotides in white and black sheep skin, respectively. The longest unigene sequenced was more than 9,000 nt in length and the average size of the majority of coding sequence (CDS) identified was 300 nt. There were 1,367 unigenes with more than 3,000 nt of CDS ( Figure 1).

Functional classification of the unigenes
BLAST analysis (e-value < 0.00001) of the sheep skin unigenes against the protein and nucleotide databases revealed 37,768 known genes, of which, 36,438 were annotated through COG classification analysis. These genes were grouped into 25 classes based on their putative functions and the largest group of genes was classified into general function only (15%; Figure 2). The known genes were also annotated through GO classification analysis and grouped into 3 categories (biological process, 46.1%; cellular component, 36.1%; molecular function, 16%) based on their putative functions ( Figure 3).

Genes encoding transcription factors expressed in sheep skin
In the transcriptome of sheep skin, there was at least 527 genes identified encoding for transcription factors (Additional file 1: Table S1). The highly expressed transcription factor genes in both white and black sheep skin include transcription factor jun-B-like, nascent polypeptideassociated complex subunit alpha isoform b, endothelial differentiation-related factor 1 isoform alpha, homeobox protein DLX-3 (DLX3), transcription factor AP-1 (JUN), cyclic AMP-dependent transcription factor ATF-4 (ATF4), and transcription factor GATA3 (GATA3) ( Table 2). Most of these transcription factors do not show differential expression between white and black skin. However, some less abundantly expressed transcription factors such as Pbx, Tcf4, Nr2f1, Sox11 and Sox4, were differentially expressed in white and black sheep skin. Transcription factors that are known to regulate mRNA expression of coat color genes, such as MITF and CREB/ATF bZIP transcription factor were also found to be expressed in sheep skin. Interestingly, both genes were expressed approximately 2 times higher in white sheep skin than black sheep skin.

Differentially expressed genes in white versus black sheep skin
Using an algorithm based on a previously described method [14], genes differentially expressed between white and black sheep skin were identified. There were a total of 2,235 known genes identified as differentially expressed in white versus black sheep skin, of which 1,756 were downregulated (≤ 2 fold) and 479 were up-regulated (≥ 2 fold) in skin from black sheep compared with skin from white sheep (see Additional file 2: Table S2). For the GO analysis, 1,904, 1,784 and 1,787 differentially expressed genes were grouped in cellular component, molecular function and biological process categories, respectively. Most of the differentially expressed genes were classified into two GO categories (cellular process and cell and cell part; Table 3 and Figure 4). The majority of the GO terms including pigmentation do not appear to be significantly enriched in the differentially expressed genes.
A total of 845 novel genes were identified as differentially expressed, of which 738 were down-regulated (≤ 2 fold) and 107 were up-regulated (≥ 2fold) in skin from black sheep compared with skin from white sheep. Of the 107 up-regulated genes, 16 genes were exclusively expressed in black sheep skin (including 2 genes highly expressed), and   Table S3). In order to validate the transcriptome sequencing results, we selected 10 genes at random for real time PCR to determine their relative expression in black and white sheep skin. These genes, identified as differentially expressed in black versus white sheep skin based on transcriptome sequencing analysis, included known genes related to coat color in mammals. The results of real time PCR showed that 8 of the 10 selected genes had significantly higher expression in black sheep skin compared with white sheep skin, which was consistent with the transcriptome sequencing data. Among the differentially expressed genes, TYR showed the greatest differential expression between black and white sheep skin ( Figure 5).

KEGG pathway analysis
Of the 2,235 known genes differentially expressed in white versus black sheep skin, 1,903 had a specific KEGG pathway annotation. Of these KEGG pathway annotated genes, 324 were down-regulated in black sheep skin. These down-regulated genes are mainly involved in oxidative phosphorylation, and glycolysis and/or gluconeogenesis. Remaining KEGG pathway annotated genes were associated with 241 pathways including those functionally related to coat color in skin such as melanogenesis, tyrosine metabolism and Wnt signaling. For example, there were 20, 53 and 27 differentially expressed genes involved in tyrosine metabolism, Wnt signaling and    melanogenesis pathways, respectively. The enriched GO terms for genes identified in sheep skin transcriptome related to pigmentation and melanogenesis and their relative expression in black versus white skin are shown in Table 4 and Figure 6.

Differential expression of known coat color genes
Approximately 127 genes involved in different pathways controlling coat color formation have been identified in the mouse [15]. Those known coat color genes are routinely classified into six general functions including: Melanocyte development, Components of melanosomes and their precursors, Melanosome construction/protein routing, Melanosome transport, Eumelanin and pheomelanin and Systemic effects [15]. Expression of a total of 49 of aforementioned coat color genes was detected in sheep skin in present studies, and 13 genes showed higher expression in black sheep skin and 8 genes showed higher expression in white sheep skin. Interestingly, all genes encoding for the components of melanosomes and their precursors had higher expression in black sheep skin (see Additional file 4: Table S4). The coat color genes in the 'Eumelanin and pheomelanin' functional category showed higher expression in black sheep skin, while genes in the 'Melanosome construction/protein routing (HPSrelated)' category displayed lower expression in black sheep skin. Among the coat color genes showing higher expression in black sheep skin, TYRP1 showed the highest expression in black sheep skin versus white sheep skin, followed by TYR, MLPH, MATP and Si ( Table 5). The genes associated with oculocutaneous albinism (OCA) such as HPS1, HPS3, HPS4, HPS5 and HPS6 were expressed in sheep skin but most of them did not show differential expression associated with coat color.

Discussion
Mammalian coat color exhibits a wide range of shades and is dictated by melanin production in melanocytes (melanogenesis). Melanogenesis involves a complex molecular regulation [7]. In order to understand the molecular mechanisms of coat color formation, previous studies have reported the generation of ESTs from both sheep and alpaca skin through traditional Sanger    [16,17]. A previous study examined differences in gene expression associated with black spots in fleece of Corriedale sheep using microarray technology [18]. To further investigate genes that may play important roles in sheep skin, particularly in fiber/coat pigmentation, over 100 million transcriptome sequence reads were generated from white and black sheep skin using the Illumina technology. From these reads there were 37,768 known unigenes identified as expressed in sheep skin, among which 2,235 were differentially expressed in black versus white sheep skin. It is acknowledged that study design was not optimal due to limited biological replication because single pooled samples (n = 3 per coat color) were used in transcriptome sequencing analysis and the same three samples from white sheep skin and from black sheep skin were used individually for quantitative real time PCR validation of the sequencing results. Despite such limitations, results have significantly enhanced understanding of sheep skin transcriptome composition and potential differences in gene expression associated with coat color that are foundational to further study in the future. Genes encoding for ribosomal proteins, keratin family members and keratin associated proteins were among the most highly expressed genes detected in sheep skin. The ribosome is a central player in the translation system and its function is to decode the nucleotide sequence carried by the mRNA and convert it into an amino acid primary structure [19]. Abundant presence of ribosomal proteins in sheep skin suggests the importance of high rates of protein translation in sheep skin. In channel catfish skin, the expression of ribosomal proteins was high presumably due to higher levels of translational activities [20,21]. Of the top 30 highly expressed genes in sheep skin, all 9 keratin family members and keratin associated proteins displayed down regulation in black sheep skin, which was the same as observed in piebald Merino sheep skin [12]. Collectively, results support Garcia's view that no single keratin gene alone appears to be responsible for the coat color trait [12]. Hair keratins contain a much higher content of cysteine residues in their non-helical domains and thus form tougher and more durable structures via intermolecular disulfide bond formation [22]. Therefore, high expression of keratins is likely crucial for fleece strength. Genes encoding for important oxidative and dehydrolytic enzymes such as NADH5 and COX1 were also highly expressed in sheep skin. The coenzyme NAD (nicotinamide adenine dinucleotide) is a key electron-carrier which mediates hundreds of reactions. The redox state of the NAD-NADH couple plays a central role in energy metabolism [23], signal transduction [24], and transcriptional regulation [25], which is consistent with the need for mitochondrial biogenesis, energy and other proteins during the strong metabolism characteristic of adult sheep skin development [26].
The human hair follicle (HF) has a variable response to potent androgens, such as testosterone (T) and dihydrotestosterone (DHT). The pilosebaceous unit (including HF and sebaceous gland) enzymatically converts weak androgens, such as dehydroepiandrosterone (DHEA) and androstenedione (AD), to more potent androgens, such as T and DHT. In HF of scalp, androgens shorten the anagen growth phase of the hair cycle, causing the HF to regress and recede. The conversion of androgens is dependent on oxidized-reduced pyridine cofactors, NAD, NADH, and NADPH [27]. So, the high level of expression of NADH likely improves the conversion of androgens in certain body regions, influencing terminal hair growth.
Transcription factors (TFs) perform important regulatory functions by controlling a variety of cellular processes [28]. In the mouse genome, 1,445 genes were identified to encode for TFs and 983 were expressed in the brain [29]. In the current studies expression of 527 TF genes was detected in sheep skin, including general TFs such as endothelial differentiation-related factor 1 isoform alpha, DLX3, JUN, ATF4 and GATA3. The high level of expression of these genes detected in skin reflects their importance in regulation of general transcriptional pathways in sheep skin.
Several novel genes were also identified in sheep skin, and a portion of such genes were differentially expressed. Two of the novel genes detected lacked ORF in sequence reads detected, and were highly abundant and exclusively expressed in black sheep skin. BLAST analysis of these 2 novel genes did not find any similar sequences in NCBI database (including EST), suggesting that they could be specific to sheep skin. The differentiated phenotype of melanocytes must be due, at least in part, to differential transcription of melanocyte-specific genes [30]. Thus, these two novel genes may play an important role in promoting pigmentation and dark coat colors.
The GO and KEGG pathway analyses of differentially expressed genes revealed that most were associated with the function of cell and cell part ontology categories. Of particular interest in our dataset were pathways related to pigmentation and melanogenesis. Of the differentially expressed genes, the genes in the category related to 'the components of melanosomes and their precursor' and 'Eumelanin and pheomelanin' were up-regulated in skin from sheep with black coat color. The function of genes in "the components of melanosomes and their precursor" and 'Eumelanin and pheomelanin' are melanin synthesis and the switch between eumelanin and pheomelanin [15]. The darker pigmentation of skin, and possibly of hair, is associated with a higher numbers of melanosomes, although the number of melanocytes remains constant [7,31]. Melanocytes in black hair follicles contain the greatest number of melanosomes (which are eumelanosomes), while the melanosomes in brown hair bulbs are smaller and those in blonde hair are very poorly melanised. The relationship of less melanin with lighter skin/hair phenotype has been reported in several species, including humans [32], alpaca [17], llama [33] and horse [34]. In both domestic sheep and Soay sheep, light coat color is known to be due to a decrease in the ratio of eumelanin to pheomelanin, relative to black coat color [35].
Genes in the 'Melanosome construction/protein routing (HPS-related)' ontology category, such as HPS5, Lysosomal trafficking regulator (Lyst) and Pallidin, were all down-regulated in skin of sheep with black coat color. The functions of genes in the 'Melanosome construction/protein routing (HPS-related)' categories are related to organelle biogenesis [15]. The key to melanin production is the organelle that is the site of melanogenesis, the melanosome, whose architecture, intracellular distribution and enzyme catalog are critical [30]. HPS5 protein is a component of the biogenesis of lysosomerelated organelles complex-2 (BLOC-2) and its deficiency can result in Hermansky-Pudlak syndrome (HPS-5) [36]. HPS is a disorder of lysosome-related organelle (melanosome) biogenesis, resulting in oculocutaneous albinism [37,38]. It has been reported that HPS5 melanocytes have an approximately normal contingent of the melanogenic protein, TYR [36]. Elucidation of the relationship between lower level of expression of HPS5 and other genes in this ontology category with black coat color phenotype will require further investigation.
Among the differentially expressed coat color genes, TYRP1 showed the greatest level of differential expression in black versus white sheep skin. TYRP1, one of the members of the tyrosinase family, is a I type membrane bound protein that is expressed in both melanocytes and the retinal epithelium. TYRP1 is involved in the distal eumelanic pathway and plays a role in stabilizing TYR, which is the critical and rate-determining enzyme in melanogenesis [39]. There existed a significant association between coat color and TYRP1 in Soay sheep [11]. In the free-living Soay sheep, coat color is either dark brown or light tawny color. The light phenotype is determined by homozygosity of a single recessive amino acid change (G → T transversion) at coding position 869 in the TYRP1 gene [11]. This is consistent with studies in domestic sheep, where light coat color is known to be due to a decrease in the ratio of eumelanin to pheomelanin, relative to black coat color [35].

Conclusions
In summary, to our knowledge this is the first report of transcriptome analysis of sheep skin from animals with white and black coat color. The present studies have described and revealed a set differentially expressed known and novel genes in sheep skin potentially related to coat color and other physiological functions. The 16 novel genes exclusively expressed in skin of sheep with black coat color are of particular interest for further studies to elucidate their functional roles in coat color regulation. Results are foundational for future studies to potentially manipulate coat color via pharmacological and genetic approaches.

Sheep skin sampling and total RNA extraction
Housing and care of sheep and collection of skin samples for use in the described experiments were conducted in accordance with the International Guiding Principles for Biomedical Research Involving Animals (http://www. cioms.ch/frame 1985 texts of guidelines.htm). The animals were locally anaesthetized with hydrochloridum (1.5 ml of 3%, i.h.), following the approval (reference number 2010 [088]) provided by the Animal Hospital of Shanxi Agricultural University to decrease the animal suffering. Six healthy 2-year-old white and black female Sunite sheep 3 sheep per color) were selected for sample collection from the sheep farm in Sunite, Inner Mongolia, China. A piece of skin (8 mm in diameter) from the neck was collected via punch skin biopsy under local anesthesia and immediately placed in liquid nitrogen. Total RNA from the sample was extracted using Trizol reagent (Invitrogen, USA) according to the manufacturer's instructions. The RNA integrity was evaluated by gel electrophoresis and the RNA purity was checked by the ratio of OD 260 /OD 280 and RIN value. RNA samples with RIN value greater than 7.5 and OD 260 /OD 280 ratio greater than 1.7 were selected for deep sequencing.

Library generation and sequencing
Three RNA samples from black or white sheep skin were pooled before mRNA isolation. Beads with Oligo (dT) were used to isolate poly (A) mRNA from sheep skin total RNA. The isolated mRNA was fragmented followed by first-strand cDNA synthesis using random hexamer-primers. The second-strand cDNA was synthesized using buffer, dNTPs, RNaseH and DNA polymerase I. The short cDNA fragments were purified using QiaQuick PCR extraction kit (Qiagen, USA). The fragment ends were repaired and A tailed followed by ligation to sequencing adaptors. Suitable size fragments were selected following agarose gel electrophoresis and used as templates for PCR amplification. Sequencing of the library was performed using Illumina HiSeq™ 2000.

Unigene assembly and functional annotation
Raw reads were cleaned by removing adaptors and low quality reads before assembly. Unigene assembly was carried out using the short reads assembly program, Trinity (http://www.genomics.cn). Blastx alignment (e-value < 0.00001) between the unigenes and protein databases (nr, Swiss-Prot, KEGG and COG) was performed, and the best aligned results were used to decide sequence direction of the unigenes. If results of different databases conflicted with each other, a priority order of nr > Swiss-Prot,>KEGG > COG was followed when deciding sequence direction of the unigenes. If a unigene was not aligned to one of the above databases, ESTScan software was used to determine its sequence direction. Unigene sequences were first aligned by blastx to protein databases and then aligned by blastn to nucleotide database nt (e-value < 0.00001), retrieving proteins with the highest sequence similarity with the given unigenes along with their protein functional annotations. Proteins with the highest ranks in the blast results were taken to determine the coding region sequences of unigenes. Coding sequences were translated into amino acid sequences with the standard codon usage. Gene Ontology (GO) functional annotation was based on nr annotation [40]. Blast2GO program (http:// www.blast2go.com) was used to assign GO annotations, and WEGO software (http://wego.genomics.org.cn/cgibin/wego/index.pl) was used to perform GO functional classification for all unigenes.

Identification of differentially expressed genes and pathway analysis
A rigorous algorithm based a previously described method [14] was used to identify differentially expressed genes between white and black skin. The FDR (false discovery rate) value of ≤ 0.001 and RPKM ratio of > 2 were used in the analysis [41]. Differentially expressed genes (DEG) were mapped to each term of GO database (http://www.geneontology.org/) and the gene numbers for each GO term were calculated. A list of genes and gene numbers for every GO term was obtained. Hypergeometric test was used to find significantly enriched GO terms in DEG against the genome background. The calculated p-values went through Bonferroni correction, using corrected p-value ≤ 0.05 as a threshold. GO terms fulfilling this condition were defined as significantly enriched GO terms in DEG. With the help of KEGG [42] pathway database (http://www.genome.jp/kegg/ pathway.html), the biological complex behaviors of the DEG were further studied.