Landscape and evolution of cis- and trans-regulation in chicken genome between two contrasted breeds analyzed in three tissues at one day of age

Gene expression variation is an important mechanism underlying phenotypic variation, and can occur via cis- and trans-regulation. In order to understand the role of cis- and trans-regulatory variation on population divergence of chicken, we developed reciprocal crosses of two chicken breeds, White Leghorn and Cornish Game, with major differences in size and reproductive traits, and used them to identify the degree of cis versus trans variation in brain, liver and muscle of both male and female samples at 1-day age. We provided a landscape about how the transcriptomes are regulated between two contrasted breeds by allele specific expression analysis. Our results showed that compared with the cis-regulated genes, trans-acted genes existed more extensively in the chicken genome. Furthermore, a widespread compensatory tendency exists in chicken genome. Most importantly, we found the evidence of stronger purifying selection for trans-regulatory variation than the cis-elements.


3, 4].
Cis-and trans-regulation elements are thought to differ in important genetic and evolutionary properties [ 5,6]. In diploid individuals, cis-regulatory elements regulate expression in an allele-specific manner, and heterozygotes for cis-regulatory variation express allelic imbalances at the transcriptional and translational level. By comparison , trans-regulatory factors interact with target sequences to regulate both alleles [ 1]. Trans-regulatory divergence is enriched for dominant or recessive alleles, while the effects of cis-regulatory variants are additivity [ 6,7]. Compared with trans-acting mutations, advantageous cis-regulatory variants more likely to enriched to fixation, because the additive effects expose rare alleles to selection [ 5].
Both cis-and trans-regulatory variation is important in phenotypic variation [ 1,[8][9][10], and previous work in a range of species, including Drosophila [ 7], Mouse [ 11,12] and Coffea [ 13] (Table 1), have used allele specific expression (ASE) [ 14] to differentiate cis-from trans-regulated gene expression variation. Gene expression regulation in birds may be different from mammals, insects or plants. For instance, genomic imprinting has been found in mammals and some plants [ [15][16][17], but appears to be largely absent in birds thus far assessed [18][19][20]; dosage compensation mechanism exists in some diploid species to buffer the effect of copy number difference of genes on sex chromosome [21][22][23], but it was incomplete in birds [24][25][26][27][28]. Given this, we thought it is important to study the gene expression regulatory mechanisms in birds.
Chicken is a model animal in researches on birds, and a remarkable example of rapid phenotypic divergence, with artificial selection resulting in major size, behavioral and reproductive differences among breeds [ 29]. Previous studies have identified frequent allele specific expression between different chicken breeds [ 19,20]. The rapid change under domestication offers a unique example to determine the relative importance of cis-versus trans-regulatory variation underlying phenotypic change. We used reciprocal crosses of White Leghorn (WL), a breed selected for high egg output and a key layer breed, and Cornish Game breeds (CG), a breed selected for rapid growth and muscle development and a cornerstone broiler breed [ 30], to assess the role of different forms of regulatory variation in brain, liver and muscle of both males and females at 1-day age.

Significant gene expression differences in based in breed, tissue and sex
The two inbred chicken strains, Cornish-Game and White-Leghorn, which exhibit major differences in growth rate, egg production and behavior, were used to generate purebred and reciprocal hybrid F1 progeny (Fig. 1). In order to determine breed-specific variants, we resequenced the four parents of the two hybrid crosses, recovering on average 100.73 million pair-end reads per sample after quality control. From this, we identified on average 4.74 million SNPs per parental genome which we used to generate simulated parental genomes. From these SNPs, we selected SNPs which were homozygous in each breed (parental birds) but heterozygous in the hybrid birds (cross 2 and cross 3), resulting in 1.4 million of heterozygous SNPs, to identify the allele-specific RNA-seq reads of offspring in the following steps.
For each hybrid cross, we collected RNA-Seq data from brain, liver and muscle for three male and three female F1 progeny 1-day post-hatch. On average, we recovered 29.17 million mappable reads per sample. In order to eliminate the effect of the sex chromosomes, we removed all Z and W genes from our analysis and focused entirely on autosomal loci. We recovered significant differences in gene expression in based in tissue, sex and parent of origin (Fig. 2). Tissue was the most powerful factor in gene expression. While in different tissues, the sex played a leading role in gene expression of brain, and the strain effected gene expression of liver the most, while in muscle, the parent of origin seems more powerful because samples were divided into two parts by mother origin. We therefore retained all three variables in our subsequent analysis. This results in 12 treatment groups, comprised of 3 tissues, 2 sexes and 2 reciprocal crosses in our study.
The 24,881 genes from the Ensembl v87 annotation were analyzed. About one fifth of them contained homozygous SNPs and expressed in our samples (Table S1). We observed significant expression differences (p-value <0.05, binomial test corrected for multiple comparisons by q-value method) between the purebred females (cross 1 vs. cross 4) for 14.71 % of genes in brain, 36.45 % in liver, and 38.38 % in muscle (take the homozygous SNP list of cross 2 for example). In males, we observed significant expression differences for 17.64 % of genes in brain, 41.87 % in liver, and 37.84 % in muscle (Table S1). The differential expression of genes between strains appears to play an important role in phenotypic divergence. For example, the gene IGFBP2, TGFBI, PDGFRL and IGF2R all showed significant evidence of expression bias between our two breeds. These genes are related to the chicken growth, which could explain the difference of growth rate between two chicken breeds.
An effective pipeline was applied on the allele-specific expression analysis To identify the origin of offspring's mRNA, we have tried a new pipeline by using a R package of 'asSeq' [ 31]. Briefly, A set of R scripts are available for genotype phasing with the 1.4 million of SNPs identified in the previous step. About 2% of these SNPs located on the exon region. The large number of SNPs provide higher chance that an RNA-seq read overlaps with a heterozygous genetic marker, hence is identified as allele-specific read. The tool of 'asSeq' has considered the condition of one read containing more than one SNPs [ 31]. We assigned a read to one of the two parental genomes if the proportion of those heterozygous SNPs suggesting the read is from that genome is larger than 0.9, to ensure the accuracy of parental origin identification.
In order to validate our ASE pipeline, we generated two artificial hybrid F1 libraries. Specifically, we concatenated the RNA-seq fastq files for two male brain samples of cross 1 and cross 4, which had roughly equal read depth. We also concatenated two female liver samples the same way. The two simulated hybrid libraries were handled consistently with other hybrid libraries, using the homozygous SNP lists of cross 2 and cross 3, respectively. We compared the expression ratio of two alleles (CG/WL) of the simulated F1 to the real expression ratio of two strains (CG/WL) of each gene, and observed a strong correlation between the two measurements of each group (Fig. S1), indicating that our ASE analysis pipeline was robust. Since our method only counted the local reads containing the homozygous SNPs, we also assessed the correlation of expression fold change (CG/WL) between the local reads method and the method of counting total reads using edgeR [32][33][34]. The correlation was also strong (Fig. S2). These results proved the feasibility of our pipeline. Our study provided an example to deal with the similar situation where there was no specific reference genome for different strains.

Genes were classified into different categories by regulatory divergency
We classified expressed genes into eight categories by the regulatory divergency [  Table 1, Fig. S3-S5). Most genes showed conserved or ambiguous expression, as expected given the recent divergence time of these two breeds. There were more than 70% genes classified as conserved in brain, and more than 40% genes in liver, and in muscle, the ratio is around 50%. However, we nonetheless observed substantial cis-and trans-variation in the hybrid F1 crosses.
There were a greater proportion of trans-regulated gene expression variations in most tissues and across both sexes, particularly in muscle Genes regulated by both cis-and trans-regulatory variations contained four categories: "cis+trans (same)", "cis+trans (opposite)", "cis×trans" and "compensatory". Genes classified as "cis+trans (same)" show evidence of cis and trans variations acting in the same direction, while genes classified in the other three categories show evidence of cis and trans variations acting in opposite directions, with different expression preference on the two alleles. We observed that genes more often display the latter pattern, and most genes were classified as "compensatory".
The gene proportion of each regulatory category was similar between different tissues and different sexes, except for some differences between the muscle and other two tissues (Fisher's exact test, Table S2). Unexpectedly, we observed few loci with consistent significant cis-or trans-regulatory variation across different groups ( fig. S6, Table S3).

Genes regulated by trans-acting variation display greater sequence conservation
We counted the number of variants located within 1 kb upstream of transcription start sites of each gene. The results showed greater variations at upstream of cis-regulated genes than trans-regulated genes (Fig. S7).
The ratio of the number of non-synonymous SNPs to the number of synonymous SNPs in coding sequence (pN/pS) was used to assess the degree of selective constraint, with genes under high constraint expected to have lower pN/pS [ 37,38]. In our research, the value of pN/pS in gene regulated by trans variant is significantly lower than cis-regulated genes in most groups ( fig. 4, Fig. S8-S9).

Discussion
On account of the short divergence time, the two chicken inbred strains are not like the mouse inbred lines with high consistency on genome. To enhance the reliability of our results, we have improved our analysis pipeline. Firstly, the SNP list we used to identify the parental origin was strictly filtered from the re-sequencing data of their parents. The SNPs were statistically homozygous in each parent, hence heterozygous in each hybrid offspring. Secondly, we counted the total number of reads covering at least one SNP across the whole exon, instead of counting the reads number of each SNP.
Compared with the previous researches possessing strain-specific reference genomes, our pipeline improved the accuracy of parental origin identification for the SNPs in hybrid offspring, since we have sequenced their parents directly. These SNPs were used to mark the parental origin of the alleles of each gene, and subsequently increased the accuracy of regulation classification. However, it also resulted in the limited number of genes could be studied in this research.
We observed a greater proportion of trans-regulated gene expression variations than cis regulatory variations in most samples. Although chicken domestication occurred several thousand years ago, commercial populations were established only in the last 200 years [29]. In our study, most genes showed conserved or ambiguous expression, and more trans regulatory variants than cis, which may attribute to the short differentiation time between White Leghorn and Cornish Game.
Genes regulated by both cis and trans variations more often acting in opposite directions, and most genes were classified as "compensatory". This result consistent with the previous researches on house mice [ 36], indicating that the cis and trans variants tend to act convergently to maintain gene expression stability. It may be due to the short divergence history between the Cornish Games and White Leghorn, and the strict selective pressures that formed and maintain each breed. Despite lacking of complete dosage compensation mechanism on sex chromosome [24][25][26][27][28], an extensive compensatory tendency still exist in chicken genome.
There were few loci with consistent significant cis-or trans-regulatory variation across different groups. This result consistent with some previous ASE analyses, which suggest rare ASE genes are consistently expressed across tissues [ 39,40]. However, the cis-and trans-regulation classification is more complex than the simple ASE analysis. Besides the characteristic of spatiotemporal specificity of gene expression, the gene expression is a complex network controlled by the interaction of cis-regulatory DNA sequences and trans-regulatory factors, which could complicate the identification of regulatory divergence. The classification involved in multiple conditions. The statistical methods could not accurately classify them with the limited expression information. Even so, the statistical result still referable and valuable for the following analysis.
Cis-regulatory elements are primarily located at upstream of coding sequences. Our results, is consistent with resent studies in Drosophila [ 7], detected greater variants within 1 kb upstream of transcription start sites of cisregulated genes than trans-regulated genes, suggesting that the classification results were reliable.
Meanwhile, genes regulated by trans-variant showed a lower value of pN/pS than cis-regulated genes in most groups. In theory, the pleiotropic effects of trans-regulatory mutations will result in selection to eliminate most deleterious trans-acting mutations [ 41]. In contrast, however we might expect a large proportion of cis-regulatory mutations to be largely neutral, and therefore to accumulate over time [ 9,42]. The large proportion of trans-regulatory mutations that we observe suggest that artificial selection has primarily acted on trans-regulatory mutations, and that neutral cis-regulatory mutations have not accumulated substantially in the short time since these breeds were established.
Trans-regulated genes display lower pN/pS values on average compared to cis-regulated genes in our data, also suggesting that trans-regulated genes are subject to increased selective constraint during chicken domestication and may have been under stronger artificial selection. This is consistent with similar studies in mice [ 11], which concluded that trans-regulated genes display greater sequence conservation.

Conclusions
Our analyses indicate important differences among tissue, sex and parent of origin in gene expression regulation. The results also suggest that artificial selection associated with domestication may have more often acted on trans-regulatory variation in the history of chicken commercial breeds establishment. We also demonstrated a pipeline to explore the allele-specific expression on the inbred lines without specific reference genome.

Methods Samples
The inbred chickens used in our study were derived from National Engineering Laboratory for Animal Breeding of China Agricultural University. All the animals were fed and handled according to the regulations and guidelines established by the Animal Care and Use Committee of China Agricultural University, and all efforts were made to minimize suffering. DNA was extracted from brachial vein blood of four parents of two reciprocal crosses, using phenol-chloroform protocols, for whole genome re-sequencing. Total RNA was extract from three tissues (brain, liver and breast muscle) of 23 chickens at the age of 1 day using Trizol.

DNA & RNA sequencing and data alignment
Both DNA and RNA were sequenced with paired-end 100bp reads with a 300bp insert. All sequencing data was filtered by NGS QC Toolkit (v2.3) [

43] with default parameters.
To ensure the accuracy of RNA-seq data alignment, we simulated four parental genomes. The resequencing data were mapped to the chicken reference genome (Gallus_gallus-5.0, http://hgdownload.soe.ucsc.edu/downloads.html#chicken) with the Burrows-Wheeler Aligner (BWA) [ 44]. The BAM files were sorted, and duplicate reads were removed using Picard toolkit (https://github.com/broadinstitute/picard). The Genome Analysis Toolkit (GATK) [ 45] was used for single-nucleotide polymorphism (SNP) calling. Nucleotide from the reference genome was substituted if the mutant base was supported by more reads than the original reference base, this step was completed by vcftools [ 46]. The four simulated parental genomes were used to replace the reference genome in the RNA-seq data alignment of hybrid crosses. For each hybrid cross, we then identified SNPs between two parents that were homozygous in each parent with >10 supporting reads from resequencing data. The SNP list divided each hybrid offspring genome into two parts basing on the parent-of-origin.
The RNA-seq data alignment was completed by STAR [ 47]. With the SNP list between two parents, we counted the allele-specific reads from the two parts of each hybrid offspring on exon set level, using the R package of 'asSeq' [ 31]. To be specific, we counted the total number of reads covering at least one SNP across the whole exon set. For the condition of one read containing more than one SNPs, we set the parameter of prop.cut as 0.9, that is, we assigned a read to one of the two parental alleles if the proportion of those heterozygous SNPs suggesting the read is from that allele is larger than 0.9. In practice, this condition could insure all the SNPs on one read were consistent. If not, they would be discarded. We then collapsed counts at the exon level into gene level according to the Ensembl gene annotation file (ftp://ftp.ensembl.org/pub/release-91/gtf/gallus_gallus). We filtered the expressed genes with the following standard: for each sex and each tissue, the total reads of both the three purebred offspring and three hybrid offspring should be in the limits of 6 to 1000. The reads count of each sample was further normalized based on the sum of reads which could be mapped to the whole genome.
One male muscle sample of cross 3 was removed because its expression pattern was abnormal. We speculated it might be mixed from other cross by error.

Classification of different regulatory categories
In order to categorize regulatory variation, we referenced the methods of regulatory divergence research on Drosophila [ 7] and house mouse [ 36]. Concretely, we used the binomial test to identify differential expression both between the two purebred progenies (P) and between the two alleles of hybrid progenies (H). Fisher's exact test was used to judge the breed-specific RNA abundance ratio difference between the P and H data sets for detecting the trans effects (T). For false discovery rate control, a method of q-value estimation was adopted [ 48] to correct the p-value of both binomial test and Fisher's exact test, and we considered it has significantly difference when the q < 0.05. We classified the expressed genes into eight 13 categories according to the following criterion: (1) Cis: Significant difference in P and H, no significant difference in T.
(2) Trans: Significant difference in P, but not H, significant difference in T.
(3) Cis + trans (same): significant difference in P, H and T, the log2-transformed strain-specific ratios in P and H have the same sign, and the difference of P higher than H.
(4) Cis + trans (opposite): significant difference in P, H and T, the log2-transformed strain-specific ratios in P and H have the same sign, and the difference of H higher than P.
(5) cis × trans: significant difference in P, H and T, and the log2-transformed strain-specific ratios in P and H have the opposite sign.
(6) compensatory: Significant difference in H, but not P, significant difference in T.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The author NY is a member of the editorial board (Associate Editor) of this journal.         Supplemental tables: Table S1 The summary of differential expression genes in hybrid and purebred progenies   Principal Component Analysis of RNA-seq data. Each point represents one sample, with shape indicating sex, color indicating tissue (All) or cross (Brain, Liver, Muscle). In this step, information of genes on Z chromosome has been excluded.

Figure 3
Classification of genes according to the expression pattern of purebred and hybrid data sets. We take the group of brain male (A) and brain female (B) of cross 2 for example (the other groups see Supplemental Material), each point represents a single gene and is colorcoded according to its regulatory category. The coordinate position shows the average log2 expression fold change between the alleles in the hybrids (y-axis) and between the two purebreds (x-axis). The proportion of each category was summarized in the bar graph (C), where we removed the conserved and ambiguous genes, and further subdivided the genes of cis+trans category into two categories, basing on the same direction or opposition direction did the cis and trans variants act in. The number above the bar represents the proportion of genes in this regulatory category, and number on the bar represents the gene count of this category.

Figure 4
Synonymous and non-synonymous SNPs (pN/pS) in cis-and trans-regulatory genes in different groups. The value of y-axis refers to the mean value of all genes in the category.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.