Landscape and evolution of cis- and trans-regulatory divergence in chicken genome between two contrasted breeds analyzed in three tissues at one day age

Background: Gene expression variation is an important mechanism underlying phenotypic variation, and can occur via cis- and trans-regulation. In order to understand the role of cis- and trans-regulatory variation on population divergence of chicken, we developed reciprocal crosses of two chicken breeds, White Leghorn and Cornish Game, with major differences in body size and reproductive traits, and used them to identify the degree of cis versus trans variation in brain, liver and muscle of both male and female samples at 1 day age. Results: We provided a landscape about how the transcriptomes are regulated in the hybrid progenies of two contrasted breeds by allele specific expression analysis. Our results showed that compared with the cis-regulatory divergence, trans-acted genes existed more extensively in the chicken genome. Furthermore, a widespread tendency of compensatory regulation exists in chicken genome. Most importantly, we found the evidence of stronger purifying selection on genes regulated by trans variations than the cis elements. Conclusions: We demonstrated a pipeline to explore the allele-specific expression in the hybrid progenies of inbred lines without specific reference genome. Our research performed the first study to describe the regulatory divergence between two contrasted breeds. The results suggested that artificial selection associated with domestication in chicken may have more often acted on trans-regulatory divergence than cis. The mouse insulin-like growth factor type-2 receptor is imprinted and closely linked to the Tme locus.

reproductive differences among breeds [29]. Previous studies have identified frequent allele specific expression between different chicken breeds [19,20]. The rapid change under domestication offers a unique example to detect the relative importance of cisversus trans-regulatory variation underlying phenotypic change. We used reciprocal crosses of White Leghorn (WL), a key layer breed selected for high egg output, and Cornish Game breeds (CG), a cornerstone broiler breed selected for rapid growth and muscle development [30], to assess the role of different forms of regulatory variation in brain, liver and muscle of both males and females at 1 day age.

Results
The profile of the parental genomes and gene expression in different tissues,

sexes of progenies
The two inbred chicken strains, Cornish-Game and White-Leghorn, which exhibit major differences in growth rate, egg production and behavior, were used to generate purebred and reciprocal hybrid F1 progenies (Fig. 1). In order to identify breed-specific variants, we sequenced the four parents of the two reciprocal crosses, recovering on average 100.73 million pair-end reads per sample after quality control. We identified on average 4.74 million SNPs per parental genome which were used to generate simulated parental genomes. From these SNPs, we picked SNPs which were homozygous in each parental bird but different from each other in a same cross (heterozygous in the hybrid progenies), resulting in two heterozygous SNP lists with on average 1.4 million of heterozygous SNPs for the two reciprocal crosses, individually, to identify the allele-specific RNA-seq reads of offspring in the following steps.
For each hybrid cross, we collected RNA-Seq data from brain, liver and muscle of three male and three female F1 progenies 1-day post-hatch. On average, we recovered 29.17 million mappable reads per sample. In order to eliminate the effect of the sex chromosomes, we removed all Z and W genes from our analysis and focused entirely on autosomal loci. We recovered significant differences in gene expression between different tissues, sexes and parents of origin (Fig. 2). Tissue was the most powerful factor for gene expression. The sex played a leading role in brain, and the strain effected gene expression of liver the most, while in muscle, the parent of origin seems more powerful because samples were divided into two parts by mother origin. We therefore retained all three variables in our subsequent analysis. It resulted in 12 treatment groups, comprised of 3 tissues, 2 sexes and 2 reciprocal crosses in this study.

An effective pipeline was applied on the allele-specific expression analysis
To identify the parental origin of offspring's mRNA, we tried a new pipeline by using a R package of 'asSeq' [31]. Briefly, a set of R scripts are available for genotype phasing with the 1.4 million of heterozygous SNPs identified in the previous step. About 2% of these SNPs located on the exon region. The large number of SNPs provide higher chance that an RNA-seq read could overlap with a heterozygous genetic marker, hence is identified as allele-specific read.
In order to validate the accuracy of our ASE pipeline, we generated two artificial hybrid F1 libraries. Specifically, we concatenated two male brain RNA-seq fastq files from cross 1 and cross 4, which had roughly equal read depth. We also concatenated two female liver samples the same way. The two simulated hybrid libraries and four original purebred libraries were handled consistently with other hybrid libraries, using the heterozygous SNP lists of both cross 2 and cross 3. We compared the expression ratio of two simulated alleles (CG/WL) to the real expression ratio of two samples (CG/WL) for each gene. A strong correlation between the two measurements were observed (Fig. S1), indicating that our ASE analysis pipeline was robust. Since our pipeline only counted the local reads containing the heterozygous SNPs, we further assessed the correlation of expression fold change (CG/WL) between the local reads method and the method of counting total reads using edgeR [32][33][34]. The correlation was also strong (Fig. S2). These results proved the feasibility of our pipeline.

Genes were classified into different categories by the type of regulatory divergency
A total of 24,881 genes from the Ensembl v87 annotation were analyzed. About one fifth of them contained heterozygous SNPs and expressed in our progeny samples (Table S1).
For the genes containing heterozygous SNPs, we observed significant expression difference (p-value <0.05, binomial test corrected for multiple comparisons by q-value method) between the purebred females (cross 1 vs. cross 4) for 14.71 % of them in brain, 36.45 % in liver and 38.38 % in muscle (take the heterozygous SNP list of cross 2 for example). In males, we observed that 17.64 % of genes in brain, 41.87 % in liver and 37.84 % in muscle expressed significantly different (Table S1).
Expressive genes were classified into different categories by the type of regulatory divergency [7, 35, 36] (Fig. 3A, B, Table 1, Fig. S3-S5). Most genes showed conserved or ambiguous expression, as expected given the recent divergence time of these two breeds.
There were more than 70% genes in brain, more than 40% genes in liver, and around 50% genes in muscle classified as conserved. Nonetheless, we observed substantial cis-and trans-variation in the hybrid crosses. There were a greater proportion of trans-regulated gene expression variations than cis in most tissues and across both sexes, particularly in muscle (Fig. 3C).
Genes classified as "cis+trans (same)" show evidence of cis and trans variations acting in the same direction, while genes classified into the other three categories show evidence of cis and trans variations acting in opposite directions, with different expression preference on the two alleles. We observed that genes more often display the latter pattern, and most genes were classified as "compensatory" (Fig. 3C).
The gene proportion of each regulatory category was similar between different tissues and different sexes, except for some differences between the muscle and other two tissues (Fisher's exact test, Table S2). Unexpectedly, we observed only few loci with consistent cis-or trans-regulatory divergence across different groups (Fig. S6). These stable cis-or trans-regulatory divergence genes appear to play important roles in phenotypic divergence. For example, the gene IGFBP2, TGFBI, PDGFRL and IGF2R all showed significant evidence of expression bias between our two breeds. These genes are related to the chicken growth, which could explain the difference of growth rate between two chicken breeds (Table S3).

Genes regulated by trans-acting variation show greater sequence conservation
We counted the number of variants located within 1 kb upstream of transcription start sites of each gene with the genome data of the four parents. The results showed greater variations at upstream of cis-regulatory divergence genes than trans-acted genes in all samples (Fig. 4A).
The ratio of the number of non-synonymous SNPs to the number of synonymous SNPs (pN/pS) in coding sequence of each gene was calculated in this research. We found that the value of pN/pS in genes regulated by trans-variant were lower than cis-regulatory divergence genes in all samples (Fig. 4B, Fig. S7-S8).

Discussion
For the previous studies on regulatory divergence, the time point they chose were not identical, from embryo to adult [7, 11, 12]. Genes express differentially among different developmental stages [37], therefore different result of the regulatory divergence would be obtained. We chose the 1-day age because it is a critical stage for the chicken development, when they make a transition from embryo to growth stage, and genes responsible for growth and immunity begin to express [38,39].
On account of the short divergence time, our two chicken inbred strains are not like the mouse inbred lines with high consistency on genome. To enhance the reliability of our results, we have improved our analysis pipeline. Firstly, the SNP list we used to identify the parental origin was strictly filtered from the re-sequencing data of the four parents.
The SNPs were statistically homozygous in each parent, hence heterozygous in each hybrid offspring. Secondly, we counted the total number of reads covering at least one SNP marker across the whole transcript, instead of counting the reads number of each SNP. Compared with the method using the existing strain-specific reference genomes, our pipeline could improve the accuracy of parental origin identification for the heterozygous SNPs in hybrid offspring, because we sequenced their parents directly. These SNPs were used to mark the parental origin for the alleles of each gene, and subsequently increased the accuracy of regulation classification. But it also resulted in the limited number of genes could be studied in this research. However, our study provided an example to deal with the similar situation where there was no specific reference genome for different strains.
Although chicken domestication occurred several thousand years ago, commercial populations were established only in the last 200 years [29]. In our study, most genes showed conserved or ambiguous expression, and more trans-regulatory variants than cis, which could attribute to the short differentiation time between White Leghorn and Cornish Game. In theory, the pleiotropic effects of trans-regulatory mutations will result in selection to eliminate most deleterious trans-acting mutations [40]. In contrast, we might expect a large proportion of cis-regulatory mutations to be largely neutral, and therefore to accumulate over time [9,41]. The large proportion of trans-regulatory mutations observe in our study suggest that artificial selection has primarily acted on transregulatory mutations, but the neutral cis-regulatory mutations have not accumulated substantially in the short time since these breeds were established.
Genes regulated by both cis-and trans-variations more often acting in opposite directions, and most genes were classified into "compensatory" in our study. This result consistent with the previous research on house mice [36], indicating that the cis-and trans-variants tend to act convergently to maintain the stability of gene expression [11,42]. Despite lacking of complete dosage compensation mechanism on sex chromosome [24-28], an extensive compensatory tendency still exist in chicken genome.
There were few loci with consistent cis-or trans-regulatory variation across different tissues and different sexes. This result consistent with some previous ASE analyses, which suggest rare ASE genes are consistently expressed across tissues [43,44]. However, the cis-and trans-regulatory divergence classification is more complex than the simple ASE analysis. Gene expression is characterized by spatiotemporal specificity. It always controlled by the interaction of cis-regulatory DNA sequences and trans-regulatory factors, which could complicate the identification of regulatory divergence. The statistical methods could not accurately classify them with the limited expression information. However, even so, the statistical result still referable and valuable for the following analysis.
Cis-regulatory elements are primarily located at upstream of coding sequences. Our results, is consistent with resent studies in Drosophila [ 7], detected greater variants within 1 kb upstream of transcription start sites of cis-regulatory divergence genes than trans-acted genes, suggesting that our classification results were reliable. Meanwhile, genes regulated by trans-variant showed a lower value of pN/pS than cis-acted genes. The pN/pS was usually used to assess the degree of selective constraint. Genes under high constraint were expected to have lower pN/pS [45,46]. Our results suggest that transregulatory divergence genes are subject to increased selective constraint during chicken domestication and may have been under stronger artificial selection. It is consistent with similar studies in mice [11], which pointed that trans-regulated genes display greater sequence conservation by computing the GERP (Genomic Evolutionary Profiling) score for each exon.

Conclusions
In this study, we demonstrated a pipeline to explore the allele-specific expression in the hybrid progenies of inbred lines without specific reference genome. With the genome sequence of parents and RNA-seq data of offspring, we classified the expressed genes on chicken genome into different categories by the type of regulatory divergency. More trans-regulatory divergence than cis was detected due to the short divergence history of the two parental breeds. A widespread compensatory tendency for the regulatory divergence exists in chicken genome. The sequence conservation analysis suggested that artificial selection associated with domestication may have more often acted on genes regulated by trans-variations in the history of chicken commercial breeds establishment.

Samples
The inbred chickens used in our study were derived from National Engineering Laboratory for Animal Breeding of China Agricultural University. All the animals were fed and handled according to the regulations and guidelines established by the Animal Care and Use Committee of China Agricultural University, and all efforts were made to minimize suffering. DNA was extracted from brachial vein blood of four parents of two reciprocal crosses, using phenol-chloroform protocols with the standard protocols. Total RNA was extract from three tissues (brain, liver and breast muscle) of 23 chickens at the age of 1 day using the Trizol reagent (Invitrogen, Carlsbad, CA, USA) with the standard protocols from the manufacturer. The DNA and RNA quality were assessed by NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc.) and agarose gel electrophoresis (AGE).

DNA & RNA sequencing and data alignment
Whole-genome sequencing of parent genomes and RNA-seq of offspring were performed using an Illumina HiSeq 2500 sequencer. Library construction and sequencing were To ensure the accuracy of RNA-seq data alignment, we simulated four parental genomes.
The re-sequencing data of four parents were mapped to the chicken reference genome (Gallus_gallus-5.0, http://hgdownload.soe.ucsc.edu/downloads.html#chicken) with the Burrows-Wheeler Aligner (BWA) [48] (v0.7.15). The BAM files were sorted, and duplicate reads were removed using Picard toolkit (https://github.com/broadinstitute/picard). The Genome Analysis Toolkit (GATK) [49] (v3.6) was used for single-nucleotide polymorphism (SNP) calling. Nucleotide from the reference genome was substituted if the mutant base was supported by more reads than the original reference base, this step was completed by vcftools [50] (v0.1.13). The four simulated parental genomes were used to replace the reference genome in the RNA-seq data alignment of hybrid crosses. For each hybrid cross, we then identified SNPs between two parents that were homozygous in each parent with >10 supporting reads from re-sequencing data. The SNP list divided each hybrid offspring genome into two parts basing on the parent-of-origin.
The RNA-seq data alignment was completed by STAR [51] (v2.5.3a). With the SNP list between two parents, we counted the allele-specific reads from the two parts of each hybrid offspring on exon set level, using the R package of 'asSeq' [31]. To be specific, we counted the total number of reads covering at least one SNP across the whole exon set.
For the condition of one read containing more than one SNPs, we set the parameter of prop.cut as 0.9, that is, we assigned a read to one of the two parental alleles if the proportion of those heterozygous SNPs suggesting the read is from that allele is larger than 0.9. In practice, this condition could insure all the SNPs on one read were consistent.
If not, they would be discarded. We then collapsed counts at the exon level into gene level according to the Ensembl gene annotation file (ftp://ftp.ensembl.org/pub/release-91/gtf/gallus_gallus). We filtered the expressed genes with the following standard: for each sex and each tissue, the total reads of both the three purebred offspring and three hybrid offspring should be in the limits of 6 to 1000. The reads count of each sample was further normalized based on the sum of reads which could be mapped to the whole genome.
One male muscle sample of cross 3 was removed because its expression pattern was abnormal. We speculated that it might be mixed from other cross by error.

Classification of different regulatory categories
In order to categorize regulatory variation, we referenced the methods of regulatory divergence research on Drosophila [7] and house mouse [36]. Concretely, the binomial test was used to identify differential expression both between the two purebred progenies (P) and between the two alleles of hybrid progenies (H). Fisher's exact test was used to judge the breed-specific RNA abundance ratio difference between the P and H data sets to detect the trans effects (T). For false discovery rate control, a method of q-value estimation [52] was adopted to correct the p-value of both binomial test and Fisher's exact test. We considered it has significantly difference when q < 0.05. The expressed genes were classified into eight categories according to the following criterion: (1) Cis: Significant difference in P and H, no significant difference in T.
(2) Trans: Significant difference in P, but not H, significant difference in T.
(3) Cis + trans (same): significant difference in P, H and T, the log2-transformed strainspecific ratios in P and H have the same sign, and the difference of P higher than H.
(4) Cis + trans (opposite): significant difference in P, H and T, the log2-transformed strainspecific ratios in P and H have the same sign, and the difference of H higher than P.
(5) cis × trans: significant difference in P, H and T, and the log2-transformed strainspecific ratios in P and H have the opposite sign.
(6) compensatory: Significant difference in H, but not P, significant difference in T.
(7) Conserved: No significant difference in H, P and T.

Sequence conservation analysis
Re-sequencing data of four parents were used to study the sequence conservation of cis-

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The author NY is a member of the editorial board (Associate Editor) of this journal.

Figure S6
Intersection of different groups of cis-and trans-regulatory genes. Figure S7 The ratio of the numbers of non-synonymous SNPs to the numbers of synonymous SNPs (pN/pS) in different groups of cross 2. Figure S8 The ratio of the numbers of non-synonymous SNPs to the numbers of synonymous SNPs (pN/pS) in different groups of cross 3.
Supplemental tables: Table S1 The summary of differential expression genes in hybrid and purebred progenies Table S2 The difference of gene proportion of each categories between different groups  Figure 3 Classification of genes according to the expression pattern of purebred and hybrid data sets. We take the group of brain male (A) and brain female (B) of cross 2 for example (the other groups see Supplemental Material), each point represents a single gene and is color-coded according to its regulatory category.
The coordinate position shows the average log2 expression fold change between the alleles in the hybrids (y-axis) and between the two purebreds (x-axis). The proportion of each category was summarized in the bar graph (C), where we removed the conserved and ambiguous genes, and further subdivided the genes of cis+trans category into two categories, basing on the same direction or opposition direction did the cis and trans variants act in. The number above the bar represents the proportion of genes in this regulatory category, and number on the bar represents the gene count of this category.