Trait-trait dynamic interaction: 2D-trait eQTL mapping for genetic variation study

Background Many studies have shown that the abundance level of gene expression is heritable. Analogous to the traditional genetic study, most researchers treat the expression of one gene as a quantitative trait and map it to expression quantitative trait loci (eQTL). This is 1D-trait mapping. 1D-trait mapping ignores the trait-trait interaction completely, which is a major shortcoming. Results To overcome this limitation, we study the expression of a pair of genes and treat the variation in their co-expression pattern as a two dimensional quantitative trait. We develop a method to find gene pairs, whose co-expression patterns, including both signs and strengths, are mediated by genetic variations and map these 2D-traits to the corresponding genetic loci. We report several applications by combining 1D-trait mapping with 2D-trait mapping, including the contribution of genetic variations to the perturbations in the regulatory mechanisms of yeast metabolic pathways. Conclusion Our approach of 2D-trait mapping provides a novel and effective way to connect the genetic variation with higher order biological modules via gene expression profiles.

The null reference distribution is generated by the following procedure. 1. Randomly generate the values from the population of all expression values. 2. In this random drawing, we only draw forty values first and the rest of forty values are selected based on the previous random drawing results. For instance, if we randomly pick the expression value from 1000 th row and 3 rd column, the corresponding 1000 th row and 43 th column will be selected for preserving the dyeswap structure in the original data set. 3. After repeat step 1 and 2 for 1 million times, we obtain 1 million randomly generated gene pairs. 4. For each randomly generated gene pair, we compute the LA scores for all marker blocks and find its highest positive and negative marker blocks as well as the corresponding LA scores.
In the Figure 1, the qq-plot is shown for the genome wide LA linkage scores versus the reference LA linkage scores. As one can see that the apparent straight line is observed from both TOP (positive) and BOT (negative) qq-plots. In order to detect the subtle difference between the distributions, Figure 2 plots the difference between the quantiles of the two distributions against the quantile of the reference distribution. From Figure 2, we observe that the difference of these two distribution increases when the reference LA score gets larger. This shows that the higher LA linkage score is, the less likely it is resulted from random chance.  One simple way to estimate the FDR is to find out the ratio of expected number of LA scores from randomly generated sample and the observed number from genome wide distribution. In Figure 3, the relation between FDR and log(P-value) is plotted for both positive (TOP) and negative (BOT) LA scores. For each p-value, we find the corresponding cutoff of LA score and count the number of discoveries from genome wide LA scores, and then calculate FDR. For example, for p-value 1e-3, based on the reference distribution, the corresponding LA score cutoffs are -0.4867 (BOT) and 0.4575 (TOP). The corresponding number of discoveries are 49646 (BOT) and 47445 (TOP). This will give the estimated FDR 33.1% (BOT) and 34.7% (TOP). Our reference distribution could be regarded as a mixture of various distributions. The power of the test will be essentially reduced. For this genome wide study, we would like to demonstrate that the signal is essentially not that strong but still detectable. The high FDR is not that surprising since the majority of gene pairs are not expected to be related. This is why we also restrict our study on a subset of gene pairs based on pathway information.

Section B: Top/Bottom 20 LA 2D-trait Mapping Results from Genome-wide Study
In Table 1, we list the 20 gene pairs with the highest positive or negative LA scores  from genome wide 2D-Linkage result. In Tables 2 and 3, gene pairs mapped to the overly represented marker block 227 are listed. Based on annotation from SGD (Saccharomyces Genome Database, http://www.yeastgenome.org/), YGR146C has no known function but is induced by transcription factor Aft2p. However, when we look at the annotations for its paired genes, we found that KRR1, SAS10, RRB1, URB1, RPF1, LTV1, HTS1 are all involved or required in assembly of ribosome or ribosomal biogenesis (see Table 3 for the gene by gene annotations). Since YGR146C is still unknown, we look at the genes located within the marker block 227. We found YGL101W, SEH1, LSG1, RUF2 and USE1. However, if we look at the gene located next to YGL101W, we found RPL28, whose gene expression profile has a correlation of 0.2240 with the marker profile of marker block 227. If we search for the genes, which trans-link to the marker block 227, we find that MRM2, Mitochondrial rRNA Methyl transferase, has correlation 0.4277 with marker block 227. In Tables 4 and 5, gene pairs mapped to the overly represented marker block 348 are given. We found that most of them have functions associated with mitochondrial-ribosome protein (MRPS28, MPL22, RSM28, SWS2), mitochondrial membrane or intermembrane space (QCR6, SHE9, CYC1), assembly of mitochondrial respiratory complexes (MBA1), intramitochondrial sorting(YLR168C) (see Table 5 for the gene by gene annotations). We further investigate genes within or near to the makerblock 348. We find TIM54 very close to markerblock 348. Based on the Gene Ontology, TIM54 is "Component of the mitochondrial Tim54p-Tim22p complex involved in insertion of polytopic proteins into the inner membrane" and localized in mitochondrial.   Table 3: Annotations (from SGD) for genes linked to marker block 227 by 2D-trait mapping.
Gene Annotation YGR146C Putative protein of unknown function; induced by iron homeostasis transcription factor Aft2p; multicopy suppressor of a temperature sensitive hsf1 mutant KRR1 Essential nucleolar protein required for the synthesis of 18S rRNA and for the assembly of 40S ribosomal subunit SAS10 Component of the small (ribosomal) subunit (SSU) processosome required for pre-18S rRNa processing; essential nucleolar protein that, when overproduced, disrupts silencing RRB1 Essential nuclear protein involved in early steps of ribosome biogenesis; physically interacts with the ribosomal protein Rpl3p URB1 Nucleolar protein required for the normal accumulation of 25S and 5.8S rRNAs, associated with the 27SA2 pre-ribosomal particle; proposed to be involved in the biogenesis of the 60S ribosomal subunit RPF1 Nucleolar protein involved in the assembly of the large ribosomal subunit; constituent of 66S pre-ribosomal particles; contains a sigma(70)-like motif, which is thought to bind RNA SER2 Phosphoserine phosphatase of the phosphoglycerate pathway, involved in serine and glycine biosynthesis, expression is regulated by the available nitrogen source LTV1 Component of the GSE complex, which is required for proper sorting of amino acid permease Gap1p; required for ribosomal small subunit export from nucleus; required for growth at low temperature HTS1 Cytoplasmic and mitochondrial histidine tRNA synthetase; encoded by a single nuclear gene that specifies two messages; efficient mitochondrial localization requires both a presequence and an amino-terminal sequence YNL114C Dubious open reading frame unlikely to encode a protein, based on available experimental and comparative sequence data; completely overlaps the verified ORF RPC19/YNL113W, an RNA polymerase subunit  Gene Annotation CYC1 Cytochrome c, isoform 1; electron carrier of the mitochondrial intermembrane space that transfers electrons from ubiquinone-cytochrome c oxidoreductase to cytochrome c oxidase during cellular respiration YLR168C Putative protein of unknown function that may be involved in intramitochondrial sorting; has similarity to Ups1p and to human PRELI; the green fluorescent protein (GFP)-tagged protein localizes to mitochondria, required for wild-type respiratory growth MRPS28 mitochondrial ribosomal small subunit component MRPL22 Mitochondrial Ribosomal Protein, Large subunit RSM28 Mitochondrial ribosomal protein of the small subunit; genetic interactions suggest a possible role in promoting translation initiation YGR235C Putative protein of unknown function; the authentic, non-tagged protein is detected in highly purified mitochondria in high-throughput studies QCR6 Subunit 6 of the ubiquinol cytochrome-c reductase complex, which is a component of the mitochondrial inner membrane electron transport chain; highly acidic protein; required for maturation of cytochrome c1 MUD1 U1 snRNP A protein, homolog of human U1-A; involved in nuclear mRNA splicing MBA1 Protein involved in assembly of mitochondrial respiratory complexes; may act as a receptor for proteins destined for export from the mitochondrial matrix to the inner membrane SWS2 Putative mitochondrial ribosomal protein of the small subunit, has similarity to E. coli S13 ribosomal protein; participates in controlling sporulation efficiency SHE9 Mitochondrial inner membrane protein required for normal mitochondrial morphology, may be involved in fission of the inner membrane; forms a homooligomeric complex

Section C: eQTL Hotspots Obtained from Genome-wide Study
We further studied all the top 605 and bottom 1213 LA 2D-trait mapping results obtained using p-value cutoff 1e-5. The list of these 2D-trait mapping result can be found at http://kiefer.stat.ucla.edu/lap2/index-y-2g1m-genome-1e-5.htm. The marker block locations and their frequencies of occurrence are shown in Figure 4 in main text and in Table 6. The high frequency marker blocks are given in Table 7 along with a description of the enriched GO (gene ontology) term for the linked LA genes.   Table 8. Annotations for some candidate regulators in Table 6 Gene Annotation FLO10 Lectin-like protein with similarity to Flo1p, thought to be involved in flocculation LSG1 Putative GTPase involved in 60S ribosomal subunit biogenesis; required for the release of Nmd3p from 60S subunits in the cytoplasm NFT1 Putative transporter of the multidrug resistance-associated protein (MRP) subfamily PCL10 Biological Processes are regulation of glycogen biosynthetic process, regulation of glycogen catabolic process REX4 Putative RNA exonuclease possibly involved in pre-rRNA processing and ribosome assembly RPL28 Ribosomal protein of the large (60S) ribosomal subunit RPL40A Ribosomal Protein of the Large subunit RUF2 H/ACA box small nucleolar RNA (snoRNA); guides pseudouridylation of large subunit (LSU) rRNA at positions U1110, U2349, and U2351 SEH1 Nuclear pore protein that is part of the evolutionarily conserved Nup84p complex (Nup84p, Nup85p, Nup120p, Nup145p, and Seh1p); homologous to Sec13p TIM54 Component of the mitochondrial Tim54p-Tim22p complex involved in insertion of polytopic proteins into the inner membrane UIP5 interacts with Ulp1p, a Ubl (ubiquitin-like protein)-specific protease for Smt3p protein conjugates YTA12 Component, with Afg3p, of the mitochondrial inner membrane m-AAA protease that mediates degradation of misfolded or unassembled proteins and is also required for correct assembly of mitochondrial enzyme complexes

Section D: 1D-trait mapping results for gene pairs identified by genome-wide 2D-trait mapping
We studied the 1D linkages of the 1818 LA 2D-trait mapping results obtained using p-value cutoff 1e-5. We found that the vast majority of these 2D-trait mapping results cannot be identified by 1D-trait mapping. Specifically, each 2D-trait mapping result is a triplet (X, Y, Z) where X and Y are two gene expression traits and Z is a marker block. We used absolute correlation to measure the relation between one gene expression trait and one marker, and used max{ corr( , ) , corr( , ) } X Z Y Z ρ = to measure whether this triplet can be at least partially identified by 1D-trait mapping. The correlations are generally quite low, with median 0.13 and 95 percentile 0.27. In contrast, significant 1D linkages correspond to much higher absolute correlations with median 0.68 and the 5 percentile is 0.59 (Figure 4 (c)). We also show the distribution of 1 min{ corr( , ) , corr( , ) } X Z Y Z ρ = in Figure 4 (a). Comparing with ρ , 1 ρ measures whether the 2D result can be completely identified by 1D mapping. As expected, 1 ρ is quite small, with median 0.06 and 95 percentile 0.16.