Evidence for contribution of common genetic variants within chromosome 8p21.2-8p21.1 to restricted and repetitive behaviors in autism spectrum disorders

Background Restricted and Repetitive Behaviors (RRB), one of the core symptom categories for Autism Spectrum Disorders (ASD), comprises heterogeneous groups of behaviors. Previous research indicates that there are two or more factors (subcategories) within the RRB domain. In an effort to identify common variants associated with RRB, we have carried out a genome-wide association study (GWAS) using the Autism Genetic Resource Exchange (AGRE) dataset (n = 1,335, all ASD probands of European ancestry) for each identified RRB subcategory, while allowing for comparisons of associated single nucleotide polymorphisms (SNPs) with associated SNPs in the same set of probands analyzed using all the RRB subcategories as phenotypes in a multivariate linear mixed model. The top ranked SNPs were then explored in an independent dataset. Results Using principal component analysis of item scores obtained from Autism Diagnostic Interview-Revised (ADI-R), two distinct subcategories within Restricted and Repetitive Behaviors were identified: Repetitive Sensory Motor (RSM) and Insistence on Sameness (IS). Quantitative RSM and IS scores were subsequently used as phenotypes in a GWAS using the AGRE ASD cohort. Although no associated SNPs with genome-wide significance (P < 5.0E-08) were detected when RSM or IS were analyzed independently, three SNPs approached genome-wide significance when RSM and IS were considered together using multivariate association analysis. These included the top IS-associated SNP, rs62503729 (P-value = 6.48E-08), which is located within chromosome 8p21.2-8p21.1, a locus previously linked to schizophrenia. Notably, all of the most significantly associated SNPs are located in close proximity to STMN4 and PTK2B, genes previously shown to function in neuron development. In addition, several of the top-ranked SNPs showed correlations with STMN4 mRNA expression in adult CEU (Caucasian and European descent) human prefrontal cortex. However, the association signals within chromosome 8p21.2-8p21.1 failed to replicate in an independent sample of 2,588 ASD probands; the insufficient sample size and between-study heterogeneity are possible explanations for the non-replication. Conclusions Our analysis indicates that RRB in ASD can be represented by two distinct subcategories: RSM and IS. Subsequent univariate and multivariate genome-wide association studies of these RRB subcategories enabled the detection of associated SNPs at 8p21.2-8p21.1. Although these results did not replicate in an independent ASD dataset, genomic features of this region and pathway analysis suggest that common variants in 8p21.2-8p21.1 may contribute to RRB, particularly IS. Together, these observations warrant future studies to elucidate the possible contributions of common variants in 8p21.2-8p21.1 to the etiology of RSM and IS in ASD. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2475-y) contains supplementary material, which is available to authorized users.


Background
Autism Spectrum Disorder (ASD) is characterized by impaired reciprocal social interactions, delayed or aberrant communication, and the presence of restricted and repetitive behaviors, frequently with restricted interests [1]. These disabilities often confer significant lifelong burdens on individuals with ASD. This fact, together with the high ASD prevalence in the general population, makes ASD a major challenge for public health systems. [2][3][4][5] Based on heritability estimates as high as 70-90 % in twin and family studies [6], great effort has been devoted to elucidating the genetic mechanism of ASD. However, it has been difficult to identify any individual genetic factors that confer even moderate risk [7,8].
Genome-wide association studies (GWAS) have implicated the region on chromosome 5p14.1 between CDH9 and CDH10 as the first potential common genetic risk factor for ASD in Caucasian populations [9,10]. Replication in independent GWAS, however, has frequently not been achieved for many candidate loci for ASD [11][12][13][14]. Phenotype and genetic heterogeneity between patients are conjectured to greatly reduce the power of overall genome-wide case-control studies in ASD, and is a likely explanation for the lack of replication and much of the 'missing heritability' in this complex disease [15]. Various attempts have been made to reduce heterogeneity in large-scale genetic studies of ASD. One proposed approach to increase statistical power to detect pathogenic loci is to design genetic association studies focusing on ASD sub-phenotypes [16][17][18][19].
Restricted and repetitive behaviors (RRB) are a core symptom of ASD [20]. Previous studies have shown that RRB have an underlying genetic component and may be influenced by genes independent of those associated with the social or communication deficits [21][22][23]. Moreover, Autism Diagnostic Interview-Revised (ADI-R), a gold-standard diagnostic tool for ASD [24], provides widely-accepted quantitative measures for RRB [25,26], making it a promising sub-phenotype for association studies. Since RRB comprises heterogeneous groups of behaviors [27,28], research during the last decade has used factor-analysis to examine the structure of RRB using different subsets of ADI-R items and subpopulations of ASD individuals that differ in symptom severity and/or ethnicity [27,[29][30][31][32]. Remarkably, in spite of their methodological differences, many of these analyses converge on a two-factor solution for RRB comprising 'repetitive sensory-motor' (RSM) and 'insistence on sameness' (IS). The RSM subcategory quantifies motor mannerisms, sensory seeking behaviors, and the repetitive use of objects, whereas the IS subcategory quantifies compulsions, rituals and difficulties with changes in routine [30].
IS and RSM were found to be differentially related to other ASD variables. Specifically, high correlations were found between RSM, but not IS, with IQ, additional adaptive behaviors, and age at first words and phrases [29,30,33], suggesting that, compared to IS, RSM may be more correlated with ASD severity [33]. Studies also indicated that the IS subcategory might be under stronger additive genetic control than the RSM subcategory. Whereas significant familial aggregation of the IS subcategory has been consistently reported [31,32], no significant concordance for familial aggregation has been reported for RSM [25,27,31,34].
Behavioral subcategories that differ in behavioral correlates and familiality are of particular interest to researchers investigating the genetic components that underlie ASD sub-phenotypes. In a recent genomewide linkage analysis [25], RSM and IS subcategories were linked to various chromosomal regions that only partially overlapped regions previously identified using ASD diagnosis as the phenotype.
In the current work, we explored the underlying structure of RRB using two independent, publicly available ASD datasets. In an effort to identify SNP markers, candidate genes and biological pathways associated with RRB, the empirically derived RRB subcategories were then used as quantitative traits for GWAS. The observation that both univariate and multivariate linear mixed models identified associated SNPs within 8p21.2-21.1 in the discovery dataset, provides the first evidence that genetic variation in this region influences RRB phenotypes in ASD. Further studies are needed to confirm the association signals within chromosome 8p21.2-8p21.1, since replication was not obtained in an independent sample of 2,588 ASD probands, possibly due to insufficient sample size and between-study heterogeneity.

Ethics statement
This study was approved by the ethics committee of the School of Basic Medical, Fudan University, China (IRB#2010CB529601). All the genetic data and phenotype data used is previously published and publicly available. Written informed consent was previously obtained from all individuals and procedures had approval from institutional review boards from all the institutions involved in recruitment and research, following national and international ethical and legal regulations and the principles of the Declaration of Helsinki.

Dataset demographics
The discovery dataset comprised individuals in the Autism Genetic Resource Exchange family-based dataset (AGRE: http://www.agre.org) [35]. AGRE has obtained informed consent from all individuals listed in their pedigree catalogue. Individuals with ASD in the AGRE cohort were diagnosed using both the Autism Diagnostic Interview-Revised (ADI-R) [24] and Autism Diagnostic Observation Schedule (ADOS) [36], widely considered to be the gold-standard diagnostic instruments for ASD. Individuals with possible non-idiopathic ASD (e.g., patients with significant chromosomal abnormalities, premature birth, or comorbid disorders) were excluded. All subjects were genotyped using the Illumina Human-Hap550 BeadChip. Genotyping details and other important information have been previously described [9]. A "cleaned" version of the raw AGRE genetic data, designated CHOP.clean100121, was downloaded from the AGRE website (4,327 subjects). Following the method described in the supplement section of the study by Wang et al. [9], population structure was examined based on the first two principal components obtained by multidimensional scaling (MDS) of a matrix of pairwise IBS (Identical By State) values between these individuals. Individuals of European ancestry were selected based on principal component 1 scores less than 0.0, and principal component 2 scores between -0.02 and 0.02. A total of 806 ASD families (3,455 individuals) were inferred to have European ancestry using the above procedure (Additional file 1). Because ADI-R record were available only for ASD probands rather than for all pedigree members, the final discovery dataset comprised 1,335 probands, ranging in age from 1.8 to 44 years (mean = 8.00 years, SD = 4.87), and were predominately male (78.7 %). The sample size of each computer-scored diagnostic group defined by AGRE was: Autism 1,152 (86.3 %), Not Quite Autism (NQA) 57 (4.27 %), and Broad Spectrum (BS) 126 (9.44 %). The specific criteria for these classifications are given on the AGRE website. Detailed information for all individuals can be found in Additional file 2.
The dataset used for replication comprised individuals in Simons Simplex Collection (SSC, version 15), a genetic study limited to families with one child with ASD (the proband). Previous reports have described the SSC data collection process, as well as the extensive phenotypic data available [37]. Informed consent was obtained at each data collection site included in the SSC. Our group obtained phenotype and genotype data for 2,591 ASD families, from which 2,588 probands with ADI-R records were selected for analysis, These probands ranged in age from 1 to 108 years (mean = 21.46 years, SD = 13.96), and were predominately male (86.67 %). The sample size of each diagnosed status defined by ADI-R was Autism: 2,346 (90.65 %), and Autism Spectrum Disorder (ASD): 242 (9.35 %). Detailed information for all the individuals can be found in Additional file 3.

Diagnostic instruments
The Autism Diagnostic Interview-Revised (ADI-R) instrument is a standardized parent interview designed to assess the presence and severity of symptoms based on the DSM-IV criteria for ASD [24]. Items designed for interviews fall within three diagnostic categories: i) social, ii) communication and iii) restricted and repetitive behaviors. Two scores are given for most items: a 'current' score, which assesses behavior during the past 3 months, and an 'ever' score, which assesses behavior in early childhood or at its greatest severity. We used the 'current' score in each item to avoid potential retrospective bias that could result from using the 'ever' score. The full range of each item scores (0-3) was used to provide maximal information of severity. Scores of 7 ("definite abnormality in the general area of the coding, but not of the type specified"), 8 ("not applicable"), and 9 ("not known or not asked") given under certain circumstances were all converted to 0, according to the algorithm listed in ADI-R [38].

Statistical analysis PCA analysis of RRB items
Factor analysis was carried out using Principal Components Analysis (PCA) with varimax rotation on 11 RRB items from the ADI-R using R [42,43]. These items were previously included in the RRB subdomains, RSM and IS by S. L. Bishop and colleagues using exploratory factor analysis [27]. Similar to previous factor analyses of the ADI-R [30,34], we employed a cutoff of 0.30 for the inclusion of an item in a factor. Correlation analyses were conducted to examine relationships between RRB subcategories.

Assessment of familiality of RRB subcategories
To investigate potential familial relationships in the empirically derived subcategories, intraclass correlations (ICCs) between sibling pairs with any ASD diagnosis (i.e., Autism, NQA, or BS) from the multiplex families were calculated (monozygotic twins were excluded) [30]. Affected sib pairs of each multiplex family were included in ICCs calculation (N = 200 from 100 families). This analysis was only done using the AGRE dataset, since SSC was limited to families with only one ASD child.

Genome-wide association analyses
The association of SNPs with the RRB subcategories was measured using a novel Genome-wide Efficient Mixed-Model Association (GEMMA) approach developed by Zhou and Stephens [44]. Briefly, GEMMA fits univariate linear mixed models for associations with single phenotypes or multivariate linear mixed models for simultaneous associations with multiple phenotypes, while controlling for sample relatedness and potential population stratification. (For addition details see [44][45][46]). GEMMA was downloaded from http://www.xzlab.org/ software.html. Raw item scores from ADI-R score sheets of ASD probands were summed for each subcategory of RRB identified by PCA analysis. Sex and age at ADI-R standardized residuals of the summed scores of each subcategory were calculated using multivariate regression. These residuals were normalized following Tukey's formula using SPSS and then used as phenotypes in genome-wide association analysis. Autosomal chromosome association results were retained [47]. First, associations between SNPs and each subcategory were tested using the GEMMA program based on a univariate linear mixed model, while applying a correction for sample structure (population stratification and hidden relatedness) through a pairwise relatedness matrix derived from SNP genotypes. Second, the GEMMA program was used to investigate associations between SNPs and RRB sub-categories in a multivariate analyses model to estimate the robustness of the associations. The use of multivariate methods has been recommended, because multivariate analyses may increase power not only to detect genetic variants that affect only one of the multiple correlated phenotypes, but also pleiotropic genetic variants [44][45][46].
Following association analysis, statistical evidence for association was evaluated by carrying out genome-wide association analysis for 1000 permutations of phenotypes. To ease the computational burden, these analyses were performed for genotyped SNPs only. To estimate genomic inflation factors for all the distributions of P-values, lambda genomic control (lambdaGC) values were calculated as the ratio of the median of the empirically observed distributions of the test statistic to the expected median. The empirical p-value for the original association was calculated as the proportion of permutation replicates with lambdaGC values greater than the lambdaGC value of the original distribution.

Cis-eQTL analysis for the candidate loci
Expression quantitative trait loci (eQTL) analysis was conducted using gene expression data obtained using adult human prefrontal cortex obtained from the "BrainCloud" study (http://braincloud.jhmi.edu) [48]. Only individuals of European ancestry (n = 112) were included in the analysis. Individual-level mRNA expression data were downloaded from the Gene Expression Omnibus (accession number GSE30272) and genotype data were obtained from dbGaP (accession number phs000417.v1.p10.) Linear regression models were used to identify SNPs located within 500 kb of the transcript being tested (cis-eQTL) with statistically significant correlations between genotype and mRNA expression levels, with RNA integrity numbers (RIN) and age included as covariates. Genotype imputation for chromosome regions of interest was performed as described elsewhere [49].

Functional enrichment analysis
To identify plausible pathways associated with RRB, we expanded our focus beyond single variants by performing functional enrichment analysis using the webaccessible bioinformatics tool, Database for Annotation, Visualization and Integrated Discovery (DAVID) [50,51]. Since DAVID can only handle gene lists, SNPs with an association P-value smaller than 0.01 were used to compile a list of genes for further analysis (i.e., a list of all genes that contain associated SNPs or are the nearest genes to associated intergenic SNPs). Analysis was performed using the software's Functional Annotation Clustering option. The"Functional Annotation Clustering" tool identifies gene annotation terms that are enriched in the input gene list compared to the gene list from entire human genome and ranks the terms according to their enrichment P-values calculated using Fisher's exact test. Subsequently, "clusters"of related gene annotation terms that are enriched in the input gene list are assigned an "enrichment score" (ES), defined as the geometric mean of the log 10 -transformed P-values for all gene annotation terms in the cluster. Enrichment scores > 1.3 are considered to be nominally significant.

Discovery (AGRE) dataset
Factor substructure of RRB Descriptive statistics of 11 ADI-R items used in these analyses are listed in Table 1. Detailed information regarding each item is given in Additional file 4. The twofactor solution provided a satisfactory fit to 11 ADI-R items in the PCA analysis (Chi Square [df = 34] = 128.61, P-value = 6.38E-13). Using a cutoff of ≥ 0.3 for the inclusion of items in a respective factor, 9 out of 11 items loaded on the two factors ( Table 2). Four items loaded on Factor 1 and five items loaded on Factor 2. Loadings on Factor 1 (Repetitive Sensory Motor: RSM subcategory) ranged from 0.46 to 0.75. Loadings on Factor 2 (Insistence on Sameness: IS subcategory) ranged from 0.30 to 0.68.Sum scores for RSM and IS were calculated by summing the scores of items included in each factor. RSM and IS gave similar score distributions that spanned the full range of possible scores (i.e., 0-11 for RSM, 0-15 for IS). For the entire sample, the mean RSM score was 4.096 (SD = 2.82), and the mean IS score was 3.73 (SD = 2.92) (Additional file 5). Together, the two subcategories accounted for 42 % of the variance of RRB. RSM and IS subcategory scores were correlated at r = 0.204 (Pvalue < 0.001), indicating that they share 4 % of their variance (r 2 = 0.04).
ANOVA was conducted to compare mean scores for each derived subcategory with respect to status category defined by AGRE: Autism, NQA (Not Quite Autism), and Broad Spectrum [52]. Both for RSM and IS, significant differences were detected among the three status categories (F = 101.8, P-value < 0.0001; F = 28.4, P-value < 0.0001). The Tukey HSD post-hoc test indicated that both RSM and IS scores for individuals in the Autism category were significantly higher than for individuals in the NQA or BS categories, who did not differ in their scores (Fig. 1).

Familiality of RRB
To test potential familial relationships between sibling pairs in our data, we calculated interclass correlations (ICCs) between sibling pairs for RSM and IS and the original RRB domain from ADI-R. A significant family genetic effect was shown for IS (ICC = 0.405, P-value = 0.005), while the ICC for RSM was 0.042 (P-value =0.416) and the ICC for RRB was 0.359 (P-value = 0.015).

Genome-wide association analysis
After filtering for minor allele frequency (MAF) < 5 % and info score r 2 < 0.3, 6,066,362 genotyped or imputed SNPs were tested for association with standardized and normalized RSM (univariate linear mixed model), IS (univariate linear mixed model), and RSM/IS (multivariate linear mixed model) scores using GEMMA mixed model association analyses. Quantile-Quantile (QQ) plots of P-values for association with IS or RSM in the AGRE cohort GWAS are shown in Fig. 2. QQ plots of P-values for association with RSM/IS and permuted RSM/IS phenotypes are also shown, with the true distribution compared to the permuted distributions (Fig. 2c). The true lambdaGC value was 1.0355, while the permutation P-value for RSM/IS association result was 0.078 based on the distribution of lamdaGC values obtain from all permutations analyzed (Fig. 2d). These results suggest little evidence for inflation of Pvalues due to stratification or other confounding biases, but provide compelling evidence for RRB association. The Manhattan plots for associations of SNPs with RSM、IS or RSM/IS quantified using GEMMA are shown in Fig. 3. We failed to detect genome-wide significant associations (P-value < 5.0E-8) with RSM or IS in the AGRE cohort. However, rs11512467 at 11q23 showed an association signal close to genome-wide significance for RSM (5.67E-08) and ten SNPs at this locus had P-values below 4.27E-07, all located within 20 kb upstream from IL10RA (Fig. 3a), Table 3). The strongest univariate association signals for IS were detected at 8p21.2-8p21.1, including the top 4 significant SNPs/rs62503729 (P-value = 1.39E-07), rs13278976 (P-value = 1.64E-07), rs13270725 (P-value = 1.64E-07) and rs35240189 (P-value = 1.65E-07), located within 10 kb downstream of the STMN4 gene (Fig. 3b), Table 4). Several other regions showing suggestive association signals (P-value < E-06) for RSM or IS were also detected, such as 1q21-1q21.2 (within the 3'-UTR of MCL1 (Myeloid cell leukemia 1) and an intron on ENSA (endosulfine alpha)), 6p24.3 (in an intron of TXNDC5 (thioredoxin domain containing 5)) ( Table 3,  Table 4). These regions cannot be easily implicated in RRB, but may still provide new insights, if confirmed in follow-up studies with larger samples or sequencing validations.
For RSM/IS, the top-ranked SNPs were identified within 8p21.2-8p21.1, completely overlapped with the most highly associated region for IS (Fig. 3c), Table 5). As shown in regional plots of association signals (Fig. 4a b  c), association with 8p21.2-8p21.1 region SNPs is greater for RSM/IS than for IS or RSM, suggesting that the multivariate model including RSM and IS might provide greater power for detecting associations. Since no significant family genetic effect was detected for RSM in our analysis, association in the multivariate association analysis was likely driven by the IS category. It should also be noted that this region did not show any significant association with ASD diagnosis after we re-analyzed the association for this region in AGRE cohort following the earlier analysis [9] (Fig. 4d). Based on these observations, we focused subsequence analyses on the 8p21.2-8p21.1 region and hypothesized that common variants in this region may be novel candidate loci for RRB, especially IS.

Replication (SSC) dataset
Based on PCA of 11 ADI-R scores in the SSC dataset, we identified the same two RRB subcategories, RSM and IS, observed in the AGRE dataset (Additional file 6). However, no statistically significant signals were detected for association with RSM、IS or RSM/IS in GEMMA univariate or multivariate analysis of 8p21.2-8p21.1 region SNPs. The top-ranked SNP rs17057065 for RSM/IS with P-value 0.041 is located in an intron of PTK2B (Regional plot showing association P-values for 8p21.2-8p21.1 region SNPs is shown in Additional file 7). The lack of association in the SSC dataset may be caused by between-study heterogeneity. SSC probands included in present analysis are from simplex families, in contrast to 90 % of the probands in the AGRE cohort who are from multiplex families. Thus, the relatedness might increase the power for association analysis in AGRE sample. An additional source of heterogeneity comes from the age distributions of the probands in these two dataset, for which the sample means are significantly different (t = 34.3, Pvalue < 0.001).
Although the associations we reported for 8p21.2-8p21.1 were not observed in SSC dataset, previous observations provide biological plausibility for the contribution of chromosome 8p and 8p21.2-p21.1 to ASD. Chromosome 8p is known to harbor numerous genes implicated in developmental neuropsychiatric disorders, including schizophrenia and ASD [53]. In the largest schizophrenia linkage analysis to date, the 8p21.2-p21.1 region was found to be associated with schizophrenia (P-value = 4.00E-04) [54]. In a subsequent publication, the same group reported significant associations for SNPs in and around DPYSL2 and ADRA1A, 8p21.2-p21.1 region genes previously associated with schizophrenia in family-based and case-control association studies [55,56]; the strongest associated SNP (rs7817434; P-value = 3.01E-04) is located 377 kb from the top signal in our analysis (rs2322600) [57]. Recent meta-analyses of CNV and  GWA studies results suggest that there are both clinical and biological links between autism and schizophrenia [58,59], so it is highly plausible that common variants in this region contribute to both ASD and schizophrenia. Significantly, a subset of schizophrenia patients display multiple repetitive behaviors [60].

Genomic features of the 8p21.2-8p21.1 region
The SNP showing the most significant association for RSM/IS, rs2322600, has a P-value just shy of genome-wide significance (P-value = 6.48E-08). In addition, twenty-four SNPs in the same region have P-values for association smaller than 5E − 07, including the top associated SNP for IS (rs62503729) and three genotyped SNPs (rs1562331, rs2322606 and rs10097861) that are in linkage disequilibrium with the top three SNPs (Additional file 8). Together, these 25 SNPs span a 117 kb region that contains three genes: STMN4, TRIM35 and PTK2B. STMN4 (stathmin-like 4) interacts directly with microtubules, causing a switch from a straight to a curved conformation that has been proposed to promote rapid microtubule depolymerization. According to the Brain-Cloud study database (http://braincloud.jhmi.edu/), STMN4 mRNA is highly expressed in both fetal and adult dorsolateral prefrontal cortex (DLPFC) (Additional file 9) [48], consistent with a role in early neuron development [61,62]. PTK2B (Protein Tyrosine Kinase 2 Beta) encodes a major focal adhesion kinase that plays a key role in neuritogenesis and neurite elongation [63]. Because all the top SNPs are located in noncoding gene regions, we hypothesized that these SNPs are linked to genetic variants that regulate the expression STMN4 or PTK2B.

Expression quantitative trait locus (eQTL) analyses using adult CEU prefrontal cortex
With the aim of exploring the molecular basis of the observed associations with RSM/IS, we investigated whether our top SNPs or their proxies (r 2 > 0.7) associate with gene expression in the dorsolateral prefrontal cortex (Brodmann area 46) using expression and genotype data of 112 healthy adult CEU individuals. Top SNPs and multiple proxy SNPs in or near STMN4 showed nominally significant association between genotype and STMN4 mRNA expression (rs2322600, P-value = 0.023; rs35240189, P-value = 0.014; rs62503729, P-value = 0.024), with the strongest association represented by rs10097861 (a proxy of rs2322600, r 2 = 0.76) at a P-value of 0.002. None of the SNPs were associated with the expression of PTK2B mRNA. Since expression data in BrainCloud dataset were from adult human brain, it is possible that the genetic variants regulate PTK2B mRNA expression only in the developing brain.

Bioinformatic evaluation
We queried the RegulomeDB database [64] to assess whether any of the 28 SNPs that associated with the RSM/IS sub-phenotypes (P-value < 5E-07, n = 28 SNPs) are located in known or predicted regulatory elements, including regions of DNase I hypersensitivity, binding sites for transcription factors and promoter regions that regulate transcription. Two SNPs, rs2322606 and rs10097861, received RegulomeDB likelihood scores of 1f (i.e., mapping to a predicted TF binding site and/or within a DNase I sensitivity peak and correlating with gene expression). Both of these variants, which are located within an intron of PTK2B, associated with

Functional enrichment pathway analysis
In total, 252 genes linked to SNPs with nominal associations (P < 0.01) with RSM/IS in the multivariate analysis of the AGRE cohort were selected for enrichment analysis using DAVID [see Additional file 10]. Using the DAVID Functional Annotation Clustering tool, we identified 7 annotation term clusters with enrichment scores > 1.3 (equivalent to a nominal P-value ≤ 0.05), including two clusters containing pathways crucial for brain development and function [see Additional file 11]. Cluster 1(enrichment score: 1.93) contained several pathways previously implicated in the pathogenesis of ASD, including neuron development (GO:0048666, P-value = 0.002797) [65], neuron projection development (GO:0031175, Pvalue = 0.003104) [66] and axon guidance (GO:0007411, P-value =0.016). Cluster 3 (enrichment score: 1.75) contained several cell-adhesion pathways, which have also As a negative control, we also carried out DAVIDbased enrichment analyses using candidate gene lists derived from association analyses of 10 sets of permuted RSM/IS phenotypes (specifically, the phenotypepermuted replicates with the ten highest lambdaGC values). Among 53 annotation term clusters with enrichment scores greater than 1.3 that were obtained for the ten gene sets, only one was related to the brain/ neuron development (Additional file 12). These results suggest that the gene list derived from the original, non-permutated association is enriched in brain and plausible ASD-related pathways. Because no enriched annotation term cluster identified in present study survived correction for multiple testing, however, the possibility of false-positive enrichments cannot be excluded.

Discussion
In this work, analysis of ADR-R RRB item scores in the AGRE and SSC datasets confirmed the existence of two previously identified subcategories in the RRB domain, RSM and IS. GWAS of RSM and IS subcategory scores based on univariate and multivariate mixed models identified common variants within 8p21.2-8p21 as possible susceptibility locus for RRB in the AGRE dataset, but not in the SSC dataset.
Univariate association analysis identified different association patterns for IS and RSM, including signals at 1q21-1q21.2、6p24.3、11q23 and 8p21.2-8p21.1 that have not previously been reported for association with RRB. Several regions in 15q have previously been linked to RSM or IS in linkage analysis, including 15q21.2-q22.2, 15q13.1-q14 [25] and 15q11-q13 [31]. Our analysis failed to detect suggestive associations for RSM or Common variants within chromosome 8p21.2-8p21.1, a locus previously linked to schizophrenia, approached genome-wide significance for RSM/IS and were also the top signals for IS in univariate association analysis. Since many genetic variants linked to ASD have a high degree of pleiotropy (i.e., where one gene affects more than one phenotype), it is reasonable that some genetic variants contribute to both RSM and IS and were detected with higher association magnitude using multivariate association model [68]. Although the associations we reported for 8p21.2-8p21.1 were not observed in the SSC dataset, we should mention that a partial trisomy of 8p (21)(22)(23) has been identified in a 6-year-old female with autism [69]. This region is also included in a large (6.14 Mbp) chromosome duplication identified in a patient with autism and self-mutilation [70]. This patient presented with abnormal behaviors, including ritualistic behaviors, self-injury and temper tantrums, consistent with the hypothesis that this chromosomal region contains a dosagesensitive gene that contributes to RRB phenotypes.
Our top SNPs and multiple proxy SNPs in or near STMN4 were identified as eQTLs for STMN4 in human prefrontal cortex, and it is plausible that STMN4 influences RRB domain phenotypes by modulating neuron development and dendritic microtubule dynamics [71].
Furthermore, based on mRNA expression data from two public databases [64,72], our top SNP, rs2322600, and several proxy SNPs correlate with expression of BNIPL3, a gene located almost 890 kilo base pairs upstream (rs2322600: P-value = 9.17E-06). Since long-range regulation of mRNA expression by genetic elements located as far away as 1 Mbp, has been previously described [73,74], in principle, this gene may also be considered a candidate for RRB. Although BNIPL3 has not previously been reported to be associated with ASD, it encodes a and ASD (d). *Each filled circle represents the P-value for one SNP, with the top SNP, represented by a purple diamond and additional associated SNPs represented by colors showing their degree of linkage disequilibrium (r 2 ) with the top SNP (as estimated internally by the Locus Zoom program based on data from CEU (Utah residents of Northern and Western European ancestry HapMap haplotypes) population. Genes within the region are shown in the lower panel, and the unbroken blue line indicates the recombination rate within the region. *Association with common variants in this region and ASD diagnosis were analyzed according to previously reported association analyses with the AGRE pedigrees [9] using Pedigree Disequilibrium Test (PDT) [83] mitochondrial outer membrane protein that is required for mitochondrial clearance and has been proposed to play a role in hypoxia-induced autophagy [75]. Recent research [67] has shown that children and adolescents with autism have high dendritic spine density in the brain and this excess is due to a defect in dendritic spine "pruning," a process essential for normal brain development [76]. The same study also showed that the abnormal spine pruning is caused by a defect in autophagy in neurons [67]. Mitochondria localize in both pre-and postsynaptic department (axon terminals and dendritic spines), and mitophagy is crucial for brain development and dendritic spine pruning [77].
PTK2B has been widely studied since it was identified as a novel Alzheimer's disease (AD) candidate gene in a large meta-analysis of AD GWAS [78]. PTK2B kinase, a major focal adhesion kinase, regulates the integrity of focal adhesions, which are major compartments for integrating signals for cell growth, apoptosis, and neuron migration, cellular functions essential for normal brain development [79]. Since several neuronal cell-adhesion genes have been identified in rare ASD cases [80,81] and a GWA study has shown that neuronal cell-adhesion molecules may be collectively associated with ASDs [9], it is plausible that PTK2B contributes to ASD through its roles in regulating integrity of focal adhesions.
Because the terms in each enriched cluster identified by the pathway analysis did not survive multiple testing correction (P-value > 0.05 after controlling for the false discovery rate (FDR) using the Benjamini-Hochberg method), we could not identify specific biological pathways that contribute to the development of RRB in ASD. However, DAVID analysis included PTK2B among the top three enriched terms in Cluster1(Additional file 11), based on its function in focal adhesion formation and regulation of adherens junction dynamics by phosphorylation switches [82], providing evidence that PTK2B is a plausible candidate gene for RRB.

Conclusions
In this study, univariate and multivariate genome-wide association studies of RRB subcategories using data from an AGRE ASD cohort enabled the detection of associated SNPs at 8p21.2-8p21.1. This region contained 25 genotyped or imputed SNPs with P-values for association with RSM/IS < 5E-07, with the top SNP (P-value = 6.48E-08) just missing genome-wide significance. Notably, 8p21.2-8p21 has previously been linked to schizophrenia and our top SNPs are located in close proximity and/or correlate with the expression of several genes with plausible connections to ASD and RRB. Because association signals in this chromosome region were not detected in the SSC ASD dataset, however, more work will be required to validate the possible contributions of common variants in 8p21.2-8p21.1 to RRB or ASD.