Characterization of siRNAs clusters in Arabidopsis thaliana galls induced by the root-knot nematode Meloidogyne incognita

Background Root-knot nematodes (RKN), genus Meloidogyne, are plant parasitic worms that have the ability to transform root vascular cylinder cells into hypertrophied, multinucleate and metabolically over-active feeding cells. Redifferentiation into feeding cells is the result of a massive transcriptional reprogramming of root cells targeted by RKN. Since RKN are able to induce similar feeding cells in roots of thousands of plant species, these worms are thought to manipulate essential and conserved plant molecular pathways. Results Small non-coding RNAs of uninfected roots and infected root galls induced by M. incognita from Arabidopsis thaliana were sequenced by high throughput sequencing. SiRNA populations were analysed by using the Shortstack algorithm. We identified siRNA clusters that are differentially expressed in infected roots and evidenced an over-representation of the 23–24 nt siRNAs in infected tissue. This size corresponds to heterochromatic siRNAs (hc-siRNAs) which are known to regulate expression of transposons and genes at the transcriptional level, mainly by inducing DNA methylation. Conclusions Correlation of siRNA clusters expression profile with transcriptomic data identified several protein coding genes that are candidates to be regulated by siRNAs at the transcriptional level by RNA directed DNA methylation (RdDM) pathway either directly or indirectly via silencing of neighbouring transposable elements. Electronic supplementary material The online version of this article (10.1186/s12864-018-5296-3) contains supplementary material, which is available to authorized users.


Background
Small RNAs (sRNAs) are 20-to 24-nucleotide (nt) non-coding RNAs that regulate gene expression at the transcriptional and post-transcriptional levels in eukaryotes [1]. There are two principal classes of small RNAs in plants, classified according to their biogenesis: micro-RNAs (miRNAs) and small interfering RNAs (siRNAs) [2,3]. MicroRNAs are 21-22 nt long and are produced from a single-strand RNA precursor folded into a hairpin. Plant miRNAs are well characterised. They induce post-transcriptional gene silencing (PTGS) principally by triggering messenger RNA (mRNA) degradation, but they can also induce translational repression [4]. By contrast, siRNAs are 21-24 nt long and are produced from double stranded RNA (dsRNA) precursors resulting from (i) the hybridisation of two complementary RNA strands or (ii) de novo synthesis from a single-stranded RNA as a new complementary strand by RNA-dependent RNA polymerases (RDRs) [5,6]. siRNAs mediate gene repression at the transcriptional or post-transcriptional level and form several classes differing in terms of predominant size [3].
Transcriptional gene silencing (TGS) takes place in two phases with different specific actors. The pre-establishment phase involves a DNA-dependent RNA polymerase (Pol) II producing aberrant transcripts, RDR6, which produces dsRNAs that are processed by Dicer-like protein 2 (DCL2) and DCL4 to produce 21-22 nt siRNAs that induce DNA methylation by DOMAINS REARRANGED METHYLTRANSFERASE (DRM) 2 and, probably, DRM1, through Argonaute 6 -RNA Induced Silencing Complex (AGO6-RISC) and PolV transcripts [7,8]. Once RdDM has been established, a stabilisation phase occurs. This phase involves single-strand RNA transcripts produced by PolIV from intergenic or repetitive regions of the genome, which are rendered double-stranded by RDR2 and processed by DCL3 to produce 24 nt hc-siRNAs. Hc-siRNAs are loaded onto AGO4-RISC to initiate RdDM through the DRM2 and, probably, DRM1 proteins at hc-siRNA-generating loci, using other transcripts produced by the DNA-dependent Pol V [8]. One of the main functions of RdDM is maintaining genome integrity, by ensuring that suppressive levels and types of DNA methylation are maintained at transposable elements. RdDM may also modulate the expression of neighbouring protein-coding genes through the spread of DNA methylation [9][10][11][12].
siRNAs are also involved in PTGS. They are loaded onto AGO1 or AGO2-RISC, where they generally either induce the cleavage of target transcripts or prevent their translation [3]. The siRNAs involved in PTGS can be classified into several subfamilies according to the origin of the precursor. Natural antisense siRNAs (nat-siRNAs) are generated by the processing of dsRNA precursors derived from endogenous RNAs with complementary sequences, through the action of DCL4 or DCL2, to generate sRNAs of 21 and 22 nt in length, respectively. Phased, secondary, small interfering RNAs (phasiRNAs) are mostly 21 nt siR-NAs derived from a RNA converted to dsRNA by RDR6 and processed by DCL4. One well characterised family of phasiRNAs are Arabidopsis trans-acting siRNAs (tasiR-NAs) [13,14]. The production of double-stranded pha-siRNA precursors is stimulated by one or more upstream miRNAs, such as TAS3-derived tasiRNAs in A. thaliana, which are generated from non-coding TAS3 transcripts by miR390 triggers [15,16].
In plants, siRNAs have been shown to regulate gene expression in various biological processes, including growth, development [15], cell differentiation [17], and plant responses to abiotic and biotic stresses [18][19][20][21]. Various PTGS-inducing siRNAs (21-22 nt) have been shown to be related to plant immunity. Examples include nat-siRNAATGB2 from A. thaliana, which is induced by Pseudomonas syringae pv. tomato and plays a positive role in disease resistance by repressing the pentatricopeptide repeats protein-like gene [22]. The nucleotide-binding leucine-rich repeat (NB-LRR) gene family is widely targeted by secondary siRNAs, and pha-siRNAs derived from NBS-LRRs play a key role in regulating plant immunity [14,23]. For example, in Arabidopsis, miR472 and its RDR6-mediated gene silencing help to modulate both PAMP-triggered (PTI) and effector-triggered (ETI) immunity [24]. TGS-mediating hc-siRNAs have also emerged as major regulators of plant immunity directing DNA methylation and/or histone modification. A role for hc-siRNA-mediated TGS in plant immunity has also come to light, as fungal elicitors induce alterations in the accumulation of certain hc-siRNAs [12]. Moreover, the RdDM machinery has been shown to be involved in plant responses to several pathogens, including Botrytis cinerea, P. syringae, and Agrobacterium tumefaciens [25][26][27].
Root-knot nematodes (RKN), Meloidogyne spp., are highly polyphagous sedentary plant parasites capable of infesting most crop species [28,29]. After penetrating host roots, RKN larvae migrate toward the vascular cylinder and reprogram gene expression in several vascular root cells, to induce their development into hypertrophied multinucleate giant feeding cells (GCs) [30]. These GCs are metabolically overactive, and serve as the sole source of the nutrients required for RKN development. The growth of the GCs and divisions of the surrounding cells lead to a root deformation known as a knot or gall. The redifferentiation of vascular cells into GCs results from the extensive reprogramming of gene expression in root cells, in response to RKN signals [31]. The expression of genes encoding proteins involved in metabolism, the cytoskeleton, cell cycle, cell rescue, defence, hormones, cellular communication and cellular transport are modified in galls and giant cells from various plant species [30,[32][33][34][35]. We are beginning to decipher the genetic pathways modified in infected roots for the formation of GCs, but little is known about the factors regulating this reprogramming of gene expression.
To date, the role of siRNAs in plant-nematode interactions has been little investigated. Two studies provided a first general overview of the small RNA populations expressed in the early feeding sites induced in A. thaliana roots by the RKN M. javanica [36] and the beet cyst nematode (CN) Heterodera schachtii [37]. The first detailed analysis of plant siRNAs expressed in response to H. schachtii identified 125 putative A. thaliana siRNAs expressed in root feeding sites named syncytia. The methylome of A. thaliana syncytia and the associated population of 24 nt siRNAs were recently studied and their abundance was found to be associated with the hypermethylation of transposable elements (TEs) and gene promoters [38]. Moreover, an analysis of the length distribution of Arabidopsis sRNAs in the early developing galls induced by M. javanica showed markedly larger numbers of 24 nt reads than in uninfected roots and an overexpression of miR390 and its secondary TAS3-siRNA in galls [36]. Resistance to RKN of Arabidopsis mutant lines for TAS3a confirmed the role of these small RNAs in the plant-RKN interaction via the control of Auxin Responsive Factor 3 expression. Overall, these analyses suggest that plant parasitic nematodes use both microRNAs and the siRNA pathway to manipulate host gene expression at the transcriptional and/or post-transcriptional levels.
We developed a sequencing strategy to characterise the siRNA populations expressed in galls at two key points in giant cell/gall development: 7 days post infection (dpi), corresponding to the phase of successive nuclear divisions without cytokinesis; and 14 dpi, corresponding to the phase of isotropic growth and an increase in DNA levels through endoreduplication without nuclear divisions [39]. In this study, we focused on genomic regions named clusters corresponding to accumulations of siRNAs differentially expressed (DE) between galls and uninfected roots. This analysis provides insight into the loci targeted by siRNAs during the plant-nematode interaction (coding genes, promoting regions, transposable elements). We then identified differentially expressed genes corresponding to differentially expressed clusters with inversely correlated expression profiles, providing biological support for the regulation of these genes by siRNA pathways.

Identification of predicted siRNA clusters in galls and uninfected roots
We sequenced small RNAs expressed in uninfected root inter-nodes and galls 7 and 14 dpi induced by the RKN M. incognita. Twelve small RNA libraries corresponding to three independent replicates of galls (G) at 7 and 14 dpi and the corresponding uninfected roots (R) were sequenced. The pipeline used for analysis of small RNAs is presented in Fig. 1. Roots galls are composed of nematode and plant tissue. The vast majority of the reads aligned with A. thaliana genome (90-95%) with only a small proportion (5-10%) of the reads aligned with M. incognita genome [40].
We predicted siRNA clusters, by annotation with ShortStack package [41] for each of the 12 libraries. Most of the sequenced reads mapped on A. thaliana genome are located within clusters (Fig. 2a). siRNAs result of the processing by Dicer of a long dsRNA precursor [42]. A DicerCall (DC) score is attributed by ShortStack to each cluster for which the majority of reads are between 20 and 24 nt in size, a characteristic typical of Dicer processing. A DC cutoff of 0.8 (meaning that at least 80% of the reads are between 20 and 24 nt long) was used to distinguish between non-DCL-derived and DCL-derived loci and to exclude all small RNAs corresponding to the degradation products of long RNAs (mRNAs, rRNAs, tRNAs) [43]. Most of the sequenced reads that are located within cluster on A. thaliana genome belong to DC-clusters (Fig. 2a, Additional file 1: Figure S1). Moreover, most of the clusters predicted were found to be DC-clusters: about 80% of the clusters in galls at 7 and 14 dpi, 72% in uninfected roots at 7 dpi and only 58% in uninfected roots at 14 dpi (Fig. 2a, Additional file 1: Figure S1). The lower proportion of DC-clusters in uninfected roots at 14 dpi principally reflects the effect of the R3-14 dpi library, which contained only 44% DC clusters (Additional file 2: Table S1). Only the R3-14 dpi library had a smaller number of reads mapping to DC-clusters (678,080) than the other libraries suggesting that the R3-14 dpi library may contain more degradation products than the other libraries. siRNA clusters produce a mixture of RNAs of different sizes. The DC score (21,22,23 or 24 nt) therefore corresponds to the most frequent size of reads mapped to the cluster and does not always reflect the diversity of reads within the cluster (Additional file 1: Figure S1). In all libraries, the majority of DC clusters (between 94 and 97%) had scores of 23 or 24 nt (Fig. 2). Only the DC clusters were retained for further analysis. As previously described [43], we assigned the predicted siRNA DC-clusters into two categories: 20-21-22 nt clusters corresponding to siRNAs and 23-24 nt clusters corresponding to hc-siRNAs.
About twice as many DC clusters were identified in galls as in uninfected roots, with a mean of 67,630 clusters at 7 dpi and 72,556 clusters at 14 dpi for galls, and a mean of 27,151 clusters at 7 dpi and 17,378 clusters at 14 dpi for uninfected roots (Fig. 2a). The mean size of uninfected root clusters was 247 nt at 7 dpi and 202 nt at 14 dpi, whereas the corresponding sizes in galls were 355 nt at 7 dpi and 340 nt at 14 dpi ( Fig. 2a; Additional file 2: Table S1). The larger size of clusters in galls indicates that the over-representation of clusters in galls is not due to a prediction bias for size fragmentation. In addition to the much larger number of clusters in galls than in uninfected roots, the coverage of clusters was also clearly different at 7 dpi, with mean coverage rates of 51 and 96 reads per cluster in galls and roots, respectively. At 14 dpi, no strong difference was observed, with a mean coverage of 52 and 60 reads per cluster in galls and roots, respectively.

Localization of siRNA clusters that are differentially expressed in galls
As siRNA clusters were predicted independently for each sample, the three libraries for each condition (gall or root) did not yield the same lists of predicted siRNA clusters (Additional file 2: Table S1). For the statistical analysis of our data, we constructed a reference set of predicted siRNA clusters with a strategy similar to that used for Physcomitrella patens [43]. For each time point, predicted A. thaliana siRNA clusters common to at least two of the three libraries in at least one condition (gall or root) were pooled in a reference set of predicted siRNA clusters (Fig. 1). This reference set comprised 86,264 and 92,578 siRNA clusters from gall and/or root libraries at 7 and 14 dpi, respectively. We compared the expression levels of these predicted siRNA clusters between galls and roots at each time point, by DESeq statistical analysis [44] with an adjusted P-value (p adj ) < 0.05 considered significant. Only differentially expressed (DE) clusters supported by a DC score and displaying significant levels of expression (at least 2 reads per million mapped reads (RPMM)) in the three libraries, for at least one condition (gall or root) were considered to be robust predicted siRNA clusters and were retained for further analysis (Additional file 3: Table S2). An analysis of the genomic positions of predicted differentially expressed siRNA clusters showed that, at 7 dpi, 2871 Fig. 1 Pipeline of siRNA analysis from prediction to statistical analysis. Data obtained from the SOLiD sequencing of the 12 libraries were cleaned of adaptors and special sequences (snRNA, snoRNA, mitRNA, tRNA, miRNA and pre-miRNA). The Shortstack algorithm mapped and identified clusters corresponding to genomic regions accumulating siRNAs. The algorithm was first run for each library independently. Bedtools was used to identify clusters that were present in at least two out of three libraries in at least one condition (galls or roots). If these clusters were separated by a distance of less than 2 nucleotides, they were then merged (default parameter) and selected to build a reference set of clusters to perform counting and statistical analysis. From counting data, a DESeq statistical analysis was performed to identify clusters that were differentially expressed between gall and root conditions. Only clusters with a DicerCall (DC-clusters) and a minimum coverage of 2 rpmm in all replicates in at least one condition (galls or roots) were considered as biologically relevant and 3157 clusters in galls were located within the body of the gene and in the promoter region, respectively ( Fig. 3; Additional file 4: Table S3). We found that 2029 and 2295 of the clusters differentially expressed in galls at 14 dpi were located within gene sequence and putative promoter regions, respectively ( Fig. 3; Additional file 4: Table S3). The numbers of clusters in putative promoter regions and within the body of genes were therefore similar. Most of these clusters were covered by reads in both gall and root samples. Overall, 275 and 809 clusters upstream from genes at 7 and 14 dpi, respectively, and 328 and 653 clusters within genes at 7 and 14 dpi, respectively, were covered exclusively by reads in a single condition, either galls or roots (Additional file 5: Table S4). Most of the predicted DE clusters were found to be upregulated in galls ( Fig. 3; Additional file 4: Table S3). At 7 dpi, the clusters upregulated in galls accounted for 72.0% of the clusters located in promoter regions and 89.7% of those located within the body of the gene. At 14 dpi, the clusters upregulated in galls accounted for 99.3% of the clusters located in promoter regions and 99.7% of those located within the body of the gene. Only two clusters were among the 20 differentially expressed clusters with the highest fold change in expression, both at 7 and 14 dpi: one of these clusters is located within a gene encoding a lecithin cholesterol acyltransferase (AT3G44830), and the other is located upstream from a gene encoding a homeodomain-like protein (AT2G13960).
The number of clusters was always larger in galls than in uninfected roots (Additional file 6: Table S5) and 23-24 nt was the main DC category of DE clusters regardless of the type of sample (galls or uninfected roots) or the genomic location (in promoter or in gene body) in comparison to the 20-22 nt DC category (Additional file 6: Table S5). However, these two DC categories clearly differed between galls and roots: the number of 23-24 nt clusters in galls was always larger  (6) proportion of DC-clusters; (7) DC-cluster mean size (nt) and (8) mean abundance of reads by DC-cluster. The data for each library are presented in Additional file 2: Table S1. b Stacked bar charts of the number of the various categories of DC-clusters (20, 21, 22, 23, 24 nt) for each condition. The proportions of each category are also presented than in uninfected roots, but the opposite pattern was observed for the 20-22 nt category, for which the number of clusters in uninfected roots was always greater than in galls ( Fig. 4 and Additional file 6: Table S5). The over-representation differentially expressed clusters in galls appeared therefore to be specific to the 23-24 nt category. This size corresponds to the hc-siRNAs that are known to repress gene expression by targeting transposable elements located in their promoter region and by inducing RdDM. We therefore focused further analyses on the differentially expressed clusters located in putative promoter regions by i) strengthening cluster bioinformatic predictions with transcriptomic data and ii) investigating the presence of transposable elements or repeats derived from transposable elements within these biologically relevant clusters.

Analysis of differentially expressed clusters of hc-siRNAs and their putative targeted genes in galls
We restricted our analysis to biologically relevant genes, by focusing on predicted DE siRNA clusters located in the putative promoters of genes differentially expressed in galls with inversely correlated expression profiles. The expression of genes located within 2 kb downstream from the siRNA clusters was investigated within DNA chips data listed in NEMATIC [45]. Only a small fraction of the predicted siRNAs targeted genes differentially expressed in galls (Additional file 3: Table S2). At 7 dpi, 118 genes covered by 129 differentially expressed siRNA clusters and located within the putative promoter region were differentially expressed in galls and the expression profiles of 59 of these genes were inversely correlated with those of the corresponding siRNA clusters, with 37 genes repressed and 22 upregulated in galls (Table 1; Additional file 7: Table S6). Of these 59 DE genes, 14 genes (23.7%) were specifically differentially expressed at 7 dpi and 31 genes (52.5%) were differentially expressed at 7, 14 and 21 dpi (Table 1). Twenty-nine of these 31 genes had similar patterns of expression at these three time points: 20 were downregulated and nine were upregulated in galls, throughout gall development. At 14 dpi, 116 genes covered by 129 DE siRNA clusters in their promoter regions were differentially expressed in galls, and the expression profiles of 69 of these genes were inversely correlated with those of the siRNA clusters; all of these genes were repressed in galls ( Table 2; Additional file 8: Table S7). Fifteen (21.7%) of the 69 DE genes with expression patterns inversely correlated with that of the DE clusters in their putative promoter regions were differentially expressed specifically at 14 dpi, 16 genes (23.2%) were differentially expressed at 7, 14 and 21 dpi. 34 genes (49.3%) were differentially expressed at 14 and 21 dpi, but not at 7 dpi (Additional file 8: Table S7).  Table with the total number of clusters differentially expressed in galls in comparison to uninfected roots at 7 and 14 dpi according to their genomic location. The number of upregulated clusters is indicated A comparison of the results presented in Additional file 7: Table S6 and Additional file 8: Table S7 identified 13 genes as differentially expressed with expression patterns inversely correlated with those of the DE siRNA-clusters targeting their putative promoters at 7 and 14 dpi (Additional file 9: Table  S8). These genes included several with molecular functions relating to catalytic activity (2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase, choline kinase 3, Phosphoglycerate mutase, beta-glucosidase and NADP-dependent malic enzyme 2), or encoding DNA-binding proteins (protein-coding NAC domaincontaining protein 58).

Colocalisation of hc-siRNAs clusters and repeats
Genes targeted by hc-siRNA and RdDM have TEs, or repeats derived from TEs, within their promoter sequences. These sequences are targeted by hc-siRNA, to induce TGS of this region through the induction of RdDM. We investigated the presence of transposon-derived repeats within or in the vicinity of the differentially expressed siRNA clusters. We retrieved the 2 kb upstream from the 5'UTR of DE genes with siRNA clusters in their promoter region and inversely correlated pattern of expression for the gene and the siRNA cluster and investigated the presence of repeats within or close to the cluster by comparing these sequences to Repbase sequences with CENSOR  These DC-clusters were all shared by galls and roots and located in promoter regions and with expression patterns inversely correlated with those of the downstream differentially expressed genes. The upregulation (up) or downregulation (down) of the cluster in galls (G) compared to uninfected roots (R) at 7 dpi (variation G/R 7 dpi), the AGI gene name, the description of the encoded protein from TAIR and the log2 values of galls/roots at 7, 14 and 21 dpi obtained from microarrays [32,45] were indicated These DC-clusters were all shared by galls and roots and located in promoter regions and with expression patterns inversely correlated with those of the downstream differentially expressed genes. The upregulation (up) of the cluster in galls (G) compared to uninfected roots (R) at 14 dpi (variation G/R 14 dpi), the AGI gene name, the description of the encoded protein from TAIR and the log2 values of galls/roots at 7, 14 and 21 dpi obtained from microarrays [32,45] were indicated [46]. At 7 dpi, 21 of the 64 clusters differentially expressed in galls, located in promoter regions and with expression patterns inversely correlated with those of the associated DE genes had sequences displaying identity to the A. thaliana TE (Fig. 5; Table 3). At 14 dpi, 13 of the 76 clusters differentially expressed in galls, located in promoter regions and displaying expression patterns inversely correlated with those of the associated DE genes displayed had sequences displaying identity to the A. thaliana TE (Fig. 5; Table 4). The other DC-clusters displayed no direct sequence identity but were located in the vicinity of repeats. Eight genes differentially expressed at 7 dpi and 17 genes differentially expressed at 14 dpi had siRNA clusters and sequences displaying identity to the sequences of transposable elements in the promoter region, although the transposable elements and siRNA clusters did not overlap ( Fig. 5; Table 5; Table 6). These genes correspond to 9 DE clusters at 7 dpi and 20 at 14 dpi that have sequence identity with TE in their vicinity. All together, 46% of the clusters differentially expressed in galls at 7 dpi and 39% of the clusters differentially expressed in galls at 14 dpi display sequence identity with A. thaliana TE or are located in vicinity of sequence homologous to TE.
Differentially expressed genes with siRNA clusters colocalised with repeats in their promoter regions overlap with genes differentially expressed in the syncytium induced by cyst nematode with methylationassociated TE profiles Cyst nematode induce the formation of hypermetabolic multinucleate feeding site, named syncytium, that results from the induction of an initial syncytial cell within the root parenchyma that then integrates several hundred of the surrounding cells through cell wall dissolution [47,48]. Therefore the feeding sites induced by CN and RKN differ by their biogenesis but have similar phenotype and share some molecular pathways [49]. We compared the list of DE genes/DE siRNA clusters in galls displaying inversely correlated patterns of expression and with repeats in the gene promoter to the list of the 526 differentially methylated TE-associated genes corresponding to genes differentially expressed in syncytia. We only identified seven genes as differentially methylated TE-associated genes that were differentially expressed in syncytia [38] and targeted by differentially expressed siRNA clusters in GCs at 7 or 14 dpi (Table 7). These genes were differentially expressed in galls and syncytia, with similar expression patterns, had repeats/TEs in

Discussion
The classification of siRNAs is complex, and their study requires a new analysis at genome level for each biological condition analyzed. For this genome-wide analysis, we used the ShortStack bioinformatics tool [41], which i) was developed for plant genomes, ii) predicts de novo areas of the genome in which small RNAs accumulate, named "clusters" and (iii) carries out statistical analyses of the read counts corresponding to these clusters, comparing the gall to root conditions. This algorithm therefore performs clustering analyses rather siRNA DC-clusters differentially expressed in galls compared to uninfected roots at 7 dpi located in promoter regions and with expression patterns inversely correlated with those of the associated differentially expressed genes and displaying sequence identity to A. thaliana transposable elements (TE). The upregulation (up) or down regulation (down) of the cluster in galls (G) compared to uninfected roots (R) at 7 dpi (variation G/R 7 dpi), the AGI gene name, the description of the encoded protein from TAIR and the log2 values of galls/roots at 7 dpi obtained from microarrays [32,45] were indicated than comparing the expression levels of single short sequences. In addition to predicting clusters generating siRNAs, ShortStack also provides information about the small-RNA population of each cluster, including the probability that this cluster results from maturation by a Dicer protein (DicerCall), size distribution, position on the strand, and the most highly represented sequence at the locus. We identified a large number of siRNA clusters in galls and uninfected roots, with the number of clusters predicted in uninfected root libraries much siRNA DC-clusters differentially expressed in galls compared to uninfected roots at 14 dpi located in promoter regions and with expression patterns inversely correlated with those of the associated differentially expressed genes and with sequence identity to A. thaliana TE. The upregulation (up) of the cluster in galls (G) compared to uninfected roots (R) at 7 dpi (variation G/R 7 dpi), the AGI gene name, the description of the encoded protein from TAIR and the log2 values of galls/roots at 7 and 14 dpi obtained from microarrays [32,45] were indicated siRNA DC-clusters differentially expressed in galls at 7 dpi located in promoter regions, with expression patterns inversely correlated with those of the associated differentially expressed genes and located in the vicinity of A. thaliana TE. The AGI gene name, the description of the encoded protein from TAIR, the TE similarity and TE name were indicated lower than that in galls. We investigated whether this larger number of clusters in the galls was due to the simple fragmentation of clusters into smaller clusters. The larger size of the clusters in galls was not consistent with such a prediction bias. The imbalance for 23-24 nt sequences seems to be characteristic of galls and has already been reported for galls collected at 3 dpi [36]. DESeq statistical analysis identified clusters differentially regulated between galls and uninfected roots. We increased the robustness of our data and the stringency of our analysis by selecting only differentially expressed clusters for which prediction was based on i) reproducible results for the various libraries, ii) a significant level of expression and iii) a high probability of resulting from Dicer cleavage. The hc-siRNAs associated with TGS are mostly targeted to promoter regions, although some examples of hc-siRNAs targeting the body of the gene have been reported [7,50]. For the study of hc-siRNAs, we therefore chose to focus on siRNA clusters targeting putative promoter regions. For the selection of clusters with regulatory action supported by biological evidence, we restricted our study to differentially expressed clusters with a DicerCall of 23-24 nt, covering promoter regions or genes differentially expressed in galls in DNA chip analysis and displaying a pattern of expression inversely correlated with that of the corresponding clusters. However, this strategy was highly restrictive, because only 2.7% of the genes with a cluster in their promoter region displayed transcriptomic regulation in NEMATIC DNA chip data. Similarly, only 5.6% of the DE siRNA clusters target a promoter region of a gene that is differentially expressed in galls at 7 and/or 14 dpi. The NEMATIC data were extracted from transcriptomic analyses performed with microarrays [45]. Microarrays are less sensitive and less exhaustive than new sequencing technologies [51]. New high-throughput sequencing-based analyses of gall transcriptomes are, therefore, required, to obtain a more complete view of the transcriptional reprogramming occurring in galls. The proportions of List siRNA DC-clusters differentially expressed in galls at 14 dpi located in promoter regions, with expression patterns inversely correlated with those of the associated differentially expressed genes and located in the vicinity of A. thaliana TE. The AGI gene name, the description of the encoded protein from TAIR, the TE similarity and TE name were indicated upregulated and downregulated siRNA clusters with inversely correlated expression differed between 7 and 14 dpi. At 7 dpi, 58 clusters (65.5%) were upregulated in galls, whereas, at 14 dpi, 100% of DE clusters were upregulated, suggesting a progressive increase in the number of clusters upregulated in galls. An analysis of the expression profiles of DE genes targeted by siRNA clusters in their putative promoters with an inversely correlated expression profile throughout gall development studied in [32] at 7, 14 and 21 dpi showed that most of the genes targeted by siRNA clusters at 7 dpi were differentially expressed at 7, 14 and 21 dpi, whereas most of the genes targeted by siRNA clusters at 14 dpi are differentially expressed at 14 and 21 dpi, but not at 7 dpi. However, these differences do not suggest any hypotheses concerning the action of siRNA, because these proportions are consistent with the general transcriptomic data obtained over the entire period of gall development [32]. Finally, TEs or sequences displaying identity to repeats were found in the putative promoter region of 40% (at 7 dpi) and 50.7% (at 14 dpi) of the DE genes displaying an inverse correlation of expression with the siRNA clusters in the promoter. These genes are good candidates for regulation by hc-siRNA during gall formation. hc-siRNAs control transposons and gene expression by inducing DNA methylation. We compared the list of hc-siRNA clusters differentially expressed in galls and supported by biological expression data with the genomic regions that were identified as differentially methylated in feeding sites in A. thaliana roots induced by the cyst nematode H. schachtii [38]. The role of siRNAs in plant-CN nematode interactions was first highlighted by a significant lower rate of infection with H. schachtii in the A. thaliana mutants dcl2-1 [37]. We identified five genes at 7 dpi and two genes at 14 dpi with i) promoter regions displaying some sequence identity to transposable elements ii) differential expression in galls and an inverse correlation of expression with DE siRNA clusters located in the promoter region of the genes concerned and iii) promoter regions differentially methylated in CN feeding sites. These genes are, therefore, robust candidates for regulation by hc-siRNA, through RdDM, during the development of feeding sites induced by plant-parasitic nematodes. Recently, two Arabidopsis mutants rdr2-rdr6 and dcl2/dcl3/dcl4 deprived of key factors for RdDM were shown to have lower rates of infection with RKN highlighting the importance of siRNAs and RdDM in Arabidopsis-Meloidogyne interaction [52].

Conclusions
In this work we provide the first analysis of siRNA clusters expressed in root galls induced by the RKN M. incognita. We identified several siRNA clusters that are candidates to modify expression of plant genes in response to RKN infection and these regulations are supported by transcriptomic expression data. These candidates should now be investigated in functional analyses, to confirm i) their regulation by RdDM and ii) their role in plant-nematode interactions. siRNAs will constitute a new field of investigation in studies of the molecular mechanisms underlying plant responses to parasitic nematodes. siRNA DC-clusters differentially expressed (DE) in galls at 14dpi located in the vicinity of Arabidopsis transposable elements. List of siRNA DC-clusters differentially expressed in galls at 7 dpi and/or 14 dpi located in promoter region, with expression patterns inversely correlated with those of the associated differentially expressed genes and that colocalise with differentially methylated regions in cyst nematode feeding sites. The AGI gene name, the description of the encoded protein from TAIR and the log2 values of galls/roots (G/R) at 7 and 14 dpi obtained from microarrays [32,45] and the log2 values of syncytia/roots (Sync/R) at 5 and 15 dpi obtained from [38] were indicated

Biological material, growth conditions and nematode inoculation
Seeds of A. thaliana (ecotype Wassilewskija) were surface-sterilised and sown on Murashige and Skoog (Duchefa) medium agar plates (0.5 x MS, 1% sucrose, 0.8% agar, pH 6.4). Plates were kept at 4°C for two days, and then transferred to a growth chamber (20°C with 8 h light and 16 h darkness). M. incognita strain "Morelos" was multiplied on tomato plants in a growth chamber (25°C, 16 h light and 8 h darkness). For in vitro nematode infection, J2 larvae were surface-sterilised with HgCl2 (0.01%) and streptomycin (0.7%) as described elsewhere [53]. We inoculated 25-day-old seedlings grown in vitro individually with 200 sterilised J2 s each, resuspended in Phytagel. Seven and 14 dpi, galls were dissected from the infected roots by hand. We also dissected internodes from uninfected roots (without apical and lateral root meristems) with the same age than infected roots for use as a negative control. Samples were immediately frozen in liquid nitrogen and stored at − 80°C. Three independent biological replicates were established for each set of conditions.

Construction and sequencing of small RNA libraries
Total RNA, including small RNAs (less than 200 nt long), was isolated from galls or uninfected roots at 7 and 14 dpi. Approximately 150 galls or internode fragments from uninfected roots were independently ground into powder in liquid nitrogen, with a mortar. Total RNA was extracted from these samples with the miR-Neasy Mini Kit (Qiagen), according to the manufacturer's instructions, with three additional washes in RPE buffer. The quality and integrity of the RNA were assessed with a Bioanalyzer (Agilent). Small RNA libraries were generated by ligation, reverse transcription and amplification (11 cycles) from total RNA (2 μg), with the reagents of the NEBNext Small RNA Library Prep Set for SOLiD. Libraries were then quantified with the Bioanalyzer High Sensitivity DNA Kit (Agilent) and sequenced on a SOLiD 5500 wildfire sequencer (Life Technologies) at the Nice-Sophia Antipolis functional genomics platform (France Génomique, IPMC, Sophia Antipolis, France).

Bioinformatic siRNA analysis
SOLiD colour spaced reads were translated into sequences for implementation in ShortStack application. For each library, adapters were trimmed and reads matching ribosomal RNA, mitochondrial RNA were removed by performing Blast analyses with the sequences listed in the Rfam database [54]. Reads mapping on sequences corresponding to snRNA, snoRNA, mitRNA, tRNA, miRNA and pre-miRNA were removed in order to keep only siRNAs. Genomic loci accumulating siRNAs were predicted by using the ShortStack version 3.3 algorithm [41,55]. ShortStack was run on each library separately with default parameters except: "nohp" mode (no miRNA research) and zero mismatches allowed. ShortStack software version 3.3 mapped trimmed and cleaned reads on a virtual concatenated genome composed by A. thaliana genome completed with plastidial and mitochondrial genomes (TAIR10.21) and M. incognita genome [56]. From alignments, Short-Stack identified a list of coordinates of loci accumulating siRNA. Bedtools [57] with Multi-intersectbed -i option was used to find shared clusters in the different libraries.
The output was filtered so that only regions that were present in at least 2 (out of 3) libraries for at least one condition either gall or uninfected root were used to serve as the final reference small RNA locus boundaries. Close clusters with a maximum distance of two nucleotides were merged with bedtools option Merge. ShortStack-count mode under default settings was then used to find relative small RNA abundance on this reference list of each library. Reads mapped on multiple loci were counted on each locus. Counts for siRNA accumulating loci from each replicate were used for differential expression analysis with the R package DESeq. Loci accumulating differentially siRNAs between galls and uninfected roots at a 5% false discovery rate (adjusted P value < 0.05; Benjamini Hochberg adjustment) were retrieved. Localisation of DE clusters was established in gene or in putative promoter gene region defined as the 2 kb upstream coding DNA sequence. Sequences derived from transposable elements were searched within promoter gene region by using CENSOR algorithm [45] that screens query sequences against a reference collection of repeats. The 2 kb upstream 5'UTR of genes were retrieved and analysed by CENSOR with default parameters.