A novel approach to co-expression network analysis identifies modules and genes relevant for moulting and development in the Atlantic salmon louse (Lepeophtheirus salmonis)

Background The salmon louse (Lepeophtheirus salmonis) is an obligate ectoparasitic copepod living on Atlantic salmon and other salmonids in the marine environment. Salmon lice cause a number of environmental problems and lead to large economical losses in aquaculture every year. In order to develop novel parasite control strategies, a better understanding of the mechanisms of moulting and development of the salmon louse at the transcriptional level is required. Methods Three weighted gene co-expression networks were constructed based on the pairwise correlations of salmon louse gene expression profiles at different life stages. Network-based approaches and gene annotation information were applied to identify genes that might be important for the moulting and development of the salmon louse. RNA interference was performed for validation. Regulatory impact factors were calculated for all the transcription factor genes by examining the changes in co-expression patterns between transcription factor genes and deferentially expressed genes in middle stages and moulting stages. Results Eight gene modules were predicted as important, and 10 genes from six of the eight modules have been found to show observable phenotypes in RNA interference experiments. We knocked down five hub genes from three modules and observed phenotypic consequences in all experiments. In the infection trial, no copepodids with a RAB1A-like gene knocked down were found on fish, while control samples developed to chalimus-1 larvae. Also, a FOXO-like transcription factor obtained highest scores in the regulatory impact factor calculation. Conclusions We propose a gene co-expression network-based approach to identify genes playing an important role in the moulting and development of salmon louse. The RNA interference experiments confirm the effectiveness of our approach and demonstrated the indispensable role of a RAB1A-like gene in the development of the salmon louse. We propose that our approach could be generalized to identify important genes associated with a phenotype of interest in other organisms. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08054-7).


Estimation of power parameter of the scale-free network transformation
In this study, we choose the power parameter β from integers ranging from 1 to 20, using the criterion recommended by [1],i.e. we choose the smallest β that results in approximate scale-free topology as measured by the scale-free topology fitting index R 2 . The scale-free topology fitting index is defined based on the frequency distribution of the connectivities in the network. The frequency distribution of network connectivity is calculated based on discretizing connectivities, and the discretized connectivity (dk) is defined as: where k represents the vector of network connectivities (see the formula-3 and formula-4 in the paper), and the function discretize()implements a equal-width discretization method to transform a numeric vector (k) to a vector (dk) of equal length whose components indicate the bin number into which the value falls. no.bins represents the number of equal-width bins, and it can be set depending on the specific situations or it can be defined as: where m is the number of observations, i.e. the number of components in the vector k. The frequency distribution of connectivity (p.Connectivity) can be defined as: A network exhibits scale-free topology if its frequency distribution p(dk) follows a power law: where N and γ denote positive real numbers. The logarithm of forumla (4) is: indicating that there exists a straight line relationship between log(p(dk)) and log(k) in a scale-free network, and the slope of the line should be negative. Therefore, the scale-free topology fitting index (R 2 ) can be defined to measure the extent of a straight line relationship between log(p(dk)) and log(k) : where BinN o = (1, 2, ..., no.bins). Networks with a scale-free topology fitting index R 2 close to 1 are thus can be defined to be approximately scale free. Since the slope of the regression line between log(p(dk)) and log(k) is negative, we multiplied R 2 by −1, i.e. we checked the signed R 2 .
According to [1], there is a natural trade-off between maximizing scale-free topology fitting index (R 2 ) and maintaining a high mean number of connectivities. Therefore, it is recommended that only those power parameter values leading to a network satisfying scale-free topology at least approximately (signedR 2 > 0.80) should be considered. And the average of network connectivities should be high, i.e, choose the lowest power parameter β that results in approximate scale-free topology.

Statistics for module preservation analysis
The package provided five module connectivity-based preservation statistics to compare the module connectivity patterns changes between the reference network and cor.kM E specifies the correlation of module memberships of nodes in the module between the reference network and test network, while cor.kM Eall calculates the analogous correlations for all nodes between the two networks: ).
The maximum adjacency ratio (M AR) of node i is defined as follow: where There are four density-based preservation statistics for measuring preservation of module density:meanCor, meanAdj, propV arExpl and meanKM E. For a given module, meanCor represents the mean correlation in the test network multiplied by the sign of the corresponding correlations in the reference network: where obs a represents the observed value of the statistic a, and µ a and σ a is the mean and standard deviation value obtained from the permutation respectively.
Finally, the connectivity-based preservation Z statistics and the density-based preservation Z statistics are aggregated into summary preservation statistics Z connectivity and Z density : Z connectivity = median(Z cor.kIM , Z cor.M AR , Z cor.kM E , Z cor.kM Eall , Z cor.cor ) (18) Z density = median(Z meanCor , Z meanAdj , Z propV arExpl , Z meanKM E ) Since we are interested in the preservation of both connectivity pattern and density pattern of a module, the summary preservation statistics were assigned equal