- Research article
- Open Access
De novo assembly and sex-specific transcriptome profiling in the sand fly Phlebotomus perniciosus (Diptera, Phlebotominae), a major Old World vector of Leishmania infantum
© Petrella et al. 2015
- Received: 7 July 2015
- Accepted: 15 October 2015
- Published: 23 October 2015
The phlebotomine sand fly Phlebotomus perniciosus (Diptera: Psychodidae, Phlebotominae) is a major Old World vector of the protozoan Leishmania infantum, the etiological agent of visceral and cutaneous leishmaniases in humans and dogs, a worldwide re-emerging diseases of great public health concern, affecting 101 countries. Despite the growing interest in the study of this sand fly species in the last years, the development of genomic resources has been limited so far. To increase the available sequence data for P. perniciosus and to start studying the molecular basis of the sexual differentiation in sand flies, we performed whole transcriptome Illumina RNA sequencing (RNA-seq) of adult males and females and de novo transcriptome assembly.
We assembled 55,393 high quality transcripts, of which 29,292 were unique, starting from adult whole body male and female pools. 11,736 transcripts had at least one functional annotation, including full-length low abundance salivary transcripts, 981 transcripts were classified as putative long non-coding RNAs and 244 transcripts encoded for putative novel proteins specific of the Phlebotominae sub-family. Differential expression analysis identified 8590 transcripts significantly biased between sexes. Among them, some show relaxation of selective constraints when compared to their orthologs of the New World sand fly species Lutzomyia longipalpis.
In this paper, we present a comprehensive transcriptome resource for the sand fly species P. perniciosus built from short-read RNA-seq and we provide insights into sex-specific gene expression at adult stage. Our analysis represents a first step towards the identification of sex-specific genes and pathways and a foundation for forthcoming investigations into this important vector species, including the study of the evolution of sex-biased genes and of the sexual differentiation in phlebotomine sand flies.
- Sand fly
- Transcriptome de novo assembly
- Differential expression analysis
- Sex-specific gene expression
- Sex-biased genes
In the Old World, the sand fly Phlebotomus (Larroussius) perniciosus (Diptera: Psychodidae, Phlebotominae) is one of the major vectors of Leishmania infantum (Kinetoplastida: Trypanosomatidae), the parasitic protozoan that causes visceral and cutaneous leishmaniases in humans and canine reservoir hosts [1–3]. P. perniciosus is also a vector for various known and emerging arboviruses considered relevant from a public health perspective [4, 5]. Described for the first time in Malta by Newstead in 1911, P. perniciosus is widely distributed in the western Mediterranean Basin, from Morocco to Libya in North Africa and from Portugal to Croatia in Europe. One of the most important endemic foci of canine and human visceral leishmaniases of this area is located in Italy, in the Campania region . Leishmaniases are diseases of great public health concern affecting 101 countries around the world with an estimated incidence of 0.9–1.6 million new cases each year . Due to climate and social changes, leishmaniases are becoming a worldwide re-emerging public health problem, with an expanding endemicity and more than 300 million people estimated at risk of transmission world-wide . Despite their importance as disease vectors, sand flies remain understudied relative to mosquitoes and other important vectors and relatively little is known about their biology, their feeding and reproductive behaviour under natural conditions, and about their genetics [9, 10].
In the recent years, the advent of the next generation sequencing technology and bioinformatics applied to the study of insects have paved the way for the fast identification and characterization of genes involved in relevant biological processes [11–13]. Nowadays, the number of insect genomes and transcriptomes available is increasing exponentially in public databases  (http://www.1kite.org). This allows for easier and faster comparative studies between related species.
To date two unpublished genome sequencing projects of the New World species Lutzomyia longipalpis (Dillon et al., unpub. res.; https://www.vectorbase.org/organisms/lutzomyia-longipalpis) and of the Old World species P. papatasi (McDowell et al. unpub. res.; https://www.vectorbase.org/organisms/phlebotomus-papatasi) and a few transcriptomics studies are available for sand fly species. Most transcriptomics studies, performed with EST sequencing, have focused on sand fly salivary glands and the role of saliva in pathogen-vector-host interactions [15–20], on sand fly-Leishmania interactions [21–23] and on the analysis of sand fly specific tissues . Only three studies have performed a global sand fly transcriptome analysis, the first two using EST sequencing technology [25, 26] and the third by 454 pyrosequencing on wild individuals of L. longipalpis . In addition, only two studies are available about the salivary gland and midgut transcriptome of P. perniciosus [28, 29].
In the present paper, we report the de novo transcriptome assembly and the differential expression analysis of adult males and females of the sand fly P. perniciosus. This analysis led to the identification of about 8600 sex-biased genes and represents the first next generation sequencing study with Illumina technology for a sand fly species. The data set constitutes a relevant resource for future genome annotation projects for P. perniciosus and for evolutionary comparative studies with other sand fly species and other blood-sucking Nematocera species, such as mosquitoes (Culicidae), biting midges (Ceratopogonidae) and black flies (Simuliidae). In particular, our data can be useful for the study of genes involved in the control of sex-specific traits such as host-parasite interaction, reproductive biology, somatic sexual development and sex determination, accelerating the discovery of potential targets for vector-based control strategies against leishmaniasis and other arthropod-borne diseases.
Sequencing and de novo assembly of the P. perniciosus transcriptome
Summary of sequencing and assembly statistics
Total sequenced reads
Trimmomatic filtered reads
Total assembled bases (bp)
Trinity assembled transcripts
Median transcript lenght (bp)
Average transcript lenght (bp)
Transcript N50 (bp)
Shortest transcript lenght (bp)
Longest transcript lenght (bp)
Trinity transcripts > 1Kb
Trinity transcripts > 2Kb
Transcriptome quality assessment
We validated the PERNI data set for sequence completeness using three different approaches. First, we applied the ortholog hit ratio metric (OHR) , which provides an estimate of the length of the assembled transcripts using their coding regions. OHR compares the length of the coding region between a newly assembled transcript and its ortholog of a reference transcriptome. If the two have the same length, then the reconstructed transcript is likely to correspond to the full length real mRNA, including the upstream and downstream untranslated regions, and the OHR = 1. Values of OHR < 1 suggest incompleteness. We calculated OHR values of 2827 PERNI transcripts chosen to have best reciprocal TBLASTX hits (E-value: 1e-6; min. hit coverage 30 %) with the D. melanogaster transcriptome (BDGP v6.03). In ~70 % of the cases, hits cover at least 50 % of the corresponding Drosophila transcript coding regions (average and median OHR values of 0.65 and 0.63; Additional file 1: Table S1). Similar results were obtained on 3621 PERNI transcripts using as reference the transcriptome of the sand fly P. papatasi, available at the VectorBase web site (Phlebotomus-papatasi-Israel_TRANSCRIPTS_PpapI1.1), with average and median OHR values of 0.66 and 0.65, respectively (Additional file 1: Table S1). Our results are comparable to the efficiencies observed in other de novo assembled transcriptomes of insects [34–36].
Next, we compared the PERNI data set with 4516 P. perniciosus nucleotide sequences available in the NCBI GenBank database (accession date 2015.01.15), mostly ESTs from a midgut transcriptome analysis . The 4516 available sequences of P. perniciosus were clustered using CD-HIT  with default parameters to obtain 2869 unique sequences with a N50 value of 798 bp to be used for bidirectional BLASTN analysis (cut-off E-value of 1e-20) with the PERNI data set. We observed that while only 2 % of the PERNI data set matched available P. perniciosus sequences, 77 % (2219 out of 2869) of the P. perniciosus sequences available in GenBank matched transcripts in the PERNI data set. Furthermore, we observed that 1470 out of these 2219 P. perniciosus sequences available in GenBank are shorter than their corresponding transcripts in the PERNI data set. Besides validating it, these results indicate that the de novo assembly presented here improves the coverage and the average transcript length of P. perniciosus.
Finally, to test the ability to detect transcripts with low expression, we searched for three low-abundance salivary transcripts identified in other closely-relative sand fly species that are not yet identified in P. perniciosus  (Nikola Polanska pers. comm.): the hyaluronidase, pyrophosphatase and adenosine deaminase genes. We performed a BLASTP search of the protein sequences (P. orientalis hyaluronidase GenBank acc. num.: AGT96452.1; P. argentipes pyrophosphatase GenBank acc. num.: ABA12155.1; P. duboscqi adenosine deaminase GenBank acc. num.: ABI20162.1) against the PERNI data set. Despite the PERNI data set deriving from whole body male and female adults, for each of the three low-abundance salivary-specific genes, we identified a full P. perniciosus assembled orthologous transcript with a complete ORF (hyaluronidase gene: 1800 bp long transcript, 470 aa long putative protein, GenBank acc. num.: KT160228; pyrophosphatase gene: 2795 bp long transcript, 475 aa long putative protein, GenBank acc. num.: KT160227; adenosine deaminase gene: 2180 bp long transcript, 508 aa long putative protein, GenBank acc. num.: KT160229). In Additional file 2: Figure S1 we reported the multiple alignment of these three putative proteins of P. perniciosus with orthologous proteins present in GenBank.
Overall, these results indicate that we produced a high quality de novo assembly of P. perniciosus transcriptome that matches and improves existing information and includes transcripts of genes with low expression.
Prediction of protein coding transcripts
Prediction of putative long non-coding RNAs
The Annocript annotation pipeline performs also a prediction of the putative long non-coding RNAs (lncRNAs) through the Portrait software, which uses support vector machine and is optimized for non-model organisms . In brief, a transcript is annotated as non-coding if it has (i) no match in public databases, (ii) a Portrait non-coding probability > 0.95, (iii) a transcript length > 200 bp and (iv) an ORF < 100 aa. Using Annocript, we predicted 1229 putative lncRNAs in P. perniciosus. However, to obtain a more reliable prediction of the lncRNAs, we combined the Portrait results with the predictions from two other CPAT  and CPC  which use logistic regression and machine-learning methods, respectively. We set a very stringent threshold in both cases (see Methods). When considering the intersection among the three prediction methods we predict as putative lncRNAs of P. perniciosus 981 PERNI transcripts (Fig. 2) with a length range of 201–1575 bp, a N50 value of 340 bp and a GC content of 35.04 %, lower than that of the whole PERNI data set (42.12 %), in agreement with the low GC content of the lncRNAs described in other insect species . Low GC content is emerging as a feature of the long non-coding transcripts, at least in humans where they are mainly studied . This finding supports the reliability of our prediction (Additional file 7: Table S5).
Prediction of putative novel protein coding transcripts
In addition to the 11,736 annotated PERNI transcripts and the 981 putative lncRNAs, the PERNI data set contains 244 unannotated PERNI transcripts (Fig. 2) with an open reading frame longer than 100 aa (mean ORF length = 392.3 +/− 213.5), a GC content of 42.97 % and predicted as coding by Portrait (coding probability >80 %), CPAT (coding probability >80 %) and CPC (Score > 0) software (Additional file 7: Table S5). A TBLASTX analysis against the P. papatasi transcriptome revealed that 86 out of 244 PERNI transcripts have a highly conserved match (E-value < e-20; identity range 32–96 %) (data not shown). These PERNI transcripts might encode for novel proteins specific of the subfamily Phlebotominae and not present to date in the public databases.
Expression level analysis
The relative expression level of the assembled PERNI data set was assessed in pooled Illumina reads of males and females, by the FPKM metric (Fragments Per Kilobase of transcript per Million mapped reads)  using the RSEM software . In the PERNI data set FPKM values vary from less than 1 to 73,797 (median = 1.45, mean = 28.94, standard deviation = 475.7) suggesting a wide range of expression levels (Additional file 8: Table S6). We classified PERNI transcripts as: (i) not expressed if FPKM was <1 (41.44 % in males and 44.77 % in females), (ii) weakly expressed if 1 < FPKM < 10 (31.72 % in males and 28.83 % in females), (iii) moderately expressed if 10 < FPKM < 100 (23.24 % in males and 22.98 % in females), and highly expressed if FPKM > 100 (3.59 % in males and 3.43 % in females). Overall, we concluded that vast majority of transcripts are weakly or not expressed, while only a very small fraction is highly expressed, with no substantial differences between males and females (Pearson’s Chi-squared test p-value = 0.21). Among highly expressed transcripts (FPKM >100) we found a significant enrichment (p < 0.01) for 77 GO terms in females and 55 in males (Additional file 9: Table S7), suggesting that highly expressed transcripts regulate sex-specific functions.
To validate the results of the in silico expression analysis, we selected ten transcripts in the PERNI data set and evaluated their relative expression by Real Time PCR (qPCR). In particular, we selected four PERNI transcripts with similar FPKM values in both sexes, three PERNI transcripts, which seems to be male-biased and three apparently female-biased according with their FPKM values (Additional file 10: Table S8). We included the P. perniciosus orthologs of the apyrase (Corset cluster-6143.0) and the Act-3 genes (Corset cluster-14748.2) which in mosquitoes are known to be female- and male-biased, respectively [47, 48]. The first crucial step of the relative qPCR technique is the selection of the best transcript to use as reporter in the subsequent normalization. For this reason, we selected five putative reporter transcripts (see Methods), analyzed their expression levels by qPCR and examined the results using the NormFinder software . Among the five putative reporters selected, sod (encoding for superoxide dismutase) and gpdh (encoding for glycerol-3-phosphate dehydrogenase) were observed to be the most stable between adult males and females of P. perniciosus. Consequently, we decided to use both as reference genes in our relative quantization experiments. The normalized expression values (Rn) of the selected transcripts of P. perniciosus in males and females were compared to the respective normalized FPKM values (nFPKM), resulting in statistically significant positive correlation. In particular, the Pearson correlation coefficient r was 0.743 in males (p = 0.014) and 0.862 in females (p = 0.001), showing a good agreement between the results obtained from the in silico and in vivo expression analysis. This is the first report on the selection of reporter genes in adult males and females of P. perniciosus and this result will be useful in all future studies on gene expression in this species.
Differential expression analysis
Sexual dimorphism is marked between male and female sand flies with many morphological, behavioral, and physiological traits, which are typical for each sex. In particular, females are usually heavier than males while males have prominent external terminalia, with a relatively minute and slight abdomen compared to the females. Furthermore, closely related sand fly species can often be properly identified only using the species-specific morphology of sex-specific traits such as the dilatation of the distal part of the spermathecal ducts in females or the morphology of the copulatory valves (aedeagus) in males [50, 51]. Complex sexually dimorphic phenotypes in animals are mainly the result of differential expression between males and females of the same gene , however nothing is known in P. perniciosus. Therefore, to begin the study of the molecular bases of the sexual dimorphism in sand flies we interrogated our data set to identify genes differently expressed between adult males and females of P. perniciosus.
The vast majority of sex-biased transcripts (75 %, 6111 out of the 8590; 2791 male-biased and 3320 female-biased) have a functional annotation in the UniprotKB database. This figure represented 52 % of the 11,736 annotated transcripts. In addition, 64 out of 8590 sex-biased transcripts were annotated as lncRNAs (29 male-biased and 35 female-biased) and 125 out of 8590 were annotated as putative novel protein coding genes (60 male-biased and 65 female-biased) (Fig. 4d). A RNA-seq study conducted on whole body adult samples of Drosophila melanogaster revealed that the percentage of the male-biased annotated transcripts is 20.7 % and the female-biased is 8.3 % . This higher percentage of male-biased genes was attributed to the transcriptional complexity of the testes of the fruit fly . In P. perniciosus the percentage of the annotated male-biased genes (21.3 %) is very similar to that observed in Drosophila. Conversely, in P. perniciosus we observed a much higher percentage of annotated female-biased genes (25.4 %) relative to male-biased genes. This result is in agreement with the sex-biased expression in sugar-fed adults of the mosquito Anopheles gambiae, where the percentage of the female-biased genes is higher than that of the male-biased .
To identify gene classes enriched in the sex-biased PERNI transcripts we selected the annotated male- and female-biased PERNI transcripts with a FC > 2 (1103 and 1449 PERNI transcripts, respectively) and we performed a GO term enrichment analysis for each sex using a R plugin of the Annocript software and applying the Fisher Exact Test (adjusted p-value < 0.01). We identified 80 enriched GO terms in females and 56 in males (Additional file 13: Tables S10). Among the enriched terms of females, most are involved in RNA metabolism, translation and oogenesis. In males, most of the enriched terms are related to transcription, signal transduction and response to stimuli (Additional file 14: Figure S4). Interestingly, among the male enriched GO terms we observed the monooxygenase activity (GO:0004497). Male-specific insect monooxygenases were found in the male reproductive system of Blattella germanica , D. melanogaster  and Ips paraconfusus  but their role remains unknown. Considering the importance of this enzyme family in the development of insecticide resistance in insects, it deserves further attention in future studies.
As a validation of the DGE analysis, we examined the sex-specific expression pattern of the top 31 DE PERNI transcripts (20 male- and 11 female-biased), chosen to have FDR value equal to zero and an FPKM value > 1 in at least one sex (Additional file 15: Table S11), performing a semi-quantitative RT-PCR experiment. Only 8 out of 31 of the selected DE PERNI transcripts have a functional annotation (six among female-biased PERNI transcripts, including phenoloxidase, peroxidase and a vitellogenine receptor, and two among the male-biased PERNI transcripts, Trypsine and Glycerol kinase).
Interestingly, in some PCR amplifications we obtained unexpected additional fragments. In particular, we got a male-specific larger fragment in the DE PERNI transcripts F4 and M1; a female-specific larger fragment in the DE PERNI transcripts M5a, M6, M8, M15, M17 and M19; a non-sex-specific larger fragment in the DE PERNI transcript M9. We examined the unclustered Trinity assembly output to search for the presence of alternative isoforms of these nine DE PERNI transcripts and we found assembled alternative isoforms only for the M9 and M17 transcripts (data not shown). To verify the nature of the eight sex-specific alternative amplified fragments, we performed sequencing experiments, obtaining positive results for five DE PERNI transcripts (F4, M5a, M8, M15 and M17). The numerous attempts to clone and sequence the alternative fragments of M1, M6 and M19 DE PERNI transcripts have failed. The direct sequencing of the alternative fragments of the DE PERNI transcripts F4, M5a, M8, M15 and M17, revealed the presence in each of them of an additional nucleotide sequence with conserved 5’ and 3’ intron consensus sites (5’ = AG/GURAGU and 3’ = (Y)nNCAG/G) (Additional file 16: Figure S5). This result suggests that these additional amplification fragments might be produced by sex-specific alternative splicing events, via intron retention rather than alternative exon usage. Further analyses are required to clarify if the postulated sex-specific alternative splicing is somatic or germinal. These five transcripts have not been annotated by our analysis pipeline neither as coding nor as non-coding, but the presence of sex-specific alternative splicing let us hypothesize they might have a functional role.
Non-synonymous and synonymous substitution rates and their ratio Ka/Ks can be used to infer the selective pressure acting on the nucleotide coding sequences. In particular, a Ka/Ks value lower than one indicates that the compared sequences are subjected to purifying constraint for amino acid substitutions. Conversely, when the Ka/Ks value is equal or higher than one the sequences are evolving neutrally or under positive selection, respectively.
To evaluate the evolutionary forces acting on the orthologous genes between two Phlebotominae species belonging to the same and to different genera, we measured the pairwise Ka/Ks ratio from the back-translated codon alignment of 3159 putative orthologs between P. perniciosus and P. papatasi (P-P) and of 3932 putative orthologs between P. perniciosus and L. longipalpis (P-L). Putative orthologs between species were identified by pairwise BLASTP analysis (E-value: 1e-6; min. hit coverage 70 %) and aligned using the software ParaAT . For all genes Ka/Ks was always < 1 with mean Ka/Ks very similar for both comparisons (P-P: average Ka/Ks = 0.061, max = 0.794; P-L: average Ka/Ks = 0.064, max = 0.576, Additional file 17: Table S12) showing that a strong purifying selection is acting on the ortholog genes in both the comparisons.
We presented the first adult reference transcriptome of P. perniciosus and its differential expression and evolutionary analysis. Our results represent a relevant resource for functional studies in the sand fly and provides baseline knowledge for future studies on sex-specific gene expression and pathways. Furthermore, our data will be useful to perform comparative analyses among phlebotomine sand flies and other insect species, e.g. mosquitoes, to dissect regulatory and biochemical pathways involved in arthropod blood feeding, host-parasite interactions, reproductive biology and somatic sexual differentiation. Finally, the annotated genes constitute an important toolkit that could enhance the discovery of new potential targets for vector-based control strategies against leishmaniases and other arthropod-borne diseases.
RNA extraction, quality control and sequencing
The adult individuals of P. perniciosus used in this study were from a laboratory colony held at Istituto Superiore di Sanità, Rome, Italy. This colony was founded with individuals originally collected in Spain. Sand flies have been reared in standard conditions at 28 °C, 80 % relative humidity and 12:12 h light–dark regimen . Three pools for each sex (30 virgin males and 23 virgin sugar-fed emales, 7–10 days old) were utilized to extract total RNA with PureLink® RNA Mini kit (Life Technologies, Carlsbad, CA, US) according to manufacturer’s instructions. The integrity and purity of extracted total RNA were assessed using the NanoDrop 2000c (Thermo Fisher Scientific, Waltham, MA, US) and the Agilent 2100 Bioanalyzer with RNA 6000 Nano kit (Agilent, Santa Clara, CA, US). All the RNA samples had a A260/280 ratio of 2.2 and a RIN value higher than 7. About 1.5 micrograms of total RNA for each sample were sent to EMBL Genomics Core Facility in Heidelberg, Germany, for library construction and Illumina sequencing. Six strand-specific RNA sequencing libraries were produced using TruSeq Stranded mRNA Library Prep Kit (Illumina Inc., San Diego, CA, US) and used for 50 bp paired-end (PE) sequencing on two lanes of an Illumina Hiseq 1500, multiplexing the six libraries on each lane. We obtained a total of 277,105,235 PE reads. The raw data files were deposited into the Short Read Archive (SRA) of the NCBI (Accession Number: SRP059770).
De novo transcriptome assembly and clustering
The stranded PE reads from the six libraries were pooled together and de novo assembled using Trinity (release 2014-07-17) [30, 31] with default parameters on the ADA Server at the Department of Biology, University of Naples Federico II (24 cores, 256 GB of memory). We applied the “trimmo” filtering and assembling pipeline as described in Salvemini et al., 2014  without the mitochondrial and ribosomal transcripts depletion steps. Raw read pairs from each Illumina library were mapped back individually to the Trinity assembled transcriptome using the Bowtie software  allowing for multiple reads mapping (−a parameter) and the alignment outputs were stored as six separate BAM files. The clustering of the assembled transcripts was performed using the Corset software v1.02  with default parameters. The Corset algorithm, using the read sets that have been multi-mapped to the de novo assembled transcriptome, hierarchically clusters the transcripts based on the proportion of shared reads and of transcripts expression patterns. This allows for discrimination between genes that share sequence, such as paralogs, if the expression levels between the compared groups are different. Then, the Corset software outputs gene-level counts, that can be easily tested for differential expression using count-based frameworks such as edgeR  and DESeq , and it consistently performs well compared to alternative clustering methods on a range of metrics .
To investigate and summarize the functional categories present in the PERNI data set, we applied the Annocript pipeline (https://github.com/frankMusacchia/Annocript)  using the UniProtKB reference database (2014–11 version) and the longest transcript of each Corset cluster as representative (Corset transcripts, PERNI data set). The Annocript pipeline performs various similarity searches using a speed-optimized BLASTX and BLASTP parameters and a custom parallelization of RPSBLAST to achieve a faster execution. It produces a final readable annotation table with assigned proteins, domains, GO terms, Enzymes, pathways, short and ribosomal RNAs, longest ORF size and non-coding potential. In particular, we performed the following similarity searches: BLASTX against TrEMBL/UniRef and SwissProt (parameters: evalue 1E-5, threshold 18, wordsize 4), RPSBLAST against CDD profiles (parameters: evalue 1E-5), BLASTN against Rfam and rRNAs (parameters: evalue 1E-5). To perform the comparative analysis of the GO terms distribution of the PERNI data set with the GO terms of the two sand fly species L. longipalpis and P. papatasi we downloaded the transcript data set from VectorBase (Lutzomyia-longipalpis-Jacobina_TRANSCRIPTS_LlonJ1.1 and Phlebotomus-papatasi-Israel_TRANSCRIPTS_PpapI1.1, respectively); the two data set were re-annotated using the Annocript pipeline with the same parameters as for the PERNI data set.
Abundance estimation and differential expression analysis
We applied the RSEM software  and the Bowtie aligner , as implemented in the Trinity software package, to assign reads to the Corset clusters and to compute expression levels using the FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) metric. We utilized the matrix file containing the mapped read counts for each of the three biological replicates for each sex, produced by the Corset software, to perform a differential gene expression analysis between adult males and females of P. pernicious. Replicate quality control was performed using the PtR Trinity perl script (release 2014-07-17). We applied the edgeR software  which uses a negative binomial model for differential expression analysis. We performed the analysis using the Corset clusters and a significance false discovery rate (FDR) threshold value < 0.01.
Validation of RNA-seq results by qRT-PCR
Total RNA was separately extracted from pooled males and females of P. perniciosus (50 and 25 individuals, respectively), different from those used for the RNA-seq analysis, using the extraction and quantification protocol described above. Total RNA was quantified using Qubit fluorometer (Life Technologies, Carlsbad, CA, US) and reverse transcription reaction was conducted on 200 ng of male or female total RNA using the EuroScript Reverse Transcriptase kit (Euroclone, Pero, IT) with random examers, in a final volume of 20 μl. Five transcripts encoding for superoxide dismutase (sod) (Cluster-5663.0), glycerol-3-phosphate dehydrogenase (gpdh) (Cluster-15017.0), cytochrome P450 (cytp450) (Cluster-16737.0), glyceraldehyde-3-phosphate dehydrogenase (gapdh) (Cluster-19016.0) and the gene for the hypothetical protein P119 (p119) (Cluster-13349.0) similar to the NOT3 protein of Drosophila were selected to evaluate the best reporter gene to use in relative quantization Real Time PCR experiments. Primer pairs utilized are listed in Additional file 18: Methods S1. Real Time PCR amplifications were conducted on 1/40 v/v of male or female reverse transcription reaction using the Brilliant III Ultra-Fast SYBR Green QPCR Master Mix (Agilent, Santa Clara, CA, US). The reactions were performed in technical triplicates. PCR efficiency (E) and threshold cycle (CT) for each well was calculated using the online tool RealTime PCR Miner  and the NormFinder software  was used to evaluate the best reporter gene. Real Time PCR reactions were conducted as described above. The mean relative expression ratio (Rn) and standard deviation of the ten selected transcripts in males and females were calculated using sod and gpdh as endogenous control genes by applying the formula Rn = (1 + Etarget gene)-CT target gene/(1 + Econtrol genes)-CT control genes, where the E and CT values of the control genes are the geometric mean of the efficiency and threshold cycle between sod and gpdh. The FPKM counts of the selected transcripts were normalized relative to the FPKMs of the transcripts sod and gpdh (nFPKM). The Pearson correlation coefficient r between the Rn and nFPKM values of males and females was separately calculated using the R package (www.r-project.org/).
Semi-quantitative RT-PCR validation of sex-biased genes
Total RNA was extracted from pools of 50 adult males and 25 adult females using the extraction and quantification protocols previously described. Oligo-dT-primed cDNAs were prepared starting from 1 μg of total RNA using the EuroScript M-MLV Reverse Transcriptase (Euroclone, Pero, IT) following the manufacturer’s instructions. 1/40 v/v of the synthesised cDNAs were amplified by PCR in semi-quantitative conditions as described in Salvemini et al., 2006  using the sod gene of P. perniciosus as reference. RT-PCR products were analysed by 1 % agarose gel electrophoresis. PCR products were gel-purified and sequenced with the Applied Biosystem BigDye 1.1 sequencing kit. Primers utilized in RT-PCR validation of the selected sex-biased transcripts are listed in the Additional file 18: Methods S1.
Evaluation of coding potential
The prediction of coding potential of transcripts not annotated in the PERNI data set was performed, as described in Salvemini et al., 2014 , using three independent prediction methods: the Portrait software (performed in this paper as a plugin within the Annocript pipeline) [38, 41], the Coding Potential Calculator (CPC)  and the Coding Potential Assessment Tool (CPAT) . In order to identify in the PERNI data set the potential non-coding transcripts with a high reliability, we selected stringent thresholds for each prediction method. Portrait-based prediction (YES), CPC coding potential score < −1.0 and CPAT coding probability < 0.05. Only those transcripts in accordance with the three conservative cut-off values were considered as putative non-coding transcripts of P. perniciosus.
The putative coding sequence (CDS) of each PERNI transcripts of P. perniciosus was predicted using TransDecoder implemented in Trinity, with the default parameters setting. In order to select among the CDSs predicted by TransDecoder those encoding for the amino acid sequences annotated by Annocript, the reciprocal BLASTP best-hits between the two amino acid data sets were obtained using an ad-hoc perl script and used to extract the corresponding CDSs from the nucleotide data set. The same approach was applied to the transcriptome of P. papatasi  and L. longipalpis .
Putative ortholog CDSs between P. perniciosus and P. papatasi and between P. perniciosus and L. longipalpis were identified performing a best reciprocal BLASTP search, retaining only the sequences aligned with a continuous region covering at least 70 % of the query sequence. The pair-wise nucleotide alignment of the putative orthologs based on their amino acid alignment was performed using the ParaAT software . ParAT uses the software Epal2Nal to back-translate the aminoacid alignment in a codon aligment which can be used for further nucleotides substitution analysis. The non-synonymous (Ka) and synonymous (Ks) substitution rates for each ortholog pair between P. perniciosus and P. papatasi and between P. perniciosus and L. longipalpis were calculated using the YN approximate method  implemented in the KaKs_Calculator software .
Availability of data and materials
The data sets supporting the results of this article are available in the National Center for Biotechnology Information (NCBI) repository (Sequence Read Archives: SRP059770).
This study was supported by the grant STAR2013_25 to MS from University of Naples Federico II and Compagnia di San Paolo, Naples, ITALY, in the frame of Programme STAR2013 (Sostegno Territoriale alle Attività di Ricerca). We are deeply grateful to the staff of the Genomics Core Facility, EMBL, Heidelberg, GERMANY, for the high quality technical support in Library preparation and Illumina sequencing of our samples. We are deeply grateful to Riccardo Bianchi, Istituto Superiore di Sanità, Roma, Italy, for technical support in rearing sand flies. We thank Nikola Polanska for the help in the selection of low-abundance salivary transcripts of P. perniciosus. We deeply acknowledge the Sand Fly Genome Sequencing Consortium and the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) for releasing their unpublished data prior to project completion.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Maroli M, Feliciangeli MD, Bichaud L, Charrel RN, Gradoni L. Phlebotomine sandflies and the spreading of leishmaniases and other diseases of public health concern. Med Vet Entomol. 2013;27(2):123–47.View ArticlePubMedGoogle Scholar
- Gradoni L. Epidemiological surveillance of leishmaniasis in the European Union: operational and research challenges. Euro Surveill. 2013;18(30):20539.PubMedGoogle Scholar
- Dostalova A, Volf P. Leishmania development in sand flies: parasite-vector interactions overview. Parasit Vectors. 2012;5:276.PubMed CentralView ArticlePubMedGoogle Scholar
- Alkan C, Bichaud L, de Lamballerie X, Alten B, Gould EA, Charrel RN. Sandfly-borne phleboviruses of Eurasia and Africa: epidemiology, genetic diversity, geographic range, control measures. Antiviral Res. 2013;100(1):54–74.View ArticlePubMedGoogle Scholar
- Depaquit J, Grandadam M, Fouque F, Andry PE, Peyrefitte C. Arthropod-borne viruses transmitted by Phlebotomine sandflies in Europe: a review. Euro Surveill. 2010;15(10):19507.PubMedGoogle Scholar
- Gramiccia M, Scalone A, Di Muccio T, Orsini S, Fiorentino E, Gradoni L. The burden of visceral leishmaniasis in Italy from 1982 to 2012: a retrospective analysis of the multi-annual epidemic that occurred from 1989 to 2009. Euro Surveill. 2013;18(29):20535.PubMedGoogle Scholar
- Alvar J, Velez ID, Bern C, Herrero M, Desjeux P, Cano J, et al. Leishmaniasis worldwide and global estimates of its incidence. PLoS One. 2012;7(5):e35671.PubMed CentralView ArticlePubMedGoogle Scholar
- Antoniou M, Gramiccia M, Molina R, Dvorak V, Volf P. The role of indigenous phlebotomine sandflies and mammals in the spreading of leishmaniasis agents in the Mediterranean region. Euro Surveill. 2013;18(30):20540.PubMedGoogle Scholar
- Bates PA, Depaquit J, Galati EA, Kamhawi S, Maroli M, McDowell MA, et al. Recent advances in phlebotomine sand fly research related to leishmaniasis control. Parasit Vectors. 2015;8:131.PubMed CentralView ArticlePubMedGoogle Scholar
- Killick-Kendrick R, Killick-Kendrick M. Biology of sand fly vectors of Mediterranean canine leishmaniasis. In: Canine Leishmaniasis: an Update: 1999. Barcelona: Proc Intern Canine Leishmaniasis Forum; 1999. p. 26–31.Google Scholar
- Gu XC, Zhang YN, Kang K, Dong SL, Zhang LW. Antennal transcriptome analysis of odorant reception genes in the red turpentine beetle (RTB) dendroctonus valens. PLoS One. 2015;10(5):e0125159.PubMed CentralView ArticlePubMedGoogle Scholar
- Meng X, Zhang Y, Bao H, Liu Z. Sequence analysis of insecticide action and detoxification-related genes in the insect pest natural enemy pardosa pseudoannulata. PLoS One. 2015;10(4):e0125242.PubMed CentralView ArticlePubMedGoogle Scholar
- Chauhan P, Hansson B, Kraaijeveld K, de Knijff P, Svensson EI, Wellenreuther M. De novo transcriptome of Ischnura elegans provides insights into sensory biology, colour and vision genes. BMC Genomics. 2014;15:808.PubMed CentralView ArticlePubMedGoogle Scholar
- Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346(6210):763–7.View ArticlePubMedGoogle Scholar
- Vlkova M, Sima M, Rohousova I, Kostalova T, Sumova P, Volfova V, et al. Comparative analysis of salivary gland transcriptomes of Phlebotomus orientalis sand flies from endemic and non-endemic foci of visceral leishmaniasis. PLoS Negl Trop Dis. 2014;8(2):e2709.PubMed CentralView ArticlePubMedGoogle Scholar
- Abdeladhim M, Jochim RC, Ben Ahmed M, Zhioua E, Chelbi I, Cherni S, et al. Updating the salivary gland transcriptome of Phlebotomus papatasi (Tunisian strain): the search for sand fly-secreted immunogenic proteins for humans. PLoS One. 2012;7(11):e47347.PubMed CentralView ArticlePubMedGoogle Scholar
- Kato H, Anderson JM, Kamhawi S, Oliveira F, Lawyer PG, Pham VM, et al. High degree of conservancy among secreted salivary gland proteins from two geographically distant Phlebotomus duboscqi sandflies populations (Mali and Kenya). BMC Genomics. 2006;7:226.PubMed CentralView ArticlePubMedGoogle Scholar
- Kato H, Jochim RC, Gomez EA, Uezato H, Mimori T, Korenaga M, et al. Analysis of salivary gland transcripts of the sand fly Lutzomyia ayacuchensis, a vector of Andean-type cutaneous leishmaniasis. Infect Genet Evol. 2013;13:56–66.View ArticlePubMedGoogle Scholar
- Rohousova I, Subrahmanyam S, Volfova V, Mu J, Volf P, Valenzuela JG, et al. Salivary gland transcriptomes and proteomes of Phlebotomus tobbi and Phlebotomus sergenti, vectors of leishmaniasis. PLoS Negl Trop Dis. 2012;6(5):e1660.PubMed CentralView ArticlePubMedGoogle Scholar
- Hostomska J, Volfova V, Mu J, Garfield M, Rohousova I, Volf P, et al. Analysis of salivary transcripts and antigens of the sand fly Phlebotomus arabicus. BMC Genomics. 2009;10:282.PubMed CentralView ArticlePubMedGoogle Scholar
- Pitaluga AN, Beteille V, Lobo AR, Ortigao-Farias JR, Davila AM, Souza AA, et al. EST sequencing of blood-fed and Leishmania-infected midgut of Lutzomyia longipalpis, the principal visceral leishmaniasis vector in the Americas. Mol Genet Genomics. 2009;282(3):307–17.View ArticlePubMedGoogle Scholar
- Jochim RC, Teixeira CR, Laughinghouse A, Mu J, Oliveira F, Gomes RB, et al. The midgut transcriptome of Lutzomyia longipalpis: comparative analysis of cDNA libraries from sugar-fed, blood-fed, post-digested and Leishmania infantum chagasi-infected sand flies. BMC Genomics. 2008;9:15.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramalho-Ortigao M, Jochim RC, Anderson JM, Lawyer PG, Pham VM, Kamhawi S, et al. Exploring the midgut transcriptome of Phlebotomus papatasi: comparative analysis of expression profiles of sugar-fed, blood-fed and Leishmania-major-infected sandflies. BMC Genomics. 2007;8:300.PubMed CentralView ArticlePubMedGoogle Scholar
- Gonzalez-Caballero N, Valenzuela JG, Ribeiro JM, Cuervo P, Brazil RP. Transcriptome exploration of the sex pheromone gland of Lutzomyia longipalpis (Diptera: Psychodidae: Phlebotominae). Parasit Vectors. 2013;6:56.PubMed CentralView ArticlePubMedGoogle Scholar
- Dillon RJ, Ivens AC, Churcher C, Holroyd N, Quail MA, Rogers ME, et al. Analysis of ESTs from Lutzomyia longipalpis sand flies and their contribution toward understanding the insect-parasite relationship. Genomics. 2006;88(6):831–40.PubMed CentralView ArticlePubMedGoogle Scholar
- Abrudan J, Ramalho-Ortigao M, O’Neil S, Stayback G, Wadsworth M, Bernard M, et al. The characterization of the Phlebotomus papatasi transcriptome. Insect Mol Biol. 2013;22(2):211–32.PubMed CentralView ArticlePubMedGoogle Scholar
- McCarthy CB, Santini MS, Pimenta PF, Diambra LA. First comparative transcriptomic analysis of wild adult male and female Lutzomyia longipalpis, vector of visceral leishmaniasis. PLoS One. 2013;8(3):e58645.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson JM, Oliveira F, Kamhawi S, Mans BJ, Reynoso D, Seitz AE, et al. Comparative salivary gland transcriptomics of sandfly vectors of visceral leishmaniasis. BMC Genomics. 2006;7:52.PubMed CentralView ArticlePubMedGoogle Scholar
- Dostalova A, Votypka J, Favreau AJ, Barbian KD, Volf P, Valenzuela JG, et al. The midgut transcriptome of Phlebotomus (Larroussius) perniciosus, a vector of Leishmania infantum: comparison of sugar fed and blood fed sand flies. BMC Genomics. 2011;12:223.PubMed CentralView ArticlePubMedGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.PubMed CentralView ArticlePubMedGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.View ArticlePubMedGoogle Scholar
- Davidson NM, Oshlack A. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biol. 2014;15(7):410.PubMed CentralPubMedGoogle Scholar
- O’Neil ST, Dzurisin JD, Carmichael RD, Lobo NF, Emrich SJ, Hellmann JJ. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics. 2010;11:310.PubMed CentralView ArticlePubMedGoogle Scholar
- Ewen-Campen B, Shaner N, Panfilio KA, Suzuki Y, Roth S, Extavour CG. The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus. BMC Genomics. 2011;12:61.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Belleghem SM, Roelofs D, Van Houdt J, Hendrickx F. De novo transcriptome assembly and SNP discovery in the wing polymorphic salt marsh beetle Pogonus chalceus (Coleoptera, Carabidae). PLoS One. 2012;7(8):e42605.PubMed CentralView ArticlePubMedGoogle Scholar
- Salvemini M, Arunkumar KP, Nagaraju J, Sanges R, Petrella V, Tomar A, et al. De novo assembly and transcriptome analysis of the Mediterranean fruit fly Ceratitis capitata early embryos. PLoS One. 2014;9(12):e114191.PubMed CentralView ArticlePubMedGoogle Scholar
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.View ArticlePubMedGoogle Scholar
- Musacchia F, Basu S, Petrosino G, Salvemini M, Sanges R. Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs. Bioinformatics. 2015;31(13):2199–201.View ArticlePubMedGoogle Scholar
- Gong ZJ, Wu YQ, Miao J, Duan Y, Jiang YL, Li T. Global transcriptome analysis of orange wheat blossom midge, Sitodiplosis mosellana (Gehin) (Diptera: Cecidomyiidae) to identify candidate transcripts regulating diapause. PLoS One. 2013;8(8):e71564.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen B, Zhang YJ, He Z, Li W, Si F, Tang Y, et al. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae). Parasit Vectors. 2014;7:314.PubMed CentralView ArticlePubMedGoogle Scholar
- Arrial RT, Togawa RC, Brigido Mde M. Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis. BMC Bioinf. 2009;10:239.View ArticleGoogle Scholar
- Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013;41(6):e74.PubMed CentralView ArticlePubMedGoogle Scholar
- Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35(Web Server issue):W345–349.PubMed CentralView ArticlePubMedGoogle Scholar
- Niazi F, Valadkhan S. Computational analysis of functional long noncoding RNAs reveals lack of peptide-coding capacity and parallels with 3’ UTRs. RNA. 2012;18(4):825–43.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.View ArticlePubMedGoogle Scholar
- Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011;12:323.View ArticleGoogle Scholar
- Smartt CT, Kim AP, Grossman GL, James AA. The Apyrase gene of the vector mosquito, Aedes aegypti, is expressed specifically in the adult female salivary glands. Exp Parasitol. 1995;81(3):239–48.View ArticlePubMedGoogle Scholar
- Vyazunova I, Lan Q. Stage-specific expression of two actin genes in the yellow fever mosquito, Aedes aegypti. Insect Mol Biol. 2004;13(3):241–9.View ArticlePubMedGoogle Scholar
- Andersen CL, Jensen JL, Orntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004;64(15):5245–50.View ArticlePubMedGoogle Scholar
- Leger N, Pesson B, Madulo-Leblond G, Abonnenc E. [Differentiation of females of the subgenus Larroussius Nitzulescu 1931 (Diptera-Phlebotomidae) of the Mediterranean region]. Ann Parasitol Hum Comp. 1983;58(6):611–23.PubMedGoogle Scholar
- Killick-Kendrick R, Tang Y, Killick-Kendrick M, Sang DK, Sirdar MK, Ke L, et al. The identification of female sandflies of the subgenus Larroussius by the morphology of the spermathecal ducts. Parassitologia. 1991;33(Suppl):335–47.PubMedGoogle Scholar
- Mank JE. Sex chromosomes and the evolution of sexual dimorphism: lessons from the genome. Am Nat. 2009;173(2):141–50.View ArticlePubMedGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.PubMed CentralView ArticlePubMedGoogle Scholar
- Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L, et al. The developmental transcriptome of Drosophila melanogaster. Nature. 2011;471(7339):473–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Magnusson K, Mendes AM, Windbichler N, Papathanos PA, Nolan T, Dottorini T, et al. Transcription regulation of sex-biased genes during ontogeny in the malaria vector Anopheles gambiae. PLoS One. 2011;6(6):e21572.PubMed CentralView ArticlePubMedGoogle Scholar
- Wen Z, Scott JG. Cytochrome P450 CYP6L1 is specifically expressed in the reproductive tissues of adult male German cockroaches, Blattella germanica (L.). Insect Biochem Mol Biol. 2001;31(2):179–87.View ArticlePubMedGoogle Scholar
- Kasai S, Tomita T. Male specific expression of a cytochrome P450 (Cyp312a1) in Drosophila melanogaster. Biochem Biophys Res Commun. 2003;300(4):894–900.View ArticlePubMedGoogle Scholar
- Huber DP, Erickson ML, Leutenegger CM, Bohlmann J, Seybold SJ. Isolation and extreme sex-specific expression of cytochrome P450 genes in the bark beetle, Ips paraconfusus, following feeding on the phloem of host ponderosa pine, Pinus ponderosa. Insect Mol Biol. 2007;16(3):335–49.View ArticlePubMedGoogle Scholar
- Zhang Z, Xiao J, Wu J, Zhang H, Liu G, Wang X, et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun. 2012;419(4):779–81.View ArticlePubMedGoogle Scholar
- Hunt BG, Ometto L, Wurm Y, Shoemaker D, Yi SV, Keller L, et al. Relaxed selection is a precursor to the evolution of phenotypic plasticity. Proc Natl Acad Sci U S A. 2011;108(38):15936–41.PubMed CentralView ArticlePubMedGoogle Scholar
- Volf P, Volfova V. Establishment and maintenance of sand fly colonies. J Vector Ecol. 2011;36 Suppl 1:S1–9.View ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.PubMed CentralView ArticlePubMedGoogle Scholar
- Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao S, Fernald RD. Comprehensive algorithm for quantitative real-time polymerase chain reaction. J Comput Biol. 2005;12(8):1047–64.PubMed CentralView ArticlePubMedGoogle Scholar
- Salvemini M, Mauro U, Velaeti S, Polito C, Saccone G. A new Minos vector for eye-specific expression of white + marker in Ceratitis capitata and in distantly related dipteran species. Insect Mol Biol. 2006;15(3):341–9.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17(1):32–43.View ArticlePubMedGoogle Scholar
- Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics. 2006;4(4):259–63.View ArticlePubMedGoogle Scholar