Phylogenetic analysis and antigenic epitope prediction for E6 and E7 of Alpha-papillomavirus 9 in Taizhou, China

Background Alpha-papillomavirus 9 (α-9) is a member of the human papillomavirus (HPV) α genus, causing 75% invasive cervical cancers worldwide. The purpose of this study was to provide data for effective treatment of HPV-induced cervical lesions in Taizhou by analysing the genetic variation and antigenic epitopes of α-9 HPV E6 and E7. Methods Cervical exfoliated cells were collected for HPV genotyping. Positive samples of the α-9 HPV single type were selected for E6 and E7 gene sequencing. The obtained nucleotide sequences were translated into amino acid sequences (protein primary structure) using MEGA X, and positive selection sites of the amino acid sequences were evaluated using PAML. The secondary and tertiary structures of the E6 and E7 proteins were predicted using PSIPred, SWISS-MODEL, and PyMol. Potential T/B-cell epitopes were predicted by Industrial Engineering Database (IEDB). Results From 2012 to 2023, α-9 HPV accounted for 75.0% (7815/10423) of high-risk HPV-positive samples in Taizhou, both alone and in combination with other types. Among these, single-type-positive samples of α-9 HPV were selected, and the entire E6 and E7 genes were sequenced, including 298 HPV16, 149 HPV31, 185 HPV33, 123 HPV35, 325 HPV52, and 199 HPV58 samples. Compared with reference sequences, 34, 12, 10, 2, 17, and 17 nonsynonymous nucleotide mutations were detected in HPV16, 31, 33, 35, 52, and 58, respectively. Among all nonsynonymous nucleotide mutations, 19 positive selection sites were selected, which may have evolutionary significance in rendering α-9 HPV adaptive to its environment. Immunoinformatics predicted 57 potential linear and 59 conformational B-cell epitopes, many of which are also predicted as CTL epitopes. Conclusion The present study provides almost comprehensive data on the genetic variations, phylogenetics, positive selection sites, and antigenic epitopes of α-9 HPV E6 and E7 in Taizhou, China, which will be helpful for local HPV therapeutic vaccine development. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10411-1.


Background
Cervical cancer remains the fourth most common cancer affecting women's health worldwide, especially in China.Persistent high-risk HPV infection is considered the primary aetiological factor for cervical cancer [1].HPV is a small, double-stranded, circular DNA virus that exclusively infects epithelial cells of the skin or mucosa.The genome size of HPV is approximately 8.0 kb, containing six early genes (E1, E2, E4, E5, E6, E7), two late genes (L1, L2), and long control region (LCR).The oncoproteins encoded by the E6 and E7 genes play a crucial role in HPV-driven viral carcinogenesis and cancers [2,3].E6 binds to the tumour suppressor protein p53 and prevents its translocation, and mediate the cellular transformation by inhibiting the ability of p53 to activate.E7 binds to the retinoblastoma protein (Rb) and induces cells to enter into premature S-phase by disrupting Rb-E2F complexes.These processes lead to impaired p53 and Rb functions, involving DNA repair, cell cycle, apoptosis, and ultimately result in immortalization of HPV-infected cells [4].
Persistent infection of human epithelial cells by HPV leads to integration of the viral DNA into the host genome, usually disrupting the E1 and/or E2 genes [10].HPV integration is a key event for cervical carcinogenesis, leading to structural aberrations in the host genome or abnormal gene expression of target genes.Of which, the expression of E6 and E7 oncoproteins cause cellular immortalization and neoplastic transformation.The most frequent integration sites include SHKBP1, ERBB3, CASP8, HLA-A, HLA-B, TGFBR2, PIK3CA, EP300, FBXW7, PTEN, NFE2L2, ARID1A, KRAS, MAPK1, etc. [11,12].The transformed cervical cells show expression of HPV E6 or E7, with antigenic epitopes on their protein surface that stimulate B lymphocytes to produce antibodies [13,14].Viral antigen peptides are presented at the cell surface through human leukocyte antigen (HLA) and are recognized by CD8 + cytotoxic T lymphocytes (CTLs) [15][16][17].Therefore, E6 or E7 molecules are considered ideal targets for HPV therapeutic vaccines, inducing cell-mediated immunity by stimulating CTLs in immune response strategies or eliciting humoral immunity by activating B lymphocytes to produce specific antibodies [18,19].
Unfortunately, as a major issue for cervical cancer, local data available on HPV therapeutic vaccines are still limited in China, and almost no consideration is given to E6 or E7 gene mutations.Hence, there is an urgent need to further study the genetic variation, positive selection site, protein structure, and antigen epitopes of α-9 HPV E6 or E7 to provide data to explore effective treatment of HPV-induced cervical lesions in Taizhou, China.

Study population
This study was ethically approved by the Institutional Medical Ethics Review Board of Taizhou Hospital, China.Cervical exfoliated cells were collected from women who underwent cervical cancer screening at Taizhou Hospital.The specimens were collected by cervical scraping and stored in 2.5 ml of cell preservation buffer at -20 °C.Before specimen collection, written informed consent was obtained from all participants, and the participants' privacy was strictly protected.

PCR amplification and sequencing
Based on the reference sequences of α-9 HPV types in GenBank, specific primer pairs for the entirety of the E6 and E7 regions were designed using the Primer-BLAST tool (ncbi.nlm.nih.gov/tools/primer-blast).The primers, PCR conditions, amplicon size, and reference sequences are listed in Table S1.Genomic DNA was extracted using a DNA Extraction Kit (#GK0122, GENEray, China).PCR products were purified and sequenced at BGI, and all data were confirmed by repeating PCR and sequencing reactions at least twice.In this study, genetic variant data for HPV31 and 35 were combined with data from our previous studies on HPV16, 33, 52 and 58 [21][22][23][24].

Phylogenetic analysis and homology comparison
All successfully acquired nucleotide sequences were aligned by BioEdit.Then, a phylogenetic tree of α-9 HPV E6 and E7 variation patterns was constructed by the maximum-likelihood method with one thousand bootstrap replicates using MEGA X.The phylogenetic tree was constructed using 201 complete α-9 HPV E6 and E7 sequences, including 64 HPV16, 16 HPV31, 15 HPV33, 5 HPV35, 27 HPV52, 25 HPV58, and 49 reference sequences downloaded from NCBI.

Selective pressure analysis
The CodeML program in PAML (abacus.gene.ucl.ac.uk/ software/paml.html) was used to calculate the nonsynonymous (dN)/synonymous (dS) ratio (ω) for selective pressure analysis.If nonsynonymous mutations are favoured by Darwinian selection, they will be fixed at a higher rate than synonymous mutations, resulting in dN > dS, ω = dN/dS > 1 [25].

Protein structure analysis
The obtained nucleotide sequences were translated into amino acid sequences (protein primary structure) using MEGA X.Then, the secondary and tertiary structures of the E6 and E7 proteins of α-9 HPV were predicted using PSIPred (bioinf.cs.ucl.ac.uk/psipred) and Swiss-model (swissmodel.ExPASy.org),respectively.

Prediction of linear and conformational B-cell epitopes
Linear B-cell epitopes were predicted from the primary sequence of the E6 or E7 protein using sequence-based methods (Kolaskar and Tongaonkar's antigenicity, tools.immuneepitope.org/bcell/).Conformational B-cell epitopes were predicted from the tertiary structure of E6 or E7 using ElliPro, which identifies protrusions in antigen surfaces (tools.immuneepitope.org/ellipro).

Prediction of cytotoxic T-cell epitopes
CTL epitopes were predicted from the E6 or E7 protein of α-9 HPV types using the NetCTL server, which accepts the FASTA format (tools.immuneepitope.org/netchop).NetCTL gives results for 9-mer peptides together with their predicted MHC binding affinity, binding affinity rescale value, C-terminal cleavage affinity, and TAP transport efficiency.C-terminal cleavage weights and TAP transport efficiency were calculated using the default values 0.15 and 0.05, respectively; 9-mer peptides with a prediction score > 0.75 were considered to be potential CTL epitopes.

Protein structure analysis and homology modelling
Nucleotide nonsynonymous substitution changes the amino acid composition, which affects the structure and function of proteins.Our analysis showed that α-9 E6 and E7 are composed of residues 148-158 and 97-99, respectively.The template-target pairwise sequence alignment for α-9 HPV E6 and E7.More details are shown in Table S1 and Figures S3-S4.All amino acid substitutions in E6 and E7 are shown in Fig. 3.As depicted in Figure S5, six E6 or E7 proteins of α-9 genus HPV are highly homologous, so their correlations are included in our research.As shown in Fig. 3, the majority of amino acid substitutions are located on the outer edge of E6 or E7 proteins and near the zinc granule, which is situated in the active site.Interestingly, we found that the 93rd residue is not only a common nonsynonymous mutation in the E6 region but also a positive selection site.
We selected the HPV variant with the highest infection rate in each type for homology modelling.The template coverage for the E6 protein of α-9 HPV included most of the protein sequence, no involving only a short N-terminal stretch (see Figure S5, A-F for the homology models and Figure S3 for the template-target pairwise sequence alignments).However, the template coverage for the E7 protein was low, excluding approximately 50 residues at the N-terminal stretch (see Figure S5, G-L for the homology models and Figure S4 for the template-target pairwise sequence alignments).

Prediction of linear and conformational B-cell epitopes
The predicted linear B-cell epitopes of α-9 HPV E6 or E7 proteins are presented in Table 4. Conformational B-cell epitopes were predicted from protein tertiary structure models using ElliPro.Amino acid residues, the number of residues, the sequence location and the PI score of the predicted conformational epitopes are given in Table 5, and the graphical depiction of these epitopes are provided in Figure S6.Immunoinformatics predicted 57 potential linear and 59 conformational B-cell epitopes, many of which are also predicted as CTL epitopes (Tables 4, 5 and 6).

Prediction of cytotoxic T-cell epitopes
Based on the literature, we selected the HLA-A*02 and HLA-B*62 supertypes for NetCTL epitope prediction, as well as CTL epitopes that overlap with the sites for the predicted B-cell epitopes (linear and/or conformational).The predicted CTL epitope 9-mer peptides with their predicted MHC-I binding affinity, rescaled binding affinity, proteasomal C-terminal cleavage affinity, and TAP transport efficiency are indicated in Table 6, with an overall prediction score threshold of 0.75.

Discussion
Cervical cancer remains a major challenge for women's health, with approximately 600,000 new cases and 340,000 deaths worldwide every year [1,26].High-risk HPV genotypes are the main cause of cervical cancer, which can be largely prevented through HPV vaccination and cervical screening.However, the cervical cancer burden for women remains heavy in China, with only 3% of women aged 9-45 years receiving complete HPV vaccination [27,28].Although prophylactic vaccines are effective at protecting against approximately 90% of HPV infections, their benefits in eliminating preexisting infection are limited.Therefore, potential therapeutic vaccines need to be developed for treatment of persistent HPV infection or cervical lesions, with the goal of activating adaptive immune responses by presenting viral antigen peptides to the immune system.However, because the genetic variations of HPV genotypes show some degree of geographical differences, it is recommended that the ideal therapeutic vaccine be based on a local type.The Taizhou area is located along the central coast of Zhejiang Province in China, with high prevalence and pathogenicity of α-9 HPV (HPV 16, 31, 33, 35, 52, 58), especially HPV52 and 58 [20].According to our previous findings, the odds ratio (OR) for CIN2 + in women infected with α-9 HPV is 3.2 when compared to women infected with α-5, α-6, or α-7 [7].The genetic variation of E6 or E7 genes may correlate highly with cancer risk in Taizhou [21][22][23].Therefore, E6 or E7 molecules might be regarded as ideal targets for HPV therapeutic vaccine development and cervical cancer treatment.The purpose of this study was to provide data for effective prevention and treatment of HPV-induced cervical lesions in Taizhou by analysing the phylogenetic tree and epitope prediction of α-9 HPV E6 or E7.
In the E7 gene, the most common nucleotide mutation observed was A647G (N29S, 195/298, positive selection) of HPV16.Similarly, there were three amino acid substitutions at the 29th residue in HPV16 E7: N29S (65.44%),N29H (9.40%), and N29T (0.34%).In addition, all HPV58 E7 with G63S carries T20I, and all HPV58 E7 samples with G63D carry G41R.In our previous study, it was reported that HPV58 E7 T20I/G63S substitutions   increase risk of HPV carcinogenesis [24].Boon et al. [31] reported that T20I/G63S substitutions possess greater ability to degrade Rb, immortalize, and transform primary cells.Consistently with our selective pressure analysis, positive selection sites for 32E in HPV16 E6, 29S in HPV16 E7 were found in Kunming, Southwest of China [32].Positive selection sites for 93K in HPV33 E6, 45A, 97Q in HPV33 E7 were found in Sichuan province, Southwest of China [33].Notably, positive selection site for 63G in HPV58 E7, which has been widely reported in other regions of China, including Sichuan province [33] and Yunnan province [34,35] in Southwest China, Hubei province in central China [36], as well as the present study (Southeast China).However, there were no reports of positive selection sites for HPV58 E6 in China [33][34][35][36].No positive selection sites for HPV16 E6 and E7 genes have been reported in central China [37].No positive selection sites for HPV52 E6 and E7 genes have been reported in central and Southwest China [38,39].
In addition, selective pressure analysis from non-Asian population showed different results, with 10R, 14H, 83V in HPV E6 and 85G in HPV16 E7 under positive selective pressure [40,41].Therefore, these results indicate that genetic variations among HPV types may lead to biological advantages through fixed mutations in their genomes and that even small variations might lead to minor adaptive improvements.Furthermore, the genetic variations of HPV may differ in terms of infectivity, carcinogenic potential, and host immune response.Therefore, the data provided in this study may have significant implications for understanding the biological differences of HPVs in Taizhou, as well as for developing local therapeutic vaccines.
HLA participates in the local immune response of viral infection through its target recognition function, blocking HPV infection or preventing tumour cell invasion and metastasis.However, a minority of infected cells can escape host immune surveillance, causing persistent HPV infection.Immunoinformatics provides new strategies a Residuea highlighted in bold were also predicted to be CTL epitopes b Protrusion Index of ElliPro; a higher value indicates a higher probability for a discontinuous B-cell epitope    for identifying ideal epitopes for HPV therapeutic vaccine targets.Our predicted T-and B-cell epitopes may be used for development of vaccines targeting specific HPV variants, and our results suggest that amino acid substitution may influence these epitopes.For example, the prediction score of the HPV16 E6 CTL epitope 25-33ELQTTIHDI was 1.0904, and because of the D32E substitution the score of the epitope 25-33ELQTTIHEI increased to 1.3272; the HPV16 E6 predicted epitope 31-39HEIILECVY became a new epitope because of the mutation D32E.Therefore, the substitution of the positive selection site D32E in HPV16 E6 influences its antigenic epitopes, which may make it more difficult to detect by the immune system, thereby enhancing adaptability of HPV to the environment.In addition, Li et al. [37] suggested that non-conservative substitutions of amino acids should be fully considered when developing therapeutic vaccines, such as H31Y, D32N, D32E, I34M, L35V, E36Q, L45P, N65S, and K75T in HPV16 E6.Chen et al. [33] suggested that K93N in HPV33 E6, Q97L in HPV33 E7, R145K in HPV58 E6, and T20I in HPV58 E7 belong to ideal B-cell epitopes.

Conclusions
In summary, this is the first almost comprehensive study to explore the genetic variations, phylogenetics, positive selection sites, and antigenic epitopes of α-9 HPV E6 and E7 molecules in Taizhou, China, and the results will be helpful for local HPV therapeutic vaccine development.

Fig. 1 Table 1
Fig. 1 Distribution of different high-risk HPV genotypes in Taizhou from 2012 to 2023

H
Predicted amino acid changes were also shown in the last low.The "C" means coil, "E" means β-Strand, the "H" means α-Helix

Table 1
(continued) Predicted amino acid changes were also shown in the last low.The "C" means coil, "E" means β-Strand, the "H" means α-Helix

Table 4
Predicted linear B-cell epitopes of E6 and E7 proteins of α-9 genus HPV a residues highlighted in bold were also predicted to be CTL epitopes

Table 5
ElliPro predicted conformational B-cell epitopes of E6 and E7 proteins of α-9 genus HPV

Table 6
(continued) Amino acids highlighted in bold were also predicted as B-cell antigenic sites (linear and/or conformational) a b Prediction score threshold > 0.75