Skip to main content

Evaluation and limitations of different approaches among COVID-19 fatal cases using whole-exome sequencing data

Abstract

Background

COVID-19 caused by the SARS-CoV-2 infection may result in various disease symptoms and severity, ranging from asymptomatic, through mildly symptomatic, up to very severe and even fatal cases. Although environmental, clinical, and social factors play important roles in both susceptibility to the SARS-CoV-2 infection and progress of COVID-19 disease, it is becoming evident that both pathogen and host genetic factors are important too. In this study, we report findings from whole-exome sequencing (WES) of 27 individuals who died due to COVID-19, especially focusing on frequencies of DNA variants in genes previously associated with the SARS-CoV-2 infection and the severity of COVID-19.

Results

We selected the risk DNA variants/alleles or target genes using four different approaches: 1) aggregated GWAS results from the GWAS Catalog; 2) selected publications from PubMed; 3) the aggregated results of the Host Genetics Initiative database; and 4) a commercial DNA variant annotation/interpretation tool providing its own knowledgebase. We divided these variants/genes into those reported to influence the susceptibility to the SARS-CoV-2 infection and those influencing the severity of COVID-19. Based on the above, we compared the frequencies of alleles found in the fatal COVID-19 cases to the frequencies identified in two population control datasets (non-Finnish European population from the gnomAD database and genomic frequencies specific for the Slovak population from our own database). When compared to both control population datasets, our analyses indicated a trend of higher frequencies of severe COVID-19 associated risk alleles among fatal COVID-19 cases. This trend reached statistical significance specifically when using the HGI-derived variant list. We also analysed other approaches to WES data evaluation, demonstrating its utility as well as limitations.

Conclusions

Although our results proved the likely involvement of host genetic factors pointed out by previous studies looking into severity of COVID-19 disease, careful considerations of the molecular-testing strategies and the evaluated genomic positions may have a strong impact on the utility of genomic testing.

Peer Review reports

Background

The coronavirus disease (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a complex, highly infectious disease involving the respiratory, immune, cardiovascular, gastrointestinal, and neurological systems [1,2,3,4]. The first case was registered in Wuhan, Hubei Province of China in December 2019, and the disease has rapidly evolved into a global pandemic [5]. At the time of writing (February 2022), there have been more than 410 million confirmed cases and 5.8 million deaths worldwide (in Slovakia, more than 1.8 million people have been infected so far, with total death toll exceeding 18 000) (https://origin-coronavirus.jhu.edu/map.html).

The mortality rate of COVID-19 (ranges between 1–7%) is lower than that of the other two types of coronaviruses, the severe acute respiratory syndrome (SARS-CoV) and the middle east respiratory syndrome (MERS-CoV). However, the rate of human-to-human transmission is higher, as the virus can primarily be transmitted through respiratory droplets and close contact [4, 6,7,8,9,10]. COVID-19 presents a broad spectrum of varied clinical manifestations, from asymptomatic or mild symptoms to serious health outcomes leading to death [11, 12]. Even though the symptoms are highly heterogeneous, the most commonly observed ones in the large majority of infected persons include fever, cough, severe headache, muscle pain, fatigue, shortness of breath, chest tightness, and loss of taste or smell [13,14,15,16,17,18]. Besides, several minor symptoms such as gastrointestinal complications, including nausea, vomiting, and diarrhea, have also been reported [19]. In severe cases, breathing difficulties develop, namely dyspnea, with acute respiratory distress syndrome (ARDS) being the most serious complication [20].

SARS-CoV-2 infection exhibits varied infectivity and mortality rates in different worldwide populations [21, 22]. An obvious possible explanation for these findings is that a mixture of genetic and nongenetic factors interplays between the virus and host genetic background, which determines the severity of COVID-19 outcome. Older age (> 60 years old), male gender, blood type, smoking, hypertension, diabetes mellitus, obesity, cardiovascular, respiratory, and kidney disease or cancer have been identified as risk factors associated with a higher risk of death caused by COVID-19 [12, 23,24,25,26,27,28,29]. In addition, the host’s susceptibility or response to the infection can be influenced by the host's genetic variations affecting the structure or function of essential proteins playing an active role in the entry and spread of SARS-CoV-2. This assumption is supported by a study of twins reporting 50% heritability of COVID-19 risk [30].

In this respect, the COVID-19 Host Genetics Initiative (HGI), 2020 (available at https://www.covid19hg.org/) is currently leading a worldwide public effort to analyse COVID-19 information in millions of individuals in relation with genotype data. It aims to identify genetic variants associated with SARS-CoV-2 infection as well as COVID-19 rate of hospitalization and disease severity. A recent study by Baggen et al. summarizes the wealth of information on proviral SARS-CoV-2 host factors that have been produced by genome-wide functional genetic screens and interactome analyses with further discussion about their roles in cellular processes [31]. Moreover, worldwide genome-wide association studies (GWAS) have identified many risk genes and loci that could be functionally implicated in COVID-19 disease [32,33,34,35,36]. Recent whole-exome sequencing (WES)-based studies in several countries also aimed to identify the genetic factors of COVID-19 susceptibility. However, it is to be noted that these studies are focused only on investigating genetic variants in small groups of risk genes, mainly related to the initial stages of infection [37,38,39,40,41,42].

In this study, we summarized available resources of known risk variants and genes associated with COVID-19 to explain patients’s genetic predispositions for a severe course or higher mortality of COVID-19. We collected and analysed genomic variants that either: a.) showed evidence of association with COVID-19 in GWAS studies, b.) were located in genes previously reported to be a risk in selected studies by literature search, c.) showed strong association in meta-analyses conducted by the COVID-19 HGI Browser, and d.) were located in genes associated with COVID-19 by QIAGEN Clinical Insight (QCI™) Analyze software. We present a comprehensive comparative assessment of variants identified in these four groups compared to our cohort of genotyped patients. In addition, allele frequencies of identified variants were compared with two types of control groups, worldwide public data of the non-Finnish European population from the gnomAD database (NFE) and our genetic data from Non-Invasive Prenatal Testing (NIPT) in the Slovak population. Despite the growing knowledge about genetic factors associated with the risk of COVID-19, there is a lack of studies that elucidate the multitude of other factors that influence this disease. In this study, we provide different approaches to analysing genetic variants in the WES data from Slovak patients who have died of COVID-19.

Results

Analyses of variants associated with COVID-19 in GWAS Catalog

The first goal of our analysis was to investigate the variants associated with COVID-19 in GWAS Catalog (https://www.ebi.ac.uk/gwas/). Overall, 413 risk variants in a total of 403 genes were identified in the GWAS dataset after entering the keyword “COVID-19” into the GWAS Catalog Browser. After merging all identified variants from GWAS (413 risk variants) with our WES data of the deceased patients, we identified only 3 common variants – the rs11147040 missense variant, the synonymous rs72472161 variant, and the rs8176719 coding sequence variant. Other identified variants belonged to the intron, non-coding, and UTR variants group.

Analyses of COVID-19 risk variants through literature search

After conducting the literature search, we chose 3 types of studies: I. a study by Baggen et al. [31] included cellular host factors for SARS-CoV-2 infection, II. studies by COVID-19 HGI [43] and Pairo-Castineira et al. [35], which reported genome-wide loci associated with severe manifestations of COVID-19, and III. Ackermann et al. [44] which analysed respiratory failure-associated gene sets (Table S1).

When we merged all identified variants with the WES data of the deceased Slovak patients, we identified 208 (I.), 154 (II.), and 437 (III.) common risk variants, respectively. Next, we compared this data with two control groups—the NIPT data from the Slovak population, where we identified 103 (I.), 96 (II.), and 245 (III.) common risk variants and the NFE data (Table S2, Figure S1). Using the NFE data, we performed two types of analyses, first with risk variants found in the overlap publications data/WES/NIPT and the second with risk variants from the overlap of the publications data/WES. The violin-swarm plots in Fig. 1 represent the graphical comparison of the allele frequencies of risk variants from the overlap of the publications data/WES/NIPT/NFE (103, 96, and 245 common risk variants). The comparison of allele frequencies of risk variants from the overlap of the publication data/WES/NFE (208, 155, and 437 risk variants) is shown in Figure S2. Using the Mann–Whitney U test, we did not identify any significant difference between the groups in both types of comparisons.

Fig. 1
figure 1

Violin-swarm plots show allele frequencies for the Slovak NIPT and WES data and NFE in risk variants identified in 3 types of publications: I. Baggen et al. [31] (103 risk variants). Statistic: median_sN = 0.280, sd_sN = 0.285; median_W = 0.296; sd_W = 0.284; median_N = 0.265, sd_N = 0.291; Mann–Whitney U test (WES vs. Slovak NIPT): p-value = 0.3539; Mann–Whitney U test (WES vs. NFE): p-value = 0.4146. II. COVID-19 Host Genetics Initiative [43] and Pairo-Castineira et al. [35] (96 risk variants). Statistic: median_sN = 0.21, sd_sN = 0.215; median_W = 0.278; sd_W = 0.223; median_N = 0.238, sd_N = 0.212; Mann–Whitney U test (WES vs. Slovak NIPT): p-value = 0.0518; Mann–Whitney U test (WES vs. NFE): p-value = 0.0816. III. Ackermann et al. [44] (245 risk variants). Statistic: median_sN = 0.268, sd_sN = 0.281; median_W = 0.278; sd_W = 0.287; median_N = 0.274, sd_N = 0.283; Mann–Whitney U test (WES vs. Slovak NIPT): p-value = 0.4048; Mann–Whitney U test (WES vs. NFE): p-value = 0.3854. Median is labeled with ♦; sd—standard deviation, sN—Slovak NIPT, W—WES—whole-exome sequencing data from the deceased patients, N—Non-Finnish European population from the gnomAD database

Subsequently, allele counts of risk variants identified in our sample sets from the WES data were compared to allele counts of these variants in the Slovak NIPT data and the NFE data using the Fisher's exact test. By comparing risk variant allele counts in individual groups of publications (I., II., III.), we have identified 9, 12, and 10 risk variants with p-value under 0.05 in the comparison between the NIPT and the WES data and 4, 21, and 15 risk variants in the comparison between the WES and NFE data, respectively (Table S2). After implementing the Bonferroni correction, the rs35048651 risk variant was identified with a significantly different representation (p-value = 0.000005) in the I. group. Also, we identified the rs2498800 risk variant (p-value = 0.000035) in the III. group after comparison between the NIPT and WES data. Further still, the rs7853989, rs8176741, rs8176743, rs8176746, rs8176747, rs8176749, and rs8176751 risk variants were found in the II. group based on comparison between the WES and NFE data in the overlap of the publications data/WES/NIPT/NFE (Table 2). Using the Fisher's exact test, we also compared allele counts of risk variants from the overlap of the publications data/WES/NFE. Here, we found 43, 30, and 92 risk variants, respectively with a p-value under 0.05. After the Bonferroni correction, we identified the rs78472109, rs4024453, rs594178, rs146238849, rs62478356, rs10454320, rs56850341, rs150756699 and rs377124451 risk variants in the I. group. The same process led to identification of rs8176751, rs8176749, rs8176747, rs8176746, rs8176743, rs8176741 and rs7853989 risk variants in the II. group as well as rs373030497, rs9521779, rs1226613997, rs540909253, rs9521780, rs9515217, rs201005516 and rs201495169 risk variants in the III. group all characterised by a significantly different representation (Table S2).

Analyses of COVID-19 risk variants using the COVID-19 Host Genetics Initiative

In this part, we analysed the COVID-19 risk variants identified in meta-analyses conducted by the COVID-19 HGI Browser. We focused only on missense variants from all four genome-wide meta-analysis results available online (https://app.covid19hg.org/). The total number of COVID-19 missense risk variants found in each HGI group is summarized in Table 1.

Table 1 The total number of risk variants that were common across the individual comparative analysis (between NIPT, WES data and variants found in publications and individual HGI groups). *Comparison results from NIPT/WES/NFE/pub or HGI data/QCI analysis published in Results; ** Comparison results from NFE/WES/pub or HGI data/QCI analysis published in supplementary files; *** % of variants from comparison total number of NIPT/WES/Covid-19 HGI risk variants to the total number of risk variants in HGI group; **** % of variants from comparison number of WES/NFE/QCI/pub or HGI data risk variants to the total number of risk variants in comparison: WES/NFE/pub or HGI data; ***** % of variants from comparison number of NIPT/WES/NFE/QCI/pub or HGI data risk variants to the total number of risk variants in comparison: NIPT/WES/pub or HGI data

After merging all missense risk variants identified in individual HGI groups with the WES data of the deceased patients, we determined 51 common risk variants in the A2 group, 13 risk variants in the B1 group, 58 risk variants in the B2 group, and 63 risk variants in the C2 group (Table S2). This data was compared with two controls – the first represented the Slovak NIPT data, where we identified 45 common risk variants in the A2 group, 13 in the B1 group, 51 in the B2 group and 52 in the C2 group (Table S2, Figure S1). The second was the NFE data, where we performed two different analyses, first with the risk variants that were found in the overlap HGI groups data/WES/NIPT, and the second with risk variants from the overlap of HGI groups data/WES. The violin-swarm plots displaying graphical comparison of the allele frequencies of risk variants from the overlap HGI groups data/WES/NIPT/NFE are shown in Fig. 2. Likewise, the graphical comparison of the allele frequencies of risk variants from the overlap HGI groups data/WES/NFE is shown in Figure S3.

Fig. 2
figure 2

Violin-swarm plots show allele frequencies for the Slovak NIPT and WES data and NFE data. Statistical significance is labeled with *, median with ♦. A2: Critically ill vs. population controls (45 risk variants). Statistics: median_W = 0.278; sd_W = 205; median_sN = 0.182, sd_sN = 0.186; median_N = 0.217, sd_N = 0.198; Mann–Whitney U test (WES vs. Slovak NIPT): p-value = 0.06510; Mann–Whitney U test (WES vs. NFE): p-value = 0.05363. B1: Hospitalized COVID-19 vs. non-hospitalized COVID-19 (13 risk variants). Statistics: median_W = 0.389; sd_W = 0.321; median_sN = 0.386, sd_sN = 0.297; median_N = 0.326, sd_N = 0.312; Mann–Whitney U test: WES vs. Slovak NIPT: p-value = 0.41863; WES vs. NFE: p-value = 0.22057. B2: Hospitalized COVID-19 vs population controls (51 risk variants). Statistics: median_W = 0.278; sd_W = 0.179; median_sN = 0.181, sd_sN = 0.181; median_N = 0.217, sd_N = 0.176; Mann–Whitney U test: WES vs. slovak NIPT: p-value = 0.04538; WES vs. NFE: p-value = 0.01872. C2: Reported SARS-CoV-2 infections vs. population controls (52 risk variants). Statistics: median_W = 0.204; sd_W = 0.21; median_sN = 0.177, sd_sN = 0.207; median_N = 0.176, sd_N = 0.209; Mann–Whitney U test: WES vs. slovak NIPT: p-value = 0.26942; WES vs. NFE: p-value = 0.12153; sd—standard deviation, sN—Slovak NIPT, W—WES—whole-exome sequencing data from deceased patients, NFE—Non-Finnish European population from the gnomAD database

Using the Mann–Whitney U test, we observed a significant difference in the comparison between the WES data and the NIPT data (p-value = 0.02609) as well as between the WES data and the NFE data (p-value = 0.01872), both in the B2 group in the HGI groups data/WES/NIPT/NFE overlap (Fig. 2). Another significant difference was identified between the WES data and NFE data in the B2 group in the HGI groups data/WES/NFE overlap (p-value = 0.03132, Figure S3). The differences identified in the other comparisons were negligible.

Using the Fisher's exact test, we compared the allele count of risk variants identified in our sample set from the WES data to allele counts of these variants in the Slovak NIPT data as well as the NFE data. We identified 3 variants in the A2 group, 4 variants in the B2, and 2 risk variants in the C2 group when comparing the WES and NIPT data. Likewise, we found 2 risk variants in the A2 group, 2 risk variants in the B2 group, and 7 risk variants in the C2 group in the comparison between the WES and NFE data with p-value under 0.05 in the overlap HGI groups data/WES/NIPT/NFE (Table S2). After the Bonferroni correction, the rs3130984 variant was identified with a significantly different representation (p-value = 0.000056) in the comparison between the WES and NIPT data in the A2 group. Equally, 4 risk variants, rs8176747, rs8176746, rs8176743 and rs7853989 (p-value = 0.00025) were identified in the comparison between the WES and NFE data in the C2 group (Table 2). We did not observe any significant differences in the B1 and B2 groups. After applying the Fisher's exact test to compare the allele count of risk variants in the overlap of HGI groups data/WES/NFE, we found 2 risk variants in the A2 group, 2 risk variants in the B2 group, and 8 risk variants in the C2 group with p-value under 0.05. After the Bonferroni correction, the results correspond with the figures from the comparison between WES and NFE in the overlap of HGI groups data/WES/NIPT/NFE.

Table 2 Risk variants with a significant difference (after the Bonferroni correction) in allele distribution of WES data from patients who died of Covid-19 and the control data of the Slovak NIPT (W/sN) and NFE (W/N). sig—significance: *—p-value < 0.05, **—significant after the Bonferroni correction

Analyses of variants by QIAGEN Clinical Insight (QCI™) Analyze software

After analysing the variant call format files (VCF) for each patient and then applying a biological filter, we selected 4 from the 8 offered COVID-19 filters. Variants are divided into two categories to follow the concept applied in other analyses: the QCI infection group—containing 376 risk variants with the "Mild COVID-19" biological filter and the QCI critical group containing 1738 variants with "Critical COVID-19", "Severe COVID-19" and "COVID-19 related immunodeficiency 74" biological filters. Other filters were excluded from further analyses as it was not possible to clearly determine their classification into infected or critical groups. Subsequently, a number of risk variants in the WES/NFE comparison or in the NIPT/WES/pub comparison or HGI data were contrasted to the QCI analysis of the infection and critical groups. The results of individual comparisons of the number of variants are shown in Table 1.

Discussion

In the present article, we attempted to elaborate on the spectrum of risk variants and genes identified in different ways and their possible relationship to severity and/or mortality of COVID-19. We investigated the frequencies of these variants and evaluated their possible role using a cohort of 27 Slovak patients who have died of COVID-19. In our previous studies, we described the re-use of the data from NIPT for genome-scale population-specific frequency determination of small DNA variants [45], copy number variants (CNVs) [46], and variants associated with colorectal cancer and Lynch syndrome [47]. Therefore, we assumed that the NIPT data can be utilized as a control group in this population study of COVID-19. As a second control group, we selected the NFE genetic data which contains a total of 125,748 exomes and 71,702 genomes. To our knowledge, the present study is the first population analysis of COVID-19 variants worldwide and also in the Slovak population that provides different approaches to the analysis of genetic variants in the WES data of patients who have died of COVID-19.

Over the past two years, GWAS have offered the opportunity to reveal factors of genetic susceptibility factors to COVID-19 disease and provided insights into the biological basis of SARS-CoV-2 etiology. To date, a large number of risk genetic variants and genes have been identified by the GWAS approach, which has been intimately connected to the COVID-19 susceptibility and severity [32,33,34,35, 48]. As we only identified 3 common variants between the GWAS catalog and our WES data, we have found that risk variants from the GWAS catalog are not useful for analyzing and comparing our WES-derived data.

In 2021, a large consortium conducted some highly anticipated studies published last year. The COVID‐19 HGI presented results from three genome‐wide association meta‐analyses of up to 49,562 COVID‐19 patients from 46 studies spanning across 19 countries [49]. They reported 13 genome‐wide significant loci. The 3p21.31 region seemed to be associated with infection susceptibility, which was also confirmed in a study by Ellinghaus et al. This study also confirmed a potential involvement of the ABO blood-group system [32]. Similar results were also found in a study conducted by 23andMe company using their own biobank. After the Bonferroni correction in the analysis of COVID-19 missense risk variants conducted by the COVID-19 HGI Browser, we identified 5 variants with a significantly different representation; the rs3130984 missense variant located in the CDSN (Corneodesmosin) gene and four variants (rs8176747, rs8176746, rs8176743, and rs7853989) all located in the ABO gene. Two missense variants, rs8176747 and rs8176746, were found by a comparison between the WES and NFE data in the C2 HGI group and the II. group in the analysis using literature search. Recently, a GWAS found a COVID-19-association signal at locus 9q34.2 coincident with the ABO-blood group (rs8176747, rs41302905, rs8176719) in Italian and Spanish severe COVID-19 patients with respiratory failure [32, 50]. Another study published a genetic hypothesis on the role of the renin-angiotensin system (RAS) genes—ACE1 (Angiotensin-converting enzyme 1), ACE2 (Angiotensin-converting enzyme 2), and ABO-locus (rs495828, rs8176746) in COVID-19 prognosis, suspecting inherited genetic predispositions to be predictive of COVID-19 severity [51].

Since 2017, the QIAGEN Clinical Insight (QCI™) Analyze software has been available as an integrated clinical decision support solution for the annotation, interpretation and reporting of NGS data. QCI seeks to extract clinical significance and actionability from sequencing data [52]. After a literature search of studies using QIAGEN Clinical Insight (QCI™) Analyze software, we found only a few studies focusing mainly on cancer research [52,53,54]. As anticipated, a comparison of the QCI results showed an overlap in the number of risk variants found in the QCI critical group with the variants in the groups II. and III. that publications associated with the severity and mortality of COVID-19. However, we did not identify any overlap with the group I.; the publication included cellular host factors for SARS-CoV-2 infection. In the B1 HGI group, no overlap was found in the number of variants with the QCI infection and the critical group, which may be due to the relatively small number of identified missense variants (only 13 risk variants). In the C2 HGI group, we expected to overlap with both groups, but we only identified with the QCI critical group. Although usage of the QIAGEN Clinical Insight (QCI ™) Analyze software requires further optimization, this software may be a promising diagnostic, prognostic, or predictive tool in the future research of COVID-19.

Our study has several key shortcomings. First, although we used different approaches to analyse the WES data of patients who died of COVID-19, we were able to identify only 10 risk variants with significant difference (after the Bonferroni correction) in allele distribution of the WES data and data of Slovak NIPT and NFE. Next, the sample size of the deceased patients was relatively small. Moreover, data from NIPT are strongly biased towards the healthy population of females. It should also be noted that the total number of analyzed risk variants identified by COVID-19 HGI was also relatively small as we focused only on missense variants. We targeted missense variants to identify as many risk variants as possible since most of the identified variants from HGI belonged to the non-coding category. In GWAS analysis, this proved to be useless for analysis of our WES data.

Conclusions

As the COVID-19 pandemic led to a global crisis with a serious global impact, it is crucial to determine how host genetic factors are linked to the clinical outcomes. Therefore, many studies have focused on discovering the influence of host genetic factors on the SARS-CoV-2 infection risk or COVID-19 severity. The main aim of our study was to provide an informative insight into the utility of COVID-19 data from publicly available databases with practical application to WES data from COVID-19 patients. WES is routinely used and is gradually being optimized for the detection of rare and common genetic variants in humans. Genetic stratification of patients with COVID-19 from WES data would be possible, but in clinical practice, this approach requires overcoming several limitations that we also faced in our study. Moreover, WES only analyses exonic variants; therefore, the study may have failed to detect potentially related intronic variants. However, whole-genome sequencing (WGS) is becoming increasingly attractive as an alternative, due to its broader coverage, decreasing cost, and more comprehensive information.

Methods

The method consists of four main steps, presented in the diagram in Fig. 3. In the first step were analyzed the exome sequencing data derived from pulmonary tissue samples taken at autopsy of 27 (13 men, 14 women) deceased patients, with the mean age 66 years, with a confirmed diagnosis of SARS-CoV-2 infection. The evaluations were performed as part of the autopsy and subsequent laboratory and analytical tasks carried out by the Health Care Surveillance Authority to monitor the course of COVID-19. Performance of the analyses was approved by the relevant ethics committee, N° 21–49, on 8 April 2021. Laboratory processing and bioinformatic analysis of the data were performed as described in Additional file (Methods).

Fig. 3
figure 3

The diagram represents 4 main steps used in identification of risk variants associated with SARS-CoV-2 infections and the severity of COVID-19 disease

In the second step, we performed a selection of risk variants and/or genes associated with COVID-19. All risk variants associated with COVID-19 found in the GWAS Catalog were downloaded from https://www.ebi.ac.uk/gwas/search?query=COVID-19. We also performed a literature search of risk genes and genetic variants associated with infection and severity of COVID-19 published in 2020 and 2021 via the PubMed database (https://pubmed.ncbi.nlm.nih.gov/). More details are available in supplementary material. COVID-19 HGI GWAS meta-analysis release 6 data was downloaded from https://app.covid19hg.org/. Details for each study are provided on the HGI website and in the consortium paper [43].

In the third step, we compared each group of identified risk variants with our WES data of the deceased patients and with two groups of controls: NIPT data from the Slovak population and data of NFE (v3.1.2, downloaded from https://gnomad.broadinstitute.org/downloads). The detailed information about the NIPT dataset generated and analysed in this study is fully described in our previous studies [45, 55]. Allele frequencies of identified variants for each group (Slovak NIPT data, WES data, and data of NFE) in each analysis were plotted using violin-swarm plots.

In the fourth step, QIAGEN Clinical Insight (QCI™) Analyze software was used to analyse the WES data of the deceased patients. Variants were imported as an industry-standard VCF into the QCI-A web interface, which enables data interpretation. We performed variant filtering based on biological context with “COVID-19” as the keyword which included eight filters.

Statistical analysis

Python language libraries scikit-allele were used for analysing genetic variation data and matplotlib and seaborn for the graphic representation. The significance of our findings was evaluated using statistical tests implemented in the Python SciPy package. We used the Mann–Whitney U test to compare risk variants allele frequencies from the deceased COVID-19 patients and the Slovak NIPT data or respectively gnomAD data respectively as controls. In addition, observed allele frequencies for each risk variant in all groups were tested using the Fisher's exact test.

Availability of data and materials

The NIPT datasets generated and analysed during the current study are available in the DSpace repository, https://dspace.uniba.sk/xmlui/handle/123456789/27. WES datasets are available from the corresponding author on reasonable request.

References

  1. Driggin E, Madhavan MV, Bikdeli B, Chuich T, Laracy J, Biondi-Zoccai G, et al. Cardiovascular Considerations for Patients, Health Care Workers, and Health Systems During the COVID-19 Pandemic. J Am Coll Cardiol. 2020;75:2352–71.

    Article  CAS  Google Scholar 

  2. Mehta P, McAuley DF, Brown M, Sanchez E, Tattersall RS, Manson JJ, et al. COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet. 2020;395:1033–4.

    Article  CAS  Google Scholar 

  3. Terpos E, Ntanasis-Stathopoulos I, Elalamy I, Kastritis E, Sergentanis TN, Politou M, et al. Hematological findings and complications of COVID-19. Am J Hematol. 2020;95:834–47.

    Article  CAS  Google Scholar 

  4. Fricke-Galindo I, Falfán-Valencia R. Genetics Insight for COVID-19 Susceptibility and Severity: A Review. Front Immunol. 2021;12:622176.

    Article  CAS  Google Scholar 

  5. Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395:565–74.

    Article  CAS  Google Scholar 

  6. Majumder J, Minko T. Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19. AAPS J. 2021;23:14.

    Article  CAS  Google Scholar 

  7. Guarner J. Three Emerging Coronaviruses in Two Decades. American Journal of Clinical Pathology. 2020;420–421 https://doi.org/10.1093/ajcp/aqaa029.

  8. Rajgor DD, Lee MH, Archuleta S, Bagdasarian N, Quek SC. The many estimates of the COVID-19 case fatality rate. Lancet Infect Dis. 2020;20:776–7.

    Article  CAS  Google Scholar 

  9. Sun Q, Qiu H, Huang M, Yang Y. Lower mortality of COVID-19 by early recognition and intervention: experience from Jiangsu Province. Ann Intensive Care. 2020;10:33.

    Article  Google Scholar 

  10. Vincent J-L, Taccone FS. Understanding pathways to death in patients with COVID-19. Lancet Respir Med. 2020;8:430–2.

    Article  CAS  Google Scholar 

  11. Grasselli G, Zangrillo A, Zanella A, Antonelli M, Cabrini L, Castelli A, et al. Baseline Characteristics and Outcomes of 1591 Patients Infected With SARS-CoV-2 Admitted to ICUs of the Lombardy Region. Italy JAMA. 2020;323:1574–81.

    Article  CAS  Google Scholar 

  12. Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020;2052 https://doi.org/10.1001/jama.2020.6775.

  13. Xu X-W, Wu X-X, Jiang X-G, Xu K-J, Ying L-J, Ma C-L, et al. Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series. BMJ. 2020;368:m606.

    Article  Google Scholar 

  14. Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–13.

    Article  CAS  Google Scholar 

  15. Wu Z, McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. JAMA. 2020;323:1239–42.

    Article  CAS  Google Scholar 

  16. Han C, Duan C, Zhang S, Spiegel B, Shi H, Wang W, et al. Digestive Symptoms in COVID-19 Patients With Mild Disease Severity: Clinical Presentation, Stool Viral RNA Testing, and Outcomes. Am J Gastroenterol. 2020;115:916–23.

    Article  Google Scholar 

  17. Tian S, Hu N, Lou J, Chen K, Kang X, Xiang Z, et al. Characteristics of COVID-19 infection in Beijing. J Infect. 2020;80:401–6.

    Article  CAS  Google Scholar 

  18. Singhal T. A Review of Coronavirus Disease-2019 (COVID-19). Indian J Pediatr. 2020;87:281–6.

    Article  Google Scholar 

  19. Pan L, Mu M, Yang P, Sun Y, Wang R, Yan J, et al. Clinical Characteristics of COVID-19 Patients With Digestive Symptoms in Hubei, China: A Descriptive, Cross-Sectional. Multicenter Study Am J Gastroenterol. 2020;115:766–73.

    Article  Google Scholar 

  20. Cantalupo S, Lasorsa VA, Russo R, Andolfo I, D’Alterio G, Rosato BE, et al. Regulatory Noncoding and Predicted Pathogenic Coding Variants of Predispose to Severe COVID-19. Int J Mol Sci. 2021;22 https://doi.org/10.3390/ijms22105372.

  21. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:533–4.

    Article  CAS  Google Scholar 

  22. Ali H, Alshukry A, Marafie SK, AlRukhayes M, Ali Y, Abbas MB, et al. Outcomes of COVID-19: Disparities by ethnicity. Infect Genet Evol. 2021;87:104639.

    Article  CAS  Google Scholar 

  23. Liu X, Zhou H, Zhou Y, Wu X, Zhao Y, Lu Y, et al. Risk factors associated with disease severity and length of hospital stay in COVID-19 patients. J Infect. 2020;81:e95–7.

    Article  CAS  Google Scholar 

  24. Scully EP, Haverfield J, Ursin RL, Tannenbaum C, Klein SL. Considering how biological sex impacts immune responses and COVID-19 outcomes. Nat Rev Immunol. 2020;20:442–7.

    Article  CAS  Google Scholar 

  25. Gebhard C, Regitz-Zagrosek V, Neuhauser HK, Morgan R, Klein SL. Impact of sex and gender on COVID-19 outcomes in Europe. Biol Sex Differ. 2020. https://doi.org/10.1186/s13293-020-00304-9.

    Article  Google Scholar 

  26. Jutzeler CR, Bourguignon L, Weis CV, Tong B, Wong C, Rieck B, et al. Comorbidities, clinical signs and symptoms, laboratory findings, imaging features, treatment strategies, and outcomes in adult and pediatric patients with COVID-19: a systematic review and meta-analysis. Travel Med Infect Dis. 2020;37:101825.

    Article  Google Scholar 

  27. Guan W-J, Ni Z-Y, Hu Y, Liang W-H, Ou C-Q, He J-X, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N Engl J Med. 2020;382:1708–20.

    Article  CAS  Google Scholar 

  28. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.  Lancet. 2020;1054–1062 https://doi.org/10.1016/s0140-6736(20)30566-3.

  29. Cummings MJ, Baldwin MR, Abrams D, Jacobson SD, Meyer BJ, Balough EM, et al. Epidemiology, clinical course, and outcomes of critically ill adults with COVID-19 in New York City: a prospective cohort study. medRxiv. 2020 https://doi.org/10.1101/2020.04.15.20067157.

  30. Williams FMK, Freidin MB, Mangino M, Couvreur S, Visconti A, Bowyer RCE, et al. Self-Reported Symptoms of COVID-19, Including Symptoms Most Predictive of SARS-CoV-2 Infection. Are Heritable Twin Res Hum Genet. 2020;23:316–21.

    Article  Google Scholar 

  31. Baggen J, Vanstreels E, Jansen S, Daelemans D. Cellular host factors for SARS-CoV-2 infection. Nat Microbiol. 2021;6:1219–32.

    Article  CAS  Google Scholar 

  32. Severe Covid-19 GWAS Group, Ellinghaus D, Degenhardt F, Bujanda L, Buti M, Albillos A, et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure. N Engl J Med. 2020;383:1522–34.

    Article  Google Scholar 

  33. Wu P, Ding L, Li X, Liu S, Cheng F, He Q, et al. Trans-ethnic genome-wide association study of severe COVID-19. Commun Biol. 2021;4:1034.

    Article  CAS  Google Scholar 

  34. Dubé M-P, Lemaçon A, Barhdadi A, Lemieux Perreault L-P, Oussaïd E, Asselin G, et al. Genetics of symptom remission in outpatients with COVID-19. Sci Rep. 2021;11:10847.

    Article  Google Scholar 

  35. Pairo-Castineira E, Clohisey S, Klaric L, Bretherick AD, Rawlik K, Pasko D, et al. Genetic mechanisms of critical illness in COVID-19. Nature. 2021;591:92–8.

    Article  Google Scholar 

  36. Hu J, Li C, Wang S, Li T, Zhang H. Genetic variants are identified to increase risk of COVID-19 related mortality from UK Biobank data. Hum Genomics. 2021;15:10.

    Article  CAS  Google Scholar 

  37. Curtis D. Variants in ACE2 and TMPRSS2 Genes Are Not Major Determinants of COVID-19 Severity in UK Biobank Subjects. Hum Hered. 2020;85:66–8.

    Article  CAS  Google Scholar 

  38. Baldassarri M, Fava F, Fallerini C, Daga S, Benetti E, Zguro K, et al. Severe COVID-19 in Hospitalized Carriers of Single Pathogenic Variants. J Pers Med. 2021;11 https://doi.org/10.3390/jpm11060558.

  39. Ravikanth V, Sasikala M, Naveen V, Latha SS, Parsa KVL, Vijayasarathy K, et al. A variant in is associated with decreased disease severity in COVID-19. Meta Gene. 2021;29:100930.

    Article  CAS  Google Scholar 

  40. Monticelli M, Hay Mele B, Benetti E, Fallerini C, Baldassarri M, Furini S, et al. Protective Role of a Variant on Severe COVID-19 Outcome in Young Males and Elderly Women. Genes . 2021;12 https://doi.org/10.3390/genes12040596.

  41. Al-Mulla F, Mohammad A, Al Madhoun A, Haddad D, Ali H, Eaaswarkhanth M, et al. And variants are potential predictors of SARS-CoV-2 outcome: A time to implement precision medicine against COVID-19. Heliyon. 2021;7:e06133.

    Article  CAS  Google Scholar 

  42. Latini A, Agolini E, Novelli A, Borgiani P, Giannini R, Gravina P, et al. COVID-19 and Genetic Variants of Protein Involved in the SARS-CoV-2 Entry into the Host Cells. Genes . 2020;11 https://doi.org/10.3390/genes11091010.

  43. COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature. 2021;600:472–7.

    Article  Google Scholar 

  44. Ackermann M, Verleden SE, Kuehnel M, Haverich A, Welte T, Laenger F, et al. Pulmonary Vascular Endothelialitis, Thrombosis, and Angiogenesis in Covid-19. N Engl J Med. 2020;383:120–8.

    Article  CAS  Google Scholar 

  45. Budis J, Gazdarica J, Radvanszky J, Harsanyova M, Gazdaricova I, Strieskova L, et al. Non-invasive prenatal testing as a valuable source of population specific allelic frequencies. J Biotechnol. 2019;299:72–8.

    Article  CAS  Google Scholar 

  46. Pös O, Budis J, Kubiritova Z, Kucharik M, Duris F, Radvanszky J, et al. Identification of Structural Variation from NGS-Based Non-Invasive Prenatal Testing. Int J Mol Sci. 2019;20 https://doi.org/10.3390/ijms20184403.

  47. Forgacova N, Gazdarica J, Budis J, Radvanszky J, Szemes T. Repurposing non-invasive prenatal testing data: Population study of single nucleotide variants associated with colorectal cancer and Lynch syndrome. Oncol Lett. 2021;22:779.

    Article  CAS  Google Scholar 

  48. Mousa M, Vurivi H, Kannout H, Uddin M, Alkaabi N, Mahboub B, et al. Genome-wide association study of hospitalized COVID-19 patients in the United Arab Emirates. EBioMedicine. 2021;74:103695.

    Article  CAS  Google Scholar 

  49. COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet. 2020;28:715–8.

    Article  Google Scholar 

  50. Gemmati D, Tisato V. Genetic Hypothesis and Pharmacogenetics Side of Renin-Angiotensin-System in COVID-19. Genes . 2020;11 https://doi.org/10.3390/genes11091044.

  51. Chung C-M, Wang R-Y, Chen J-W, Fann CSJ, Leu H-B, Ho H-Y, et al. A genome-wide association study identifies new loci for ACE activity: potential implications for response to ACE inhibitor. Pharmacogenomics J. 2010;10:537–44.

    Article  CAS  Google Scholar 

  52. Perakis SO, Weber S, Zhou Q, Graf R, Hojas S, Riedl JM, et al. Comparison of three commercial decision support platforms for matching of next-generation sequencing results with therapies in patients with cancer. ESMO Open. 2020;5:e000872.

    Article  Google Scholar 

  53. Erdem HB, Bahsi T. Spectrum of germline cancer susceptibility gene mutations in Turkish colorectal cancer patients: a single center study. Turk J Med Sci. 2020;50:1015–21.

    Article  CAS  Google Scholar 

  54. Kolostova K, Pospisilova E, Pavlickova V, Bartos R, Sames M, Pawlak I, et al. Next generation sequencing of glioblastoma circulating tumor cells: non-invasive solution for disease monitoring. Am J Transl Res. 2021;13:4489–99.

    CAS  Google Scholar 

  55. Gazdarica J, Budis J, Duris F, Turna J, Szemes T. Adaptable Model Parameters in Non-Invasive Prenatal Testing Lead to More Stable Predictions. Int J Mol Sci. 2019;20 https://doi.org/10.3390/ijms20143414.

  56. Babraham bioinformatics - FastQC A quality control tool for high throughput sequence data. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ [Cited 20 Dec 2021].

  57. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. 2013. Available: http://arxiv.org/abs/1303.3997.

  58. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.

    Article  Google Scholar 

  59. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–4.

    Article  CAS  Google Scholar 

  60. Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32:292–4.

    CAS  Google Scholar 

  61. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36:983–7.

    Article  CAS  Google Scholar 

  62. Budis J, Krampl W, Kucharik M, Hekel R, Goga A, Lichvar M, et al. SnakeLines: integrated set of computational pipelines for sequencing reads. arXiv [q-bio.GN]. 2021. Available: http://arxiv.org/abs/2106.13649.

  63. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10 https://doi.org/10.1093/gigascience/giab008.

  64. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–11.

    Article  CAS  Google Scholar 

  65. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122.

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This publication is the result of support from the Operational Programme Integrated Infrastructure for the project: Pangenomics for personalized clinical management of infected persons based on identified viral genome and human exoma (Code ITMS:313011ATL7), co-financed by the European Regional Development Fund; co financed by the Slovak Research and Development Agency grant PP-COVID-20–051.

Author information

Authors and Affiliations

Authors

Contributions

NF was responsible for the literature search and manuscript writing. TSe prepared the samples for whole-exome sequencing. RH, ZH, JG and ZP performed data analysis. JG, JB and JR were responsible for designing the study and supervising the work. TSz conceived the idea of the project and TSz, JB, PB, JR and PR performed proofreading of the manuscript. LK, PJ, KMK were collecting the samples for analysis. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Natalia Forgacova.

Ethics declarations

Ethics approval and consent to participate

The study was conducted according to the guidelines of the Declaration of Helsinki, revised in 2013, and approved by the Institutional Review Board of the Department of Pathology, Faculty of Medicine, Comenius University in Bratislava, Slovakia, protocol code 21–49, on 8 April 2021. In accordance with § 48 Sect. 3 of Act No. 581/2004 Coll. the Health Care Surveillance Authority of the Slovak Republic orders the performance of an autopsy during which the indicated laboratory examinations are carried out. In accordance with § 48 Sect. 14 of Act No 581/2004 Coll. in connection with the performance of autopsies, the Authority shall take biological material from dead bodies for diagnostic, research, and development purposes and for other purposes specified by law. We confirm that the analyses included in this publication were performed after the patient’s death as part of the laboratory tests performed during the autopsy. The patients signed informed consent is therefore not relevant for these examinations. The Health Care Surveillance Authority agrees with the publication of the results of the analyses, which are anonymised and do not contain any personal data that could lead to the identification of the patient.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Genes selected after literature search from 3 types of publications.

Additional file 2: Table S2.

All 900 Covid-19 associated risk variants used in our analyses with Fisher's exact test p-values. • /p-value - included in the analysis; sig - significance: * - p-value < 0.05, ** - significant after Bonferroni correction; # - overlap with risk variants from QCI infection group; § - overlap with risk variants from QCI critical group; sN - Slovak NIPT data, W - WES data of Slovak dead patients, N - non-Finnish European gnomAD data.

Additional file 3.

Methods [56,57,58,59,60,61,62,63,64,65].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Forgacova, N., Holesova, Z., Hekel, R. et al. Evaluation and limitations of different approaches among COVID-19 fatal cases using whole-exome sequencing data. BMC Genomics 24, 12 (2023). https://doi.org/10.1186/s12864-022-09084-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-022-09084-5

Keywords

  • SARS-CoV-2
  • COVID-19
  • Whole-exome sequencing
  • Genetic association
  • Polymorphisms
  • Gnomad
  • Non-invasive prenatal testing