Development and validation of genome-wide InDel markers with high levels of polymorphism in bitter gourd (Momordica charantia)

Background The preferred choice for molecular marker development is identifying existing variation in populations through DNA sequencing. With the genome resources currently available for bitter gourd (Momordica charantia), it is now possible to detect genome-wide insertion-deletion (InDel) polymorphisms among bitter gourd populations, which guides the efficient development of InDel markers. Results Here, using bioinformatics technology, we detected 389,487 InDels from 61 Chinese bitter gourd accessions with an average density of approximately 1298 InDels/Mb. Then we developed a total of 2502 unique InDel primer pairs with a polymorphism information content (PIC) ≥0.6 distributed across the whole genome. Amplification of InDels in two bitter gourd lines ‘47–2–1-1-3’ and ‘04–17,’ indicated that the InDel markers were reliable and accurate. To highlight their utilization, the InDel markers were employed to construct a genetic map using 113 ‘47–2–1-1-3’ × ‘04–17’ F2 individuals. This InDel genetic map of bitter gourd consisted of 164 new InDel markers distributed on 15 linkage groups with a coverage of approximately half of the genome. Conclusions This is the first report on the development of genome-wide InDel markers for bitter gourd. The validation of the amplification and genetic map construction suggests that these unique InDel markers may enhance the efficiency of genetic studies and marker-assisted selection for bitter gourd. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07499-0.

Thus, InDel markers are a valuable complement for both SSR and SNP markers in genetic studies [9,10]. The development of InDel markers is becoming readily accessible because of the rapid development of nextgeneration sequencing (NGS). In crop species such as rice, maize, and soybean, genome-wide InDel markers have been developed based on sequencing data from two accessions [8,[11][12][13] and among diverse populations [14,15]. The latter cases certainly can provide more comprehensive and informative InDel markers for the species.
Bitter gourd (Momordica charantia), also known as bitter melon, bitter cucumber, and African cucumber, is an important vegetable crop widely distributed and cultivated throughout the tropics [16]. Bitter gourd fruits have many culinary uses in different countries, for example, in China, they are often stir-fried with eggs, meats, and other vegetables, stuffed (stuffed bitter gourd), or added in soups; in India, they are often served with yogurt, mixed with curry, or stuffed with spices and then fried in oil [17]. In addition, bitter gourd has been used in various herbal medicine systems and is associated with a wide range of beneficial effects on health such as anti-diabetic [18][19][20], anti-HIV [21,22], and anti-tumor [23,24]. Like most crops, genetic improvement of bitter gourd is also the challenge faced by breeders, thus developing efficient breeding protocols using molecular markers is required.
Genome-wide SSRs markers have been developed for bitter gourd based on the recently published whole genome sequence [25][26][27]; however, no work has been done on InDel identification and marker development to date. In this study, using the Dali-11 genome as a reference, we identified the genome-wide InDels from resequencing data of 61 Chinese bitter gourd accessions [27]. Based on the polymorphic information content (PIC), we selected and designed a set of highly informative, unique InDel markers. Moreover, using the newly developed InDel markers, we validated their amplification in two bitter gourd inbred lines, '47-2-1-1-3' and '04-17,' and constructed an InDel genetic map by genotyping the F 2 population derived from a cross between '47-2-1-1-3' and '04-17.' The results from this study provide a valuable marker resource for bitter gourd research and application in MAS.

Identification and distribution of genome-wide InDels
In total, 389,487 InDels were identified among the 61 Chinese bitter gourd accessions with an average density of approximately 1298 InDels/Mb across the whole genome (~300 Mb). InDels generally are distributed extensively across all 11 pseudochromosomes (MC01-MC11) and in accordance with the distribution of genes (Fig. 1).
Polymorphic alleles of InDels were identified in the 61 Chinese bitter gourd accessions, with the number of alleles per InDel ranging from two to seven ( Fig. 1; Additional file 1: Table S1). Of these, InDels with two alleles accounted for 77.53% of all InDels, thus were overrepresented. The number of InDels on each pseudochromosome varied from 16,384,005 (MC07) to 34,592,942 (MC08), with the density ranging from 1233 InDels/Mb (MC01) to 1498 InDels/Mb (MC05) (Fig. 2).

Development of highly polymorphic and unique InDel primers
To provide a set of InDels with a high potential for utilization for bitter gourd researchers, we selected 3511 highly polymorphic InDels (MC_g61ind0001-MC_ g61ind3511) with PIC ≥0.6 from the 389,487 InDels (Additional file 1: Table S2). Using their flanking sequences retrieved from the 'Dali-11' reference genome, a total of 3140 InDel primer pairs were successfully designed by the criteria defined. We subsequently mapped these primer sequences back to the 'Dali-11' reference genome and obtained a set of 2502 (79.68%) unique InDel primer pairs (Additional file 1: Table S3), which are distributed throughout the genome (Fig. 3). Then, we evaluated the amplification of the 2502 InDels in two bitter gourd inbred lines, '47-2-1-1-3' and '04-17,' and found that 2466 (98.56%) were successfully amplified. In this study, 212 (8.47%) out of 2502 InDel markers were confirmed to be polymorphic between the two lines (Additional file 2: Figure S1).

Construction of the InDel genetic map
In this study, a total of 113 F 2 individuals derived from the cross between '47-2-1-1-3' and '04-17' were genotyped using the 212 polymorphic InDel markers (Additional file 2: Figure S2). After filtering out 23 markers with severely missing data, 189 InDel markers were loaded into JoinMap 4.0. Finally, a total of 164 markers were integrated into 15 linkage groups (LG; LG1-LG15) (Fig. 4). The total genetic length of the InDel map is 1279.68 cM with an average distance of 7.80 cM between adjacent markers, and the genetic length for each LG ranged from 17.07 (LG9) to 210.70 cM (LG8) ( Table 1). Using the reference genome, the InDels on each of the 15 LGs could be assigned to a location and compared with the corresponding 11 pseudochromosomes (MC01-MC11). The genetic and physical position of the InDels on the LGs and the psudochromosomes were highly consistent (Fig. 4). The physical coverage by this map is 148.06 Mb (Table 1), which accounted for approximately half of the 'Dali-11' reference genome (~300 Mb). Based on the genetic and physical distance, the overall recombination rate of bitter gourd was calculated to be 8.64 cM/Mb.

Discussion
Bitter gourd is an economically important cucurbit crop. Molecular breeding for bitter gourd is far behind that of other cucurbits, such as cucumber and melon, because there is a lack of useful molecular markers. The two recently published bitter gourd genomes and resequencing data of diverse samples have led to the rapid identification of genome-wide polymorphisms that can be utilized for molecular studies [26,27]. InDel polymorphism is one of the most widely used PCR-based marker systems in MAS strategy. InDel markers have been extensively used in genetic mapping [13,28] and gene tagging [29][30][31].
This study accomplished the first large-scale investigation of genome-wide InDels in the bitter gourd genome, with the overall aim of providing a unique, polymorphic set of primers for molecular breeding research. In the present study, we identified a total of 389,487 InDels, which is twice the number of available SSR sites [25], from 61 Chinese bitter gourd accessions. Therefore, we  Fig. 1 Genome-wide distribution of InDels among the 61 Chinese bitter gourd accessions. Track A denotes the gene density; tracks B to G show the two, three, four, five, six, and seven allele sites, respectively. The unassembled scaffolds or contigs were assigned to MC00 and the data of gene density was cited from a previous report [27] Fig. 2 Number and density of InDels identified among 61 Chinese bitter gourd accessions. Bars represent the numbers of InDels; lines represent the density of InDels. A to F indicate the two, three, four, five, six, and seven allele sites; "All" indicates the total number of InDels have provided abundant candidates for InDel marker development. The average density of InDels in bitter gourd was observed to be approximately 1298 InDels/Mb, which is greater than the number of InDel markers available for cucumber (916 InDels/Mb) [32] and pepper (71 InDels/Mb) [33], but lower than that in rice (6245 InDels/Mb) [15] and tomato (1448 InDels/Mb) [34]. Moreover, we found that the identification criteria of each study was unique and the number of InDels obtained was largely dependent on the genetic variation of the genotypes from which they were identified. Because the InDels were identified from 61 diverse accessions of Chinese bitter gourd, these InDels will have utility in genetic research on Chinese bitter gourd germplasm and will potentially be useful for materials from other geographic regions. In addition to the value of a large number of markers in downstream genetic research, highly polymorphic sites that can be PCR amplified are more valuable for marker development. Highly variable sites will ensure the utility of InDel markers in a wider range of bitter gourd germplasm. Therefore, to determine the highly polymorphic InDels in bitter gourd, we screened 2502 unique InDel markers that had a PIC ≥0.6 from the total of 389,487 InDels. This screening criterion is higher than that of PIC ≥0.5 in maize [14] and rice [15]. The experimental PCR validation of the InDel markers between inbred lines '47-2-1-1-3' and '04-17' showed that 212 (8.47%) of 2502 InDel markers were polymorphic, which is lower than expected. We estimated that the polymorphism of this set of 2502 unique InDel markers would be better verified in more bitter gourd materials.
Some molecular marker systems, such as amplified fragment length polymorphisms (AFLP) [35], SSRs, sequence-related amplified polymorphism (SRAP) [36], and SNPs [26,37,38], have been used to construct genetic maps of bitter gourd. To the best of our knowledge, no previously published study has developed InDel markers to construct a genetic map of bitter gourd. In the present study, 164 new InDel markers were mapped into 15 LGs covering approximately half of the genome, and the genetic position on 15 LGs were nearly consistent with the physical position on all 11 pseudochromosomes, supporting the accuracy of the assembly of the 'Dali-11' reference genome [27]. The overall recombination rate observed in this study is comparable to that previously estimated by a RAD-based genetic map [38]. Taken together, the high amplification rate, number of polymorphisms, and the genetic mapping of this new set of InDel markers can be used for genetic studies such as mapping of the bitter gourd traits.

Conclusions
Here we report the first analysis of genome-wide InDels distributed throughout the bitter gourd genome and we developed a set of unique and potentially useful InDel markers. We also experimentally validated the amplification of the InDels in the inbred lines '47-2-1-1-3' and '04-17' to determine the polymorphisms. The polymorphic markers were used to construct the first InDel genetic map based on a '47-2-1-1-3' × '04-17' F 2 population of bitter gourd. The findings of this study indicate that the InDel makers developed in this study are informative and useful in future bitter gourd genetic studies.

InDel identification and selection in populations
Paired-end, clean reads of 61 Chinese bitter gourd accessions were mapped on the 'Dali-11' reference genome with BWA software [39] and exported as a BAM file. Samtools (http://samtools.sourceforge.net) and Picard (http://broadinstitute.github.io/picard) were used to refine the mapping output of BWA. The GATK pipeline [40] was used to detect InDels for each sample. Small insertions and deletions (≤50 bp in length) were calculated. The allelic diversity of each InDel in 61 bitter gourd samples was assessed by polymorphism information content (PIC), which was defined as PIC where P i and P j is the frequency of the i and j allele, respectively, and n is the allele number. InDel loci with PIC ≥0.6 were retained for primer design.

Designing unique InDel primers and validation of PCR amplification
BatchPrimer3 (https://wheat.pw.usda.gov/demos/ BatchPrimer3/) [41] was used to design InDel primers following the conditions described in a previous study [25]. Specifically, the InDel primers were designed to have the following characters: primer size, 18-27 bp with an optimum length of 20 bp; primer melting temperature (Tm), 57.0-63.0°C with an optimum temperature of 60°C; product size, 100-500 bp with an optimum size of 250 bp; and primer GC content, 40-60% with an optimum GC content of 50%. All the designed primer pairs were anchored back onto the 'Dali-11' reference genome. Primer pairs were defined as unique if both the forward and reverse primers were uniquely aligned to the reference genome.
The PCR assay was conducted in a total reaction volume of 20 μL containing 20 ng of genomic DNA, 100 μM dNTPs (Eastwin, Guangzhou, China), 0.1 μM of each forward and reverse primer, 0.5 U Taq DNA polymerase (Eastwin, Guangzhou, China), 2.0 μL of 10 × Taq buffer, and 2.0 mM MgCl 2 . PCR amplification was conducted under the following conditions: initial denaturation of 5 mins at 94°C; followed by 25 cycles of 30 s at 94°C, 30 s at 60°C, and 1 min at 72°C; and a final extension of 5 mins at 72°C. Then 2-4 μL of the amplified products were used for electrophoresis, which was run on a 6% polyacrylamide gel.

Genetic map construction
JoinMap 4.0 software [42] was used to construct the genetic map. The independence logarithm of the odds (LOD) score was set to a threshold range of 3.0 to 10.0. A regression analysis with Kosambi's function was used to estimate genetic distances. The genetic and physical maps were drawn using MapChart version 2.2 software [43].