Genetic analysis of 42 Y-STR loci in Han and Manchu populations from the three northeastern provinces in China

Background Y-STR polymorphisms are useful in tracing genealogy and understanding human origins and migration history. This study aimed to fill a knowledge gap in the genetic diversity, structure, and haplogroup distribution of the Han and Manchu populations from the three northeastern provinces in China (Liaoning, Jilin, and Heilongjiang). Methods A total of 1,048 blood samples were collected from unrelated males residing in Dalian. Genotyping was performed using the AGCU Y37 + 5 Amplification Kit, and the genotype data were analyzed to determine allele and haplotype frequencies, genetic and haplotype diversity, discrimination capacity, and haplotype match probability. Population pairwise genetic distances (Fst) were calculated to compare the genetic relationships among Han and Manchu populations from Northeast China and other 23 populations using 27 Yfiler Plus loci set. Multi-dimensional scaling and phylogenetic analysis were employed to visualize the genetic relationships among the 27 populations. Moreover, haplogroups were predicted based on 27 Yfiler Plus loci set. Results The Han populations from Northeast China exhibited genetic affinities with both Han populations from the Central Plain and the Sichuan Qiang population, despite considerable geographical distances. Conversely, the Manchu population displayed a relatively large genetic distance from other populations. The haplogroup analysis revealed the prevalence of haplogroups E1b1b, O1b, O2, and Q in the studied populations, with variations observed among different ethnic groups. Conclusion The study contributes to our understanding of genetic diversity and history of the Han and Manchu populations in Northeast China, the genetic relationships between populations, and the intricate processes of migration, intermarriage, and cultural integration that have shaped the region’s genetic landscape. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09636-3.


Background
Northeast China usually refers to the three northeastern provinces in China including Liaoning, Jilin, and Heilongjiang.As one of the major port cities of Northeast China, Dalian has attracted a considerable population from the three northeastern provinces due to its advanced economic development and plentiful job opportunities [1] (Fig. 1).At present, the Han ethnic group constitutes the largest segment of Dalian's population at 84%, followed by the Manchu group at 13% [2].
The Han Chinese was a small minority in Northeast China.However, in the late 19th century, the Qing government encouraged the development of Northeast China to prevent the Russian invasion, which led to a massive influx of Han people from the Central Plain (Henan, Shandong and Hebei) to Northeast China for seeking new economic opportunities [3] (Fig. 1).Over several centuries, more than 20 million Han people have migrated to Northeast China, with the vast majority (up to 90%) coming from Shandong and Henan provinces.Consequently, the majority of the current Han inhabitants of Northeast China can trace their roots back to the Central Plains [4].
Today, about 97% of the Manchu population resides in Northeast China, Inner Mongolia, and Beijing, with the largest concentration in Liaoning Province, the birthplace of the Manchu people [5].Despite this, the Manchu language is on the verge of extinction, as many regional Manchus have lost the ability to speak it [6].
Y-chromosomal short tandem repeats (Y-STRs) are widely employed in forensic DNA analysis, serving as a valuable tool in defining paternal lineages in situations such as identifying male suspects in sexual assault cases [7].Additionally, Y-STR polymorphisms are useful in tracing genealogy and understanding human origins and migration history [8].Haplogroups are branches on the tree of early human migrations and genetic evolution.They cluster individuals who share a common ancestor and possess specific genetic mutations, which contributes to our understanding of human migration and evolution [9].
This study aimed to fill a knowledge gap in the genetic diversity, structure, and haplogroup distribution of the Han and Manchu populations in Northeast China.We genotyped 1048 males residing in Dalian city using a 42 Y-STR system and investigated the genetic diversity and structure of this population.Among them, the genetic relationships of Han and Manchu populations from Northeast China were compared with various ethnic groups in different regions and countries based on the 27 Y-STR system.Next, we explored the Y-STR haplogroup distribution of Northeast Han and Man populations and several other populations throughout China.

Forensic characteristics
A total of 1048 unrelated males in Dalian were genotyped using AGCU Y37 + 5 Amplification Kit, which includes 27 Yfiler Plus loci.In order to estimate the genetic diversity for Y-STRs in 958 Han individuals (818 from Northeast China, 140 from other regions) and 68 Manchu individuals in Dalian (Table S1 and S2), allele frequency and forensic parameters including genetic diversity (GD) and discrimination capacity (DC) were calculated for each locus incorporated in the AGCU Y37 + 5 panel (Table S3 and S4).The data for 42 Y-STR loci can also be accessed in the Y-Chromosome STR Haplotype Reference Database (YHRD) with accession number YA006025.For both Han and Man populations in Dalian, the GD of all the loci was above 0.4 except DYS645 and 5 indel loci (Fig. 2).Three multi-locus Y-STRs including DYS527, DYS385, and DYF387S1 showed the highest GD values of more than 0.9.
To assess the Y-STR haplotype DC, haplotype diversity (HD), and haplotype match probability (HMP) of 958 Han individuals in Dalian, haplotype information was analyzed in different combinations of Y-STR loci, including minimal nine loci, PowerPlex Y12 loci, 17 Yfiler loci, PowerPlex Y23 loci, 27 Yfiler Plus loci and 29 Ymax loci (Table S5).As more Y-STR loci were included in the analysis, the percentage of unique haplotypes, HD, and DC increased, which means that the individuals had more diverse haplotypes.At the same time, HMP decreased, indicating a reduced likelihood of two individuals having matching haplotypes.Therefore, incorporating an increased number of Y-STR loci in genetic analysis enhances the precision and dependability of discerning genetic associations among individuals.However, the 27 Yfiler Plus loci set and the 29 Ymax loci set were shown to yield identical haplotype numbers, GD, HD, and HMP values, corroborating findings from previous studies [10,11].This indicates that beyond a certain threshold, the addition of extra loci may not significantly augment the accuracy of genetic examinations, especially when dealing with finite sample size.

Genetic structure and population comparisons
This population-specific analysis was focused on specific groups of unrelated male individuals born in Northeast China.These groups include the Han population from Liaoning (n = 571), Jilin (n = 89), Heilongjiang (n = 158), and the Manchu population from Liaoning (n = 62) (Table S1 and S2).Pairwise genetic distances (F st ) and corresponding p-values were calculated among these four groups and the other 23 populations, utilizing a 27 Yfiler Plus loci set (Table S6 and S7 distance (F st = 0.15996).The heatmap provides a visual representation of the degree of genetic differentiation between populations (Fig. 3).From this, it is apparent that the Kyrgyz population from Xinjiang, the Tibetan population from the Northwest show the most significant genetic differences compared to the other 25 populations.
Among the Han population from the three northeast provinces in China, the Jilin subgroup exhibited a relatively smaller genetic distance with Liaoning (F st = 0.00167) and Heilongjiang subgroups (F st = 0.00119).Conversely, the genetic distance between Liaoning and Heilongjiang subgroups was comparatively larger (F st = 0.00306).Within the broader scope of China, the Northeast Han populations show the highest genetic affinity with Shandong Han (0.00080 ≤ F st ≤ 0.00113), Henan Han (0.00137 ≤ F st ≤ 0.00266), and Sichuan Qiang (0.00188 ≤ F st ≤ 0.00414) populations.In particular, the Shandong Han population displays a closer genetic relationship to the Han populations from Jilin (F st = 0.00080) and Heilongjiang (F st = 0.00082) than to those from Liaoning (F st = 0.00113).Conversely, the Henan Han population exhibits a more pronounced genetic similarity to the Liaoning Han population (F st = 0.00137), as compared to the Jilin Han (F st = 0.00236) and the Heilongjiang Han (F st = 0.00266) populations.These findings suggest that during the migrations of the Han populations from the Central Plain to Northeast China, a larger number of individuals from Shandong chose to relocate to Jilin and Heilongjiang, whereas a considerable portion of individuals from Henan decided to settle in Liaoning.
Additionally, the Sichuan Qiang population demonstrates a genetic affinity with the Han populations from Liaoning (F st = 0.00199), Jilin (F st = 0.00188), Shandong (F st = 0.00214), Henan (F st = 0.00263), and Chongqing (F st = 0.00150).Notably, Sichuan is geographically distant from all these regions, except for Chongqing.Although several studies have uncovered evidence of a close genetic relationship between the Qiang and Han groups, none have explored this genetic relationship considering potential gene flow and geographic location [12][13][14].In this study, we found that the Sichuan Qiang population displays closer genetic proximity to the Han population from Northeast China (Liaoning and Jilin) than to those from the Central Plains (Shandong and Henan), despite Fig. 3 Genetic differentiation degree between 27 populations based on analysis of 27 Yfiler Plus loci the Northeast Han population's migration from the Central Plains.Such a pattern suggests the possibility of substantial gene flow or migration between these populations in the past.However, the underlying cause of this particular genetic proximity remains uncertain due to the absence of corresponding historical records.
According to historical records, the Qiang people are a minority ethnic group in China who traditionally lived in the mountainous regions of western China, including the areas around the Min River in Sichuan province for thousands of years [15,16].In the 17th century, Han people began to move into the Qiang's territory, primarily for economic reasons such as farming and trade.This led to increased interaction between the Han and Qiang people, and over time, intermarriage between the two groups became more common.As a result, numerous Qiang people today have some Han ancestry [16].Intriguingly, the ethnic identity of the Han Chinese population in China is typically primarily inherited through the patrilineal lineage, which is reflected in the Y-chromosome DNA.However, based on the result of this study, it appears that many Han males who intermarry with Qiang individuals or their descendants tend to adopt Qiang cultural identity over their own Han cultural identity.Accordingly, in the process of multi-ethnic integration, people's identification with ethnicity is not strictly based on paternal lineage, whereas it is more of a sociological concept than a matter of racial bloodline.
Furthermore, our study includes a Manchu population, about 91% of which is from Liaoning.Even though Han and Manchu people cohabit in Liaoning, they display a relatively substantial genetic difference (F st = 0.00657).The genetic distance between the Manchu and Han populations increases as one moves northwards from Liaoning towards Heilongjiang.The Manchu population presents a relatively large genetic distance from all the other 26 populations examined in the study (F st > 0.00500).The genetic differences between the Han and Manchu populations in Liaoning were also reported in the studies conducted by Adnan et al. [17] and He and Guo [6], albeit with a smaller set of Y-STR loci.Theoretically, the commencement of mutual integration between the Han and Manchu groups should have taken place in the 1600s during the Qing dynasty, when the Manchu people ascended to the ruling class of China.However, to maintain the purity and nobility of their ethnic bloodline as well as preserve their traditional language and customs, the Manchu rulers implemented a policy that barred intermarriage between the Manchu and Han peoples [18].This policy, in effect, impeded the integration process between these two ethnic groups.
Pairwise genetic distances between 27 populations estimated by 27 Yfiler Plus set were visualized in the Multidimensional scaling (MDS) plot (Fig. 4).The MDS plot reinforces the genetic affinity between the Han populations from the Northeast China and Central Plain, and the Sichuan Qiang population.These groups also demonstrate genetic proximity to the Guizhou Miao population as well as the Han populations from Chongqing, Jiangsu, Anhui, Guangdong, Hainan, and Shaanxi.Guo [19] and Adnan et al. [20] also demonstrated the genetic affinity between the Han populations of Liaoning, Shandong, Fig. 4 MDS plot between 27 populations based on pairwise Fst values using 27 Yfiler Plus set and Anhui.However, the Sichuan Qiang and Guizhou Miao populations were not included in their studies.Conversely, despite their geographical location within China's territory, the Northwest Tibetan and Xinjiang Kyrgyz populations appear genetically distinct, indicating their isolation from the aforementioned cluster.
Subsequently, we constructed a neighbor-joining (N-J) phylogenetic tree based on the F st values to further verify the genetic relationships among the 27 populations (Fig. 5).Generally, the phylogenetic structures are in line with the MDS analysis outcomes

Y-chromosome haplogroup prediction
A total of 818 Northeast Han (Liaoning Han, n = 571; Jilin Han, n = 89; Heilongjiang Han, n = 158) and 68 Manchu individuals as well as 13 other Chinese populations were assigned to haplogroups based on 27 Yfiler Plus loci using the Whit-Athey algorithm with the Haplogroup Predictor tool.Due to changes in some haplogroup names since 2009, when the version of the Haplogroup Predictor tool was available, the haplogroup names have been updated to ISOGG-2020 names in this study.The detailed changes in haplogroup names and the fitness probability for 17 populations are presented in Table S8.
Our findings slightly differ from previous studies, which have indicated that haplogroups O1b and O2 are the most prevalent haplogroups among East Asians [22][23][24].This discrepancy may result from the use of different marker combinations or haplogroup prediction models.Deng et al. [22] investigated Chinese Y haplogroups using nested cladistic analysis based on non-recombining portions of the Y-chromosome (NRY) SNPs, while Fan et al. [23] employed a k-nearest neighbor predictor based on 23 common Y-STR loci.Wang et al. [24] utilized Whit Athey's Haplogroup Predictor based on 17 Y-STRs.In contrast, our study used 27 Yfiler Plus loci as markers submitted to Whit Athey's Haplogroup Predictor.
Additionally, in our study, the Kyrgyz population displays a higher prevalence of haplogroup O2 (64%) and a significantly lower percentage of E1b1b (4%) compared to other Chinese populations.This observation implies that the Kyrgyz people may have a unique genetic history that sets them apart from other populations in China.Furthermore, six haplogroups (I1, I2a1a1a, J2a1a1a2b2, J2a1a, J2b, and N) were not observed in the Manchu population, in contrast to other Chinese populations.This may be indicative of limited intermarriage between the Manchu population and other groups.However, it is important to consider that the small sample size of the Manchu population could lead to less precise estimates of the population's haplogroup diversity.Furthermore, our study is limited by the absence of sequencing for single nucleotide polymorphisms, which may offer a more accurate approach to determine haplogroups.

Conclusion
In summary, this study investigated the genetic diversity, structure, and haplogroup distribution of the Han and Manchu populations from Northeast China.The results revealed several key findings.First, the Han populations from Northeast China exhibited genetic affinities with Han populations from the Central Plain and Qiang population from Sichuan, indicating historical migrations, substantial gene flow, and intermarriage among these groups.Second, the Manchu population displayed distinct genetic characteristics and maintained a relatively large genetic distance from other populations, possibly due to historical policies and limited intermarriage with the Han population.Finally, the haplogroup distribution analysis revealed the prevalence of haplogroups E1b1b, O1b, O2, and Q in the studied populations, with variations observed among different ethnic groups.Overall, this research contributes to our understanding of the genetic diversity and history of the Han and Manchu populations from Northeast China.The findings shed light on the genetic relationships among these populations and provide insights into the complex processes of migration, intermarriage, and cultural integration that have shaped the genetic landscape of the region.

Sample preparation
A total of 1048 blood samples from unrelated Chinese males were collected from Dalian, Liaoning Province, China.Their ethnic groups and birthplaces were confirmed through their ID cards and self-declarations (Liaoning Han, n = 571; Jilin Han, n = 89; Heilongjiang Han, n = 158; Northeast China Manchu, n = 68; Others, n = 162).All participants signed informed consent.All the experimental procedures were performed following the standards of the Declaration of Helsinki.This study was approved by the Medical Ethics Committee of Dalian Blood Center (Approval Number: LL-2301).

Statistical analyses
Allele and haplotype frequencies, GD, and HD were computed by GenAlEx 6.5 software.DC and HMP were calculated following the given formulae (Table S3, S4, and S5).Based on 27 Yfiler Plus loci, population pairwise genetic distance (F st ) and p values were estimated by analysis of molecular variance (AMOVA) using Arlequin v3.5 software.This allowed us to compare the genetic distances between the 4 populations under study and 23 other populations reported in the YHRD.Then a heatmap was generated using R-4.2.2 program (R Core Team), where each cell is colored corresponding to the magnitude of the pairwise F st values.MDS was also employed to create a reduced dimensionality spatial representation of populations based on F st values using 'smacof ' , 'ggplot2' and 'ggrepel'packages by R-4.2.2 program (R Core Team).Additionally, phylogenetic relationships among 40 populations were depicted by an N-J phylogenetic tree using 'ggplot2' package by R-4.2.2 program (R Core Team).Furthermore, based on 27 Yfiler Plus loci, the haplogroups of 17 populations in China were identified with the 27-Haplogroup Program in Whit Athey's Haplogroup Predictor (http://www.hprg.com/hapest5/).Then Tableau 2021.1 was employed to visualize the distribution of haplogroups across various populations.

Fig. 1
Fig. 1 Geographical locations of populations from Northeast China and other Chinese populations included in this study.The cyan and red arrows indicate the migration of Han populations.Dalian, marked as a red pentagram, is located in geographic coordinates 38°43′~40°12′n, 120°58′~123°31′e

Fig. 2
Fig. 2 Genetic diversity (GD) across 40 loci.(a) Han population in Dalian (n = 958); (b) Man population in Dalian (n = 68) . The majority of the Chinese populations formed a single large cluster, except the Northwest Tibetan, Xinjiang Kyrgyz and Inner Mongolia Duar populations.Several geographically proximate populations formed tight clusters on the tree, including Sichuan Qiang and Chongqing Han, Henan Han and Shandong Han, Heilongjiang Han and Jilin Han, as well as Jiangsu Han and Anhui Han.Interestingly, the Liaoning Han population initially clustered with the Henan Han and Shandong Han populations, then with the Chongqing Han, Sichuan Qiang, and Guizhou Miao populations, and finally with the Jilin Han and Heilongjiang Han populations.This pattern suggests a certain degree of phylogenetic isolation of the Liaoning Han from the Jilin and Heilongjiang Han populations.This divergence may be attributed to a higher level of population movement in Liaoning, potentially driven by favorable economic opportunities.It is worth noting that the Northeast Manchu and Shaanxi Han populations form a tight cluster, despite a relatively large genetic distance (F st = 0.00580).This may be attributed to specific historical intermarriage or migration patterns.Nevertheless, the small sample size of the Northeast Manchu population may lead to misleading results.

Fig. 5
Fig. 5 The neighbor-joining phylogenetic tree of 27 populations based on pairwise Fst values using 27 Yfiler Plus set.Height: degree of genetic divergence between populations

Fig. 6
Fig. 6 Haplogroup distribution in the 17 geographical populations.Haplogroups with a fitness probability less than 1% are not included in the figure.Different colors represent the fitness probability of each haplogroup."*" indicates changes in haplogroup names since 2009.In this study, the predominant haplogroup in the populations, except for the Tibetan and Kyrgyz groups, is E1b1b.Haplogroup E1b1b traces its origin back to East Africa before becoming widespread in Eurasia[21].We found that Guangxi Zhuang and Hainan Han populations notably display a higher percentage of E1b1b, at 39% and 44% respectively, in contrast to the Guangdong Han population, which has a lower percentage of E1b1b at 26%.Among the Northwest Tibetan and Xinjiang Kyrgyz populations, the presence of E1b1b is less common, with percentages of 20% and 4%, respectively.The next most prevalent haplogroups in all Chinese populations, except for the Kyrgyz group, are O1b, O2, or Q