Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system
BMC Genomics volume 16, Article number: 61 (2015)
The giant panda (Ailuropoda melanoleuca) is a critically endangered species endemic to China. Microsatellites have been preferred as the most popular molecular markers and proven effective in estimating population size, paternity test, genetic diversity for the critically endangered species. The availability of the giant panda complete genome sequences provided the opportunity to carry out genome-wide scans for all types of microsatellites markers, which now opens the way for the analysis and development of microsatellites in giant panda.
By screening the whole genome sequence of giant panda in silico mining, we identified microsatellites in the genome of giant panda and analyzed their frequency and distribution in different genomic regions. Based on our search criteria, a repertoire of 855,058 SSRs was detected, with mono-nucleotides being the most abundant. SSRs were found in all genomic regions and were more abundant in non-coding regions than coding regions. A total of 160 primer pairs were designed to screen for polymorphic microsatellites using the selected tetranucleotide microsatellite sequences. The 51 novel polymorphic tetranucleotide microsatellite loci were discovered based on genotyping blood DNA from 22 captive giant pandas in this study. Finally, a total of 15 markers, which showed good polymorphism, stability, and repetition in faecal samples, were used to establish the novel microsatellite marker system for giant panda. Meanwhile, a genotyping database for Chengdu captive giant pandas (n = 57) were set up using this standardized system. What’s more, a universal individual identification method was established and the genetic diversity were analysed in this study as the applications of this marker system.
The microsatellite abundance and diversity were characterized in giant panda genomes. A total of 154,677 tetranucleotide microsatellites were identified and 15 of them were discovered as the polymorphic and stable loci. The individual identification method and the genetic diversity analysis method in this study provided adequate material for the future study of giant panda.
The giant panda (Ailuropoda melanoleuca), a global icon of biodiversity conservation, is threatened by human population expansion and current habitat loss and is often cited as one of the most endangered species in the world [1,2]. The international society and the Chinese government have made great efforts to protect this precious species in recent years. However, some urgent problems are still left unsolved.
Currently, the conservation strategy for the giant panda includes both captive pandas and wild pandas. Until 2013, the captive population size had grown to 376 individuals, more than 200 of which were living in Chengdu Research Base of Giant Panda Breeding (Chengdu, China) and China Research and Conservation Center for the Giant Panda (Wolong, China). The paternity of panda offspring bred in captivity is uncertain due to the breeding pattern in which a female in estrous is artificially inseminated with the sperm from multiple males. As a result, it has been impossible to maintain an accurate studbook; therefore, an accurate paternity assignment method needs to be established for the captive population. In addition, the design of conservation strategies for the wild panda population is also limited by the lack of information on the population’s genetics. Although microsatellite loci analyses [3-8] have been used to assess the genetic variability and evaluate the population size for giant pandas, the genetic status of the giant panda is still matters of significant controversy. For example, some researchers assumed that wild populations might have low genetic variability [9-11], while Lu et al.  and Zhang et al.  concluded that wild populations might maintain high genetic variation. However, it is difficult to make comparisons between the different results due to the different microsatellites they use, which is really confused the conservator in designing effective conservation strategies. Therefore, a universal genetic marker system, which is powerful and repeatability, will be convenient for different researchers to make comparison. Although nearly 100 microsatellite markers have already been developed for the giant panda [3,8,12-16], most of them are dinucleotide repeats. Dinucleotide microsatellite is easily subject to mistyping due to polymerase slippage during polymerase chain reaction (PCR) [17,18]. This problem is especially acute when template DNA is of low quality or concentration, as with faecal samples or degraded tissue samples [9,19,20]. The high quality samples are very difficult to obtain from wild. Schlotterer and Tautz  also found the generation of false alleles from polymerase slippage is greatest with di-, less with tri-, and does not occur with tetranucleotide loci. In general, tetranucleotide repeats tend to stutter less than the trinucleotide and dinucleotide repeats and are much more accurate and reliable [22,23], which also has become the marker of preferred choice and be widely used in paternity test kits for people [24,25]. Disappointingly, only 15 markers with single motif of (GATA)n were tetranucleotide repeats and nearly no one were used in the wild genetic studies. It was because most of them were unavailable when using the non-invasive samples. In this study, we focused on developing microsatellites with high levels of polymorphism, strong stability, good repeatability, and very low genotyping error rate, which would be widely used in the giant panda studies. Therefore, we concentrated on the tetranucleotide microsatellites to establish a universal genetic marker system.
Classically, microsatellite development requires substantial technical effort to construct enriched microsatellite libraries, including cloning, hybridization to detect positive clones, plasmid isolation, and Sanger sequencing . Most of these steps are either expensive, time-consuming, or both. Moreover, traditional enrichment-based approaches for isolating microsatellite loci require a priori choices about what types of microsatellite loci to target (both repeat size, and repeat motif sequence) which will ultimately lead to limited success in obtaining sufficient numbers of different types of useful microsatellite loci . Fortunately, the availability of the giant panda complete genome sequences  provided the opportunity to carry out genome-wide scans for all types of microsatellites markers, which is much cheaper, more efficient and more successful than the previous methods. Consequently, a diversity of repeat motif types of microsatellites can be identified so as to establish a universal genetic marker system for giant panda.
Here we employ a method that allows the rapid and efficient development of microsatellite markers for giant panda by screening its whole genome sequence in silico mining. A large number of different kinds of repeat motif types of perfect microsatellite sequences were discovered. Moreover, the frequency and distribution of these microsatellites in different genomic regions were analyzed and an integrative database of tetranucleotide microsatellite markers was developed. The 51 novel polymorphic tetranucleotide microsatellite loci screened from the database were further used to establish the universal genetic marker system for giant pandas with faecal samples. Furthermore, a universal individual identification method was established, which is particularly effective in assessing the population size for wild giant pandas. We also analyzed the genetic diversity of Chengdu captive giant panda population.
SSR frequency and distribution in the giant panda genome
A total of 855,058 SSRs were identified in the giant panda genome assembly (Table 1). The relative abundance was 372 SSRs/Mb. Mono-nucleotides were the most abundant category, accounting for 48.56% of all of the SSRs, followed by di-nucleotides (26.17%) and tetra-nucleotides (18.09%). In contrast, tri-nucleotides and penta-nucleotides were less abundant.
Among all the mono-nucleotide repeats, (A)n was the most abundant while (C)n was comparatively scarce (Table 2). In the di-nucleotide repeat category, (AC)n and (AG)n were the two most frequent microsatellite motifs. Over 50% of the trinucleotide type were (AAC)n and (AAT)n in the panda genome. The most abundant tetra- and penta-nucleotide motifs were (AAAT)n and (AAACA)n, which comprised about 42.03% and 31.76% of the total number of microsatellites of these two repeat category, respectively. (AAACAA)n was the most frequent hexa-nucleotide motifs. A-rich occurred in nearly all the most frequent motifs of microsatellites.
Densities of SSRs and relative abundances of the different microsatellite length classes (i.e., mono-, di-, tri- up to hexa-nucleotides) across the different regions of the giant panda genome are presented in Table 3. SSRs were more abundant in intergenic regions (413,585 SSRs) than in introns (270,247 SSRs) and TEs (243,474 SSRs). More than 40% of different length categories of microsatellites were distributed in intergenic regions. For the microsatellites in CDSs, over 80% of them were tri- and hexa-nucleotides. Rare penta-nucleotides and hexa-nucleotides were found in 5′UTRs, CDSs or 3′UTRs.
The 13 most abundant microsatellite classes were An, Cn, (AG)n, (AC)n, (AT)n, (AAC)n, (AAT)n, (AAAT)n, (AAAG)n, (AAAC)n, (AAGG)n, (AGAT)n, (AAACA)n, (Figure 1). Together, they comprised 90.1% of all microsatellites identified. For the tetra-nucleotides, the number distributions of each repeat motif were summarized in Figure 2 and comprised 97.42% of all of the tetra-nucleotides microsatellites identified in giant panda genome.
Development of microsatellite markers
There were 154,677 tetranucleotide microsatellites sequences identified in the giant panda genome. Following the selection criteria, a total of 3,280 ‘potentially amplifiable loci’ with a repeat number in the range of 10 to 22 were isolated. A total of 336 candidate sequences, which were suitable for primer design (i.e., the flanking sequences should long enough and not be single-copy sequences), were chosen to develop an integrative database of tetranucleotide microsatellite markers for the giant panda. We designed and synthesized 160 pairs of primer for amplification which targeted as many of the SSR motifs as possible (see Additional file 1: Table S1). After amplification, the 61 loci that all showed a single band of expected size were further considered. All the forward primers of these 61 loci were labelled with different fluorescent dyes and then used to genotype 22 captive giant pandas (Chengdu, blood DNA). Loci that failed to provide clear signals in the expected size range or that lacked polymorphism were not considered further. Finally, 54 novel tetranucleotide microsatellites loci were discovered for the giant panda (see Additional file 2: Table S2).
The sequence raw data of these 54 loci were added to NCBI (GenBank accession numbers KF907130–KF907183). Additional file 2: Table S2 presented the characteristics of the 54 tetranucleotide microsatellites discovered in this study. A total of 246 alleles were identified and the number of alleles per locus ranged from 2 to 10. The observed and expected heterozygosities at each locus ranged from 0.091 to 0.909 and from 0.089 to 0.873, respectively (see Additional file 2: Table S2). The PIC ranged from 0.083 to 0.836 with an average of 0.533. We used the Micro-Checker software  to estimate the presence of genotyping errors such as null alleles, large allele dropouts or stuttering. There was no evidence for large allele dropouts or null alleles in the data set, and all the loci were neutral, which indicated that the data were sufficient for further analysis. The failure of loci to meet HWE will have an effect on population genetic analysis. HWE test in this study indicated that 3 out of the newly discovered 54 loci deviated significantly from HWE (P < 0.01, see Additional file 2: Table S2) and, therefore, should be discarded. LD also influences population genetic analysis. However, it is not clear whether the 54 new tetranucleotide microsatellites loci are on different chromosomes due to the fact that the genome sequences of the giant panda were assembled into scaffolds but not annotated to different chromosomes. Loci located in different scaffolds were the first choice in order to reduce the influence of LD to a low level (see Additional file 3: Table S3).
A universal genetic marker system based on microsatellites
The remaining 51 novel polymorphic tetranucleotide microsatellite loci were further tested to establish the universal genetic marker systems for the giant panda. Considering that the system will be used for the wild giant panda, these standard loci must be applied to non-invasive samples. Therefore, 30 faecal DNA samples from captive giant pandas (Chengdu) were used to test the sensitivity and quality of the 51 loci. However, the amplification success rates of 9 loci (GPL-10, GPL-21, GPL-26, GPL-37, GPL-58, gpz-11, gpz-25, gpz-3, gpz-40) were less than 50% faecal, which means that these loci showed a lack of responsiveness to faeces and, therefore, were not suitable for faecal DNA analysis. Another 10 loci (GPL-1, GPL-7, GPL11, gpz-50, GPL-75, gpz-36, gpz-26, gpz-48, gpz-50, gpz-55) were rejected due to a lack of polymorphism. What’s more, repeat tests indicated that multiple amplification or false amplification existed in 16 loci. The genotypes of the remaining 16 loci were compared between faecal DNA and blood DNA (n = 15) for a further stability analysis. Fortunately, all loci except one (GPL-12) were confirmed stable and reliable because no difference was found in genotypes of the 15 pairs of matched samples, which indicated there is no genotyping error for the remaining 15 loci. The test about the relationship between exposure time of faecal samples and the stability of the loci showed that these loci can be used to the faecal samples with even five weeks exposure to the wild environment. Especially, the loci gpz-47 and gpz-06, which can be most easily amplified in the PCR, were the most stability and responsive loci in the 15 markers. The 15 markers (Table 4), which showed high levels of polymorphism, strong stability, and good repeatability, were used to genotype the rest of the 27 faecal samples and to build a data base for Chengdu captive giant pandas (n = 57).
The establishment of a universal individual identification method
As one of the applications of this microsatellite system, a universal individual identification method was established by the present study. We know that loci which are higher in expected heterozygosity (He) are more useful for individual identification. The 15 loci selected range in He a high of 0.819 to 0.380 (Table 4) (the first two were the most stability and responsive loci in this study). The number of microsatellite markers used for individual identification is extremely important because it has consequences for all subsequent analyses [17,29]. Too many markers can increase genotyping errors, false genotypes, and overestimations of population sizes ; while too few or insufficient markers would lead to underestimations . Therefore, for the purpose of individual identification, the question is how many of the loci should be used? The earlier measure of probability of identity (PID) developed by Waits et al.  was preferred for individual identification; however, PIDsib (estimating PID among sibs) is a more conservative minimum number of loci necessary to distinguish individuals with PIDsib value <0.01 . In order to determine the minimum number of loci required for accurate individual identification of giant pandas, besides the most stability and responsive two loci (gpz-47 and gpz-06), we first investigated how the PIDsib values for the giant panda samples change as the number of loci are increased according to the He of other 13 loci. Using GIMLET program , we calculated PID(sibs) curves based on 10 loci for the captive populations using different kinds of samples (Chengdu blood = 22, Chengdu faecal = 57). The PID(sibs) curves of the first two loci were gpz-47 and gpz-06, then we added the loci set with the most informative loci one by one. It revealed that the subset of six loci was enough for accurate individual identification (PIDsib < 0.01) (the first six loci in Table 4) (Figure 3).
Individual identification simulations were conducted with the six loci using the CERVUS 3.0 software . The result indicated that the set of six loci is effective for individual identification in the 57 captive giant panda. Furthermore, an individual identification test was first conducted using 13 captive panda faecal samples without known information. The genotype result indicated that the 13 faeces came from 11 giant pandas, which was in agreement with the record (see Additional file 4: Table S4). What’s more, these 60 faecal samples from wild giant pandas in the Wolong Nature Conservation Centre were identified as 22 unique individuals, which revealed that the method is also effective in wild faecal samples (see Additional file 5: Table S5).
Genetic diversity of Chengdu captive population
The genetic diversity of Chengdu captive giant panda population was analyzed using 15 tetranucleotide microsatellite loci. As showed in the Table 4, a total of 70 alleles were identified among the 57 giant pandas. The number of alleles per locus ranged from three (GPL-28, GPL-44, GPL-31, gpy-20) to a maximum of ten (gpz-20). Allelic richness (AR) at each locus ranged from 2.995 to 10.000 alleles. The mean AR (mean AR = 4.660) in this study is much higher than that of Shen et al. (mean AR = 3.957). The observed and expected heterozygosity ranged from 0.382 to 0.849 and from 0.380 to 0.819, respectively. Mean Ho and He were 0.615 and 0.598, respectively, which were both a little lower than that of Shen et al. (2009) (Ho = 0.671, He = 0.634). A large variation in heterozygosity was observed in different loci. The mean polymorphic information content (PIC) was 0.541 (ranging from 0.362 to 0.783) in this captive population. HWE tests revealed that none of the loci deviated from HWE in this captive populations (P > 0.01).
Genome-wide distribution and organization of microsatellites in the giant panda
In this study, we characterized the SSRs in the entire genome sequencing assembly of giant panda and analyzed their frequency and distribution in different genomic regions. Most of the SSRs are mono-, di- and tri-nucleotides, accounting for up to 75% of all of the SSRs identified. The distribution of microsatellites in the giant panda was agreement with Li et al. , which reported that di-nucleotides are the most common microsatellites in many organisms without taking into account of mononucleotide repeats. In most genomes, motifs with short repeated units (mono- to tri-nucleotides) were more abundant than long repeated units, indicating that longer repeats correlate with higher instability .
Moreover, SSRs identified in the different regions provided useful information about possible physical linkage between microsatellite loci. The highest SSR relative abundance was found in intergenic regions, followed by introns. The findings in this study are in agreement with the prior studies that the majority of SSRs are embedded in non-coding DNA, either in the intergenic sequences or introns . Although the relative abundance of SSR in exons was lowest, there was a propensity of tri- and hexa-nucleotides in exons, which was consistent with Labbe et al.  and Qian et al. . Such a propensity may be to suppress the other categories of SSRs, thus reducing the incidence of frameshift mutations in coding regions caused by nontriplet repeats [39,40].
The genome-wide distribution and organization of SSR highlighted a non-random distribution of these repeats which may be involved in the genome plasticity. The wealth motifs of genome-wide SSR markers identified in the present study now opens new perspectives for the development of a wide range of microsatellite markers in the panda genome. Especially, these tetranucleotide microsatellite data obtained in this study will be helpful in developing SSR markers that could be applied in the establishment of a universal marker system.
The establishment of a novel microsatellite marker system
In recent years, more and more researchers have become aware of the problem of microsatellite data quality and its consequence for population analyses [20,41,42]. Highly polymorphic microsatellite markers could easily suffer from mutations, allelic dropouts, undetectable null alleles  and genotyping errors [20,43,44]. If null alleles exist in an SSR marker, an intrinsically heterozygotic individual might be misinterpreted as homozygotic, leading to inaccurate and biased genetic estimates . Except null alleles, researchers should be aware of the selective neutrality, Hardy–Weinberg equilibrium (HWE) and Linkage disequilibrium (LD) . The different parts of the genome have differences in the mutation rate and the accepted selection pressure. The microsatellites may vary in these aspects. The 15 loci we selected in this study all showed neutrality, no null alleles and no deviation from Hardy–Weinberg equilibrium (HWE), which ensured these loci were available and effective.
The instability microsatellite markers, which is easily produce error genotypes, may result in mistakes in the individual identification, paternity test, population structure and genetic diversity analysis for the different species . Especially for the DNA samples with poor quality, it may produce more error genotypes when using these instability loci. In this study, we used the in silico approach to screening the whole genome sequence of the giant panda and selected the most stability loci from the large number of tetranucleotide microsatellite sequences. We designed 160 tetranucleotide microsatellite primer pairs, in which 51 novel loci showed good polymorphic, selectively neutral, no deviation from Hardy–Weinberg equilibrium (HWE) and high stability were selected based on blood DNA samples. However, one of the great challenges in the research of giant panda is that it is extremely difficult to get the good samples. The blood collection process of captive giant pandas is very complicated and may have adverse effects on their health, which raises questions of research ethics for both the scientific community and general public. It is even harder to obtain blood and muscle samples from wild giant pandas. Non-invasive genetic sampling, where DNA is recovered from discarded sources such as shed hair and faeces , is a necessary alternative to tissue sampling of giant panda. While high concentrations and high quality of DNA from non-invasive samples will greatly reduce genotyping errors such as allele dropouts or false alleles in genetic studies [17-20], they are very difficult to obtain. In all studies in which typing errors were checked, a non-negligible error rate from 0.2% to more than 15% per locus was reported . Even higher error rates are known to occur in studies using DNA with poor quality or low concentrations [17,18], as is in the case of non-invasive genotyping. Therefore, the loci used to establish the novel microsatellite marker system must show a lower error rate and be responsive to non-invasive DNA. In the previous studies, most of the loci were screened with blood or muscle DNA but nearly never tested with non-invasive samples for responsiveness, which resulted in a large number of wild faecal samples being abandoned due to failed amplification in PCR. In this study, the screening procedures for these novel 15 high stability and repeatability loci were relatively rigorous. First, non-invasive samples were used to test the sensitivity of the inferred markers, which ensured that the loci were responsive to DNA with low quality and concentrations. In addition, the repeat tests conducted by faecal DNA guaranteed the stability and reliability of the selected loci, which reduced the probability of genotyping error at the loci level. Moreover, the relationship test between the exposure time of faecal samples and the stability of the loci indicated that these loci could be used in the wild samples with an exposure time of five weeks. Therefore, these 15 loci with high stability and repeatability will be widely and effectively used in future studies.
The application of the novel microsatellite marker system
Based on the 15 loci, we established the genotype database of the Chengdu captive giant pandas. The database displayed the size range of alleles characteristic of different loci, which facilitates the accurate identification of genotypes in future studies. This database contains the basic genetic information of microsatellites for Chengdu captive giant pandas, which can be shared with other researchers to allow broader application. Moreover, we would like to accept more genetic information of other populations of giant pandas and make much more improvements for this data. Also, it is much convenient to compare the genetic diversity of different populations and to understand the population structure using the universal genetic markers for the giant panda.
Although China has taken three national surveys to estimate the population size of wild giant panda and the millions of dollars already spent on, the number is still controversy in the researchers. Microsatellite analysis using faecal DNA has proven effective in estimating population size of elusive animals while the error genotypes in different loci may result in large deviation from the real result. Too many markers can increase genotyping errors and overestimations of population sizes  while too few or insufficient markers would leading to underestimations . Previous studies indicated that a single-locus error rate of 1% would add up to 10% using ten loci . Considering the maximum threshold of 5% of genotyping errors in population size estimation , it could be one means to minimize potential error sources by reducing the number of microsatellite markers used. In most other studies on wildlife forensics, six to ten microsatellite markers are commonly used [50-52]. In any case, the sufficient discriminating power must be contained in the minimum subset of microsatellite loci needed for accurate individual identification [31,47]. Following Waits et al. , the value of PIDsib was used as a bound to estimate the minimum number of loci necessary to distinguish between individuals. A subset of six microsatellite loci in our study was enough for accurate individual identification in giant pandas (PIDsib < 0.01) in this study. The individual identification test for faeces from captive and wild pandas were further indicated that this subset of loci is available and quite effective in making accurate individual identifications. Moreover, we would like to encourage using this method to establish a shared wild giant panda microsatellite database to facilitate and enhance further research on the giant panda. All researchers could add the data of new individuals to the database. Genetic information about this species would accumulate more rapidly, which would be more convenient for researchers based in different sites to study important ecology problems for wild giant pandas (such as population size, population dynamics, breeding behaviour, habitat use, and home range size).
Besides individual identification, genetic diversity of Chengdu captive giant panda population was also analysed as another application of the marker system. It demonstrated that these markers developed in this study were effective in genetic diversity analyses. Moreover, the mean allelic richness of the Chengdu captive population in our study was much higher than Li et al.  and Shen et al. . However, the level of heterozygosity was similar, which means that the loci developed in this study with a higher number of alleles. While, one of the aim of the conservation programs is that to conserve genetic diversity over long periods as genetic diversity is essential to ensure the conservation of the evolutionary potential which allows the population to adapt to changing environments . Therefore, monitor the genetic diversity using high quality markers in different populations are needed in order to the long-term persistence of this species.
This analysis of microsatellites in completely sequenced panda genome provides a snapshot of the differential coverage and density of 1–6 bp repeats in this species. In particular, the mono-, di- and tri-nucleotides repeats are accounting for up to nearly 75% of all of the SSRs identified. The majority of SSRs were embedded in non-coding DNA and there was a propensity of tri- and hexa-nucleotides in exons. Especially, we focused on the 154,677 tetranucleotide microsatellites because they were much more accurate and reliable than di- and tri-nucleotide microsatellites. The final 51 novel polymorphic tetranucleotide microsatellite loci were further used to establish the universal genetic marker system for giant pandas with faecal samples. The individual identification method, which is established based on these loci, is particularly effective in assessing the population size for wild giant pandas. Moreover, the effectively of this marker system in analyses the genetic diversity of one captive giant panda population will promote other population studies. Undoubtedly, the development of large sets of markers should in turn facilitate population genetic research on giant panda.
Sample collection and DNA preparation
Faeces and blood samples were collected from the Chengdu Research Base for Giant Panda Breeding (Chengdu, faeces = 57, blood = 22) and the China Research and Conservation Centre for the Giant Panda in the Wolong Nature Reserve, Sichuan Province (Wolong, faeces = 61). These animals included close relatives such as siblings, which were necessary for standardization of the final set of loci used for individual identification. Matched samples from blood and fresh faeces (n = 15) were included in the samples from Chengdu for the stability analysis of the markers. Captive faecal samples (n =13) without any prior background information were collected from Chengdu for individual identification tests. Wild giant panda faecal samples (n = 60) were collected from Wolong Nature Reserve at the beginning of 2013. In order to reduce the chance of sampling from the same individual, different samples were not collected from the same home range . In addition, all samples were GPS recorded and mapped using Arcview 3.2a.
All blood samples were obtained from yearly routine blood tests for panda health. All samples were collected in accordance with the regulations for the implementation of China on the protection of terrestrial wild animals (State Council Decree  No.13) and were approved by Wildlife Protection Office, Sichuan Provincial Forestry Departments (China). Blood and faecal samples were carefully collected to avoid contamination and preserved in EDTA Vacutainers and sterile bags, respectively. All samples were frozen at −20°C. Total genomic DNA extracted from blood and faecal samples were performed using the commercially available Qiagen DNeasy Blood & Tissue Kit and Qiagen QIAamp Stool Mini Kit respectively, according to the manufacturer’s instructions with some optimizations .
Genome sequences and SSR identification
The entire genome of the giant panda was directly downloaded from UCSC Genome Bioinformatics (http://genome.ucsc.edu/). The sequences of the gene models, introns, coding sequences (CDSs), 5′ untranslated regions (5′ UTRs), 3′ untranslated regions (3′ UTRs), transposable elements (TEs) and intergenic regions were generated according to the positions in the genome annotations. The intergenic regions referred to the genomic regions that were not included the introns, CDSs, UTRs or TEs. Genome sequences were scanned for microsatellite content using the program MSEA v2.3 (http://code.google.com/p/msdb) . Detection criteria were restricted to identify perfect SSRs (i.e., those with uninterrupted repeats and compound motifs) of 1–6 bp and a minimum repeat number of 12, 7, 5, 4, 4, and 4, for mono-, di-, tri-, tetra-, penta- and hexa-nucleotide microsatellites, respectively . Repeats with unit patterns being circular permutations and/or reverse complements were considered as one type in this study [27,57]. For example, the AGG contains AGG, GGA, GAG, CCT, CTC and TCC in different reading frames or on the complementary strand. To facilitate the comparison among different repeat categories or genomic regions, the relative abundance, which means the SSR number per Mb of the sequence analyzed, and the relative density, which means the SSR length (in bp) per Mb of the sequence analyzed, were introduced [37,58].
Development of SSR markers
The flanking regions of microsatellites (200 bp either side) were extracted from the program output in order to design the primer sets for the microsatellite loci identified. These output sequences were further manually scanned and filtered according to the criteria of microsatellite identification which are as follows: (1) repeats should be tetranucleotide repeats; (2) microsatellites should not be in published repeat sequences; (3) the number of repeats should be in the range of 10–22; (4) the flanking sequences of microsatellites should not be single-copy sequences but must be long enough to design primers (i.e., more than 20 bp). A large number of 400-500 bp sequences containing a tetranucleotide microsatellite of interest were extracted and compared with the previous published 15 tetranucleotide microsatellite sequences using the software Clustal_X 1.83 . We used Primer 3  to design the primers to amplify the selected sequences. The lengths of the primers designed in the present study were between 17 and 27 bp, with a maximum of three degenerated positions and with an expected product size between 100 and 400 bp. We then tested the primers for reproducible amplification in three giant pandas under the standard PCR conditions, with annealing temperatures altered according to the primer sequence. During optimization, we tested whether amplification was improved by the addition or decrease of MgCl2, or by a higher or lower annealing temperature.
Polymorphism microsatellite isolation
After optimization, the primers with single band of expected size in the amplification were selected to label with one of three fluorescent dyes (FAM, TAMRA or HEX) (Invitrogen Shanghai Sangon Biological Engineer Technology & Services, Shanghai, China) in the forward primers for fragment analysis on Applied Biosystems 3100 Genetic Analyzers. The blood DNA from 22 captive giant pandas was used to evaluate the ability of the primer pairs to amplify polymorphic bands. PCR amplifications were carried out in 25 μL reaction mixtures, comprising approximately 50 ng of template DNA, 1.5-2 mm MgCl2 (TaKaRa, Japan), 200 μm of each dNTP, 15pmol of each primer, and 0.3 U of Ampli Taq DNA polymerase (TaKaRa, Japan). Amplifications were performed using the following PCR procedure: an initial denaturation step for 5 min at 95°C, followed by 35 cycles of 95°C for 45 s, 30 s at locus-specific annealing temperature (55°C--65°C) and 50 s at 72°C, and a final elongation for 10 min at 72°C. For genotyping, the PCR amplification products were separated by capillary electrophoresis using a denaturing acrylamide gel matrix on an ABI PRISM 377 Genetic Analyser (Applied Biosystems) using GeneScan Tarmara 350 internal size standard (ABI). Alleles were detected using the GeneScan⁄Genotyper software package of Applied Biosystems. Markers which have a strong tendency to form stutter peaks were excluded in this step. The remaining markers were taken into consider with amplification if (a) the expected PCR products were observed for more than 90% of the 22 samples investigated and (b) the number of bands did not exceed the ploidy of any individuals sampled (diploid in the giant panda). What’s more, microsatellites fragments were considered as in expected size if their length was within ±30% of the target sequence length.
High sensibility polymorphism microsatellite
The faecal DNA from 30 captive giant panda was used to test whether the polymorphic markers showed highly responsive and could be applied to faecal DNA. In order to control the inhibiter in the faecal DNA, the bovine serum albumin (BSA) was added in the PCR mixture. The markers would be excluded if (a) the amplification success rate less is than 50% in faecal samples and (b) having many stutter peaks as in the blood samples above. Meanwhile, the ‘multi-tube procedure’  was used to test the tendency for genotyping errors in these microsatellite loci.
The stability of these microsatellites
The genotyping results of blood and faecal DNA obtained from the same panda (15 pandas in total) were compared to evaluate the reliability and the stability of these microsatellites. Moreover, in order to test whether these loci can be used to the faecal samples with a long exposure to the wild environment (time gradient: one to seven weeks), the relationship between the exposure time and the stability of the loci were tested in the present study.
The application of the novel microsatellite marker system
The markers which showed good polymorphism, repeatability and stability in both blood DNA and faecal DNA were used to genotype the rest of the 27 faecal samples and to build a database for the Chengdu captive giant pandas (n = 57). Furthermore, we established a universal individual identification method based on the novel set of genetic markers. We also analysed the genetic diversity Chengdu captive giant panda population.
Statistical and genetic data analysis
We used Micro-Checker software  to estimate the presence of genotyping errors such as null alleles, large allele dropout, or stuttering in the data set. The number of alleles (A), observed heterozygosity (Ho), expected heterozygosity (He), polymorphic information content (PIC), and the paternity test were calculated with the software of CERVUS 3.0 . Deviations from Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were tested for by using GENEPOP 3.4 . To test the discrimination power of sets with different numbers of microsatellites, the probability of pairs of individuals bearing an identical multi-locus genotype (P(ID)) was computed using GIMLET 1.3.1 . Since PIDsib is a more conservative P(ID) for full sibs, we used PIDsib as an upper limit to the probability that pairs of individuals would share the same genotype. Individual identification was analysed by CERVUS 3.0 .
DNA sequences: GenBank accessions KF907130–KF907183; see Additional file 2: Table S2 for details.
O’Brien SJ, Pan WS, Lu Z. Pandas, people and policy. Nature. 1994;369:179–180.
Ran JH, Du BB, Yue BS. Conservation of the Endangered giant panda (Ailuropoda melanoleuca) in China: successes and challenges. Oryx. 2009;43:176–178.
Lu Z, Johnson WE, Menotti-Raymond M. Patterns of genetic diversity in remaining giant panda populations. Conserv Biol. 2001;15:1596–1607.
Zhan XJ, Li M, Zhang ZJ, Goossens B, Chen Y, Wang H. Molecular censusing doubles giant panda a population estimate in a key nature reserve. Curr Biol. 2006;16:451–452.
Zhang BW, Li M, Zhang ZJ, Goossens B, Zhu L, Zhang S. Genetic viability and population history of the giant panda, putting an end to the “Evolutionary Dead End”. Mol Biol Ecol. 2007;24:1801–1810.
He W, Ling L, Shen FJ, Zhang W, Zhang Z, King E, et al. Genetic diversities of the giant panda (Ailuropoda melanoleuca) in, Wanglang and Baoxing Nature Reserves. Conserv Genet. 2008;9:1541–1546.
Shen FJ, Zhang ZH, He W, Yue BS, Zhang A, Zhang L. Microsatellite variability reveals the necessity for genetic input from wild giant pandas (Ailuropoda melanoleuca) into the captive population. Mol Ecol. 2009;18:1061–1070.
Li YZ, Xu X, Shen FJ, Zhang WP, Zhang ZH, Hou R, et al. Development of new tetranucleotide microsatellite loci and assessment of genetic variation of giant panda in the two largest giant panda captive breeding populations. J Zool. 2010;282:39–46.
Su B, Shi L, He G, Zhang A, Song Y, Zhong S, et al. Genetic diversity in the giant panda: Evidence from protein electrophoresis. Chin Sci Bull. 1994;39:1305–1309.
Zhang Y, Ryder OA, Fan Z, Zhang H, He T, He G, et al. Sequence variation and genetic diversity of giant panda. Sci Chin Ser C. 1997;40:210–216.
Fang SG, Feng WH, Zhang AJ, Chen HW, Yu JQ, He LS, et al. The research on genetic diversity of the giant panda. In: Chengdu Zoo and Chengdu Research Base for Giant Panda Breeding ed. Proceeding of the International Symposium on the Protection of the Giant Panda (Ailuropoda melanoleuca,). Chengdu, China: Sichuan Publishing House of Science and Technology; 1997. p. 141–147.
Zhang YP, Wang W, Su B. Microsatellite DNAs and kinship identification of giant panda. Zool Res. 1995;16:301–306. (In Chinese).
Shen FJ, Watts P, Zhang ZH, Zhang AJ, Sanderson S, Kemp SJ, et al. Enrichment of giant panda microsatellite markers using dynal magnet beads. Acta Genetica Sin. 2005;32:457–462. (in Chinese).
Shen FJ, Watts P, He W, Zhang ZH, Zhang AJ, Sanderson S, et al. Di-, tri- and tetranucleotide microsatellite loci for the giant panda, Ailuropoda melanoleuca. Mol Ecol Notes. 2007;7:1268–1270.
Zhang HM, Guo Y, Li DS, Wang PY, Fang SG. Sixteen novel microsatellite loci developed for the giant panda (Ailuropoda melanoleuca). Conserv Genet. 2009;10:589–592.
Wu H, Zhan XJ, Zhang ZJ, Zhu LF, Yan L, Li M, et al. Thirty-three microsatellite loci for noninvasive genetic studies of the giant panda (Ailuropoda melanoleuca). Conserv Genet. 2009;10:649–652.
Taberlet P, Waits LP, Luikart G. Noninvasive genetic sampling: Look before you leap. Trends Ecol Evol. 1999;14:323–327.
Taberlet P, Griffin S, Goossens B, Questiau S, Manceau V, Escaravage N, et al. Reliable genotyping of samples with very low DNA quantities using PCR. Nucleic Acids Res. 1996;24:3189–3194.
Morin PA, Chambers KE, Boesch C, Vigilant L. Quantitative polymerase chain reaction analysis of DNA from noninvasine samples for accurate microsatellite genotyping of wild chimpanzees (Pan troglodytes verus). Mol Ecol. 2001;10:1835–1844.
Pompanon F, Bonin A, Bellemain E, Taberlet P. Genotyping errors: causes, consequences and solutions. Nat Rev Genet. 2005;6:847–859.
Schlotterer C, Tautz D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 1992;20:211–215.
Edwards A, Civitello A, Hammond HA, Caskey CT. DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am J Hum Genet. 1991;49:746–756.
Walsh PS, Fildes NJ, Reynolds R. Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. Nucleic Acids Res. 1996;24:2807–2812.
Archie EA, Moss CJ, Alberts SC. Characterization of tetranucleotide microsatellite loci in the African savannah elephant (Loxodonta africana africana). Mol Ecol Notes. 2003;3:244–246.
Lu J, Riley R, Robertson M, Nelson L, Ward K. Tetranucleotide repeat polymorphism at D8S342, D8S323, D8S345, D8S315, and D8S347 loci on 8q. Hum Mol Genet. 1993;2:1743.
Castoe TA, Poole AW, Gu W, Jason AP, Daza JM, Smith EN, et al. Rapid identification of thousands of copperhead snake (Agkistrodon contortrix) microsatellite loci from modest amounts of 454 shotgun genome sequences. Mol Ecol Resour. 2010;10:341–347.
Li RQ, Fan W, Tian G, Zhu H, He L, Cai J. The sequence and de novo assembly of the giant panda genome. Nature. 2009;463:311–317.
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P. Software for identifying and correcting genotype errors in microsatellite data. Mol Ecol Notes. 2004;4:535–538.
Broquet T, Menard N, Petit E. Non-invasive population genetics: a review of sample source, diet, fragment length and microsatellite motif effects on amplification success and genotyping error rates. Conserv Genet. 2007;8:249–260.
Creel S, Spong G, Sands JL, Rotella J, Zeigle J, Joe L, et al. Population size estimation in Yellowstone wolves with error-prone noninvasive microsatellite genotypes. Mol Ecol. 2003;12:2003–2009.
Knapp S, Craig B, Waits L. Incorporating genotyping error into non-invasive DNA-based mark-recapture population estimates. J Wildl Manag. 2009;73:598–604.
Waits J, Leberg P. Biases associated with population estimation using molecular tagging. Anim Conserv. 2000;3:191–199.
Waits LP, Luikart G, Taberlet P. Estimating the probability of identity among genotypes in natural populations: cautions and guidelines. Mol Ecol. 2001;10:249–256.
Valière N. GIMLET: a computer program for analysing genetic individual identification data. Mol Ecol Notes. 2002;2:377–379.
Marshall TC, Slate J, Kruuk L, Pemberton JM. Statistical confidence for likelihood-based paternity inference in natural populations. Mol Ecol. 1998;7:639–655.
Li YC, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol. 2002;11:2453–2465.
Qian J, Xu HB, Song JY, Xu J, Zhu YJ, Chen SL. Genome-wide analysis of simple sequence repeats in the model medicinal mushroom Ganoderma lucidum. Gene. 2013;512:331–336.
Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004;5:435–445.
Labbe J, Murat C, Morin E, Le Tacon F, Martin F. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers. Curr Genet. 2011;57:75–88.
Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000;10:72–80.
Bonin A, Bellemain E, Eidesen PB, Pompanon F, Brochmann C, Taberlet P. How to track and assess genotyping errors in population genetics studies. Mol Ecol. 2004;13:3261–3273.
Kalinowski ST, Taper ML, Marshall TC. Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol Ecol. 2007;16:1099–1106.
Pemberton JM, Slate J, Bancroft DR, Barrett JA. Nonamplifying alleles at microsatellite loci: a caution for parentage and population studies. Mol Ecol. 1995;4:249–252.
Chakraborty R, Li J, Zhong YX. Paternity evaluation in cases lacking a mother and nondetectable alleles. Int J Legal Med. 1994;107:127–131.
Gotoh RO, Tamate S, Yokoyama J, Tamate HB, Hanzawa N. Characterization of comparative genome-derived simple sequence repeats for acanthopterygian fishes. Mol Ecol Resour. 2013;13:461–472.
Selkoe KA, Toonen RJ. Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers. Ecol Lett. 2006;9:615–629.
Kolodziej K, Theissinger K, Brün J, Schulz HK, Schulz R. Determination of the minimum number of microsatellite markers for individual genotyping in wild boar (Sus scrofa) using a test with close relatives. Eur J Wildl Res. 2012;58:621–628.
Waits LP, Paetkau D. Noninvasive genetic sampling tools for wildlife biologists: a review of applications and recommendations for accurate data collection. J Wildl Manag. 2005;69:1419–1433.
Lukacs P, Burnham K. Review of capture–recapture methods applicable to noninvasive genetic sampling. Mol Ecol. 2005;14:3909–3919.
Hajkova P, Zemanova B, Roche K, Hajek B. An evaluation of field and noninvasive genetic methods for estimating Eurasian otter population size. Conserv Genet. 2009;10:1667–1681.
Marucco F, Pletscher D, Boitani L, Schwartz M, Pilgrim K, Lebreton J. Wolf survival and population trend using non-invasive capture–recapture techniques in the Western Alps. J Appl Ecol. 2009;46:1003–1010.
Wilson G, Frantz A, Pope L, Roper T, Burke T, Cheeseman C, et al. Estimation of badger abundance using faecal DNA typing. J Appl Ecol. 2003;40:658–66.
Frankham R, Ballou JD, Briscoe DA. Introduction to conservation genetics. Cambridge: Cambridge Univ. Press; 2002.
Wedrowicz F, Karsa M, Mosse J, Hogan FE. Reliable genotyping of the koala (Phascolarctos cinereus) using DNA isolated from a single faecal pellet. Mol Ecol Resour. 2013;13:634–641.
Du LM, Li YZ, Zhang XY, Yue BS. MSDB: A user-friendly program for reporting distribution and building databases of microsatellites from genome sequences. J Hered. 2013;104:154–157.
Demuth JP, Drury DW. Genome-wide survey of Tribolium castaneum microsatellites and description of 509 polymorphic markers. Mol Ecol Notes. 2007;7:1189–1195.
Jurka J, Pethiyagoda C. Simple repetitive DNA sequences from primates: compilation and analysis. J Mol Evol. 1995;40:120–126.
Karaoglu H, Lee CM, Meyer W. Survey of simple sequence repeats in completed fungal genomes. Mol Biol Ecol. 2005;22:639–649.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The Clustal_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882.
Rozen S, Skaletsky HJ. Primer 3 on the www for general users and for biologist programmers. In: Krawetz S, Misener S, editors. Bioinformatics methods and protocols: methods in molecular biology. Totowa, NJ, USA: Humana Press; 2000. p. 365–386.
Raymond M, Rousset F. Genepop (version 1.2): population genetics software for exact tests and ecumenicism. J Hered. 1995;86:248–249.
This research was funded by the National Forestry Bureau of China (SG1412) and the National Science and Technology Support Project of China (2012BAC01B06). We would like to thank Dr. Yan Huang, Dr. Jindong Zhang, and Hong Liu for their help in collecting samples. We also thank Mengyao Liu, Chaochao Yan, Yu Zhou, Ting Huang, Xuhao Song for their experiment assistances. We are grateful to Dr. Zhenxin Fan and Timothy Moermond for their helpful comments on the manuscript.
The authors declare that they have no competing interests.
JH performed the laboratory work and data analysis. YZL designed the primers and contributed to the laboratory work. LMD carried out the bioinformatics analyses. BY collected the samples and performed the experiments. FJS, HMZ, and ZHZ supervised the laboratory work. XYZ and BSY conceived of the study, and participated in its design. JH, YZL, LMD, BY, FJS, HMZ, ZHZ, XYZ, and BSY wrote the manuscript. All authors read and approved the final manuscript.
Jie Huang and Yu-Zhi Li contributed equally to this work.
The information of the 160 primer pairs.
Characteristics of 54 polymorphism microsatellites developed in this study. Shown are locus names, primer sequences, accession number, repeat units, fluorescent dyes, annealing temperatures (Ta), length (bp), numbers of individuals genotyped (N), numbers of alleles(A), observed heterzygosity (HO), expected heterzygosity (HE), Polymorphism Information Contents (PIC), HWE P values (P-value).
The linkage disequilibrium (LD) analysis between the 54 microsatellite loci.
The genotype result of the 13 captive giant panda faeces based on 6 microsatellites.
The genotype result of the 22 unique wild giant panda individuals based on 6 microsatellites.
About this article
Cite this article
Huang, J., Li, YZ., Du, LM. et al. Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system. BMC Genomics 16, 61 (2015). https://doi.org/10.1186/s12864-015-1268-z