A total of 910529 microsatellite markers have been searched by in silico mining. Simple STR were found to be most abundant (91.16%). Microsatellite density has been found positively correlated with genome size [18–20]. Among fully sequenced eukaryotic genomes, microsatellite density is highest in mammals. However in case of plant, microsatellite frequency is negatively correlated with genome size .
In the present study of water buffalo, mono- motif was found to be most abundant. Relative distributions of different microsatellite motif length classes in genomes differ considerably from species to species .
In case of water buffalo, it was found that longer repeats are less in abundance which is expected as reported and described in various studies [23, 24]. It was also observed that microsatellite size range is increasing from 10 up to 14–16, however beyond this size range, it again starts decreasing. This is due to cyclical nature of microsatellite marker per say in its course of evolution. The birth of microsatellite starts with, out of register loop in event of DNA replication with a threshold size of 8 repeat unit or more, in the form of simple repeat. Gradually due to background mutation simple repeat gets converted in compound repeat. At the stage of simple repeat, the rate of mutation is high and predominantly it is addition of repeat unit and hence size increases. But once background mutation converts simple repeat into compound interrupted repeat, the smaller size simple repeat of less than 8 unit gets pinched off in subsequent replications. This maintains the size of microsatellite as evolutionary constraints otherwise microsatellite marker would have been always increasing in length during course of evolution. Thus individual microsatellites arrays have a “life cycle” of sorts, they are born, they grow and ultimately they perish. These events may stretch over tens or even hundreds of millions of years [25, 26].
Water buffalo microsatellite profile exhibits the similar pattern. The relative abundance of repeat motif were in order of mono, di, tri, penta, tetra and hexa (Table 2). Though di-nucleotide repeats are most abundant in eukaryotic genome [27, 28] but we found most abundance of mononucleotide repeats across all chromosome. This relatively higher abundance of mono over di nucleotide repeat type might be due to inherent limitation of the NGS technology which adds more mono nucleotide causing sequencing error . The longer the chromosome proportionately higher the total repeat content as expected in ubiquitously distributed STR markers .
In order to validate the previously reported STR markers, two sets viz. heterologous (cattle original species and buffalo focal species), homologous STR (developed from buffalo and validated in buffalo) were considered. The heterologous markers recommended by FAO-IASG  and homologous marker  were used. It was observed that both subsets of heterologous ISAG-FAO recommended primer for cattle and buffalo diversity analysis gave less validation results i.e. 10% and 13.33% respectively. Cross species amplifiability is due to conservation of cattle STR and its flanking regions in other species . Though some of the primers showed validation up to 36.67% (Table 3). In the cross species amplifiability of bovidae species, such data are usually expected due to null alleles and genomic changes during speciation . In validation of homologous STR, it was found that both subsets reported higher percentage of monomorphic (28.57%) and polymorphic (24.30%) loci. The validation results are limited as the first draft genome assembly of buffalo is based on cattle and it is not completely finished.
The findings of this study has limitations which need to be addressed. As genome of water buffalo is just draft assembly based on cow assembly Btau 4.0, thus de novo assembly is needed to have the buffalo specific chromosome wise microsatellite profile. The current database is based on chromosome number of cattle which is certainly not the same in case of buffalo. For example cattle chromosome 4 is actually buffalo chromosome 8. In fact only chromosome number common between cattle and buffalo are just 5 viz 1, 2, 17, 18, and X . The splitting and translocation has rendered syntenic relationship between these two species which are well documented. Nevertheless the microsatellites in our database with option of primer designing at desired place over “chromosome” will be of immense use especially over radiation hybrid of buffalo to resolve the problem and current issue of de novo assembly. Besides this, these markers can be further used for QTL, gene mapping as well as biodiversity analysis in setting the conservation priorities. The markers present in our database need further wet lab validation. Being first database of water buffalo microsatellite especially at juncture where de novo genome assembly is yet to be done, the use of these markers are highly warranted in order to “finishing” of water buffalo genome assembly. This will further lead to next version of buffalo microsatellite database base with proper buffalo specific chromosome wise data which is hitherto missing but critically needed. Such endeavour will fetch not only increase in buffalo productivity but also greater food security especially in third and new world countries.