Assessment of genetic diversity, relationships, and structure within a given set of germplasm is useful in plant breeding for different reasons including: (i) assisting in the selection of parental combinations for developing progenies with maximum genetic variability for genetic mapping or further selection ; (ii) describing heterotic groups [2–7]; (iii) determining the level of genetic variability when defining core subsets selected for specific traits ; (iv) estimating possible loss of genetic diversity during conservation or selection programs ; and (v) estimating the relative strengths of evolutionary forces (mutation, natural selection, migration or gene flow, and genetic drift) [10, 11]. In maize, the two main tasks of breeders involve the first two points, above, including developing improved inbred lines and identifying the best parental combinations for creating hybrids that are phenotypically superior and with significantly higher yield compared to their parents . In species where heterosis and heterotic groups can be exploited, inbred lines are primarily developed by crossing elite lines within heterotic groups followed by inbreeding and selection, while hybrids are produced by crossing parents that belong to different heterotic groups. A heterotic group is a collection of closely related inbred lines which tend to result in vigorous hybrids when crossed with lines from a different heterotic group, but not when crossed to other lines of the same heterotic group . Depending on the objectives of the breeding program, breeders use different methods in selecting the best parents for making crosses, and for assigning lines to a particular heterotic group, including (a) pedigree relationships, (b) phenotypic performance for specific traits, (c) adaptability and yield stability, (d) top crosses, (e) diallel crosses, and (f) genetic distances estimated from morphological and molecular markers . Genetic distance can be estimated from various types of molecular markers, including restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs).
Advances in molecular technology, however, have produced a shift towards SNP markers [15, 16]. Because of their low cost per data point, high genomic abundance, locus-specificity, codominance, potential for high throughput analysis, and lower genotyping error rates [17–19], SNPs have emerged as a powerful tool for many genetic applications, including genetic diversity studies, linkage and quantitative trait loci (QTL) mapping, and marker-assisted breeding . Currently, chip-based technology is the most high-throughput SNP genotyping platform. The Illumina chip-based SNP detection technology is useful for a broad range of applications to genotype samples with different possible levels of multiplexing, from 48 to 384 (BeadXpress) and 1536 (GoldenGate) to 55,000 SNPs (Infinium). Such chip-based genotyping platforms are suitable for large-scale studies that require genotyping of individual samples with thousands of SNPs . High levels of multiplexing, high total cost and lengthy process of initial assay development are a drawback of chip-based platforms. They may be unsuitable for studies where only a small to moderate number of SNPs are needed over a large number of samples, as is the case in mapping, marker assisted recurrent selection, marker assisted backcrossing, and quality control applications. In such cases, uniplex SNP genotyping platforms are more suitable . Furthermore, a significant percentage of the SNPs in highly multiplexed chip-based assays generally prove uninformative in any given population . It is therefore necessary to select the best SNPs to provide a good level of discrimination for uniplex assays of each population under study.
Maize (Zea mays ssp. mays L.) is the world's third most important by acreage and is a multi-purpose crop for food, animal feed, biofuel, and raw material in the synthesis of a broad range of industrial products . Over the past 4 decades, breeders at the International Maize and Wheat Improvement Center (CIMMYT), in collaboration with the National Agricultural Research Systems (NARS) of many maize-growing countries, have developed numerous germplasm pools, populations, and open-pollinated varieties [6, 24, 25]. CIMMYT maize germplasm is widely used by various public and private sector institutions worldwide for the development of open pollinated varieties, hybrid seed production, pedigree breeding, development of populations for QTL mapping, molecular breeding, doubled haploid production, and transgenic introduction of traits. Lu et al.  characterized 770 maize lines, including 394 tropical/subtropical germplasm from CIMMYT; 14 tropical/sub-tropical and 268 temperate germplasm from China; and 1 temperate and 93 tropical/subtropical germplasm from Brazil, using 1034 SNPs. The authors reported the presence of clear population structure and genetic divergence between temperate and subtropical/tropical germplasm. Yan et al.  studied 632 inbred lines from temperate, tropical, and subtropical public breeding programs and reported the presence of clear structure between temperate and tropical lines, and also complex familial relationships among global maize collection. Wen et al.  studied an association mapping panel consisting of 359 maize inbred lines both from CIMMYT and International Institute for Tropical Agriculture (IITA) breeding programs that have resistance to drought, low nitrogen, soil acidity, pest and disease resistance. The authors reported the presence of a subgroup that largely consisted of lines developed from LaPosta Sequía. All the previous three studies, however, included some of the maize inbred lines that were either developed by the CIMMYT maize breeding programs in eastern and southern Africa or widely used CIMMYT Maize Lines (CMLs) in the region. The main objective of our study was to investigate the population structure and patterns of relationships of the maize inbred lines from CIMMYT maize improvement programs in Zimbabwe and Kenya for better exploitation in breeding programs. The other objectives of our study were to assess the utility of SNPs in classifying maize inbred lines into one of two heterotic groups commonly used by the CIMMYT breeders, and identify a subset of highly informative SNP markers for routine and low cost genotyping of CIMMYT germplasm in the region.