The Acacia auriculiformis x A. mangium hybrid is emerging as an important forest tree for pulpwood production in South East Asia. Marker-assisted breeding is a promising approach for selection of superior trees with improved wood and pulp properties for the establishment of forest plantations. Previous efforts to develop molecular markers such as Cleaved Amplified Polymorphic Sequence (CAPS) , genomic - Simple Sequence Repeat (SSR)  and Expressed Sequence Tag - Simple Sequence Repeat (EST-SSR)  for Acacia hybrid did not generate sufficient markers for linkage map construction because they were either monomorphic or not fully informative for the biparental mapping populations. Development of molecular marker from narrow genetic background such as the parents of the mapping population is an effective way to generate informative markers for linkage mapping. Towards this end, Single Nucleotide Polymorphism (SNP) is the ideal marker because it provides affordable and high-throughput genotyping compared to other markers . A SNP is a single base change that occurs in at least 1% of the population . SNPs are also co-dominant, bi-allelic, abundant in the genome  and thus, suitable for low genetic diversity species such as A. mangium. Besides linkage map construction, SNPs can be used in genetic diversity assessment of natural germplasms, estimation of outcrossing rate in natural germplasms and seed orchards, and more importantly, clone and hybrid identification in breeding program of both species. To date, only one study reported the genetic diversity of A. mangium involving the use of Restriction Fragment Length Polymorphism (RFLP) markers . Genetic diversity of A. auriculiformis has been studied using isozyme markers [9, 10] and Sequence Characterized Amplified Region (SCAR) . Although there is no study that compare the genetic diversity of both species, the lower SNP frequency observed in the transcriptome of A. mangium suggested that A. mangium has lower genetic diversity than A. auriculiformis.
There are two main strategies to develop SNP markers, namely in vitro and in silico method. The in vitro method involves polymorphism screening using DNA sequencing while in silico method detects polymorphisms in DNA sequences of different individuals using computer sequence analysis . Although in silico method is cheaper and less labour intensive compared to in vitro detection, it is more prone to sequencing errors and low sequence coverage . Currently, Next Generation Sequencing (NGS) offers affordable, high-throughput, and accurate sequence data generation. NGS has been proven highly effective for in silico SNP detection in many plants with reference genome such as Arabidopsis . For non-model species, several approaches have been adopted to overcome the lack of reference genome: a) generation of genome sequences using the Reduced Representation Library (RRL) method [16–18]; b) use of reference genome from closely related species such as catfish ; c) Use of gene index as reference ; d) de novo transcriptome sequencing and assembly [12, 21–23]. Many reported studies on de novo transcriptome sequencing utilized 454 sequences that give longer read length for SNPs discovery (e.g., Eucalyptus grandis and maize ). Although various methods have been reported, it is difficult to apply the results of these findings to obtain similar results due to the differences in read quality, sequence coverage and preference of mapping and SNPs calling tools. It is important to understand the limitations and error rate of each dataset for effective in silico SNP detection .
Medium- to high-throughput custom SNP genotyping technologies such as Illumina GoldenGate, KBiosciences KASPar and Sequenom iPlex, which differ in assay type, throughput, multiplexing and cost are suitable for linkage map construction . Among these technologies, the Illumina GoldenGate Assay has been widely applied in many plant species . It has been demonstrated in human genome to provide affordable genotyping which has high reliability, reproducibility and multiplexing of up to 1,536 SNPs . In Illumina GoldenGate Assay, three oligonucleotides are designed for each SNP locus using the submitted flanking sequence around the SNP site. A minimum of 50 bp flanking sequence upstream and downstream of the SNP site is required for submission . Two oligos are specific to each allele of the SNP site, called Allele-Specific Oligos (ASO1 and ASO2), while a third oligo known as Locus Specific Oligo (LSO) carries a unique address sequence and hybridizes several bases downstream from the SNP site. The assay involves several steps such as DNA activation, oligonucleotide hybridization, extension and ligation, universal PCR, array hybridization and scanning. For SNP genotyping using Illumina BeadXpress Reader machine, genomic DNA is first biotinylated, attached to the oligonucleotides and bounded to streptavidin-coated paramagnetic beads . The extension and ligation of hybridized oligonucleotides provide PCR template using three universal PCR primers in which two are Cy3 and Cy5 fluorescence-labelled. The non-fluorescent strand of PCR product is removed through its 5’ biotin group to generate single-stranded DNA for hybridization to VeraCode Beads. After hybridization, the BeadXpress Reader machine scanned for fluorescence signals on VeraCode Bead Plate and exports the intensity values to GenCall software. GenCall software uses a clustering algorithm known as GenTrain and calculates a quality score for each genotype . The intensity values for each of the two-color channels, commonly referred to as A and B, are normalized and plotted to display distinct patterns or clusters to represent AA, AB and BB signal profiles. The AA, AB and BB clusters correspond to homozygous genotype for allele A, heterozygous genotype and homozygous genotype for allele B, respectively.
The development of high-throughput SNP assay in Acacia can be challenging for several reasons. Without a reference genome, several factors are known to affect the success of a SNP assay such as the presence of exon-intron boundaries, secondary SNPs and indels in the flanking region, paralogous genes, genome complexity and repetitive sequence. The assay success rate has been reportedly lower in conifer forest species with complex genome (e.g., 67% in Pinus radiata and Pinus pinaster, and 82% in spruces ). This factor may not be an issue for tropical hardwoods, which typically have smaller genomes and considerably less repetitive sequences than conifers. Although SNP transferability to other species has been reported in several plant species [34, 35], SNP development in interspecific crosses has only been reported for a few forest and aquaculture species that remained largely “wild” and naturally outcrossing [19, 35, 36]. When using interspecific crosses, genomic similarity between species must be high to allow amplification and hybridization in SNP genotyping. The genes of A. auriculiformis and A. mangium have been reported to share 99% similarity in nucleotide level , and thus sequence similarity is not a concern. The overall success of SNPs development in Acacia hybrid will depend on short read sequence quality, a highly robust SNP detection approach to identify sequencing error and with enough sensitivity to detect rare SNPs to increase assay successful rate, appropriate assay design and genotype calling approach to obtain high quality genotypes.
In this study, we aimed to develop high-throughput SNP genotyping assay for A. auriculiformis x A.mangium hybrid with the ultimate objective of linkage map construction. We sequenced the transcriptomes of the parents of two mapping populations and mapped the short reads against a set of gene contigs to discover SNP markers. We evaluated our SNP detection approach based on a set of validated SNPs from two lignin genes detected using in vitro approach and further validated 96 SNPs using Illumina GoldenGate Assay. We also investigated several factors affecting assay success rate based on 96-plex validation. Based on these findings, we further improved the SNP detection approach and designed Illumina GoldenGate Assay consisting of 768 SNPs. The clustering patterns were analyzed to evaluate the reproducibility of Illumina GoldenGate Assay. In addition, we identified polymorphic SNPs that can be transferred to natural germplasms of A. auriculiformis and A. mangium.