Flavonoids represent a class of plant secondary metabolites that have evolved a variety of physiological functions including pigmentation, pathogen defense, and UV protection . Additionally, metabolic engineering of flavonoids has become an important target for plant biotechnology, as flavonoids provide health benefits to foods, favorable agronomic traits to crops, and may be used in the future to color commercial transgenic materials such as grains to facilitate their identification and monitoring [2–4].
Commercial soybean (Glycine max (L.) Merr.) has a yellow grain. However rare spontaneous mutants exist that have black (iRT) or brown (irT) seed coat (testa) color phenotypes. Black (iRT) and brown (irT) soybean seed coats contain proanthocyanidins (PAs, a.k.a. condensed tannins) but differ in the presence/absence of anthocyanins . A goal for biosafety is to engineer a novel red seed coat color as a marker for transgenic soybean grains to facilitate their identification , and could potentially be achieved by the suppression of anthocyanin-specific genes that are overexpressed in the black soybean seed coat. However the genes have not yet been identified.
Six genetic loci (I, R, T, Wp, W1, and O)  identified by classical genetics control flavonoid-based seed coat color in soybean. The I locus controls the presence or absence and spatial distribution of flavonoid pigments and has four alleles (I, I
, i); I gives completely non-pigmented seed coat, I
restricts pigment to the hilum and I
to a saddle-shaped region, whereas the i allele results in a fully pigmented seed coat . The recessive i allele results from spontaneous deletion of CHS4 or CHS1 promoter sequences and results in the abolishment of a posttranscriptional RNA silencing mechanism that results in the increased accumulation of chalcone synthase (CHS) transcripts in the seed coat [7, 8].
The T locus is a pleoiotropic locus that controls the type and abundance of flavonoid pigments in the seed coat in addition to other traits such as seed coat cracking and trichome pigmentation [5, 9, 10]. Genetic polymorphisms that affect the expression or function of the flavonoid 3'-hydroxylase gene (F3'H1) have been shown to co-segregate with recessive t alleles [11, 12].
The W1 locus controls flower color and affects seed color only in an iRt background; where W1 and w1 alleles give imperfect black and buff seed coat colors, respectively . The W1 allele for purple flower color was shown to code flavonoid 3',5'-hydroxylase (F3'5'H), as a 65-bp insertion in the gene (F3'5'H) co-segregated with white flower color (w1) .
The Wp locus was suggested to code the flavonone 3-hydroxylase gene (F3H1) by microarray analysis as high levels of F3H1 transcripts co-segregated with purple flower color (Wp), and low levels with pink (wp) flowers . In the seed coat, recessive wp resulted in a change from black (iRTWp) to a lighter grayish (iRTwp) color .
The O locus affects the color of brown (irTO) seed coats, with the recessive o allele giving a red-brown (irTo) phenotype . The O locus has been suggested to code the proanthocyanidin (PA, a.k.a. condensed tannin) biosynthesis gene anthocyanidin reductase (ANR), as the gene was located between markers that flank the O locus on the soybean physical map . However molecular genetics analyses have not yet demonstrated the identity of the O locus gene.
Finally, the R locus controls the presence (R) or absence (r) of anthocyanins in black (iRT) or brown (irT) seed coats, respectively . Despite the identification of several pigment genes from soybean, only transcripts for anthocyanidin synthase (ANS) genes (ANS23-1, ANS27-1 and ANS100) have been shown to be overexpressed in the black (iRT) seed coat compared to the brown (irT) nearly-isogenic line by northern blot . The upregulation of several ANS genes suggests the R locus to code a regulatory factor, and raises the possibility that other isogenes for anthocyanin biosynthesis may be upregulated.
Recently, a cDNA coding the UDP-glycose:flavonoid-3-O-glycosyltransferase (UF3GT) gene (UGT78K1) was isolated from the black (iRT) seed coat and shown to function in anthocyanin biosynthesis in vitro and by complementation of a gene mutation in Arabidopsis . However UGT78K1 expressions have not been investigated in relation to seed coat color.
The soybean genome sequence Glyma1 was predicted to code 46,430 protein-coding genes with nearly 75% of the genes present in multiple copies . This may suggest a relatively high frequency of functional redundancy and increased difficulty in identifying soybean genes by traditional approaches. However, using transcriptome analysis tools such as the Soybean GeneChip equipped with 37,500 probe sets in combination with broad-coverage metabolite analysis methodologies such as LC-MS/MS, gene functions could potentially be efficiently predicted. The combined analysis of transcriptome and metabolite data has been shown to be a powerful approach for the functional identification of unknown genes [20–22]. Metabolite differences caused by the overexpression of a transcription factor, the exposure to a nutritional stress, and by species differences have been correlated with differences in transcriptome profiles to successfully predict the functions of unknown genes in flavonoid, glucosinolate, and alkaloid biosyntheses [21–23].
In the present study we have employed targeted and non-targeted metabolite analysis methodologies and have demonstrated that black (iRT) and brown (irT) nearly-isogenic soybean seed coats do not just differ in the presence/absence of anthocyanins, and have extensive differences in procyanidin, (iso)flavonoid and phenylpropanoid compositions. The underlying differences in gene expressions were then identified by microarray analysis, and the putative functions of 20 unknown genes were assigned by comparison to metabolite data. From the set of differentially regulated genes, two putative late-stage anthocyanin genes were selected and the functions of their coded enzymes were validated in vitro. In addition, a set of gene candidates potentially coded by the R locus have been provided.