- Research article
- Open Access
Sequencing and analysis of the gene-rich space of cowpea
© Timko et al; licensee BioMed Central Ltd. 2008
- Received: 05 October 2007
- Accepted: 27 February 2008
- Published: 27 February 2008
Cowpea, Vigna unguiculata (L.) Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing.
We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF) technology. Over 250,000 gene-space sequence reads (GSRs) with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa), and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO) with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A total of 5,888 GSRs had homology to genes encoding transcription factors (TFs) and transcription associated factors (TAFs) representing about 5% of the total annotated sequences in the dataset. Sixty-two (62) of the 64 well-characterized plant transcription factor (TF) gene families are represented in the cowpea GSRs, and these families are of similar size and phylogenetic organization to those characterized in other plants. The cowpea GSRs also provides a rich source of genes involved in photoperiodic control, symbiosis, and defense-related responses. Comparisons to available databases revealed that about 74% of cowpea ESTs and 70% of all legume ESTs were represented in the GSR dataset. As approximately 12% of all GSRs contain an identifiable simple-sequence repeat, the dataset is a powerful resource for the design of microsatellite markers.
The availability of extensive publicly available genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the characterization of syntenic relationships with other cultivated and model legumes, and will contribute to determining patterns of chromosomal evolution in the Leguminosae. The micro and macrosyntenic relationships detected between cowpea and other cultivated and model legumes should simplify the identification of informative markers for marker-assisted trait selection and map-based gene isolation necessary for cowpea improvement.
- Gene Ontology
- Transcription Factor Family
- WRKY Gene
- WRKY Domain
- Cowpea Cultivar
Cowpea, Vigna unguiculata L. Walp., is both one of the most important food and forage legumes in the semi-arid tropics and a valuable and dependable commodity for farmers and grain traders [1, 2]. Of the ~21 million acres grown worldwide, 80% of cowpea production takes place in the dry savannah of tropical West and Central Africa, mostly by poor subsistence farmers in developing countries [2, 3]. Despite its economic and social importance in the developing world, cowpea has received relatively little attention from a research standpoint and remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as those governing abiotic stress (drought, salinity, and heat) tolerance, photoperiod sensitivity, plant growth type, and seed quality with resistances to the numerous bacterial, fungal, and viral diseases and insect, invertebrate (nematode), and herbivorous pests [1, 2]. Implementation of marker assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. Thus, relatively large genetic gains can likely be made with only modest investments in both applied plant breeding and molecular genetics.
The Leguminosae (Fabaceae) family consists of 757 genera and over 20,000 species . Diversification commenced soon after the first identifiable legumes appeared in the fossil record ~56 million years ago (Mya), and all living legumes are thought to share a common ancestor that existed an estimated ~59 Mya . The family is divided into several major clades , with most of the major crop and model species concentrated in the Papilionoideae clades "Halogalegina" (adapted to temperate climates, such as Lotus and Medicago) and "phaseoloid/millettioids" (warm-season species such as Glycine and Phaseolus). Cowpea belongs to the latter clade, along with the other major warm-season crops: common bean (P. vulgaris), pigeon pea (Cajanus cajan), and soybean (G. max) . The split between Glycine and Medicago is dated at ~54 Mya and that between Glycine and Phaseolus at ~19 Mya [5, 8].
The most significant progress in legume genomics has been made for the small genome model species, M. truncatula and L. japonicus, and for soybean (G. max), economically the most important legume crop species [8–13]. Large Expressed Sequence Tag (EST) collections are available for all three of these species , along with near complete genome sequences and well-developed genetic and physical maps . The availability of genomics level information lags substantially in other legumes, although some progress has been made in pea (P. sativum), common bean (P. vulgaris), alfalfa (M. sativa), and peanut (Arachis hypogea) [7, 15–18].
Little attention has been paid to gene characterization and the development of resources in cowpea [2, 19] despite the fact that its genome size of 620 Mb is one of the smallest among the legumes and is at the lower end of plant genomes in general [20–22]. At the time of writing, fewer than 1,000 cowpea ESTs have been deposited in public databases  and most of the genomic DNA sequence available relates to either rRNA coding and spacer regions or represents anonymous sequence exploited for RFLP mapping. Increasing our knowledge of the structure and composition of the cowpea genome will help in the interpretation of genome evolution in this phaseoloid/millettoid clade and Papilionoideae in general, and will undoubtedly contribute substantially to efforts aimed at improvement of this crop.
While advances in high-throughput sequencing technologies make the prospects of whole genome sequencing possible, for most plant species the associated cost still remains prohibitive because of their genome size and complexity. Reduced-representation approaches, such as methylation filtration (MF) and Cot-based cloning and sequencing, have been developed to alleviate some of the difficulties presented by the presence of ubiquitous repetitive DNA [23–27]. Both techniques rely on gene enrichment for the recovery of genic sequences. While Cot-based selection separates low-copy from high copy sequence based on differential annealing [28, 29], MF targets the hypomethylated fraction of the genome for cloning. The use of MF as an enrichment technique has been successfully demonstrated in maize, sorghum, soybean and tobacco [13, 23, 25, 30–32]. Empirical comparisons suggest that the relative efficacy of MF and Cot-based enrichment the techniques is species dependent [25, 26, 33, 34]. Here we show how MF was successfully applied in cowpea for the enrichment of gene-rich regions and report a detailed analysis of the resulting recovered sequences.
Sequencing the gene-rich space of cowpea sampled by MF
Analysis of cowpea genespace sequence reads (GSRs) from methylation-filtered (MF) and unfiltered (UF) genomic libraries.
Number of Clones (%)
Number of Clones (%)
Statistical data on cowpea genespace sequence reads (GSRs) and assemblies.
Total number of sequence attempts
Total number of successful sequencesa
Usable read length (nucleotides)
Total Usable Bases
Number of Clusters
Number of GSR assemblies (contigs)
Number of singletons
Minimum Assembly Length (nucleotides)
Maximum Assembly Length (nucleotides)
Mean Assembly Length (nucleotides)
Median Assembly Length (nucleotides)
Total length of DNA represented by GSR assemblies and singletons
Number of GSR assemblies annotated by BLAST
Number of GSR singletons annotated by BLAST
The 263,425 successful GSRs were clustered and assembled  into 52,149 GSR assemblies (contigs) and 70,679 singletons (Table 2). Relatively few singletons were generated by the assembly process indicating that the clustering of sequences was effective. The largest cluster (CL1) contains 1557 contigs with 17,407 members and the smallest cluster contains two GSRs. Clusters 2 through 30 contain from 354 to 57 component GSRs. Following assembly, the GSR dataset represents a nuclear coverage of ~78 Mb, which is equivalent to 52% of the sampled gene space and 12.7% of the total cowpea genome.
Gene annotation and gene ontology analysis
Statistics on homology-based annotation of cowpea GSRs.
Number of Cowpea GSR Annotated
Distinct Accession Numbers
Annotation Database Size
Percent Matched Sequences
To determine the number of unique gene sequences represented in our dataset, BLAST comparisons to the NCBI GenBank peptide database  performed using the 52,149 GRS assemblies and 70,679 singletons resulted in the 41,260 annotated sequences. Of these annotated sequences, 19,786 had distinct GenBank accession numbers. By comparison, 23,561 distinct GenBank accession numbers were found by BLAST annotation of the 95,364 GSRs prior to assembly. Thus, the assembly process did not enhance or significantly compromise the identification of putative gene coding sequences.
The goal of reduced representation sequencing strategies such as MF is to capture as much gene complexity as possible without the laborious task of complete genome sequencing. Most plant genomes are thought to encode between 35,000 and 40,000 genes . In legumes, gene density is estimated to be 1 gene per 6–10 kb [12, 17, 40–42]. With a gene space coverage of ~78 Mb captured by MF, the estimated minimum number of genes potentially tagged in our MF dataset should be between 7,800 and 13,000. In contrast to these predicted values, our annotation data clearly indicate that we tagged ~40,000 gene coding regions, representing a minimum of 19,786 distinct GenBank accession numbers. This latter number is also likely an underestimate, since we only included the single lowest e-value score per sequence. Some of the sequences matched multiple GenBank accession numbers and the second or third ranked e-values could represent additional coding regions on the same fragment.
Comparisons to ESTs of cowpea and other legumes
Results of BLAST comparisons of cowpea GSRs against various legume EST-derived unigenes.
Number of Unigenes
Number of Matches with Cowpea GSRsa
All legume unigenes at LIS
BLAST comparisons (tblastx) of the consensus ESTs-derived unigenes from other legumes and the GSRs showed that 88.6% of the unigenes from P. vulgaris matched cowpea sequences. This is not surprising since common bean and cowpea and phylogenetically very close. Not surprisingly, ESTs-derived unigenes from the more distally related M. truncatula and L. japonicus only had match rates with cowpea of 68.68% and 60.61%, respectively. The mean percentage of match for cowpea GSRs and available legume ESTs-derived unigenes was ~70%.
Mapping cowpea GSRs to M. truncatulapseudomolecules
Analysis of transcription factor (TFs) families
Plants devote ~7% of their genome coding capacity to proteins that regulate transcriptional activities [44–46]. Analysis of completed plant genome sequences suggests that are upwards of 60 TF families present in most plant genomes. In Arabidopsis [47, 48] and P. trichocarpa [49, 50] the 64 TF families vary in size from 1–2 members to over 100 members. Rice contains 63 of the dicot TF families [51, 52], missing only the SAP1 family represented by only a single gene in both Arabidopsis and P. trichocarpa. About 43 of the known TF families and ~25 potentially novel plant TFs and TAFs were identified in an in silico analysis of the M. truncatula genome using the Medicago Gene Annotation Group (IMGAG) dataset as starting material . Since ~5% of the cowpea GSRs showed some homology to known TF, we examined the distribution among the known TF families in vascular plants and in selected cases the complexity of cowpea TF families relative to what is found in other plant species.
BLAST homology searches were carried out using conserved domains for the 64 TF families previously defined in Arabidopsis [47, 48] and tobacco . One or more gene coding sequence for 62 of the 64 TF families previously identified in vascular plants could be identified in the cowpea GSR dataset, including sequences encoding the SAP1 TF family. Among the low copy TF families present in other plants, one member of each of the LFY, NZZ, and ULT families, two members of the CCAAT-DR1 and Whirly gene families, and 3 members of the LUG and VOZ gene families were present. Only the HRT-like and S1Fa TF families were not represented among the cowpea GSRs.
The ERF, WRKY, and CONSTANS (CO)/CONSTANS-like (CO-like) gene families were chosen for a more detailed analysis. These families are well characterized in other plant species and encode proteins that regulate a variety of plant developmental, stress, and growth responses. In Arabidopsis and P. trichocarpa, the ERF and WRKY families are among the largest TF families present, and the CO-like gene family has ~20 members. However, the most important criteria for selecting these TF families for analysis was that the gene products contain short, well-conserved DNA-binding domains that can be used to estimate diversity and phylogenetic relationships among family members, and to study gene family evolution [44, 47, 53, 55]. Significant comparative information is available for the CO-like family in legumes .
The ERF family
ERF transcription factors play important regulatory roles in plant responses to both biotic and abiotic stresses, sugar signaling, and determination of organ identity [57–59]. BLAST searches of the GSR dataset were based on a representative DNA binding domain from each of the ten major ERF subgroups (I-X)  using low stringency (cut off value = 10). The complete set of ERF domain-containing GSRs was assembled into contigs, each sequence was manually verified and false positives removed. This approach ensured that all possible gene family members, including the most highly divergent ones, were isolated. As a result, 111 ERF sequences were obtained, representing a minimum of 109 ERF genes. The minimum number is lower than the total number because a small number of sequences contain incomplete DNA-binding domains, and therefore some ERF sequences may represent the 5' and 3' ends of the same gene. The predicted minimum number of ERF gene family members present in cowpea (109) is similar to that predicted to be in the Arabidopsis genome (122–124) [44, 58].
The high degree of similarity in the phylogenetic arrangement of ERF genes between cowpea and Arabidopsis indicates that it should be possible to use such analyses to identify potential targets for cowpea improvement. We also constructed a phylogeny that contained the ERFs of cowpea and those of other plant species whose biological function has been reported (data not shown). Using this type of analysis we were able to identify the closest cowpea homologues of CBF1, DREB1A, TINY, CaPBF1, ORCA3 and Pti4, ERFs known to be regulators of important agronomical traits such as drought, salt tolerance, freezing tolerance, and disease and pest resistance [see Additional file 4].
The WRKY family
The WRKY TFs regulate responses to biotic and abiotic stresses, senescence, germination, and a number of developmental processes [60–63]. Each WRKY transcription factor contains at least one conserved ~60 amino acids region (the WRKY domain) with the peptide sequence WRKYGQK at the N-terminus and a Zn-finger motif at its C-terminus . The ancestral-type WRKY TF (Group I) contains two WRKY domains, one N-terminal and the other, C-terminal. All other genes contain just one WRKY domain and are classified into Groups IIa, IIb, IIc, IId, IIe and III on the basis of their primary amino acid sequence and structure of their Zn-finger motifs. A BLAST search of the cowpea GSRs was performed with each of the WRKY domains from the various subgroups and with both the N-terminal and C-terminal domains from Group I. A total of 79 contigs, containing at least part of a WRKY domain, were obtained. Discovery of WRKY genes was technically more difficult than the ERF genes because most WRKY domains are interrupted by an intron separating the WRKY and Zn-finger parts of the domain. The effect of this intron was that frequently only the 5'- or 3'-end of the WRKY domain was present in the assembled contigs. Nevertheless, it was possible to estimate the minimum number (e.g., if all 5'- and 3'-ends were joined) and maximum number (e.g., if all 5'- and 3'-ends were not joined) of WRKY genes present to be, 53 and 79, respectively. These values are consistent with the prediction of 72 WRKY genes in Arabidopsis .
Cowpea homologs of functionally characterized WRKY genes could also be identified. These include VuWRKY44 and VuWRKY36, homologs of AtWRKY70, a TF that functions at the intersection of salicylic acid and jasmonic acid signaling during defense responses , VuWRKY35, a homolog of AtWRKY6, whose product plays a role in regulating senescence , and VuWRKY27 and VuWRKY55, homologs of TRANSPARENT TESTA GLABRA (TTG2/AtWRKY44), which plays a key role in trichome and seed coat development .
CONSTANS and the CONSTANS-like gene family
The timing of flowering is an important agronomic trait in crop plants [68–71]. Many genes involved in photoperiod responsiveness are functionally conserved in monocot and dicot species [72, 73]. As a result, it was possible for us to identify GSRs encoding many members of various gene families involved in light perception (e.g., PHY, CRY), as well as GSRs encoding components of the signal transduction pathways connecting photoperiod and phytohormonal stimuli in the induction of flowering. For example, the interaction between the products of the CONSTANS (CO) and FLOWERING LOCUS T (FT) genes underlie long- and short-day responsiveness [70–74]. CO encodes a TF that plays a central role linking the circadian clock to genes controlling meristem identity [73, 75]. CO and members of the CO-like gene family are defined by the presence of two conserved domains, a Zn-finger domain that resembles a B-box domain near the N-terminus and a CCT domain  near the C-terminus . In Arabidopsis, the CO-like gene family consists of three broad groups: Group I which includes CO and factors with two Zn-finger B-boxes near the N-terminus, Group II with one B-box, and Group III with one B-box and a second diverged Zn-finger. There is a surprising amount of disagreement in the literature about the number of CO-like genes present in Arabidopsis, with values ranging from 17  to between 33 – 51 genes [44, 77], due to variation among researchers in defining what constitutes a CO-like gene. Some researchers accept "CO-like genes" that only contain a CCT domain and but not a B-box. Since CCT domains are present in other TFs (e.g., ZIMs) and the B-box is a feature absolutely required for CO function [73, 75], in our analysis we only considered those genes that fit the stricter definition (i.e., CO and COL1-COL16). As a result, 23 cowpea CO-like genes were identified.
Vernalization acts to promote flowering by repressing the expression of another floral regulator, the MADS-domain protein termed FLOWERING LOCUS C (FLC) in Arabidopsis. Cereals appear to be missing FLC-like genes in their genomes, with the corresponding role being carried out by an unrelated Zn-finger TF . FLC is also conspicuously absent in the legumes [53, 56]. Consistent with this previous observation, we were unable to identify a FLC homolog in the MADS-domain TF family represented in the cowpea GSRs (J. Opoku, P. Rushton, and M.P. Timko, unpublished observations).
Genes controlling symbiosis and biotic stress responses
Legumes form mutually beneficial symbiotic associations with arbuscular mycorrhizal (AM) fungi and bacteria collectively known as rhizobia that are of tremendous agricultural importance [79, 80]. Establishing a fully functional symbiosis requires the successful completion of numerous steps, beginning with the recognition of chemical signals exchanged between the plant and bacterial/fungal symbiont and culminating in the differentiation of functional symbiotic cells/tissues. The process is the result of tightly regulated biochemical and molecular interactions between the legume host and its symbiont [81–86]. While the processes of nodulation and AM invasion have been extensively examined in other legume species, little experimental work has been done in cowpea. We searched the GSRs for homologs of genes known to be involved in nodulation and AM-legume symbiosis and identified the following: NFR1/NFR5, receptor kinases that perceive the bacterial derived signal in nodulation; SYMRK, receptor-like kinases that integrate perception of the signal and initiate symbiosis; NIN1 and members of the NIN-like family of TFs; GRAS-domain family proteins, such as NSP1 and NSP2; DMI1 and DMI3; putative plastidic ion channel protein components CASTOR and POLLUX; nucleoporin NUP133, required for the induction of Ca2+ spiking in nodule development; nodulin (NOD)-genes and genes encoding various nodule-specific proteins.
A large number of genes are involved in plant responses to biotic stresses (e.g. bacteria, fungi, insects, nematodes, and parasitic plants) . Both resistance (R) genes and genes encoding components of the signaling pathways activated by the R genes in the defense response have been extensively studied [88–90]. The largest class of R genes encodes intracellular proteins containing a nucleotide binding-site (NBS) and C-terminal leucine-rich repeats (LRR). The NBS family can be divided into multiple subfamilies based upon the presence or absence of other domains, such as a Toll/interleukin receptor domain (TIR) region, a coiled-coil (CC) domain, and a BED finger and/or DUF 1544 domain [87, 89]. Comparison of legume and non-legume resistance gene homologs indicates that legume genes possess a unique evolutionary history, with many clades either unique to legumes or expanded within legumes . Preliminary homology-based analysis of the cowpea GSRs dataset using previously identified conserved NBS domains from cowpea R genes , and NBS and LLR domains from R genes of other legume and non-legume species, identified > 500 R genes and R gene candidates. A fuller analysis of the diversity and phylogenetic relationships of these R genes is now underway.
In addition to the R genes, many of the conserved signaling components of the disease resistance response pathways are present in the cowpea GSRs, including NDR1, RPM1, COI1, EDS1, EDS5, PR5, SGT1, RPS5, and RIN4. Consistent with previous reports in other plant species, expression of these genes in cowpea has been shown to be activated by treatment with salicylic acid, jasmonic acid, or ethylene, and by wounding or by attack by the parasitic angiosperm Striga gesnerioides [93, 94].
Categories and distribution of simple sequence repeats (SSRs)
Summary statistics from analysis of cowpea GSRs for the presence and type of simple sequence repeats (SSRs)
Number in Total GSR/Number in Annotated GSR
Minimum Copy Number
Mean Copy Number
Maximum Copy Number
Among the legumes, comparative genetic mapping established early on that linkage relationships were well conserved between closely related genera [97–102]. As more sequence information has become available, the extent to which both macro- and microsynteny relationships exist has emerged. Despite significant differences in genome size, a high level of macrosynteny exists between Medicago and the Galegoids (such as alfalfa, pea, chickpea, and Lotus), whereas less macrosynteny is observed between Medicago and the Phaseolids (such as soybean and mungbean) [103–110]. The present work provides a firm foundation for detailed comparative studies of cowpea with other warm season legumes, which apart from soybean are as yet poorly represented. Many of the cowpea coding regions were readily mapped to the M. truncatula pseudomolecules, allowing for future efforts aimed at the dissection and analysis of regions of macro- and microsynteny. Such information, in combination with improvements of the current cowpea genetic map  will facilitate positional cloning of key genes of agronomic interest.
The development of genomic scale information and its use to conduct global transcriptomic, proteomic, and metabolomic analyses is a major goal of the legume research community [16, 22]. Such analyses are already well advanced in the model legumes [112–117]. The provision of extensive genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence provided here enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the further characterization of syntenic relationships among cultivated and model legumes. Ultimately these types of studies will contribute to determining patterns of chromosomal evolution in the Leguminosae. The determination of micro- and macrosyntenic relationships between cowpea and other cultivated and model legumes should assist in the identification of informative markers for use in marker-assisted trait selection and map-based gene isolation. The GSRs sequences we have generated also provide a resource for future studies of gene expression within cowpea. The development of oligonucleotide-based microarrays for functional genomics analysis is currently underway in our laboratory and this resource should soon be available for the legume community. We hope that the information and materials provided here will stimulate the broader goal of the genetic improvement of cowpea, which is a priority for the alleviation of the burden of biotic and abiotic stresses on subsistence farmers in developing parts of the world.
Seeds of two cowpea cultivars were used in these studies. Cultivar UCR-1115 (obtained from Dr. Jeff Ehlers, Department of Botany and Plant Sciences, University of California, Riverside, CA) was used in the pilot study and IT97K-499-35 (obtained from Dr. Mohammad Ishiyaku, Ahmadu Bello University, Zaria, Nigeria) was used in the full scale gene space sequencing project. Seeds of both genotypes are publicly available on request.
Genomic library construction and methylation filtering
Genomic DNA was purified from isolated nuclei of 1 month-old cowpea leaves  except that OptiPrep™ (Axis-Shield PoC, Oslo, Norway) was used . Purified nuclear DNA was sheared using a Hydroshear apparatus (GeneMachines, San Carlos, CA, USA) and the sheared fragments end-repaired using End it™ kit (Epicentre, Madison, WI, USA). BstXI adaptors (Invitrogen, Carslbad, CA, USA) were ligated to the end-repaired fragments, and the ligation products were size separated by agarose gel elecrophoresis. DNA fragments ranging from 0.7 – 1.5 kb were extracted from the gel and ligated to dephosphorylated, BstXI-digested pOT2 vector (from the Berkeley Drosophila Genome Project, BDGP) for use in library construction. Ligation reactions were transformed into McrBC+ and McrBC- strains of Escherichia coli for generation of methylation filtered (MF, GeneThresher® technology) and unfiltered (UF) libraries, respectively. The MF and UF libraries were plated onto selective agar medium, recombinant colonies were randomly picked using a Genetix Q-bot robot (Research Genetics, Carlsbad, CA, USA) and the selected clones arrayed individually into glycerol storage medium in 384-well microtiter plates for archiving and storage at -80 C.
In a Pilot Study, clones were picked at random from the MF and UF libraries, plasmid DNA isolated from each clone, and sequenced from one end using an ABI 3730 (PE Applied Biosystems, Foster City, CA). The resulting sequence data was analyzed as described below to estimate gene enrichment or filtering power. The purity of the nuclear genomic DNA preparation was determined by measuring two sources of DNA contamination: extracellular DNA (e.g., fungal, insect, bacterial or viral) and organellar DNA (i.e., mitochondria and chloroplast) .
Library preparation and clone picking for the full-scale genespace sequencing project was carried out as in the Pilot Study. The quality and filtering capacity of each library made from IT97K-499-35 was determined. A total of 150,336 randomly selected recombinant clones were picked from the MF libraries using a Genetix Q-bot robot (Research Genetics, Carlsbad, CA, USA), arrayed and stored individually in 384-well microtiter plates. One sequence attempt was made from each end of the insert fragment for each of the individual clones. A successful sequence read met the following criteria: at least 100 contiguous bases of good quality, insert sequence following vector and quality trimming performed using the -trim_alt option of the Phred basecaller software program  and was not derived from organellar (chloroplast and mitochondrion), vector, transposon/retrotransposon, microbial, fungal (yeast), viral or animal genomic DNA as determined by BLAST searches of relevant public databases. A significant similarity score equal to or less than 1e-10 was used.
The gene enrichment or filter power (FP) was calculated by comparing the rate of gene discovery between MF and UF sequences and is based on the proportion of matches of MF sequences compared UF sequences over a range of e-values from 10e-5 to 10e-20, such that all matches better than the given e-value are tabulated [see Additional file 1]. To ensure high quality, unique sampling events, reads were chosen that contained at least 100 contiguous Phred Q20 bases and only one read per clone was used. Detection of genes was accomplished by a blastx search (parameters: -e 0.01; -b 5; -v 5) of the curated Arabidopsis protein database. Aside from the curation of the Arabidopsis database to remove repetitive elements, matches to proteins annotated as hypothetical were not counted as hypothetical genes are often false gene predictions or unknown repetitive elements. The Arabidopsis protein set, which was used for the FP calculations and assessment of cross-genome annotation potential, is described elsewhere .
Raw sequence reads and vector-trimmed gene-space sequence reads (GSRs) are stored in FASTA format on a publicly available PostgrelSQL relational database . The primary sequence dataset consists of 263,425 FASTA formatted cowpea GSRs with an average length of 610 bp (see Additional file 2). Sequence annotation and analysis was performed using both BLAST and Hidden Markov Model (HMM) based algorithms . For homology-based annotation, each GSR was searched with blastx, with cutoff expectation (e) value of 1e-8, against the UniProtKB-TrEMBL Database , UniprotKB-Swiss-Prot Database , NCBI GenBank Proteins Database , and UniProtKB-PIR (Protein Information Resource) Database  and the Arabidopsis, rice, Medicago, and poplar protein datasets as described in .
Sequence assembly (clustering) was done with the TGI clustering tool (TGICL) from Harvard University  which uses megablast to group overlapping sequences into clusters, then assembles the clusters using CAP3 . Parameters used were a minimum overlap length of 30 bp with 94% sequence identity.
Analysis of gene ontology
Gene Ontology (GO) annotations of GSRs were generated by Arabidopsis refseq BLAST searches. Accession numbers from the BLAST annotation were used to look up GO term and name for each annotatable sequence. For each cowpea GSR that had a GO term, we tracked backwards by the shortest path to an ancestor at the third level. We used the GO MySQL database file go_20070204-seqdblite-tables.tar.gz . Although geneontology.org distributes the GO SQL database in MySQL we prefer PostgreSQL. Only the usual minor conversions were necessary to load the GO data into PostgreSQL. The GO database includes a number of related tables. SQL joins allow accession numbers to be associated with GO terms. For the sake of efficiency (to avoid repeating a SQL join on four tables) we created a new table assoc_term_seq to capture the available associations between table's term, association, gene_product, and seq. One additional query gave us the term, and a second query gave us the closest level 3 ancestor for that term. A Perl script handled the database connections, the gathering of the cowpea refseq accession numbers, iteration of the SQL queries over the cowpea accession numbers, and the accumulation of counts into the GO category terms.
EST comparisons and mapping of GSRs to M. truncatulapseudomolecules
Computational comparisons were made between the cowpea GSRs and available consensus ESTs (unigenes) from other legumes available at the Legume Information System (LIS) website . Each legume EST was searched against the total cowpea GSRs dataset using tblastx with a cutoff value of 1e-8. We also compared the cowpea GSRs to a dataset of 16,954 unigenes with an average size of 709 nucleotides (7894 assemblies with of average size 859 nucleotides; 9060 singletons of average size 578 bp) derived from 42,000 ESTs. The ESTs were generated from two different libraries (a root library and a leaf/stem library) comprising material from four drought stressed and non-stressed cowpea cultivars (Dan Ila, a type II drought tolerant cultivar; Tvu11986, a type I drought tolerant cultivar; Tvu7778, a drought susceptible cultivar; and 12008D (Tvu9956), an advanced forage line with good feed quality and reported drought tolerance. The EST sequence data was kindly communicated to us by Drs. Sarah Hearne and Richard Bishop (IITA and ILRI, Nairobi, Kenya). A full description of the libraries and their generation is in preparation (Hearne S, personal communication). For this comparison both tblastx and blastn was used.
To locate cowpea GSRs on the M. truncatula chromosome-scale pseudomolecules  we employed tblastx with a threshold of 1e-5 and Medicago truncatula Gbrowse Mtr 1.0 pseudomolecule release.
Identification of cowpea transcription factors and phylogenetic analysis of TF gene families
Homology searches (tblastn) of the cowpea GSRs were performed with the amino acid sequences of the DNA binding domains from each of the ten major ERF subgroups [see Additional file 5 and Additional file 6], representative WRKY domains from Groups IIa, IIb, IIc, IId, IIe and III and a N-terminal and C-terminal domain from Group I [see Additional file 7 and additional file 8], and the complete sequences of the At CO, COL6 and COL9 genes [see Additional file 9]. For each TF family, the GSRs recovered were assembled into contigs using a local web-based implementation of the Phrap program . Each contig was then individually analyzed by blastx searches against the non-redundant protein database . Sequences not containing the targeted TF domain under analysis were discarded. The minimum number of genes for each family was calculated based on the number of unique 5', 3', full-length and partial conserved domains present. Alignments of the predicted amino acid sequences of the conserved domains were carried out using ClustalW -following removal of any intronic sequences. Phylogenetic trees were produced using the PHYLIP program using calculations based around the neighbor-joining method and are presented using PhyloDraw .
Identification of simple sequence repeats (SSRs)
The presence of simple sequence repeats (SSRs) in each of the 263,425 cowpea GSRs was determined using the Tandem Repeats Finder program . GSRs containing SSRs along with information on repeat size, composition and the primers for their amplification were parsed and loaded into relational tables for sorting, search, and joining .
We wish to thank Ed Southern and Sonia Morgan from the Kirkhouse Charitable Trust for their support and encouragement at the outset of this project. We also thank Robert Koebner and Paul Gepts for their valuable discussion and commentary on the manuscript, Muhammad A. Budiman and Joseph A. Bedell from Orion Genomic LLC for their assistance during the project and preparation of this manuscript, and Jun Zhuang for assistance with the EST-GSR assembly. Special thanks goes to Sarah Hearne (IITA), Richard Bishop (ILRI) and their colleagues for providing access to their cowpea EST information prior to its publication. Finally, we thank the members of the Timko lab for their help, especially Jianxiong Li, Bhavani Gowda, Anne Knowleton and Jennifer Brannock. This work was supported by grants from the Kirkhouse Charitable Trust and the CGIAR Generation Challenge Programme.
- Singh BB: Cowpea Vigna unguiculata (L.) Walp. Genetic Resources, Chromosome Engineering and Crop Improvement. Edited by: Singh RJ, Jauhar PP. 2005, Boca Raton: CRC Press, 1: 117-162.Google Scholar
- Timko MP, Ehlers JD, Roberts PA: Cowpea. Genome Mapping and Molecular Breeding in Plants, Pulses, Sugar and Tuber Crops. Edited by: Kole C. 2007, Berlin: Springer-Verlag, 3: 49-68.Google Scholar
- Phillips RD, McWatters KH, Chinnan J, Komey NS, Liu K, Mensa-Wilmot Y, Nnanna IA, Okeke C, Prinyawiwatkul W, Saalia FK: Utilization of cowpea for human food. Field Crops Res. 2003, 82: 193-213.Google Scholar
- Lewis G, Schire B, Mackinder B, Lock M: Legumes of the World. 2005, London: Kew PublishingGoogle Scholar
- Lavin M, Herendeen PS, Wojciechowski MF: Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the Tertiary. Syst Biol. 2005, 54: 575-594.PubMedGoogle Scholar
- Wojciechowski MF, Lavin M, Sanderson MJ: A phylogeny of legumes (Leguminosae) based on analyses of the plastid matK gene resolves many well-supported subclades within the family. Am J Bot. 2004, 91: 1846-1862.PubMedGoogle Scholar
- Doyle JJ, Luckow MA: The rest of the iceberg. Legume diversity and evolution in a phylogenetic context. Plant Physiol. 2003, 131: 900-910.PubMedPubMed CentralGoogle Scholar
- Cronk Q, Ojeda I, Pennington RT: Legume comparative genomics: progress in phylogenetics and phylogenomics. Current Opinion in Plant Biology. 2006, 9: 99-103.PubMedGoogle Scholar
- VandenBosch K, Stacey G: Summaries of legume genomics projects from around the globe. Community resources for crops and models. Plant Physiol. 2003, 131: 840-865.PubMed CentralGoogle Scholar
- Sato S, Tabata S: Lotus japonicus as a platform for legume research. Current Opinion in Plant Biology. 2006, 9: 128-132.PubMedGoogle Scholar
- Udvardi MK, Tabata S, Parniske M, Stougaard J: Lotus japonicus: Legume research in the fast lane. Trends in Plant Sci. 2005, 10 (5): 222-228.Google Scholar
- Sato S, Nakamura Y, Asamizu E, Isobe S, Tabata S: Genome sequencing and genome resources in model legumes. Plant Physiol. 2007, 144: 588-593.PubMedPubMed CentralGoogle Scholar
- Nunberg A, Bedell JA, Budiman MA, Citek RW, Clifton SW, Fulton L, Pape D, Cai Z, Joshi T, Nguyen H, Xu D, Stacey G: Survey sequencing of soybean elucidates the genome structure, composition and identifies novel repeats. Functional Plant Biology. 2006, 33: 765-773.Google Scholar
- NCBI dbEST: database of "Expressed Sequence Tags". [http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html]
- Young ND, Mudge J, Ellis THN: Legume genomes: more than peas in a pod. Current Opinion in Plant Biology. 2003, 6: 199-204.PubMedGoogle Scholar
- Gepts P, Beavis WD, Brummer EC, Shoemaker RC, Stalker HT, Weeden NF, Young ND: Legumes as a model plant family. Genomics for Food and Feed Report of the Cross-Legume Advances through Genomics Conference. Plant Physiol. 2005, 137: 1228-1235.PubMedPubMed CentralGoogle Scholar
- Young ND, Cannon SB, Sato S, Kim D, Cook DR, Town CD, Roe BA, Tabata S: Sequencing the genespaces of Medicago truncatula and Lotus japonicus. Plant Physiol. 2005, 137: 1174-1181.PubMedPubMed CentralGoogle Scholar
- Zhu H, Choi H-K, Cook DR, Shoemaker RC: Bridging model and crop legumes through comparative genomics. Plant Physiol. 2005, 137: 1189-1196.PubMedPubMed CentralGoogle Scholar
- Timko MP: Molecular cloning in cowpea: perspectives on the status of gene isolation and genome characterization for crop improvement. Challenges and Opportunities for Enhancing Sustainable Cowpea Production, Proceedings of the World Cowpea Conference III held at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria. 4–8 September 2000. Edited by: Fatokun CA, Tarawali SA, Singh BB, Kormawa PM, Tamo M. 2003, Ibadan: IITA, 197-212.Google Scholar
- Arumuganathan K, Earle ED: Nuclear DNA content of some important plant species. Plant Mol Biol Rep. 1991, 9: 208-218.Google Scholar
- Plant DNA C-values database (release 3.0, Bennett MD, Leitch IJ Dec. 2004). [http://www.rbgkew.org.uk/cval/homepage.html]
- Paterson AH: Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nature Reviews Genetics. 2006, 7: 174-184.PubMedGoogle Scholar
- Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA: Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nature Genetics. 1999, 23: 305-308.PubMedGoogle Scholar
- Peterson DG, Wessler SR, Paterson AH: Efficient capture of unique sequences from eukaryotic genomes. Trends Genet. 2002, 18: 547-550.PubMedGoogle Scholar
- Whitelaw CA, Barbazuk WB, Pertea G, Chan AP, Cheung F, Lee Y, Zheng L, van Heeringen S, Karamycheva S, Bennetzen JL, SanMiguel P, Lakey N, Bedell J, Yuan Y, Budiman MA, Resnick A, Van Aken S, Utterback T, Riedmuller S, Williams M, Feldblyum T, Schubert K, Beachy R, Fraser CM, Quackenbush J: Enrichment of gene-coding sequences in maize by genome filtration. Science. 2003, 302: 2118-2120.PubMedGoogle Scholar
- Yuan YN, SanMiguel PJ, Bennetzen JL: High-Cot sequence analysis of the maize genome. Plant J. 2003, 34: 249-255.PubMedGoogle Scholar
- Lamoureux D, Peterson DG, Li W, Fellers JP, Gill BS: The efficacy of Cot-based gene enrichment in wheat (Triticum aestivum L.). Genome. 2005, 48: 1120-1126.PubMedGoogle Scholar
- Britten RJ, Kohne DE: Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science. 1968, 161: 529-540.PubMedGoogle Scholar
- Britten RJ, Davidson EH: DNA sequence arrangement and preliminary evidence on its evolution. Fed Proc. 1976, 35: 2151-2157.PubMedGoogle Scholar
- Rabinowicz PD, Citek R, Budiman MA, Nunberg A, Bedell JA, Lakey N, O'Shaughnessy AL, Nascimento LU, McCombie WR, Martienssen RA: Differential methylation of genes and repeats in land plants. Genome Research. 2005, 15: 1431-1440.PubMedPubMed CentralGoogle Scholar
- Gadani F, Hayes A, Opperman CH, Lommel SA, Sosinski BR, Burke M, Hi L, Brierly R, Salstead A, Heer J: Large scale genome sequencing and analysis of Nicotiana tabacum: the tobacco genome initiative. 5émes Journées Scientifiques du Tabac de Bergerac. 2003, 117-130.Google Scholar
- Palmer LE, Rabinowicz PD, O'Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR: Maize genome sequencing by methylation filtrations. Science. 2003, 302: 2115-2117.PubMedGoogle Scholar
- Springer NM, Barbazuk WB: Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space. Plant Physiol. 2004, 136: 3023-3033.PubMedPubMed CentralGoogle Scholar
- Fu Y, Hsia AP, Guo L, Schnable PS: Types and frequencies of sequencing errors in methyl-filtered and high C(0)t maize genome survey sequences. Plant Physiol. 2004, 135: 2040-2045.PubMedPubMed CentralGoogle Scholar
- Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rohlfing T, Fries J, Bradford K, McMenamy J, Smith M, Holeman H, Roe BA, Wiley G, Korf IF, Rabinowicz PD, Lakey N, McCombie WR, Jeddeloh JA, Martienssen RA: Sorghum genome sequencing by methylation filtration. Plos Biology. 2005, 3 (1): e13-. Epub 2005 Jan 4.PubMedPubMed CentralGoogle Scholar
- Chen X, Laudeman TW, Rushton PJ, Spraggins TA, Timko MP: CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L.) methylation filtered genomic genespace sequences. BMC Bioinformatics. 2007, 8: 129-132.PubMedPubMed CentralGoogle Scholar
- Cowpea Genespace Sequences Knowledgebase. [http://cowpeagenomics.med.virginia.edu/CGKB]
- TGI Clustering tools (TGICL): a software system for fast clustering of large EST datasets. [http://compbio.dfci.harvard.edu/tgi/software/]
- NCBI GenBank Peptide Database, rel152.fsa.aa.gz. [ftp://ftp.ncbi.nih.gov/genbank/]
- Nakamura Y, Kaneko T, Asamizu E, Kato T, Sato S, Tabata S: Structural analysis of a Lotus japonicus genome. II. Sequence features and mapping of sixty-five TAC clones which cover the 6.5-mb regions of the genome. DNA Res. 2002, 9: 63-70.PubMedGoogle Scholar
- Asamizu E, Kato T, Sato S, Nakamura Y, Kaneko T, Tabata S: Structural analysis of a Lotus japonicus genome. IV. Sequence features and mapping of seventy-three TAC clones which cover the 7.5 mb regions of the genome. DNA Res. 2003, 10: 115-122.PubMedGoogle Scholar
- Kato T, Sato S, Nakamura Y, Kaneko T, Asamizu E, Tabata S: Structural analysis of a Lotus japonicus genome. V. Sequence features and mapping of sixty-four TAC clones which cover the 6.4 mb regions of the genome. DNA Res. 2003, 10: 277-285.PubMedGoogle Scholar
- The TIGR Medicago truncatula Database. [http://www.tigr.org/tdb/e2k1/mta1/]
- Riechmann JL, Heard J, Martin G, Reuber L, Jiang CZ, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, Creelman R, Pilgrim M, Broun P, Zhang JZ, Ghandehari D, Sherman BK, Yu CL: Arabidopsis transcription factors: Genome-wide comparative analysis among eukaryotes. Science. 2000, 290: 2105-2110.PubMedGoogle Scholar
- Qu LJ, Zhu YX: Transcription factor families in Arabidopsis: major progress and outstanding issues for future research – Commentary. Current Opinion in Plant Biology. 2006, 9: 544-549.PubMedGoogle Scholar
- Richardt S, Lang D, Reski R, Frank W, Rensing SA: PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. Plant Physiol. 2007, 143: 1452-1466.PubMedPubMed CentralGoogle Scholar
- Guo AY, He K, Liu D, Bai SN, Gu XC, Wei LP, Luo JC: DATF: a database of Arabidopsis transcription factors. Bioinformatics. 2005, 21: 2568-2569.PubMedGoogle Scholar
- Database of Arabidopsis Transcription Factors. [http://datf.cbi.pku.edu.cn/]
- Zhu Q-H, Guo A-Y, Gao G, Zhong Y-F, Xu M, Huang M, Luo J: DPTF: a database of poplar transcription factors. Bioinformatics. 2007, 23: 1307-1308.PubMedGoogle Scholar
- Database of Poplar Transcription Factors. [http://dptf.cbi.pku.edu.cn/]
- Xiong YQ, Liu TY, Tian CG, Sun SH, Li JY, Chen MS: Transcription factors in rice: A genome-wide comparative analysis between monocots and eudicots. Plant Molec Biol. 2005, 59 (1): 191-203.Google Scholar
- Gao G, Zhong Y, Guo A, Zhu Q, Tang W, Zheng W, Gu X, Wei L, Luo J: DRTF: a database of rice transcription factors. Bioinformatics. 2006, 22: 1286-1287.PubMedGoogle Scholar
- Udvardi MK, Kakar K, Wandrey M, Montanari O, Murray J, Andriankaja A, Zhang J-Y, Benedito V, Hofer JMI, Chueng F, Town CD: Legume transcription factors: global regulators of plant development and response to the environment. Plant Physiol. 2007, 144: 538-549.PubMedPubMed CentralGoogle Scholar
- Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, Timko MP: TOBFAC: the database of tobacco transcription factors. BMC Bioinformatics. 2008, 9: 53-PubMedPubMed CentralGoogle Scholar
- Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K: RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Res. 2005, 12: 247-256.PubMedGoogle Scholar
- Hecht V, Foucher F, Ferrandez C, Macknight R, Navarro C, Morin J, Vardy ME, Ellis N, Beltran JP, Rameau C, Weller JL: Conservation of Arabidopsis flowering genes in model legumes. Plant Physiol. 2005, 137: 1420-1434.PubMedPubMed CentralGoogle Scholar
- Ohme-Takagi M, Shinshi H: Ethylene inducible DNA binding proteins that interact with an ethylene-responsive element. Plant Cell. 1995, 7: 173-182.PubMedPubMed CentralGoogle Scholar
- Nakano N, Suzuki K, Fujimura T, Shinshi H: Genome-wide analysis of the ERF gene family in Arabidopsis and rice. Plant Physiol. 2006, 140: 411-432.PubMedPubMed CentralGoogle Scholar
- McGrath KC, Dombracht B, Manners JM, Schenk PM, Edgar CI, Maclean DJ, Scheible W-R, Udvardi MK, Kazan K: Repressor- and activator-type ethylene response factors functioning in jasmonate signaling and disease resistance identified via a genome-wide screen of Arabidopsis transcription factor gene expression. Plant Physiol. 2005, 139: 949-959.PubMedPubMed CentralGoogle Scholar
- Rushton PJ, Macdonald H, Huttly AK, Lazarus CM, Hooley R: Members of a new family of DNA-binding proteins bind to a conserved cis-element in the promoters of α-Amy2 genes. Plant Molec Biol. 1995, 29: 691-702.Google Scholar
- Rushton PJ, Torres JT, Parniske M, Wernert P, Hahlbrock K, Somssich IE: Interaction of elicitor-induced DNA-binding proteins with elicitor response elements in the promoters of parsley PR1 genes. EMBO J. 1996, 15: 5690-5700.PubMedPubMed CentralGoogle Scholar
- Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000, 5: 199-206.PubMedGoogle Scholar
- Eulgem T: Dissecting the WRKY web of plant defense regulators. Plos Pathogens. 2006, 2 (11): e126-.PubMedPubMed CentralGoogle Scholar
- Zhang YJ, Wang LJ: The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evolutionary Biology. 2005, 5: 1-PubMedPubMed CentralGoogle Scholar
- Li J, Brader G, Palva ET: The WRKY70 transcription factor: a node of convergence for jasmonate-mediated and salicylate-mediated signals in plant defense. Plant Cell. 2004, 16: 319-331.PubMedPubMed CentralGoogle Scholar
- Robatzek S, Somssich IE: Targets of AtWRKY6 regulation during plant senescence and pathogen defense. Genes Dev. 2002, 16: 1139-1149.PubMedPubMed CentralGoogle Scholar
- Johnson CS, Kolevski B, Smyth DR: TRANSPARENT TESTA GLABRA2, a trichome and seed coat development gene of Arabidopsis, encodes a WRKY transcription factor. Plant Cell. 2002, 14: 1359-1375.PubMedPubMed CentralGoogle Scholar
- Amasino R: Vernalization, competence, and the epigenetic memory of winter. Plant Cell. 2004, 16: 2553-2559.PubMedPubMed CentralGoogle Scholar
- Boss PK, Bastow RM, Mylne JS, Dean C: Multiple pathways in the decision to flower: enabling, promoting, and resetting. Plant Cell. 2004, 16: S18-S31.PubMedPubMed CentralGoogle Scholar
- Krizek BA, Fletcher JC: Molecular mechanisms of flower development: An armchair guide. Nature Reviews Genetics. 2005, 6: 688-698.PubMedGoogle Scholar
- Cockram J, Jones H, Leigh FJ, O'Sullivan D, Powell W, Laurie DA, Greenland AJ: Control of flowering time in temperate cereals: genes, domestication, and sustainable productivity. J Exper Bot. 2007, 58 (6): 1231-1244. Epub 2007 Apr 9.Google Scholar
- Hayama R, Yokoi S, Tamaki S, Yano M, Shimamoto K: Adaptation of photoperiodic control pathways produces short-day flowering in rice. Nature. 2003, 422: 719-722.PubMedGoogle Scholar
- Putterill J, Laurie R, Macknight R: It's time to flower: the genetic control of flowering time. Bioessays. 2004, 26: 363-373.PubMedGoogle Scholar
- Koornneef M, Hanhart CJ, van der Veen JH: A genetic and physiological analysis of late flowering mutants in Arabidopsis thaliana. Mol Gen Genet. 1991, 229: 57-66.PubMedGoogle Scholar
- Putterill J, Robson F, Lee K, Simon R, Coupland G: The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell. 1995, 80: 847-857.PubMedGoogle Scholar
- Griffiths S, Dunford RP, Coupland G, Laurie DA: The evolution of CONSTANS-like gene families in barley, rice and Arabidopsis. Plant Physiol. 2003, 131 (4): 1855-1867.PubMedPubMed CentralGoogle Scholar
- [RARTF]: RIKEN Ar abidopsis Transcription Factor database. [http://rarge.gsc.riken.jp/rartf/]
- Yan L, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, SanMiguel P, Bennetzen JL, Echenique V, Dubcovsky J: The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science. 2004, 303: 1640-1644.PubMedGoogle Scholar
- Garg N, Geetanjali : Symbiotic nitrogen fixation in legume nodules: process and signaling. A review. Agronomy for Sustainable Development. 2007, 27: 59-68.Google Scholar
- Samac DA, Graham MA: Recent advances in legume-microbe interactions: recognition, defense response, and symbiosis from a genomic perspective. Plant Physiol. 2007, 144: 582-587.PubMedPubMed CentralGoogle Scholar
- Schultze M, Kondorosi A: Regulation of symbiotic root nodule development. Annu Rev Genetics. 1998, 32: 33-57.Google Scholar
- Parniske M: Molecular genetics of the arbuscular mycorrhizal symbiosis. Current Opinion in Plant Biology. 2004, 7: 414-421.PubMedGoogle Scholar
- Harrison MJ: Signaling in the arbuscular mycorrhizal symbiosis. Annu Rev Microbiol. 2005, 59: 19-42.PubMedGoogle Scholar
- Stacey G, Libault M, Brechenmacher L, Wan J, May GD: Genetics and functional genomics of legume nodulation. Current Opinion in Plant Biology. 2006, 9: 110-121.PubMedGoogle Scholar
- Kuster H, Vieweg MF, Manthey K, Baier MC, Hohnjec N, Perlick AM: Identification and expression regulation of symbiotically activated legume genes. Phytochem. 2007, 68 (1): 8-18. Epub 2006 Nov 1.Google Scholar
- Reinhardt D: Programming good relations – development of abuscular mycorrhyzal symbiosis. Current Opinion in Plant Biology. 2007, 10: 98-105.PubMedGoogle Scholar
- Jones JDG, Dangl JL: The plant immune system. Nature. 2006, 444: 323-329.PubMedGoogle Scholar
- Chisholm ST, Coaker G, Day B, Staskawicz BJ: Host-microbe interactions: Shaping the evolution of the plant immune response. Cell. 2006, 124: 803-814.PubMedGoogle Scholar
- DeYoung BJ, Innes RW: Plant NBS-LRR proteins in pathogen sensing and host defense. Nature Immunology. 2006, 7: 1243-1249.PubMedPubMed CentralGoogle Scholar
- Sheen J, He P: Nuclear actions in innate immune signaling. Cell. 2007, 128: 821-823.PubMedGoogle Scholar
- Cannon SB, Zhu H, Baumgarten AM, Spangler R, May G, Cook DR, Young ND: Diversity, distribution, and ancient taxonomic relationships within the TIR and non-TIR NBS-LRR resistance gene subfamilies. J Mol Evol. 2002, 54: 548-562.PubMedGoogle Scholar
- Gowda BS, Miller JL, Rubin SS, Sharma DR, Timko MP: Isolation, sequence analysis, and linkage mapping of resistance-gene analogs in cowpea (Vigna unguiculata L. Walp.). Euphytica. 2002, 126: 365-377.Google Scholar
- Liu YL, Schiff M, Marathe R, Dinesh-Kumar SP: Tobacco Rar1, EDS1 and NPR1/NIM1 like genes are required for N-mediated resistance to tobacco mosaic virus. Plant J. 2002, 30: 415-429.PubMedGoogle Scholar
- Lis K: Analysis of gene expression during the resistance response of cowpea to Striga gesnerioides, an important African parasite. MS Thesis. 2007, Technical University of Lodz, International Faculty of Engineering and University of Virginia, Department of BiologyGoogle Scholar
- Gao LF, Tang JF, Li HW, Jia JZ: Analysis of microsatellites in major crops assessed by computational and experimental approaches. Molec Breeding. 2003, 12: 245-261.Google Scholar
- Li YC, Korol AB, Fahima T, Nevo E: Microsatellites within genes: Structure, function, and evolution. Molec Biol Evol. 2004, 21: 991-PubMedGoogle Scholar
- Weeden NF, Muehlbauer FJ, Ladizinsky G: Extensive conservation of linkage relationships between pea and lentil genetic maps. J Hered. 1992, 83: 123-129.Google Scholar
- Menancio-Hautea D, Fatokun CA, Kumar L, Danesh D, Young ND: Comparative analysis of mungbean (Vigna radiata (L.) Wilczek) and cowpea (V. unguiculata (L.) Walpers) using RFLP mapping data. Theor Appl Genet. 1993, 86: 797-810.PubMedGoogle Scholar
- Boutin SR, Young ND, Olsen TC, Yu ZH, Shoemaker RC, Vallejos CE: Genome conservation among three legume genera detected with DNA markers. Genome. 1995, 38: 928-937.PubMedGoogle Scholar
- Simon CJ, Muehlbauer FJ: Construction of a chickpea linkage map and its comparison with maps of pea and lentil. J Hered. 1997, 88: 115-119.Google Scholar
- Brauner S, Murphy RL, Walling JG, Przyborowski J, Weeden NF: STS markers for comparative mapping in legumes. J Am Soc Hortic Sci. 2002, 127: 616-622.Google Scholar
- Humphry M, Konduri V, Lambrides C, Magner T, McIntyre C, Aitken E, Liu C: Development of a mungbean (Vigna radiata) RFLP linkage map and its comparison with lablab (Lablab purpureus) reveals a high level of colinearity between the two genomes. Theor Appl Genet. 2002, 105: 160-166.PubMedGoogle Scholar
- Lee JM, Grant D, Vallejos CE, Shoemaker RC: Genome organization in dicots. II. Arabidopsis as a 'bridging species' to resolve genome evolution events among legumes. Theor Appl Genet. 2001, 103: 765-773.Google Scholar
- Gualtieri G, Kulikova O, Limpens E, Kim DJ, Cook DR, Bisseling T, Geurts R: Microsynteny between pea and Medicago truncatula in the SYM2 region. Plant Mol Biol. 2002, 50: 225-235.PubMedGoogle Scholar
- Yan H, Mudge J, Kim DJ, Shoemaker RC, Cook DR, Young ND: Estimates of conserved microsynteny among the genomes of Glycine max, Medicago truncatula and Arabidopsis thaliana. Theor Appl Genet. 2003, 106: 1256-1265.PubMedGoogle Scholar
- Choi HK, Kim D, Uhm T, Limpens E, Lim H, Mun JH, Kalo P, Penmetsa RV, Seres A, Kulikova O, Roe BA, Bisseling T, Kiss GB, Cook DR: A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa. Genetics. 2004, 166: 1463-1502.PubMedPubMed CentralGoogle Scholar
- Choi H-K, Mun J-H, Kim D-J, Zhu H, Baek J-M, Mudge J, Roe B, Ellis N, Doyle J, Kiss GB, Young ND, Cook DR: Estimating genome conservation between crop and model legume species. Proc Natl Acad Sci USA. 2004, 101: 15289-15294.PubMedPubMed CentralGoogle Scholar
- Kalo P, Seres A, Taylor S, Jakab J, Kevei Z, Kereszt A, Endre G, Ellis T, Kiss G: Comparative mapping between Medicago sativa and Pisum sativum. Mol Genet Genomics. 2004, 272: 235-246.PubMedGoogle Scholar
- Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, Gouzy J, Wang XH, Mudge J, Vasdewani J, Scheix T, Spannagl M, Monaghan E, Nicholson C, Humphray SJ, Schoof H, Mayer KFX, Rogers J, Quetier F, Oldroyd GE, Debelle F, Cook DR, Retzel EF, Roe BA, Town CD, Tabata S, Van de Peer Y, Young ND: Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci USA. 2006, 103: 14959-14964.PubMedPubMed CentralGoogle Scholar
- Yan H, Mudge J, Kim DJ, Shoemaker RC, Cook DR, Young ND: Comparative physical mapping reveals features of microsynteny between the genomes of Glycine max and Medicago truncatula. Genome. 2004, 47: 141-155.PubMedGoogle Scholar
- Ouédraogo JT, Gowda BS, Jean M, Close TJ, Ehlers JD, Hall AE, Gillaspie AG, Roberts PA, Ismail AM, Bruening G, Gepts P, Timko MP, Belzile FJ: An improved genetic linkage map for cowpea (Vigna unguiculata L.) combining AFLP, RFLP, RAPD, biochemical markers, and biological resistance traits. Genome. 2002, 45: 175-188.PubMedGoogle Scholar
- Dixon RA, Bouton JH, Narasimhamoorthy B, Saha M, Wang ZY, Maya GD: Beyond structural genomics for plant science. Advances in Agronomy. 2007, 95: 77-161.Google Scholar
- Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S, Kopka J, Udvardi MK: Global changes in transcription orchestrate metabolic differentiation during symbiotic nitrogen fixation in Lotus japonicus. Plant J. 2004, 39: 487-512.PubMedGoogle Scholar
- Barnett MJ, Fisher RF: Global gene expression in the rhizobial-legume symbiosis. Symbiosis. 2006, 42: 1-24.Google Scholar
- Dita MA, Rispail N, Prats E, Rubiales D, Singh KB: Biotechnology approaches to overcome biotic and abiotic stress constraints in legumes. Euphytica. 2006, 147: 1-24.Google Scholar
- Jorrin JV, Rubiales D, Dumas-Gaudot E, Recorbet G, Maldonado A, Castillejo MA, Curto M: Proteomics: a promising approach to study biotic interaction in legumes. A review. Euphytica. 2006, 147: 37-47.Google Scholar
- Kuster H, Becker A, Firnhaber C, Hohnjec N, Manthey K, Perlick AM, Bekel T, Dondrup M, Henckel K, Goesmann A, Meyer F, Wipf D, Requena N, Hildebrandt U, Hampp R, Nehls U, Krajinski F, Franken P, Puhler A: Development of bioinformatic tools to support EST-sequencing, in silico- and microarray-based transcriptome profiling in mycorrhizal symbioses. Phytochem. 2007, 68 (1): 19-32. Epub 2006 Nov 1.Google Scholar
- Phred Phrap Consed. [http://www.phrap.org/phredphrapconsed.html]
- UniProtKB-TrEMBL Database. [http://www.ebi.ac.uk/trembl/]
- UniprotKB-Swiss-Prot Database. [http://www.ebi.ac.uk/swissprot/]
- NCBI GenBank Proteins Database. [ftp://ftp.ncbi.nih.gov/genbank/]
- UniProtKB-PIR (Protein Information Resource). [http://pir.georgetown.edu/]
- Huang X, Madan A: CAP3: A DNA Sequence Assembly Program. Genome Research. 1999, 9: 868-877.PubMedPubMed CentralGoogle Scholar
- Tair GO Annotation. [ftp://ftp.arabidopsis.org/home/tair/Ontologies/]
- Legume Information System (LIS) website. [http://www.comparative-legumes.org]
- TIGR Medicago Genome Browser, Medicago truncatula Gbrowse Mtr 1.0 pseudomolecule release. [http://www.tigr.org/plantProjects.shtml#]
- ClustalW. [http://www.ebi.ac.uk/clustalw/#]
- Choi J-H, Jung H-Y, Kim H-S, Cho H-G: PhyloDraw: a phylogenetic tree drawing system. Bioinformatics. 2000, 16: 1056-1058.PubMedGoogle Scholar
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27: 573-580.PubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.