Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing
© Vega-Arreguín et al; licensee BioMed Central Ltd. 2009
Received: 02 December 2008
Accepted: 06 July 2009
Published: 06 July 2009
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage.
To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20–454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20–454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20–454 sequences and corresponding levels of gene expression.
A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Sequencing and analysis of expressed sequence tags (ESTs) has been a primary tool for the discovery of novel genes and for annotation of genomic sequences in plants. ESTs provide large-scale characterization of mRNA populations through single-pass sequencing of cDNA. In crop species with a highly repetitive genome like maize, EST sequencing represents a rapid and cost-effective method for analyzing the transcribed region of the genome, allowing a distinction between functional genes and pseudogenes. ESTs can be used for other functional genomic projects including gene expression profiling, microarrays, molecular markers and physical mapping. Sequencing of ESTs from a non-normalized cDNA library using a high throughput approach could be useful for the quantitative assessment of transcript abundance and also for the discovery of novel transcribed sequences. In addition, ultra-deep sequencing of a non-normalized cDNA library could overcome the high sequence redundancy rates that the library might present.
Quantitative estimates of gene expression are also possible with large number of ESTs derived from diverse libraries . Other high throughput approaches for quantitative and qualitative genome-wide gene expression profiling are Serial Analysis of Gene Expression (SAGE)  and Massively Parallel Signature Sequencing (MPSS) . SAGE has been largely used in animal systems and more recently SAGE collections for several plant species have been made available [4–7]. In contrast, MPSS has been more widely used in plants than in animal species [8, 9].
Large-scale pyrosequencing of cDNAs offers a unique and an alternative opportunity to deeply explore the nature and complexity of a given transcriptional universe. Currently, one GS20–454 sequencing run produces a minimum of 200,000 reads with an average length of 100 nt. Applications of the 454 technology in plants include the sequencing of barley's BACs , Arabidopsis thaliana miRNAs  and cDNA libraries of Medicago truncatula , A. thaliana  and the shoot apical meristem of maize . Although these efforts have produced a large amount of valuable transcriptional information, the procedure has not yet been optimized for the sequencing of cDNAs, and the amount of sequencing runs or GS20–454 reads that are necessary to reach full coverage or "near identity saturation" of a target transcriptome remains to be determined. An estimation of these types of representational parameters is important for large-scale EST projects that rely on 454 technology for large-scale transcriptional analysis.
Mexico is considered the center of origin and domestication of maize. With no less than 59 native landraces and many distinct environmental adaptations, Mexican germplasm has been essential to harness important traits for crop improvement. Palomero Toluqueño is a landrace of the Central and Northern Highlands Group characterized by short plants with frequent tassel branches, small conically shaped ears, a weakly developed root system, and pubescent leaf sheaths often pigmented by anthocyanins. It is one of several ancient landraces that are believed to have spread from the Pacific Coast to Northern areas of Mexico, contributing to the emergence of popcorn elite cultivars in the USA .
As part of a genomic platform for the systematic exploration of landrace genetic diversity, we analyzed over 1.5 Million quality-filtered reads generated by 4 consecutive pyrosequencing runs of a single cDNA library derived from 2 week old plants of EDMX2233 Palomero Toluqueño maize, and compared them to publically available ESTs, and Maize Assembled Gene Islands (MAGIs) from the B73 maize inbred line. MAGIs are genomic sequence assemblies from regions that are enriched in transcriptionally active units . This collection of 454 quality-filtered reads was sufficient to find transcripts corresponding to 50% of public maize ESTs. Comparisons to the MAGIs revealed that 11% of them align with our collection of Palomero sequences. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes and potentially represent new maize transcripts. Our results indicate that recurrent pyrosequencing is necessary to attain a thorough representation of the transcriptional universe present in a single cDNA sample, suggesting that large-scale transcriptional sequencing of native germplasm will emerge as an important tool to characterize the functional diversity of maize, as well as the identification of relevant genes for particularly interesting agronomic traits.
Generation and Sequencing of the Palomero cDNA Library
A cDNA library was generated from total RNA extracted from young aerial and root tissues of a Mexican maize landrace as described in Material and Methods. We used a procedure for preparation of the maize cDNA library that overcomes possible bias that may occur when sequencing short sequences of DNA by 454 technologies. For library construction, 3'-enrichment of sequences was avoided by using random primers rather than a poly(T) primer during a second round of cDNA synthesis; the resulting cDNA sample was sheared by nebulization and end-repaired before ligating the 454 sequencing adapter. It is expected that synthesis of cDNA using oligo-dT primers will yield sequences that are 3'-enriched relative to the entire transcriptome, resulting in sequences frequently containing polyadenylated tails that significantly reduce the length of informative reads.
Statistics of the high quality reads from four GS20–454 sequencing runs of the Palomero cDNA library and coverage of the maize unigenes from the NCBI (UniGene) and TIGR (ZMGI) databases after each sequencing run.
Num. reads (Mb)
Avg length of read (nt)
ZMGI (N = 115744 seq)
NCBI (N = 55327 seq)
Analysis of High Quality Reads from GS20–454 Runs and Comparison to Gene Index and UniGene Databases
The number of individual reads between each 454-sequencing run showed a notorious homogeneity (Additional file 1). After trimming we had a minimum of 367,391 (37.09 Mb) and a maximum of 394,851 (39.73 Mb) reads per run in all four sequencing runs (Table 1). This represents a considerable increase in the average number of reads reported so far for a 454 run in cDNA libraries from plants. For instance, one single 454 run of a Medicago  and maize  cDNA library resulted in 252,000 (23 Mb) and 260,000 high quality reads, respectively. In addition, two sequencing runs of an Arabidopsis cDNA library yielded 541,852 ESTs . Here, we obtained 40% more high quality sequences per run than those reported previously for plants, indicating that our sequencing-by-synthesis (SBS) approach represents an efficient strategy to generate large amounts of ESTs.
Comparison to Maize Assembled Genomic Islands (MAGIs)
Comparison of the number of the NCBI maize ESTs and Palomero GS20–454 ESTs aligning by BLAST with the MAGIs.
% No hits
Num. MAGIs matched
(N = 727781 seq)
Number of unique MAGIs with a match to the Palomero GS20–454 ESTs that did not have prior expression evidence in the NCBI maize ESTs.
MAGIs matching the GS20–454 ESTs
MAGIs matching both NCBI ESTs and GS20–454 ESTs
Unique MAGIs matching the GS20–454 ESTs
Gene Discovery and Characterization of Novel Transcripts
Percentage of the Palomero GS20–454 sequences from four sequencing runs that matched by BLAST to maize databases and those that did not align (e-value < 9e-07) to any maize database.
% of 454 reads matching
% of 454 reads
without a match
(1 ch + 4 mit)
Representation of Emblematic Maize Genes
Representation of emblematic maize genes in the Palomero GS20–454 cDNA library.
Number of readsa
knotted1-like homeodomain protein liguleless4b
knotted1-like homeodomain protein liguleless3
teosinte glume architecture1
P gene; transposon
c1 locus myb homologue
fertilization independent endosperm2
Assembly of GS20–454 Transcripts and Quantitative Assessment of Transcriptional Abundance
Assembly of the total GS20–454 raw sequences was performed using the 454 commercial software utilities. A total of 1,135,969 (74.4%) reads were assembled into 65,493 contigs, 134,888 (8.8%) reads were classified as singletons and 240,535 (15.8%) sequences were classified as "repeats" on the basis of their over-representation that is likely to reflect abundant transcripts. Sequences in this latter category include highly expressed transcripts that are generally difficult to assemble. We found that 89% of these sequences have a hit to the ZMGI, averaging 7.9 reads per gene locus, whereas a similar analysis with the contigs and singletons averaged 1.7 sequences per gene locus and 84% and 48% of the sequences have a hit to ZMGI, respectively. In addition, 89.5% of the total GS20–454 reads aligned to ZMGI. These data indicate that the unassembled sequences represent valuable information that cannot be excluded from the global analysis of the Palomero ESTs, and justify the use of individual GS20–454 reads for the coverage analysis of public databases as described in this work.
Analyses performed with the 65,493 assembled contigs included transcript abundance estimation and a survey of the contribution of our assemblies to the length of the sequences in the ZMGI. For the latter, we compared the 65,493 assembled contigs to the ZMGI database and estimated the number of GS20–454 contigs having a sequence length larger than the aligned ZMGI sequence. From 54,743 contigs with a significant match to ZMGI, we only found 468 that were larger than the aligned ZMGI sequence. In addition, a TGICL-dependent assembly of all 86,069 GS20–454 sequences candidate to represent novel transcripts resulted in 9,040 contigs and 55,146 singletons, suggesting that most of these unique sequences represent rare transcripts.
Relative expression levels of known genes or ESTs can be approximately quantified by hybridization to microarrays; however, it is limited to genes that have been printed in the microarray, usually genes which sequence was previously determined or predicted based on genome annotation. To determine whether results of our high-throughput pyrosequencing approach reflect transcript abundance, we estimated relative abundance of several transcripts based on number of GS20–454 sequences assembled into a given contig and the length of that contig, according the following index:
Ra = N/L; where Ra, relative abundance; N, number of GS20–454 sequences per contig; L, length of the assembled contig.
Comparison of transcript abundance of representative maize genes estimated by GS20–454 sequencing and qPCR.
Gene locus matched
Ct qPCR (mean ± SD)
Similar to UP|LIRP1_ORYSA (Q03200) Light-regulated protein precursor
11.7808 ± 0.09
Similar to UP|PSAE_HORVU (P13194) Photosystem I reaction center subunit IV, chloroplast precursor (PSI-E)
13.1666 ± 0.05
UP|Q41754_MAIZE (Q41754) Ubiquitin
14.6629 ± 0.10
UP|TBA6_MAIZE (P33627) Tubulin alpha-6 chain (Alpha-6 tubulin)
14.0284 ± 0.03
Homologue to GB|BAD33626.1|50726105|AP005579 Polyubiquitin 2
16.3357 ± 0.04
Homologue to UP|Q41772_MAIZE (Q41772) Cytosolic ascorbate peroxidase, complete
16.5425 ± 0.08
Similar to UP|SODC2_MESCR (O49044) Superoxide dismutase [Cu-Zn]2
17.9920 ± 0.08
Similar to UP|Q6YIH2_ORYSA (Q6YIH2) OsCDPK protein
18.5091 ± 0.13
The development of pyrosequencing technologies (in particular 454 sequencing) has contributed to total sequence information available for several multicellular organisms. In the case of maize, a single GS20–454 run with cDNA amplified from shoot apical meristems of inbred line B73 resulted in ~261,000 ESTs that were sufficient to annotate more than 25,000 genomic sequences . A similar approach was used to demonstrate that 454-based transcriptome sequencing of inbred lines allows high-throughput acquisition of gene-associated single nucleotide polymorphisms (SNPs) . More recently, large-scale sequencing of 3'-UTR regions was used to resolve the expression of gene families, allowing a frequent distinction between alleles and gene family members . Although these studies have demonstrated the value of large-scale pyrosequencing technologies when applied to the analysis of specific maize transcriptomes, an in-depth estimation of the overall transcriptional universe found at a specific developmental stage had not been previously carried out.
We performed 4 consecutive GS20–454 pyrosequencing runs of a single cDNA library obtained from seedlings of Palomero Toluqueño collected 2 weeks after germination, and generated the largest collection of maize transcripts corresponding to a single developmental stage. On average we obtained over 37 Mb per run and a total of 152.37 Mb of high quality sequence, and our overall coverage was sufficient to detect transcripts similar to at least 50% of all publically available ESTs present in the UniGene and ZMGI databases. The total number of ZMGI sequences that are represented in our transcript collection increased 14% between the first and the fourth pyrosequencing run; however, the fourth and last run only yielded an increase of 2.59%, indicating that despite the importance of increasing the number of sequencing runs in terms of statistical accuracy, the last run had little contribution to the overall coverage and the discovery of novel transcripts. This percentage is slightly increased when pyrosequencing reads are compared to the MAGI collection, suggesting that MAGIs might have an under representation of rare or low abundant transcripts. This is supported by the fact that increasing the number of 454 sequencing runs shows a significant increase on the number of novel genomic sequences matched with expressed sequence tags, providing expression evidence for such genome regions, which most probably represent genes or transcriptionally active non-coding regions with low levels of expression. Overall, our analysis suggests that 3 consecutive pyrosequencing runs are sufficient to obtain a representation of most of the transcriptome present in Palomero plantlets.
The phenotypic and molecular diversity of maize has been essential to harness important traits for crop improvement. On the basis of landrace germplasm, the activity of modern plant breeders gave rise to inbred lines currently used in hybrid production, causing significant improvements in yield, grain quality, resistance to biotic or abiotic stress, and maturity. A genome wide survey of gene content in B73 and Mo17 revealed that more than 20% of gene fragments examined in allelic contigs were not shared between these 2 inbred lines ; reasonable predictions anticipate that the genomic divergence between 2 landraces is far more important. Our results identified more than 86,000 sequences that represent novel transcripts that are expressed in Palomero plantlets, indicating that a large portion of the intrinsic transcriptional diversity present in native landraces remains to be explored. The discovery of this collection of novel transcripts suggests that many more should be present in different tissues and developmental stages, opening the possibility for large-scale efforts to characterize the transcriptional universe of genetically distinct native landraces.
When estimating transcriptional abundance of representative genes, we noticed a direct correlation between the number of reads corresponding to a transcript and its level of expression assessed by qPCR, indicating the possibility that in some transcriptional ranges, deep sequencing of cDNA samples could provide an accurate estimation of transcriptional abundance. It is likely that an increase in the number of pyrosequencing runs could enhance the accuracy of this type of quantitative estimations, as the number of pyrosequencing runs necessary for deep coverage of a given transcriptome will depend on the nature and the complexity of the sample. Overall, our results suggest that a systematic and detailed characterization of gene expression in maize using high-throughput technologies will generate useful information for the understanding of maize biology.
Access to large-scale landrace transcriptional sequences promise to become an invaluable source of polymorphic information for exploring maize natural variation and exploiting allele diversity and recombination. We expect that a renewed interest in landrace germplasm will emerge with the development of new initiatives to explore the functional diversity of maize.
In conclusion, using an optimized protocol for pyrosequencing of a Palomero cDNA library we generated and analyzed the largest collection of maize transcripts corresponding to a single developmental stage. The Palomero sequences covered over 50% of all reported maize unigenes, and an estimated of 5.67% of the reads potentially represent new maize transcripts. Our results indicate that recurrent pyrosequencing is necessary to attain a thorough representation of the transcriptional universe present in a single cDNA sample, as well as for transcript abundance estimation in a non-normalized cDNA library. Finally, large-scale transcriptional sequencing of native landraces represents a valuable approach to characterize the functional diversity of maize.
Seeds from Zea mays Palomero (accession# EDMX2233, CIMMYT, Mexico) were grown under greenhouse conditions for 2 weeks and then transferred to a dark room for two days before total RNA extraction.
cDNA library construction
Total RNA was extracted with TRIZOL (Invitrogen) from whole 2 week old maize seedlings. cDNA synthesis was performed with 3.5 μg of total RNA using Message Amp-II kit (Ambion) following the protocol as recommended by manufacturers. Briefly, first strand cDNA synthesis was primed with T7 Oligo(dT) primers. After a second strand cDNA synthesis reaction, 5–10 ng of synthesized double-stranded cDNA were amplified by in vitro transcription and the resulted 5–7 μg of antisense RNA (aRNA) was purified using Qiagen RNAeasy columns (Qiagen). A second round of cDNA synthesis was performed using the aRNA as template. First and second strand cDNA synthesis were as described above except that random nonamers (Amersham) were used at the first strand synthesis. This procedure yielded about 4 μg of cDNA that were purified using the DNA Clear Kit for cDNA purification (Ambion). cDNA was nebulyzed to obtain fragments of 200–700 bp before sequencing.
Approximately 3 μg of sheared cDNA were used for GS20–454 sequencing. The cDNA sample was end-repaired and adapter ligated according to . Streptavidin bead enrichment, DNA denaturation and emulsion PCR were also according to procedures previously described . Four sequencing runs were performed in this library and resulted in 1,526,880 reads.
Trimming of polyA/T and removing of low quality sequences from the raw 1,526,880 reads was performed using TIGR SeqClean software pipeline http://compbio.dfci.harvard.edu/tgi/software. Sequences shorter than 50 bp after processing were excluded from the analysis. This resulted in 1,517,878 high quality reads. For assembly, the 454 Newbler software and the TIGR Gene Indices clustering tools (TGICL)  were used.
Stand-alone BLAST software  was obtained from the National Center for Biotechnology Information (NCBI, http://www.ncbi.nih.gov). The high quality GS20–454 sequences were compared by BLAST with 903,624 unassembled maize ESTs from GenBank (downloaded in January 2007), 55,327 maize UniGenes from GenBank (Build number 61, January 18th 2007), 727,781 contigs and singletons from MAGI version 4 and the MAGI Cereal Repeat database v 3.1 http://magi.plantgenomics.iastate.edu, 115,744 maize sequences from the TIGR Gene Indices downloaded in November 2006 (ZMGI release 17, http://compbio.dfci.harvard.edu/tgi), and the maize chloroplast (GenBank accession no. X86563) and mitochondrial (GenBank accession no. DQ645537Zea luxurians; AY506529Zea mays strain NB; DQ645539Zea mays subsp. parviglumis; DQ645538Zea perennis) genomes. A database containing 1561 maize tRNAs http://gtrnadb.ucsc.edu and 421 plant snoRNAs http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home was used to search for these RNAs in the GS20–454 sequences. Other databases used in this study are the TIGR Plant Transcript Assemblies database (Plantta) http://plantta.jcvi.org/ and the non-redundant protein database (NR) from NCBI ftp://ftp.ncbi.nih.gov/blast/db/. Several local MySQL databases were built to store all relevant information of BLAST analyses. Perl scripts were used to retrieve sequences from the Palomero EST collection. Sequences of the assemblies of the 454-GS20 reads were deposited in the GenBank Transcriptome Shotgun Assembly (TSA) database under accession numbers EZ048883 – EZ114339.
Primers for qRT-PCR were designed to produce amplicons of about 170 bp (see Additional file 3). The reaction mixture for quantitative PCR was as follows: 10 μl of Sybr green master mix (Applied Biosystems), 3 μl of cDNA template (3 ng/μl) and 1 μl of each (10 μM) of the primers. The PCR program was as follows: One cycle at 95°C for 5 min, 40 cycles at 95°C each for 30 sec, at 65°C for 30 sec, 72°C for 40 sec. Melting curves for each product, starting from 60°C to 95°C at 0.2°C/sec, produced a single melting point. All the Ct values are averages of at least three repetitions.
We thank Raymundo Mendez for assistance with the GS20–454 sequencing, Juan Caballero and Araceli Fernandez for their help with bioinformatic analyses, Susana Fuentes for assistance with qPCR, and Gustavo Hernandez for helpful discussions. This project was supported in part by the Zea-2006 project from the Mexican Secretaría de Agricultura (SAGARPA).
- Ewing RM, Ben Kahla A, Poirot O, Lopez F, Audic S, Claverie JM: Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res. 1999, 9 (10): 950-959. 10.1101/gr.9.10.950.PubMed CentralView ArticlePubMedGoogle Scholar
- Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science. 1995, 270 (5235): 484-487. 10.1126/science.270.5235.484.View ArticlePubMedGoogle Scholar
- Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000, 18 (6): 630-634. 10.1038/76469.View ArticlePubMedGoogle Scholar
- Matsumura H, Nirasawa S, Terauchi R: Technical advance: transcript profiling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant J. 1999, 20 (6): 719-726. 10.1046/j.1365-313X.1999.00640.x.View ArticlePubMedGoogle Scholar
- Fizames C, Munos S, Cazettes C, Nacry P, Boucherez J, Gaymard F, Piquemal D, Delorme V, Commes T, Doumas P, Cooke R, Marti J, Sentenac H, Gojon A: The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. Plant Physiol. 2004, 134 (1): 67-80. 10.1104/pp.103.030536.PubMed CentralView ArticlePubMedGoogle Scholar
- Gibbings JG, Cook BP, Dufault MR, Madden SL, Khuri S, Turnbull CJ, Dunwell JM: Global transcript analysis of rice leaf and seed using SAGE technology. Plant Biotechnol J. 2003, 1 (4): 271-285. 10.1046/j.1467-7652.2003.00026.x.View ArticlePubMedGoogle Scholar
- Poroyko V, Hejlek LG, Spollen WG, Springer GK, Nguyen HT, Sharp RE, Bohnert HJ: The maize root transcriptome by serial analysis of gene expression. Plant Physiol. 2005, 138 (3): 1700-1710. 10.1104/pp.104.057638.PubMed CentralView ArticlePubMedGoogle Scholar
- Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S: The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res. 2004, 14 (8): 1641-1653. 10.1101/gr.2275604.PubMed CentralView ArticlePubMedGoogle Scholar
- Hoth S, Morgante M, Sanchez JP, Hanafey MK, Tingey SV, Chua NH: Genome-wide gene expression profiling in Arabidopsis thaliana reveals new targets of abscisic acid and largely impaired gene regulation in the abi1-1 mutant. J Cell Sci. 2002, 115 (Pt 24): 4891-4900. 10.1242/jcs.00175.View ArticlePubMedGoogle Scholar
- Wicker T, Schlagenhauf E, Graner A, Close TJ, Keller B, Stein N: 454 sequencing put to the test using the complex genome of barley. BMC Genomics. 2006, 7: 275-10.1186/1471-2164-7-275.PubMed CentralView ArticlePubMedGoogle Scholar
- Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC: High-Throughput Sequencing of Arabidopsis microRNAs: Evidence for Frequent Birth and Death of MIRNA Genes. PLoS ONE. 2007, 2: e219-10.1371/journal.pone.0000219.PubMed CentralView ArticlePubMedGoogle Scholar
- Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD: Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics. 2006, 7: 272-10.1186/1471-2164-7-272.PubMed CentralView ArticlePubMedGoogle Scholar
- Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB: Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol. 2007, 144 (1): 32-42. 10.1104/pp.107.096677.PubMed CentralView ArticlePubMedGoogle Scholar
- Emrich SJ, Barbazuk WB, Li L, Schnable PS: Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2006, 17 (1): 69-73. 10.1101/gr.5145806.View ArticlePubMedGoogle Scholar
- Vielle-Calzada J-P, Padilla J: The Mexican Landraces: Description, Classification and Diversity. Handbook of Maize: Its Biology. Edited by: Hake JLBaSC. 2009, New York: Springer, 543-561. 10.1007/978-0-387-79418-1_27.View ArticleGoogle Scholar
- Fu Y, Emrich SJ, Guo L, Wen TJ, Ashlock DA, Aluru S, Schnable PS: Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc Natl Acad Sci USA. 2005, 102 (34): 12282-12287. 10.1073/pnas.0503394102.PubMed CentralView ArticlePubMedGoogle Scholar
- Messing J, Dooner HK: Organization and variability of the maize genome. Curr Opin Plant Biol. 2006, 9 (2): 157-163. 10.1016/j.pbi.2006.01.009.View ArticlePubMedGoogle Scholar
- Childs KL, Hamilton JP, Zhu W, Ly E, Cheung F, Wu H, Rabinowicz PD, Town CD, Buell CR, Chan AP: The TIGR Plant Transcript Assemblies database. Nucleic Acids Res. 2007, D846-851. 10.1093/nar/gkl785. 35 DatabaseGoogle Scholar
- Bomblies K, Wang RL, Ambrose BA, Schmidt RJ, Meeley RB, Doebley J: Duplicate FLORICAULA/LEAFY homologs zfl1 and zfl2 control inflorescence architecture and flower patterning in maize. Development. 2003, 130 (11): 2385-2395. 10.1242/dev.00457.View ArticlePubMedGoogle Scholar
- Hubbard L, McSteen P, Doebley J, Hake S: Expression patterns and mutant phenotype of teosinte branched1 correlate with growth suppression in maize and teosinte. Genetics. 2002, 162 (4): 1927-1935.PubMed CentralPubMedGoogle Scholar
- Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS: SNP discovery via 454 transcriptome sequencing. Plant J. 2007, 51 (5): 910-918. 10.1111/j.1365-313X.2007.03193.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Eveland AL, McCarty DR, Koch KE: Transcript profiling by 3'-untranslated region sequencing resolves expression of gene families. Plant Physiol. 2008, 146 (1): 32-44. 10.1104/pp.107.108597.PubMed CentralView ArticlePubMedGoogle Scholar
- Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski A: Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet. 2005, 37 (9): 997-1002. 10.1038/ng1615.View ArticlePubMedGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMed CentralPubMedGoogle Scholar
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19 (5): 651-652. 10.1093/bioinformatics/btg034.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMedGoogle Scholar
- Chan PP, Lowe TM: GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009, D93-97. 10.1093/nar/gkn787. 37 DatabaseGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.