As a key component of plant cell molecules, phosphorus (P) is an essential macronutrient for plant growth. Large quantities are used in fertiliser, but worldwide P resources will be exhausted by the end of this century . Phosphate (Pi) starvation can generally be observed throughout an afflicted field. Visual symptoms of Pi starvation (−P) are the development of dark-green leaf colour and a reduction in shoot elongation and leaf size. As −P progresses in wheat (Triticum aestivum L.), the oldest leaves become chlorotic and show signs of desiccation .
Wheat is a major staple food crop in many parts of the world in terms of both cultivation area and prevalence as a food source. To meet the increasing global demand for wheat, this crop’s exploitation of nutrients must be made more efficient and its requirement for nutritional fertilisers reduced. Because wheat is primarily grown on substrates with low P levels, such as the acidic soils of tropical and subtropical regions and the calcareous soils of temperate regions, an important constraint to wheat production is its lack of tolerance to −P.
Various genetic approaches have been used to understand genetic control of −P tolerance in wheat; these include aneuploid analyses of the nulli-tetrasomic series and wheat alien chromosome addition lines of the cultivar ‘Chinese Spring’ (CS) and quantitative trait locus (QTL) mapping [3–6]. QTL analyses using –P-sensitive CS and the tolerant variety ‘Lovrin 10’ indicated that CS possesses positive alleles of the major QTLs for P use efficiency on chromosomes 3B, 4B, and 5A . In another study, seven and six QTLs were repeatedly detected controlling P uptake and use efficiency . A large number of QTLs for agronomic trait changes under low or high P concentrations have been detected on all chromosomes in the hexaploid wheat genome, implying that −P tolerance is controlled by polygenes . However, the studies are few in number; a reverse genetic approach could help characterise genes that potentially contribute to complex multilocus traits and their global transcriptional networks in Pi-starved wheat.
Several technologies, including massively parallel sequencing and microarray analysis, have recently been used to simultaneously catalogue the effects of −P on the expressions of thousands of genes in model species [7–10]. Transcriptome sequencing using next generation sequencing (NGS) technology provides high-resolution data and is a powerful tool for studying global transcriptional networks. The evaluation of sequence-based expression profiles can identify responsive genes and provide functional annotation for genes underlying complex and multilocus traits under −P in wheat.
In model species, transcriptome profiling and the quantification of gene expression levels are generally performed by mapping reads from the NGS analysis to a reference genome sequence and annotating genes. The strategies for model species are not feasible in wheat, as its reference genome sequence and gene annotation are still incomplete; an international project to achieve these goals is currently making progress (IWGSC: International Wheat Genome Sequencing Consortium, http://www.wheatgenome.org/). This project may take considerable time, because of the difficulties involved in sequencing the huge (40 times larger than rice), highly-repetitive hexaploid genome of wheat.
The database of putative full-length cDNAs for wheat, TriFLDB, has released approximately 16,000 full-length cDNAs (http://trifldb.psc.riken.jp/index.pl) . Although this dataset is a useful reference for transcript mapping, it is incomplete, because 36,000 to 50,000 genes have been estimated per diploid genome based on the 3B chromosome of hexaploid wheat [12, 13]. Recently, de novo transcript assembly analysis has made possible comprehensive analyses of transcriptomes, and several studies have detailed the transcriptome sequencing of various non-model species, including wheat, using massively parallel sequencing technology [14, 15]. De novo assembly of short sequences of transcripts enables researchers to reconstruct the sequences of entire transcriptomes, identify and catalogue all of the expressed genes, separate isoforms, and capture transcript expression levels.
Although computer-based de novo assembly tools (e.g., Trans-ABySS, Velvet-Oases, and Trinity) [16–18] have been developed in conjunction with massively parallel sequencing, their usefulness in transcriptome assembly is not yet well demonstrated, and improvements can still be achieved using recent advances in bioinformatics. Some studies have used short-read sequence data obtained with the Illumina sequencer for de novo assembly; others have used the relatively long-read sequence data obtained with the Roche 454 pyrosequencing system or have adopted a hybrid approach of both short and long reads. In addition, contig construction is greatly affected by sequence read quality (i.e., length) and quantity. Furthermore, the cDNA library construction methods, sequencing technologies, and data pre-treatment techniques chosen influence the quality of the assembled transcriptomes . Consequently, a comparison of several assembly programs is needed to determine the best combination of parameters, which can then serve as a guideline for sequence assembly performance.
In this study, we verified the de novo assembly approach by comparing analyses from several programs using short-read sequence data obtained from wheat cultivar CS seedlings under –P. We constructed a wheat transcript dataset for de novo assembly and quantified gene expression. As a reference in the gene expression analysis, we used a non-redundant set of transcripts generated from the de novo assembly and full-length cDNAs. This dataset was also used to assess transcripts, to investigate sequence similarity, for conservation analyses among several plant species, and for comparison with our previous report on rice transcript profiling under −P conditions . We demonstrated that an overall mechanism regulating gene expression of −P-responsive genes in wheat could be effectively characterised using short-read NGS data. A comparison of gene expression profiles in wheat and rice revealed the presence of conserved gene expression systems, which appear to be essential to adaptation to –P conditions. Finally, we described an effective method for assembly of short transcript sequences to discover novel functional genes in the absence of a reference genome. The transcript assembly generated in this study should serve as a useful resource for wheat genomics and genetics.