Next Generation Sequencing (NGS) is quickly becoming the standard for the generation of cheap, accurate and high throughput DNA sequence data . The major NGS platforms are Roche 454 GS-FLX Titanium (330 bp), Illumina GAIIx (75-100 bp) and SOLiD3 (50 bp), which differ in read length, error rate and cost . Transcriptome sequencing using NGS, commonly known as RNA-Seq, enables rapid and cost-effective gene and marker discovery, gene expression analysis, detection of rare variants and splice isoforms. Most previous studies have involved sequencing plant transcriptomes with completed reference genomes available, such as Arabidopsis thalina [3, 4], Medicago truncatula  and Zea mays [6, 7]. Direct sequencing of the transcriptome of non-model organisms has the potential to rapidly generate valuable genomic resources in poorly known species. However, de novo transcriptome assembly is challenging due to short reads, lack of reference sequences and the need for development of improved bioinformatic tools to facilitate data analysis .
Most de novo transcriptome studies have used the Roche 454 platforms [9–13] as the longer reads allow more reliable de novo assembly, however, the reactions are relatively expensive, reducing the potential sequencing coverage which plays a major role in the accuracy of de novo assembly. Hybrid sequencing approaches using 454/Illumina technologies can successfully reduce cost and compensate for different sequencing technology biases [14, 15]. While sequencing exclusively using Illumina technology, the most widely published NGS platform is an attractive and cheap alternative as the high coverage obtained can overcome sequencing error rates and short read length, relatively few de novo transcriptome studies have exploited these advantages in plants https://atgc-illumina.googlecode.com/files/PAG_2010_AKozik_V09.pdf. As read lengths increase, paired-end library construction techniques improve and costs continue to go down, Illumina RNA-seq will become a powerful tool for transcriptome characterization of non-model plants.
Acacia mangium and Acacia auriculiformis are important forest tree species, belonging to the Fabaceae or Legume family, and are native to Australia, Papua New Guinea and Indonesia. A. mangium is widely planted in Southeast Asia because of its superior growth, wide site suitability and multiple uses [17, 18] while A. auriculiformis has higher adaptability, greater durability and is less susceptible to diseases than A. mangium. A. auriculiformis and A. mangium are predominantly out-crossing [19, 20]. Naturally-crossed Acacia hybrids were first noted in Sabah in the late 1970s . These hybrids possessed many attractive traits highly sought in tree improvement, such as enhanced growth, form, disease resistance and adaptability. For the wood and pulp industry, the Acacia hybrids have great potential as raw material due to superior growth, longer wood fibers and better pulp quality over their parents . Low lignin and high cellulose content are desirable in the pulping process and studies have shown increased accumulation of cellulose occurs when lignin is reduced in plants . The monolignol biosynthesis pathway is well-characterized but the coordination and regulation of genes in the pathway is not well-understood. Recent studies revealed that known regulatory sequences, including several classes of transcription factors and microRNAs play important roles in regulation of lignin and wood formation [24, 25]. These regulatory sequences may be good candidates in selective breeding and genetic engineering programs to increase pulp yield and reduce pulping costs.
The C-value for A. auriculiformis and A. mangium (both 2n = 26) are estimated to be 0.83 pg and 0.65 pg respectively  while A. auriculiformis × A. mangium hybrid genome size is estimated to be 750 Mb , making the hybrid genome 1.4 times larger than the Populus trichocarpa genome. Currently, no genome sequences for any Acacia species are available although the genomes of several model legume species like M. truncatula and Glycine max have been sequenced. Unfortunately, all of these model legumes are in a separate subfamily, the Faboideae, while Acacia species are in the Mimosoideae subfamily. In terms of EST resources for A. mangium, a total of 147 from floral tissues , 8,963 from secondary xylem and shoot tissue  and 2,459 from inner bark of the A. auriculiformis × A. mangium hybrid  have been deposited in the NCBI dbEST. However, no genomic resources is available for A. auriculiformis. Several important genes involved in monolignol biosynthesis and wood-related pathways including cinammate 4-hydroxylase (C4H), caffeoyl CoA 3-O-methyltransferase (CCoAOMT), cinnamyl alcohol dehydrogenase (CAD), phenylalanine ammonia lyase (PAL), caffeic acid O-methyltransferase (COMT) and cellulose synthase (CesA) have been successfully isolated and characterized from the Acacia hybrid [30, 31].
Conventional breeding programs for the improvement of forest trees are slow, laborious and land intensive due to the long life cycle and large size of trees. The application of genomic approaches facilitated by emerging DNA sequencing technologies may significantly accelerate the breeding program. Due to the lack of genomic resources for tree crops particularly tropical species, the simple discovery of genes controlling wood-related traits will be a major step forward. Ultimately, the development of large-scale genomic resources will facilitate the application of linkage and association mapping within tree improvement programs.
Here we applied paired-end Illumina GAII sequencing to non-normalized cDNAs of A. auriculiformis and A. mangium to discover important genes involved in lignin and secondary cell wall formation in these non-model tree species. Using standard de novo assembly algorithms, we examined the quality of the contigs generated and attempted to identify wood-related genes particularly genes and their isoforms in the monolignol biosynthesis pathway. We also sought to identify potential transcription factors involved in secondary wood formation and lignin deposition, and highly conserved microRNAs and their wood-related gene targets. A major objective in our analysis was to detect a large number of informative SNPs to be used for linkage mapping of hybrid progenies and population genetic studies of the two parental species. Our results could provide powerful tools for the efficient selection of hybrid offsprings with favorable traits, allowing rapid and continued improvement.