Conifers (Coniferales), the most important group of gymnosperms, represent 650 species, some of which are the largest, tallest, and oldest non-clonal terrestrial organisms on Earth. They are of immense ecological importance, dominating many terrestrial landscapes and representing the largest terrestrial carbon sink. Currently present in a large number of ecosystems, they have evolved very efficient physiological adaptation systems. Given that trees are the great majority of conifers, they provide a different perspective on plant genome biology and evolution taking into account that conifers are separated from angiosperms by more than 300 million years of independent evolution. Studies on the conifer genome are revealing unique information which cannot be inferred from currently sequenced angiosperm genomes (such as poplar, Eucaliptus, Arabidopsis or rice): around 30% of conifer genes have little or no sequence similarity to plant genes of known function [1, 2]. Unfortunately, conifer genomics is hindered by the very large genome (e.g. the pine genome is approximately 160 times larger than Arabidopsis and seven times larger than the human genome; in fact, it is larger than any other genome sequenced to date) that is replete with highly repetitive, non-coding sequences .
Conifers include the economically and ecologically important species of spruces (Picea sp) and pines (Pinus), Pinus being the largest extant genus with approximately 115 species. The importance of pines is due to the fact that: (i) their timber and paper pulp are used for the construction of buildings and furniture; (ii) they are used in reforestation due to their rapid growth and drought tolerance as compared to other tree species; (iii) they help stabilise sandy soils and indirectly act as an atmospheric CO2 sink, helping to reduce global warming; (iv) some pine nuts are widely used in Mediterranean cuisine. Consequently, the genus Pinus is becoming a woody gymnosperm model. The main pine model species in Europe are Pinus pinaster and Pinus sylvestris, whereas Pinus taeda and Pinus contorta are the equivalent in North America. Therefore, it is relevant to investigate and increase our knowledge of the content of the pine genome as this would allow the exploitation of natural genetic resources and the use of new forest reproductive material appropriate to adapt these trees to a changing climate.
The application of genome-based science is playing an important role in understanding the genome content and structure of different organisms. Since whole-genome sequencing approaches are hard to apply to large genomes such as the pine genome, scientists have focused on the expressed portion of the genome using dedicated technologies. For example, the sequencing of clones obtained by suppression subtractive hybridisation (SSH) [4–6] provides gene-enriched sequences that are specific to a particular condition. However, the dominant approach to characterising the transcriptionally active portions of pine genomes has been expressed sequence tags (ESTs) [7, 8] due to the absence of non-coding DNA (mainly introns and intergenic regions). Classic ESTs are subject to artefacts during cDNA library construction and are highly error prone during sequencing procedures. As a result, erroneous clustering and assembling occur during reconstruction of putative transcripts and may ultimately lead to inaccurate gene annotation . However, next-generation sequencing technologies have removed many drawbacks and time-consuming steps involved in classic ESTs and have facilitated transcriptome sequencing of many species at a fraction of the total time and cost previously required . ESTs have also driven the development of pine microarrays [11–14], although there is no easy way to relate the data printed on these microarrays to the corresponding pine sequences.
Sequencing projects should store, organise, and retrieve sequences by means of user-friendly databases. Since many sequences in EST databases are reported to be highly contaminated or incorrectly pre-processed , there is a need for more reliable pre-processing, clustering, assembly and annotation pipelines to yield reliable information. ConiferEST  (now part of ConiferGDB http://www.conifergdb.org/coniferEST.php) was the first attempt to rationalise pine sequences by more precise pre-processing dedicated to Pinus taeda traces only. The DFCI Pine Gene Index http://compbio.dfci.harvard.edu/cgi-bin/tgi/gimain.pl?gudb=pine, a subset of the discontinued TGI Gene Indices , is a non-redundant database of all putative Pinus genes. This is a very large compilation of pine sequences, but only GO and KEGG annotations are available and no separation by species is provided, P. taeda is highly over-represented, and its interface only allows limited interaction. ForestTreeDB was created to centralise large-scale ESTs from diverse tissues of conifer and poplar trees , but it is no longer available. The TreeGenes database http://dendrome.ucdavis.edu/treegenes/ is composed of a wide range of forest tree species . This effort to combine and inter-relate a great variety of different information should be acknowledged, even though EST pre-processing is not optimal. TreeSNPs  and PineSAP  are databases exclusively devoted to single nucleotide polymorphisms (SNPs) in Picea and Pinus species, respectively. Recently, Parchman and co-workers  described the first high-throughput analysis of a pine species, but no database was created for this. It should be noted that none of the above databases are linked to the pine microarrays described in literature.
Our group has been working on pine genomics for many years (e.g., EMBL accession numbers AM982822-AM983454, BX248593-BX255804, BX682240-BX683073, BX784033-BX784385, EC428477-EC428747, FM945441-FM945999 or FN256437-FN257130) and wish to provide high-quality sequences and annotations of pine genomes by means of EuroPineDB. Taking advantage of next-generation sequencing methods, recently released pre-processors , reliable sequence annotators , and the bioinformatics infrastructure of the University of Málaga (Spain), EuroPineDB was designed to gather the most reliable re-pre-processed, assembled, and annotated P. pinaster sequences using different technologies. Retrieval systems based on sequence similarity, description matches or microarray positions are also included, as well as browsing by species, experimental process, and annotation. As a new feature, many of its sequences have been printed on a microarray for expression analysis  and can be freely browsed.