An improved method to identify LBD genes
Previous strategy to detect LBD genes is primarily based on database query or BLAST search of Arabidopsis protein sequences [1, 2], which heavily depends on Arabidopsis sequence features and has limited applications for evolutionarily divergent genomes of land plants. Therefore we develop a new method that can be widely employed to detect LBD genes. As sequences of LBD proteins are highly variable except the conserved LOB domain [3, 4], we use the PFAM profile hidden Markov models of the LOB domain (PFAM database, http://pfam.xfam.org/) to query each plant genome with a cutoff of gathering threshold (E-value 1e-5). Only proteins with matched sequence covering at least 80% of the LOB domain are regarded as LBD proteins. This improved method would be effective to preclude partial sequence matches outside the LOB domain, while allowing identification of LBD genes with relative sequence variance. We identify 431 LBD genes in 11 high-quality genomes of land plants spanning bryophytes to angiosperms (Fig. 1 and Additional file 1). The lycophyte Selaginella contains 15 LBD genes, ranking the least in land plants. The number of LBD genes nearly doubles in bryophyte Physcomitrella which has a genome duplication event [19]. Apart from basal eudicot Aquilegia, more than 34 LBD genes are identified in the angiosperm genomes (Fig. 1), suggesting extensive expansion of LBD gene family in angiosperms. Noticeably, compared with the previous study [1], our improved method identifies more LBD genes in basal-node bryophyte Physcomitrella (28 vs 26) and lycophyte Selaginella (15 vs 11).
Expansion patterns of LBD genes in angiosperms
The expansion of LBD genes in angiosperms suggests they are likely influenced by whole-genome duplications (WGDs) that substantially increase gene content [18]. Therefore, we analyze each angiosperm genome to identify different types of gene duplications that contribute to the diversity of LBD genes. According to genome positions of the affected genes, gene duplication events are categorized into different sorts: WGD, dispersed duplication, and tandem/proximal duplications. On average 84% of LBD genes in angiosperms are affected by WGD and dispersed duplications (Fig. 2a). WGD events influenced 35–45% of LBD genes in most angiosperm species, while the ratio can vary dynamically from 28% in Arabidopsis to 77% in Populus. This could be explained by the fact that Arabidopsis has lost many LBD gene duplicates following two recent WGDs within the crucifer lineage, whereas Populus retained more duplicated genes after the Salicaceae WGD event [20, 21]. Compared with WGD event, tandem duplications in Populus only affect 5.2% of LBD genes. In contrast, up to 40% of LBD genes in Vitis are influenced by tandem duplications, suggesting tandem duplication in Vitis is an important driving force for the expansion of LBD genes.
Previous studies demonstrate genes descendent from a common ancestor often share chromosomal collinearity in angiosperms [18]. Therefore, we analyze collinearity relationship of LBD genes to infer their inter-species orthology. Through iterative clustering of collinear LBD genes, we merge 133 LBD genes of dicots into 21 collinear groups and 92 LBD genes of monocots into 23 collinear groups (Fig. 2b and Additional file 2). The collinear groups contain 77% of total LBD genes in the analyzed angiosperms and each group represents a set of LBD gene orthologs originated from a common ancestor. Merging collinear groups into a higher hierarchy is not feasible because extensive genome fragmentation and rearrangement obscured syntenic blocks between dicots and monocots [18]. Although the result identifies less than 23 members of LBD gene ancestors, additional phylogenetic information is needed to collapse the collinear groups of dicots and monocots to reflect a common angiosperm ancestor.
Retracing LBD gene ancestors in land plants
A maximum likelihood (ML) tree is constructed with the aligned LOB domain in the 11 species to reveal phylogenetic relationships of LBD genes. The phylogenetic tree classifies LBD genes into class IA, IB, IC1/ID, IC2, IE, and class II, a topology similar with previously reported [1, 2] (Fig. 3). Through scrutinizing the phylogenetic tree, we identify 7 independent gene clusters of Physcomitrella and Selaginella genes that neighbor with LBD genes of all analyzed seed plants, suggesting they are ancient gene lineages that give rise to LBD genes in modern plant genomes. We detect two ancient lineages in class IA, and one ancient lineage in each of the five remaining classes (Additional file 3). Therefore, class IA is actually composed of two founder genes in early land plants. To yield a better resolution, we generate another ML tree using complete protein sequences of class IA. The phylogenetic tree verifies the existence of two ancient lineages with high support values (Shimodaira-Hasegawa-like approximate likelihood-ratio test (SH-aLRT) > 94%) (Additional file 4).
We then search gene clusters of seed plants to retrace ancestor LBD genes predating seed plants emergence. As phylogenetic tree often suffers from long branch attraction in which distant protein sequences are incorrectly grouped together [22], we employ stringent criteria in the analysis, requiring that LBD genes of gymnosperm Picea should be clustered with angiosperm genes from all the analyzed species. In total 11 gene clusters of seed plants were identified from the ML tree: three in class IC1/ID, two in class IA, IB and II, one in class IC2 and IE (Fig. 4 and Additional file 3). Because each cluster of seed plant LBD genes share the same phylogenetic origin, 11 LBD gene ancestors likely existed prior to seed plant divergence. Compared with the ancient lineages, class IB, IC1/ID and II LBD genes experience gene content expansion before seed plant evolution.
Gene collinearity and phylogenetic relationships are combined to retrace angiosperm ancestors of LBD genes. The ML tree categorizes 18 angiosperm gene clusters that include both dicot and monocot LBD genes (Fig. 4 and Additional file 3). Most of them are supported by inter-species gene collinearity except one cluster in class IE, which may be caused by massive gene duplications that likely obscure the syntenic block detection. Therefore, we propose that 18 LBD gene ancestors have been established prior to angiosperm emergence.
Reconstruct the ancestry of LBD gene family
We reconstruct evolutionary history of LBD genes with the deduced ancestor genes. In early land plants, 7 ancient lineages of LBD genes were established and remained in a stable amount until the divergence of seed plants (Fig. 4). Two rounds of gene duplication occurred before the emergence of seed plants and angiosperms, expanding LBD gene family to 11 and 18 members respectively. Further expansion of LBD genes in individual angiosperm species is highly associated with WGDs. The basal eudicot Aquilegia contains 26 LBD genes, while the amount could be increased to over 40 members in other eudicots which have undergone genome triplication or additional WGD events (Fig. 1) [23]. The analyzed monocots have survived from two successive WGDs and contain more than 34 LBD genes in the genome [24]. Noticeably, all of the major lineages of LBD genes could be detected in current angiosperm genomes, indicating they are extremely reluctant to be lost during evolution.
The reconstructed ancestry describes detailed evolutionary routes for 43 LBD genes in Arabidopsis and 34 LBD genes in rice. RNA-seq data of rice developmental atlas was further analyzed to investigate expression patterns of LBD genes from different subfamily [25]. The heatmap shows that different subfamily genes exhibit variable expression enrichment in different tissues (Fig. 5). For example, class IA LBD genes are more abundant in leaf and flower tissues, while class II LBD genes are universally expressed in diverse tissues. Meanwhile, it is often observed that LBD genes from a same subfamily genes display differential expression patterns. A noticeable case is in class IE which contains eight LBD genes (OsLBD3, OsLBD6, OsLBD7, OsLBD8, OsLBD9, OsLBD10, OsLBD15 and OsLBD16) sharing a same angiosperm ancestor. Only four of these LBD genes show abundant expression in normal growth tissues, while the other genes are detected with very weak expression level (Fig. 5), suggesting expression pattern has been shifted during these genes evolution. Another case is the class IA-2 subfamily which contains OsLBD11, OsLBD31 and OsLBD27 paralog genes. Expression of OsLBD11 is enriched in leaves, shoots and panicle (Fig. 5). Meanwhile, OsLBD31 and OsLBD27 display complementary expression patterns in either leaves or shoot and panicle, suggesting the two paralog genes underwent expression specialization after gene duplication. In Arabidopsis, similar expression divergence is also observed for class IA-2 LBD genes. For example, AtLBD6/AS2 is specifically expressed in the adaxial side of leaves [3], while AtLBD10 is only detected in the pollen grains, and AtLBD36 is expressed in a variety of tissues, including leaf vasculature, flower organs and seeds [26, 27]. These observations suggest alteration of expression level and tissue specificity occurred during LBD genes evolution.
Diversification of LBD gene subfamily
To deepen annotation and classification of LBD gene family, we developed sequence profile features for each LBD gene class using whole protein sequences alignment. The result shows that, besides the well recognized LOB domains, different gene classes possess specific characteristic protein sequences (Additional file 5). The location of these characteristic sequences varies among different classes. They can lie immediately downstream the LOB domain (class IA), present at C-terminal regions (class IB and IC1/ID), or extend flanking the LOB domain (class IC2).
In seed plants, class IB LBD genes display prominent functions in regulating root development [6, 9, 10]. Previous study detected none of class IB LBD gene exists in Selaginella moellendorphii [1], leading to the assumption that genetic programs of root development in lycophytes are distinctive to the seed plants. In contrast, our method is sufficient to identify a basal-node LBD gene in S. moellendorphii (SmoeLBD007), which clusters with class IB genes with high support (aLRT = 95%) (Fig. 6). This ancient gene lineage is preserved in all of the land plant genomes analyzed, and its direct descendant in Arabidopsis is AtLBD20, which was demonstrated to participate in pathogen defense [28]. Realtime PCR shows SmoeLBD007 is mainly expressed in root and leaf tissues (Additional file 6), suggesting it is functional during these tissues development. Meanwhile, phylogenetic analysis identifies two LBD gene groups that are specific to seed plants. These gene groups contain some key regulators of lateral root development in angiosperms, including AtLBD16, AtLBD18 and AtLBD29 in Arabidopsis, OsLBD21/CRL1 in rice and ZmayLBD002/RTCS in maize [6, 9, 10]. Selection pressure analysis shows that the seed plant-specific gene lineages exhibit a higher nonsynonymous/synonymous substitution ratio (ka/ks, p-value = 0.0039) (Fig. 6), suggesting they accumulated more nonsynonymous amino acid changes during seed plants evolution. Therefore, although class IB LBD genes are present in S. moellendorphii, they are extensively duplicated in seed plants, and recruited to root regulations through sequence change and functional specialization.
Class IA is composed of two ancient lineages, namely IA-1 and IA-2, which give rise to AtLOB and AtLBD6/AS2 in Arabidopsis respectively (Fig. 4). Consistent with the independent gene ancestry, AtLOB and AtLBD6/AS2 exhibit divergent biological functions even with highly similar amino acid compositions [5]. Class IA-2 experienced a duplication event before angiosperm emergence and generates AtLBD6/AS2 lineage and AtLBD10 lineage. Sequence alignment identifies a “SKYQ” motif immediately downstream of the LOB domain in AtLBD6/AS2 orthologs (Additional file 7). In contrast, AtLBD10 orthologs contain a totally different sequence featured with “AAYIGP”. This phenomenon suggests protein sequence change is also accompanied with evolution of class IA LBD genes in angiosperms.
Class II subfamily contains two LBD gene ancestors before seed plant appearance. In Arabidopsis, AtLBD37, AtLBD38 and AtLBD39 are derived from a common ancestor (Fig. 4) and participate in a same biological process to repress anthocyanin biosynthesis and nitrogen responsive genes [8]. Through sequence alignment, we identify a featured pattern of LxLxL motif in these proteins (Additional file 7) which is required to recruit TPL/TPR co-repressors and fulfill transcriptional repression activity [29]. The LxLxL motif is also conserved in class II LBD genes of Physcomitrella, suggesting the ability to recruit TPL/TPR co-repressors has been acquired by early land plants.