STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency

Background Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. Results We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. Conclusions Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.

to happen in many organisms (for example, birds) where ILS is likely to occur [8]. Due to this growing awareness that ILS can be present in many phylogenomic datasets, ILS-aware species tree estimation techniques have gained substantial attention from systematists. These types of method are usually known as "summary methods" as they summarize a set of gene trees to estimate the species trees [9].
Several species-tree estimation methods have been proven to be statistically consistent under the multispecies coalescent model, meaning that they will return the true species tree with high probability given a sufficiently large number of true gene trees sampled from the distribution defined by the species tree. Methods that are statistically consistent under the MSC include ASTRAL [10], MP-EST [11], *BEAST [12], NJst [13], BUCKy [14], GLASS [15], STEM [16], SNAPP [17], SVDquartets [18], STEAC [19] and ASTRID [20]. *BEAST, which is a Bayesian technique, can co-estimate both gene trees and species trees, and simulation studies suggest that *BEAST can be considered the best of the co-estimation methods with excellent accuracy, but it is computationally very intensive to run on even moderately-sized dataset [9,21]. SuperTriplets [22] is a triplet-based supertree method which tries to find an asymmetric median supertree according to a triplet dissimilarity. SuperTriplets is very fast and accurate and was used in a number of important phylogenomic studies [23][24][25][26][27][28]. ASTRAL and MP-EST are two of the most accurate and widely used summary methods that are much faster than *BEAST. ASTRAL has been shown to be faster and more accurate than MP-EST and can handle large dataset containing hundreds of species which is not possible by MP-EST [10,29].
ASTRAL finds a species tree that maximizes the number of consistent quartets between the species tree and the gene trees. MP-EST maximizes a pseudo-likelihood estimate utilizing the underlying triplet distribution of the gene trees. Quartet and triplet based methods are robust to the "anomaly zone" [30,31] (a condition where the most probable gene tree topology may not be identical to the species tree topology) as there are no anomalous rooted three-taxon species trees [30,32] and no anomalous unrooted four-taxon species trees [31,33].
We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new coalescent-based method which finds a species tree that agrees with the largest number of triplets induced by the gene trees. STE-LAR is statistically consistent under the MSC model, fast (having a polynomial running time), and highly accurate -enabling genome wide phylogenomic analyses. We report, on an extensive experimental study, the performance of STELAR in comparison with ASTRAL, MP-EST and SuperTriplets. These quartet-and triplet-based methods are known to be highly accurate and reliable, and are being widely used. Our experiments showed that STELAR is as good as ASTRAL, and is better than MP-EST and SuperTriplets in most of the model conditions. Thus, with the desirable property of being statistically consistent and reliable performance on challenging realistic model conditions, STELAR will be a useful tool for genome-scale analyses of species phylogeny.

Implementation
Design of ASTRAL and other quartet-based methods [10,34] and their proofs of being statistically consistent are based upon the fact that unrooted 4-taxon species trees do not contain anomaly zone [31,33]. We use similar design in STELAR, utilizing the fact that rooted 3-taxon species trees do not contain anomaly zone [30,32].
STELAR is provably statistically consistent under the MSC model but its statistical guarantee can fail under model conditions in which gene tree discordance is due to gene duplication and loss or horizontal gene transfer. Furthermore, MSC model requires representing each gene by a single tree. Therefore, MSC model does not allow recombination events inside a gene [32] and consequently the statistical guarantee can fail in the presence of recombination.

Definitions and notation
Let T be a binary rooted tree, leaf-labelled by species set X with n taxa. Each internal node u in a tree T divides the set of taxa present in the subtree rooted at u into two distinct sets. We call this a subtree-bipartition, denoted by SBP T (u), which was originally defined in [35]. We denote by T u the subtree under node u of tree T. We denote the leaves in T by L(T), an arbitrary set of three species {a, b, c} ⊂ X by r, and a rooted topology on r by t r . We use a|bc to show that b and c are siblings. Possible three topologies on {a, b, c} are: a|bc, b|ca, c|ab. The probability, under the MSC model, of observing an induced gene tree triplet that matches the species tree topology is higher, and the other two triplet topologies are less probable and equal to each other [30]. The triplet tree topology that appears more frequently than the two alternative topologies is called the dominant topology. T R(T) denotes the set of n 3 triplet topologies induced by the tree T. We denote by T|r the triplet tree topology obtained by restricting T to the three species in r.

Problem definition
A natural optimization problem for estimating a species tree from a collection of gene trees under triplet consistency would be first decomposing the input gene trees into their induced triplets, and then estimating a species tree by amalgamating the dominant triplets [36]. Building such a consensus tree from a set of triplets has been shown to be an NP-hard problem, and exact and heuristic algorithms have been proposed [37,38]. This approach considers only the dominant triplets which could be problematic when the frequencies of the alternate triplet topologies are very close to that of the dominant one, and when all the dominant triplets are not compatible with each other meaning that all of them do not agree with a single species tree. Moreover, this approach has to explicitly enumerate and consider 3 * n 3 possible triplets, making it computationally expensive for larger values of n. An alternate approach would be to consider the relative frequencies of all the triplets, and infer a species tree that agrees with the largest number of triplets induced by the gene trees. Methods that do not explicitly decompose the gene trees into induced triplets and quartets are preferred over the ones that take a set of quartets or triplets as input, since the latter types of methods demand high computational time and memory, and thus are not scalable to large dataset [10,34]. This is possibly one of the reasons why quartet amalgamation method like QFM [34], despite having excellent accuracy, is not as popular as ASTRAL. Taking these issues into consideration, we introduce a constrained version of the triplet consistency problem, which we call the Constrained Triplet Consensus (CTC) problem, and provide efficient exact and heuristic solutions to the CTC without having to decompose the gene trees into their induced triplets. We formalize the CTC problem as follows.
• Input: a set G of rooted gene trees on X and a set SBP of subtree-bipartitions. • Output: a species tree ST on X that maximizes the triplet agreement with G and all its subtree-bipartitions are in SBP.
Note that when SBP is set to all possible subtreebipartitions on X , the triplet consistency problem will find the globally optimal solution (exact solution). Otherwise, when SBP contains only the subtree-bipartitions in the input gene trees, the search space will include only those species trees where every subtree-bipartition is present in at least one of the gene trees in G, and we call this the constrained version or heuristic version.

Theorem 1 Given a set G of true gene trees, solution to the CTC problem (both exact and constrained version) is a statistically consistent estimator of the species tree topology under the MSC model.
Proof Let G be a set of sufficiently large number of true gene trees that has evolved within the model species tree T under the multi-species coalescent model. We know that a rooted 3-taxon species tree does not have any anomaly zone [32]. So, as we increase the number of gene trees, each triplet topology induced by the species tree will have more frequency in G than its alternatives, with probability that approaches 1. Let w G (t r ) be the number of trees in G that agree with t r . Hence, for every triplet r and every possible tree T , w G (T|r) ≥ w G (T |r) with probability that approaches 1 for a large enough number of gene trees. Therefore, if T R is the set of all possible triplets on X , then By definition, the exact solution to the CTC problem will find a tree that will maximize the triplet agreement with G. Therefore, for a sufficiently large number of true gene trees, the optimal tree under CTC will be identical to the true species tree T with probability approaching one.
We now show that the solution to the constrained version is also statistically consistent. If we increase the number of gene trees, with probability approaching 1, at least one of the gene tree topologies in G will be identical to the true species tree topology. Therefore, with probability that tends to 1, the set SBP will contain all the subtree bipartitions from the true species tree, and therefore the solution to the CTC problem will be identical to the true species tree T.

Algorithmic design of STELAR
We propose a dynamic programming (DP) solution to the CTC problem. The algorithmic design (and corresponding theoretical results) in STELAR is structurally similar to ASTRAL. This sort of DP-based approach, which implicitly finds a maximum or minimum clique in a graph modelled from the input gene trees, was first used by [39] and later was used in Phylonet [40,41], DynaDup [35,42] and ASTRAL. The key idea in STELAR is to find an equation for computing the number of triplets in the gene trees that agree with a given subtree in the species tree, which ultimately enables us to design a dynamic programming algorithm for estimating a species tree by maximizing the triplet agreement.
We call a triplet t r = a|bc to be mapped to a subtreebipartition at an internal node , and the topology of t r is compatible (consistent) with T x (see Fig. 1). Each triplet mapped to a subtree-bipartition (X 1 |X 2 ) will have either two leaves from X 1 and one leaf from X 2 or two leaves from X 2 and one leaf from X 1 . Therefore, we have the following results.
Lemma 1 Total number of triplets mapped to a subtreebipartition x = (X 1 |X 2 ) of a rooted binary tree T, where |X 1 | = n 1 and |X 2 | = n 2 is: (1) Fig. 1 Mapping of a triplet to a subtree-bipartition. Each internal node x in a rooted tree defines a subtree-bipartition (X 1 |X 2 ). Each induced triplet in T maps to a subtree-bipartition in T. This figure shows the mapping of a triplet a|bc to a subtree-bipartition x = X 1 |X 2

the number of triplets mapped to both x and y is given by:
Proof We can obtain two possible subtree bipartitions by taking pairwise intersection: is also a leaf in both x and y, and so each triplet mapped to z i is also mapped to x and y and the number of mapped triplets can be computed by Eq. 1.
We now prove that Eq. 2 counts a mapped triplet only once. Without loss of generality, let t r = ab|c be mapped to both x and y where a, b ∈ X 1 , c ∈ X 2 , a, b ∈ Y 1 and c ∈ Y 2 . Equation 2 will count t r for z 1 = (X 1 ∩ Y 1 |X 2 ∩ Y 2 ) only as z 2 will have no element from {a, b, c}. Thus the lemma follows.
We now see how to compute the triplet consistency (TC) score of a species tree ST with respect to a set G of gene trees, which we denote by TC G (ST). If SBP(T) is the set of subtree-bipartitions in T, for a subtree-bipartition x and a set of rooted binary trees G, the total number of triplets in G that are mapped to x is given by: Thus, the TC score of an estimated species tree ST is obtained by, Therefore, we can design a dynamic programming approach which starts with the whole species set X and recursively divides it into smaller subsets and picks the partition that maximizes the triplet consistency. At each level, we compute V (A) which gives the score of an optimal subtree on leaf set A ⊆ X . At the end level, we have computed V (X ) and sufficient information so that backtracking yields the corresponding subtreebipartitions in the optimal species tree.

Dynamic programming
Base case:

Running time analysis
Lemma 3 Given a set G of k gene trees on n taxa and a subtree-bipartition x, computing the triplet consistency score TC G (x) takes O(n 2 k) time.
Proof A gene tree with n taxa contains O(n) subtreebipartitions. Thus, for a set of k input gene trees, there will be O(nk) subtree-bipartitions. To find the triplet consistency score TC G (x) of a subtree-bipartition x using Eq. 3, we need to score M(x, y) for all subtree-bipartions y present in the input gene trees. Since we represent each subtree-bipartion by a bitset, calculating M(x, y) (using Proof For a specific cluster A, V (A) can be computed in O |SBP| * n 2 k time since at worst we need to look at every subtree-bipartition in SBP, and computing TC G (x) takes O n 2 k time. Note that this is a conservative analysis since the number of subtree-bipartitions that the DP algorithm needs to consider for computing V (A) for a specific cluster A is, in reality, much less than |SBP| (especially for A where |A| < |X |). There will be O(|SBP|) possible clusters for which we need to compute V (A). Thus, the running time of STELAR is O n 2 k|SBP| 2 .

Software implementation
STELAR is a java based cross-platform application. STE-LAR code uses code from the PhyloNet package by Luay Nakhleh (bioinfo.cs.rice.edu/phylonet). The Phylonet code base was previously used by DynaDup [35,42] and ASTRAL (with permission from the authors), and STELAR is mostly based on the DynaDup code base.

Experimental studies
We compared STELAR with SuperTriplets, which is a very fast and highly accurate triplet based supertree technique and has been used in several important phylogenomic studies [22][23][24][25][26][27]. We also evaluated STELAR in comparison with ASTRAL-III [43] and MP-EST, two of the most widely used and accurate coalescent-based methods. ASTRAL has already been compared and shown to outperform many of the existing species tree estimation methods (e.g., BUCKy-pop, MRP, greedy consensus, CA-ML, etc.), especially in the presence of moderate to high levels of ILS. So we did not include those methods in this study. Most of the datasets analyzed in this study are prohibitively large for *BEAST to analyze. We ran MP-EST with 10 random starting points and selected the species tree with the highest pseudo-likelihood value.
We used previously studied simulated and biological datasets to evaluate the performance of these methods. We compared the estimated species trees to the model species tree (for the simulated datasets) or to the scientific literature (for the biological datasets), to evaluate accuracy. We used normalized Robinson-Foulds distance (RF rate) [44] to measure the tree error. The RF distance between two trees is the sum of the bipartitions (splits) induced by one tree but not by the other, and vice versa. All the trees estimated by ASTRAL-III, STELAR and MP-EST in this study are binary and so RF rate, False Positive (FP) rate and False Negative (FN) rate are identical. However, the trees estimated by SuperTriplets may produce multifurcations (unresolved trees). We performed Wilcoxon signed-rank test (with α = 0.05) to measure the statistical significance of the differences between two methods.

Datasets
We studied four collections of simulated datasets: one based on a biological dataset (37-taxon mammalian dataset) that was generated and subsequently analyzed in prior studies [10,29,45], and three other simulated datasets from [45], [46], and [47]. Table 1 presents a summary of the datasets analyzed in this paper, showing various model conditions with varying numbers of taxa (11 to 37), ILS levels (reflected in the average topological distance between true gene trees and true species tree) and gene sequence lengths. All datasets consist of gene sequence alignments generated under a multi-stage simulation process that begins with a species tree, simulates gene trees down the species tree under the multi-species coalescent model (and so can differ topologically from the species tree), and then simulates gene sequence alignments down the gene trees.
Mammalian dataset was simulated by taking the species tree estimated by MP-EST on the biological dataset studied in [48]. This species tree had branch lengths in coalescent units, which we used to produce a set of gene trees under the coalescent model. Thus, the mammalian simulation model tree has an ILS level based on a coalescent analysis of the biological mammalian dataset, and other properties of the simulation that are set to reflect the biological sequences [48] studied. We explored the impact of varying amounts of ILS, varying numbers of genes (25 ∼800), the impact of phylogenetic signal by varying the sequence length (250, 500, and 1000 base pair (bp)) for the markers. Three levels of ILS were simulated. The basic model species tree has branch lengths in coalescent units, and we produced other model species trees by multiplying or dividing all internal branch lengths in the model species tree by two. This rescaling varies the amount of ILS (shorter branches have more ILS), and also impacts the amount of gene tree estimation error in the estimated gene trees. The basic model condition with moderate amount of ILS is referred to as 1X and the model conditions with higher and lower amounts of ILS are denoted by 0.5X and 2X, respectively.
We used three other simulated datasets: 11-taxon dataset (generated by [46] and subsequently studied by [9,49]), 15-taxon dataset (generated and studied by [45]), and 500-taxon dataset (generated and studied by ASTRAL-II [47]). 11-taxon dataset was simulated under a complex process to ensure substantial heterogeneity between genes and to deviate from the molecular clock [46]. We analyzed the model condition with high amount of ILS as the model condition with lower amount of ILS is very easy to analyze and even methods without any statistical guarantee produced highly accurate tree with just 25 ∼ 50 gene trees [9]. 15-taxon datasets vary in sequence lengths and has high amount of ILS. The 500-taxon dataset was generated under the MSC model by introducing rate heterogeneity between genes and deviations from molecular clock (see [47] for more details).
We used two biological datasets: the 37-taxon mammalian datasets studied by [48] with 424 genes, and the Amniota dataset from [50] containing 16 species and 248 genes.

Results on mammalian simulated dataset
We analyzed the performance of SuperTriplets, ASTRAL-III, MP-EST and STELAR on various model conditions with varying amounts of ILS, numbers of genes and lengths of the sequences. Figure 2a shows the average RF rates of three methods on three model conditions with varying amounts of ILS. For all methods, RF rates increase as we increase the amount of ILS. SuperTriplets produced significantly less accurate trees than the other methods, and the trees are not fully resolved. For the dataset with varying amounts of ILS, SuperTriplets produced trees with RF rates 10% ∼18%, whereas the error rates of STE-LAR, MP-EST and ASTRAL-III range from 4% ∼ 6%. STELAR and ASTRAL-III are better than MP-EST on all amounts of ILS and, in some cases, the differences are statistically significant (P < 0.05). STELAR and ASTRAL are comparable in terms of tree accuracy and no statistically significant difference was noticed. Figure 2c shows the impact of the numbers of gene trees on the accuracy of the estimated species trees. STE-LAR, MP-EST and ASTRAL-III improved with increasing numbers of genes (this is expected as these methods are statistically consistent and thus increasing the numbers of genes has resulted into improved topological accuracy). However, SuperTriplets failed to leverage the increasing amount of data since it did not improve as we increased the numbers of genes. Moreover, even with 800 genes, SuperTriplets produced unresolved trees. STELAR and ASTRAL-III achieved similar accuracy and performed significantly better than MP-EST and SuperTriplets across all model conditions.

Results on 11-taxon dataset
We analyzed both the estimated and true gene trees. In both cases, we varied the number of genes from (5 ∼ 100). Figure 3 shows the results on 11-taxon dataset for both estimated and true gene trees. Similar to the mammalian dataset, SuperTriplets produced the least accurate trees. As expected, all methods improved with increased numbers of genes (except for a few cases for SuperTriplets), and had their best accuracies on true gene trees. The accuracies of ASTRAL-III, MP-EST and STELAR are almost similar with no statistically significant difference. On true gene trees, ASTRAL, MP-EST and STELAR recovered the true species trees across all the replicates with 50 or higher numbers of genes. However, the accuracy of SuperTriplets did not achieve any notable improvement as we increased the number of genes beyond 25. Unlike mammalian dataset, MP-EST was slightly better than ASTRAL-III and STE-LAR on a couple of model conditions (albeit the differences are not statistically significant with P-values greater than 0.05).

Results on 15-taxon dataset
15-taxon dataset evolved under a caterpillar model species tree with very high level of ILS [45]. We explored the performance on varying sequence lengths (100 bp and 1000 bp), and numbers of genes (100 and 1000). Unlike other two dataset, SuperTriplets returned well resolved trees without multi-furcations and matched the accuracy of other methods on this particular dataset. All the methods improved as the the number of genes and sequence length increased, and had their best accuracy (RF rate = 0) on true gene trees (Fig. 4). No statistically significant difference was observed between STELAR and ASTRAL-III. MP-EST consistently had the highest error rates across all the model conditions on estimated gene trees. On true gene trees, all the methods returned the true species tree on almost all the replicates.

Results on 500-taxon dataset
In order to assess the performance of STELAR on relatively larger datasets, we analyzed the 500-taxon dataset [47] with 1000 genes. We analyzed 10 replicates of data. Some of the replicates had a few non-binary gene trees which we discarded in order to run STELAR. After removing the non-binary gene trees, the numbers of genes in the replicates range from 993 ∼ 1000. MP-EST is computationally too demanding to run on this large dataset. Figure 5 shows the error rates of STELAR, ASTRAL-III and SuperTriplets. Both STELAR and ASTRAL produced competitive results with no statistically significant difference. However, similar to the other dataset analyzed in Fig. 3 Average RF rates of ASTRAL-III, MP-EST, STELAR and SuperTriplets on the 11-taxon dataset with high amount of ILS. We varied the numbers of genes from 5 to 100. We analyzed both the estimated and true gene trees. We show the average RF rates with standard error bars over 20 replicates Fig. 4 Comparison of ASTRAL-III, MP-EST, STELAR and SuperTriplets on the 15-taxon dataset. We varied the numbers of estimated gene trees (100 and 1000) and sequence lengths (100 bp and 1000 bp). We also analyzed the true gene trees. We show the average RF rates with standard errors over 10 replicates this study, SuperTriplets was much worse than ASTRAL-III and STELAR.

Results on biological datasets
Amniota dataset: We analyzed the Amniota dataset from [50] containing 248 genes across 16 amniota taxa in order to resolve the position of turtles within the amniotes, especially relative to birds and crocodiles. We re-analyzed Fig. 5 Comparison of ASTRAL-III, MP-EST, STELAR and SuperTriplets on 500-taxon dataset. We show the average RF rates with standard errors over 10 replicates both the amino acid (AA) and DNA gene trees, obtained from the authors, using ASTRAL, MP-EST, STELAR and SuperTriplets. The sister relationship of crocodiles and birds (forming Archosaurs), and placement of the turtles as the sister clade to Archosaurs are considered reliable and were supported by previous studies [51][52][53].
For amino acid dataset, all the methods placed birds and crocodiles as sister groups (forming Archosaurs) and turtles as sister to Archosaurs clade (see Fig. 6). The unrooted version of STELAR estimated tree is identical to ASTRAL. MP-EST also returned a highly similar tree except for different resolutions within the turtles. SuperTriplets produced an unresolved tree with several multifurcations, leaving bird/turtle/crocodile relationship uncertain. It also did not resolve the Squamates clade.
STELAR and MP-EST on the DNA gene trees produced the same tree which resolved bird/turtle/crocodile relationship as (birds,(turtles,crocodiles)) and thus did not form Archosaurs. However, ASTRAL on the DNA gene trees resolved the relationships as (turtles,(birds,crocodiles)). Therefore, maximizing triplet agreement and quartet agreement both recovered the (birds,(turtles,crocodiles)) relationships on AA gene trees, but differed on the DNA gene trees. Similar to the AA dataset, SuperTriplets did not resolve the bird/turtle/crocodile relationship. We estimated the quartet scores and the triplet scores of these different tree topologies with respect to the corresponding gene trees (see Tables 2 and 3). As ASTRAL estimated trees need to be interpreted as unrooted trees, we root the ASTRAL trees using the correct outgroup (Protopterus annectens) in order to compute the triplet scores. As expected, ASTRAL achieved the highest quartet agreement scores and STELAR achieved the highest triplet agreement scores.
Mammalian dataset: We re-analyzed the mammalian dataset from [48] containing 447 genes across 37 mammals after removing 21 mislabeled genes (confirmed by the authors), and two other outlier genes. The placement of tree shrews (Tupaia belangeri) is of great importance here as previous studies found that the phylogenetic placement of tree shrews is unstable as they can be classified as a sister group to Primates or to Glires [48,[54][55][56]. Our analyses using ASTRAL, MP-EST and STELAR recovered the same topology which placed tree shrews as sister to Glires (see Fig. 7). This is consistent to the tree estimated by CA-ML (reported in [48]). However, previous studies with MP-EST, using multi-locus bootstrapping [57], on the same dataset recovered a tree which placed tree shrews as the sister to Primates [10,56]. SuperTriplets produced a highly similar tree except that it resulted into an unresolved tree with several multi-furcations. It recovered the major clades such as Primates, Glires, Cetartiodactyla, Eulipotyphla, Chiroptera. However, SuperTriplets left the relationships within Primates/Glires/Scandentia unresolved. This is in line with the tree estimated by SuperTriplets on the OrthoMaM database [58], which was reported in [22]. We investigated the quartet and triplet scores of the two alternative hypotheses (regarding the placement of tree shrew) and found that the tree that suggests (tree shrew, Glires) clade has higher quartet and triplet scores (25526915 and 3042547, respectively) compared to the one that suggests (tree shrew, Primates) relationship which yields 25511503 and 3041747 quartet and triplet scores respectively. Therefore, our analyses strongly support the placement of tree shrews as sister to Glires.

Running time
We performed the experiments on small-to moderatescale dataset (11 ∼ 37 taxa) on a Linux machine with 7.5 GiB RAM and an i5 1.6 GHz processor. For the 500-taxon dataset, we used a Linux machine with 15.6 GiB RAM and an i7 3.60 GHz processor. Our experiments show that SuperTriplets is usually significantly faster than the other methods, and MP-EST is computationally more demanding than the others. For 11-, 15-, and 37-taxon datasets, SuperTriplets took a fraction of a second to run on a single replicate. STELAR and ASTRAL-III are substantially faster than MP-EST. The running time of MP-EST is more sensitive to the number of species than the number of genes, and grows rapidly as we increase the number of taxa [29]. On  There is no significant difference in running time between ASTRAL-III and STELAR on relatively smaller datasets. However, on the larger dataset with 500 taxa, ASTRAL-III is much faster than STELAR. ASTRAL-III took around 18 min whereas STELAR took about 2 h. Interestingly, SuperTriplets -which is much faster than ASTRAL-III and STELAR on relatively smaller datasets -is not the fastest on 500-taxon dataset, taking around 26 min. ASTRAL has evolved over the years (ASTRAL [10], ASTRAL-II [47], ASTRAL-III [43], etc.) with efficient heuristics to explore the tree space and highly optimized implementation. Although STELAR is not as fast as ASTRAL at its current form, it is sufficiently fast to analyze large dataset with hundreds of taxa. We will continue to improve STELAR to make it fast enough to run on ultra large dataset with thousands of taxa.

Discussion
Extensive experimental studies on simulated dataset with varying model conditions suggest that STELAR, which is an efficient DP-based algorithm for solving the triplet consistency problem, can produce substantially better trees than SuperTriplets (which is a triplet-based supertree method and tries to find a median supertree with respect to a triplet dissimilarity measure), and MP-EST (which tries to maximize a pseudo-likelihood measure based on the triplet distribution of the gene trees). Thus, within the scope of the experiments conducted in this study, maximum triplet agreement has been shown to be a better optimization criterion than the triplet frequency based maximum pseudo-likelihood used in MP-EST, and the triplet dissimilarity based maximum parsimony criterion used in SuperTriplets, and as good as the maximum quartet agreement criterion used in ASTRAL. Experiments on Amniota and mammalian biological dataset suggest that STELAR can produce reliable trees on real biological dataset.
Most of the existing methods have assumed that the gene tree discordance is due to only a particular biological reason, for example, ILS. Therefore, both orthologs and paralogs are not being included in many studies such as 1KP project [59], where many genes were discarded that evolved with duplications and losses. Furthermore, existing methods try to optimize a single optimization criterion (e.g., quartet consistency [10], and pseudo-likelihood [11]). Although these criteria are statistically consistent, the performance of various methods may vary on practical model conditions with limited numbers of genes and in the presence of gene tree estimation error. These factors underscore the need for modeling species tree estimation as a multi-objective optimization problem, where multiple explanations for observed data can be conjectured, and multiple optimization criteria are simultaneously optimized. Therefore, developing new scalable and statistically consistent estimators of species trees is of great importance. We believe this study advances the state-of-the-art in summary methods, and STELAR would be considered as a useful tool in phylogenomics.
STELAR can consistently estimate the species tree topology and is fast enough to handle large numbers of genes and species, making it suitable for large scale phylogenomic analyses. However, STELAR can be expanded in several directions. Future work will need to investigate how the estimation of branch lengths (in coalescent unit) can be incorporated in STELAR. STELAR, at this stage, cannot handle unrooted and non-binary gene trees. The CTC problem for unrooted gene trees can be formulated as follows. The input is a set G = {gt 1 , gt 2 , gt 3 , . . . , gt k } of k unrooted gene trees, and the output is a species tree ST that maximizes the triplet consistency score with respect to a set G = {gt 1 , gt 2 , gt 3 , . . . , gt k } where gt i is a rooted version of gt i . That means, the idea is to find the optimal rooted versions of the unrooted gene trees so that the triplet agreement is maximized. Similar approach was previously applied [41] for estimating species trees from a set of unrooted gene trees by minimizing "extra lineages" resulting from deep coalescence. Similarly, the problem can be extended for nonbinary gene trees as well. Thus, future works on formalizing the CTC problem for unrooted and non-binary gene trees and developing appropriate algorithms will be of great importance. The current version of STE-LAR is reasonably fast and can handle hundreds of taxa. However, more efficient algorithms and implementation are required to make STELAR scalable to much larger dataset with thousands of taxa. Furthermore, while we have studied various practical and challenging model conditions with varying numbers of genes, and varying amounts of ILS, and gene tree estimation error, future work should investigate the relative performance of STE-LAR and the widely used existing methods on various other challenging model conditions, for example, model conditions with missing taxa (i.e., incomplete gene trees). This study investigated relatively long sequences (250 ∼2000 bp); subsequent studies should investigate the relative performance of STELAR and other methods on very short sequences, since recombination-free loci can be very short [60].