Proof of Lemma ?? (Link between rooted and unrooted trees)
Let T_{1} and T_{2} be two rooted trees and \(T^{\prime }_{1}\) and \(T^{\prime }_{2}\) be the corresponding unrooted trees, i.e. V(T1′)=V(T_{1})∪{R},V(T2′)=V(T_{2})∪{R},E(T1′)=E(T)∪{(r(T_{1}),R)} and E(T2′)=E(T)∪{(r(T_{2}),R)}.
We first show that any bad bipartition of \(T^{\prime }_{1}\), i.e. any bad edge of \(T^{\prime }_{1}\), corresponds to a bad clade of T_{1} (a clade which is not present in T_{2}). Let \(e^{\prime }_{1}\) be a bad edge of \(T^{\prime }_{1}\). Then \(e^{\prime }_{1}\) should be a nonterminal edge of \(T^{\prime }_{1}\), thus different from (r(T_{1}),R)), and therefore it has a corresponding edge e_{1}=(x_{1},y_{1}) in T_{1}. Then, for one of the two nodes adjacent to \(e^{\prime }_{1}\) that we denote y1′, we have \(L(T'_{1y'_{1}}) = L(T_{1y_{1}})=C\). If e1′ is a bad edge of \(T^{\prime }_{1}\), then C should be a bad clade of T_{1} not present in T_{2}. This is because otherwise C would be a nontrivial clade of T_{2} rooted at an internal node y_{2} adjacent to an edge e_{2}=(x_{2},y_{2}) and thus also equal to \(L(T'_{2y'_{2}})\) for a given edge e2′=(x2′,y2′). This contradicts the fact that \(e^{\prime }_{1}\) is a bad edge. Therefore, each bad bipartition of \(T^{\prime }_{1}\) corresponds to a bad clade of T_{1}. Moreover, two disjoint bad bipartitions of \(T^{\prime }_{1}\) correspond to two different bad edges of \(T^{\prime }_{1}\), with the corresponding edges of T_{1} associated to two disjoint clades. Thus we have \(\mathcal {B}(T'_{1}) \leq \mathcal {C}(T_{1})\).
Conversely, a bad clade C of T_{1} corresponds to an internal node y_{1} of T_{1}. Let e_{1}=(x_{1},y_{1}) in T_{1}, where x_{1} is the parent of y_{1}. Then the corresponding edge \(e^{\prime }_{1}\) in \(T^{\prime }_{1}\) is a bad edge. Moreover, two disjoint clades of T_{1} correspond to two disjoint edges of T1′. It follows that \(\mathcal {C}(T_{1}) \leq \mathcal {B}(T'_{1})\). Combining this result with the result above, we deduce that \(\mathcal {C}(T_{1}) = \mathcal {B}(T'_{1})\). As T_{2} and \(T^{\prime }_{2}\) can be considered similarly, the result follows.
Proof of Lemma ?? (Edit distance):
The nonnegative and identity conditions are obvious. For the symmetric condition, notice that we can reverse every edit operation in an optimal sequence from T_{1} to T_{2} to obtain a sequence from T_{2} to T_{1} with the same number of events, and viceversa (extensions and contractions are inverses of each other, and any flip can be reversed by a flip). We thus have δ(T_{2},T_{1})≤δ(T_{1},T_{2}) and δ(T_{1},T_{2})≤δ(T_{2},T_{1}), and equality follows.
Finally, we prove the triangular inequality condition: for 3 trees T_{1},T_{2} and T_{3}, to transform T_{1} into T_{2}, we may take any edit sequence from T_{1} to T_{3}, followed by any edit sequence from T_{3} to T_{2}. It follows that δ(T_{1},T_{2})≤δ(T_{1},T_{3})+δ(T_{3},T_{2}).
Proof of Lemma ?? (Pairs of maximal bad subtrees):
As ∪_{i}Y_{i}=Ł, \(\phantom {\dot {i}\!}\{e'_{i}\}_{1 \leq i \leq k}\) are the only terminal edges of any subtree S^{′} of T^{′} containing the set \(\phantom {\dot {i}\!}\{e'_{i}\}_{1 \leq i \leq k}\) as terminal edges. As T^{′} is a tree, for any 1≤i≠j≤k, there is only one possible path from \(x^{\prime }_{i}\) to \(x^{\prime }_{j}\). Uniqueness follows.
Suppose that such a subtree S^{′} is not a bad subtree. Then it contains an internal good edge e^{′}=(x^{′},y^{′}). In other words, there is a nontrivial bipartition of {Y_{i}}_{1≤i≤k} which is also a bipartition in S. This contradicts the fact that S is a bad subtree of T. Finally, as all terminal edges of S^{′} are good edges of T^{′}, it follows that S^{′} is a maximal bad subtree of T^{′}.
Proof of Lemma ?? (Contract nonmixed bad edges):
We first introduce a definition that will be of use later in the proof. For two rooted trees S_{1} and S_{2}, define the union of S_{1} and S_{2} as the tree obtained by identifying their roots, i.e. by removing the root of S_{2} and making all its children now children of the root of S_{1}.
Let e={u,v} be a nonmixed bad edge and assume, without loss of generality, that both u and v have the label Spe (recall that Λ={Spe,Dup}). Notice that any sequence of operations turning T into T^{′}, at some point, must contract the {u,v} edge, as otherwise, the (bad) bipartition corresponding to {u,v} would remain in the transformed tree and we would not obtain T^{′} (noting that extensions cannot remove bipartitions). We now prove the Lemma by induction over δ(T,T^{′}). As a base case, suppose that δ(T,T^{′})=1. Then {u,v} must be the only bad edge of T and the single operation is to contract it, proving the base case.
Now assume that for any tree \(\tilde {T}\) satisfying \(\delta (\tilde {T}, T') < \delta (T, T')\), contracting any nonmixed bad edge of \(\tilde {T}\) reduces its distance to T^{′} by 1. Let Q=(q_{1},…,q_{l}) be an optimal sequence of operations transforming T into T^{′} (here each q_{i} denotes either a contraction, extension or flip). Let q_{j} be the event that contracts {u,v}. If q_{1}=q_{j}, then we are done, so assume otherwise. We make the assumption that whenever there is a contraction involving u prior to q_{j}, the contracted node is still called u. Furthermore, we assume that if an extension prior to q_{j} splits the neighbors of u, the node v is still a neighbor of u after the operation. All the same assumptions hold for v. This just changes the names we give to nodes and does not alter the scenario, but observe that this means that {u,v} is in every tree obtained before the first j operations.
For each i∈{1,…,l}, let T_{i} be the tree obtained after applying q_{1},…,q_{i} on T, and define T_{0}=T. Furthermore, for i∈{0,1,…,j−1}, denote by \(T^{u}_{i}\) and \(T^{v}_{i}\) the two trees obtained from T_{i} by removing the edge {u,v}, where u is in \(T^{u}_{i}\) and v is in \(T^{v}_{i}\). Define \(T^u = T^{u}_{0}\) and \(T^v = T^{v}_{0}\). We will assign u and v as the respective roots of each \(T^{u}_{i}\) and \(T^{v}_{i}\). Notice that for each i∈{1,…,j−1}, q_{i} only modifies either the subtree \(T^{u}_{i1}\) or \(T^{v}_{i1}\). Therefore, if events q_{i} and q_{i+1} modify \(T^{u}_{i1}\) and \(T^{v}_{i}\), respectively, we could apply q_{i+1} before q_{i} and T_{i+1} would still be the same tree. This lets us assume that we may reorder events such that all events affecting T^{u} (prior to q_{j}) occur before those affecting T^{v}. That is, there is some h such that q_{1},…,q_{h} only affects the T^{u} subtree, q_{h+1},…,q_{j−1} only affects the T^{v} subtree, so that \(T^{u}_h = T^{u}_{h+1} = \ldots = T^{u}_{j1}\) and \(T^v = T^{v}_1 = \ldots = T^{v}_{h}\).
Suppose first that u is labeled Spe in T_{h}, and thus also in T_{j−1}. Then v is also labeled Spe in T_{j−1} (and also in T_{h} since v was untouched until q_{h+1}). Let \(\hat {T}\) be the tree obtained after contracting {u,v} in T, and let z be the resulting node. Observe that if we interpret z as u, then we may apply the events q_{1},…,q_{h} on \(\hat {T}\), since these events only affected the T^{u} subtrees. To be formal, we “reproduce” q_{1} through q_{h} on \(\hat {T}\) by applying the events Q^{′}=(q1′,…,qh′) on \(\hat {T}\), defining \(\hat {T}_{i}\) as the tree obtained after the ith event of Q^{′}, where each \(q^{\prime }_{i}\) in Q^{′} is defined as follows:

if q_{i} contracts {x,y} in T_{i−1}, then \(q^{\prime }_{i}\) contracts {x,y} in \(\hat {T}_{i1}\) if x,y≠u, otherwise if, say, x=u, then \(q^{\prime }_{i}\) contracts {z,y} (and calls the resulting node z);

if q_{i} flips x in T_{i−1}, then \(q^{\prime }_{i}\) flips x in \(\hat {T}_{i1}\) if x≠u, or flips z otherwise;

if q_{i} is an extension and splits the neighborhood of x, then \(q^{\prime }_{i}\) does the same if x≠u (replacing u by z if needed). If x=u, then let X be the set of neighbors of v in T_{i−1}, excluding u. If Ch(u) is split into A and B by q_{i}, where v∈B, then \(q^{\prime }_{i}\) splits the neighbors A∪B∪X of z into A and B∪X (and z is the neighbor of B∪X and the newly created node).
One can verify the following that the following invariant holds on each \(\hat {T_i}, i \in \{1, \ldots, h\}\): if we take T_{i} and contract the edge {u,v}, ignoring the labels and keeping the label of u, then we obtain \(\hat {T}_{i}\) (the invariant is also true for T and \(\hat {T}\)).
The resulting tree \(\hat {T}_{h}\) obtained from applying q1′,…,qh′ on \(\hat {T}\) will therefore contain z as a Spe node, and will be the union of \(T^{u}_{h}\) and \(T^{v}_{0}\). From this point, in a similar fashion, we may interpret z as v and apply q_{h+1},…,q_{j−1} on \(\hat {T}_{h}\), resulting a tree that is the union of \(T^{u}_h = T^{u}_{j1}\) and \(T^{v}_{j1}\). The corresponding events are the same as above, we omit the formal details. Since T_{j} is obtained from T_{j−1} by contracting {u,v}, this means that \(\hat {T}_{j1} = T_{j}\), which we have attained with j events but contracting {u,v} first, which proves this case.
Suppose instead that u is labeled Dup in T_{h}. Then v is a Dup node in T_{j−1}. We may further assume that v is a Spe node in T_{h+1},…,T_{j−2}, since whenever we flip v into a Dup, we may assume by induction that {u,v} gets contracted. Therefore, q_{j−1} flips v from Spe to Dup, and for the first time. We may then do the following: first apply the events q_{h+1},…,q_{j−2} on \(\hat {T}\), interpreting z as v. The resulting tree \(\hat {T}'\) contains z as a Spe node, and is the union of \(T^{v}_{j2}\) and \(T^{u}_{0}\). We may now apply q_{1},…,q_{h} on \(\hat {T}'\) by interpreting u as z, resulting in a tree \(\hat {T}^{\prime \prime }\) that contains z as a Dup node and is the union of \(T^{u}_{h} = T^{u}_{j1}\) and \(T^{v}_{j  1}\). We have thus attained T_{j}, but this time without the q_{j−1} flip on v, contradicting the optimality of Q. This concludes the proof.
Proof of Lemma ?? (Upper bound δ):
Methodology 1 performs e contractions and e^{′} extensions. As for the number of flips, we have to flip at most all the nodes belonging to the smallest label group, which means at most half the nodes in each tree, and thus at most n flips in total.
Proof of Lemma ?? (Compare Meth.1 and Meth.2):
We denote by Cont(T) the minimum length of a sequence of operations contracting T, and by l(¶) the length of a sequence ¶ of edit operations (Fig. 7).
Let ¶_{2} be an optimal sequence contracting S to S_{∗} and ¶2′ be an optimal sequence contracting S^{′} to S∗′. As each operation is reversible, ¶2′ leads to a corresponding sequence ¶2′′ of the same length between S∗′ and S^{′}. Thus, ¶_{2}, concatenated with a possible flip operation transforming S_{∗} to \(S^{\prime }_{*}\), concatenated with ¶2′′ is a sequence from S to S^{′} following Methodology 1, and thus M_{1}(S,S^{′})≤M_{2}(S,S^{′}) (R1).
Conversely, let ¶ be an optimal sequence following Methodology 1. Then this sequence can be subdivided into a sequence ¶_{1} from S to a star tree S_{1}, and ¶1′ from S_{1} to S^{′}. As each operation is reversible, ¶1′ leads to a corresponding sequence ¶1′′ of the same length between S^{′} and S_{1}. In other words, M_{1}(S,S^{′})=l(¶_{1})+l(¶1′)=l(¶_{1})+l(¶1′′)≥Cont(S)+Cont(S^{′}).

1.
If S_{∗}=S∗′, then M_{2}(S,S^{′})=Cont(S)+Cont(S^{′}) and thus M_{1}(S,S^{′})≥M_{2}(S,S^{′}), and the result follows from (R1).

2.
Otherwise, S_{∗} and S∗′ are different and M_{2}(S,S^{′})=Cont(S)+Cont(S^{′})+1. Thus M_{1}(S,S^{′})≥Cont(S)+Cont(S^{′})=M_{2}(S,S^{′})−1, and thus M_{2}(S,S^{′})≤M_{1}(S,S^{′})+1.
Proof of Lemma ?? (Optimal path contracting a mixed tree):
We first show that at least ⌈diam(T)/2⌉−1 flips are needed, by induction over the diameter of T. When diam(T)=2, T is a star tree and 0=diam(T)/2−1 flips are needed. For the induction step, we assume that any tree T^{′} with diam(T^{′})<diam(T) requires at least ⌈diam(T^{′})/2⌉−1 flips. Take any optimal sequence of events S, and observe that in S, when we flip a node v of T, by Lemma ?? we may assume that S contracts all the incident edges to v until we obtain another mixed tree. Let T_{1},T_{2},…,T_{k} be the sequence of mixed trees encountered when applying S, i.e. each T_{i} is obtained after flipping a node and contracting its incident edges. Define T_{0}=T. Let i be the smallest index such that diam(T_{i})<diam(T). Then in T_{i−1}, there was a longest chain P=(u_{1},…,u_{l}) of length diam(T). The flipandcontract operations from T_{i−1} to T_{i} can reduce the length of P by at most 2 since we flip one node and only its incident edges, of which there are at most two on P. Hence diam(T_{i})≥diam(T)−2. We deduce by induction that the number of required flips is at least 1+⌈(diam(T)−2)/2⌉−1=⌈diam(T)/2⌉−1.
We now turn to the converse bound ϕ(T)≤⌈diam(T)/2⌉−1. Fix any node v of T, and suppose that we run the following procedure: as long as T is not a star tree, flip v and contract its incident internal edges. Since each flipandcontraction iteration reduces the length from v to any leaf by 1 (except its neighbors), ecc_{T}(v) is reduced by 1 each round. We stop when ecc_{T}(v)=1, in which case only terminal edges remain, and in the end, this means that ecc_{T}(v)−1 flips are needed.
To see why this proves our bound, we show that there always exists a node with eccentricity ⌈diam(T)/2⌉. Consider a longest chain P of T with nodes w_{1},…,w_{k}. Observe that diam(T)=k−1 (recall that distances are counted in terms of edges). Consider a midpoint node w:=w_{⌈k/2⌉} on P. We claim that ecc_{T}(w)=⌈diam(T)/2⌉. It is easy to check that w has distance at most ⌈diam(T)/2⌉ and at least ⌊diam(T)/2⌋ to the leaves w_{1} and w_{k} on P. Assume for contradiction that w is at distance at least ⌈diam(T)/2⌉+1 from some leaf l of T not in P. Then either we can form a chain from w_{1} to w and then to l, or a chain from w_{k} to w and then to l. This chain has length at least ⌊diam(T)/2⌋+⌈diam(T)/2⌉+1>diam(T), a contradiction. This shows that ecc_{T}(w)=⌈diam(T)/2⌉ and concludes the proof.
Proof of Theorem ?? (Upper bound Meth.2):
Consider a given instance (T,T^{′}). Take any leaf of T and assign it as the root, and do the same for T^{′}. Although we have assumed roots of degree at least two so far, we use this rooting only for our analysis in order fix a parentchild relationship between nodes. Let Q be an optimal sequence of operations turning T into T^{′}. We may assume that Q first contracts every nonmixed edge, and our algorithm does the same. Therefore, we suppose that T and T^{′} contain no nonmixed edges. Assume for our purposes that whenever a contraction takes place in Q between a node u and a child v, the u node stays in the tree and v gets removed (here the notion of a child is in the rooted sense with respect to our rooting above). Also assume that when there is an extension splitting a node u, then the newly created node becomes a child of u and u retains the same parent. It is easily checked that this only alters the name of nodes and not the sequence itself.
Call an internal node v of T a good child if the edge between v and its parent is good. Note that v has a unique corresponding node in T^{′} which we denote v^{′} (i.e. v^{′} is the root of the same clade as the subtree rooted at v). Further, call v a badgood child if v is a good child, but either the label of v differs from that of v^{′}, or v is incident to at least one bad edge (yes, children are capable of being both bad and good). Note that every bad subtree of T is rooted at a badgood child, and observe that here we say that a badgood child v that is incident to only good edges is a particular case of a bad subtree (i.e. v just has the wrong label).
We already know that δ(T,T^{′}) is at least the number of bad edges in T and T^{′}. Let Q^{′} be the set of operations of Q that are either flips, or contraction of good edges. We argue that Q^{′} is at least the number of badgood children in T. To see this, let v be a badgood child. Assume first that v is not incident to any bad edge. If we never flip v nor remove it by contracting its parent edge, then Q cannot transform T into T^{′}, as v and its underlying clade remain present in every tree from T to T^{′}, but with the wrong label (because a contraction not removing v cannot remove the v clade, and extensions can create clades but not remove them). So we may assume that v gets flipped or that its parent edge gets contracted. A flip must be in Q^{′} and, observing that at any point the parent edge of v must be good, a contraction removing v must also be in Q^{′}. Assume instead that v is incident to at least one bad edge {v,w}, with w a child of v. If v is never flipped nor removed owing to a contraction of its parent edge, then at some point w must be flipped so that the {v,w} edge gets contracted. Otherwise, if v gets removed, then its parent edge was contracted, again implying the contraction of a good edge. Either cases imply an operation in Q^{′}. Importantly, observe that the operations in Q^{′} identified above are all distinct, since each one implies a flip or a node removal of a node in a different bad subtree of T.
Now, let T_{1},…,T_{k} be the bad subtrees of T and T^{′}, and for each i∈{1,…,k}, let t_{i} be the number of bad edges in T_{i}. Further denote \(b = \sum _{i=1}^k t_{i}\). Since bad subtrees form pairs, our arguments above imply that Q^{′} has at least k/2 operations (because Q^{′} is at least the number of bad trees in T, which is half the number of bad subtrees). The contraction of bad edges plus the operations of Q^{′} show that Q has at least \(\sum _{i = 1}^k t_i + k/2 = b + k/2\) operations. Our algorithm contracts b edges in total. To count the number of flips, take any bad subtree T_{i}. Then t_{i}≥diam(T_{i})−2 and the number of flips we perform is at most ⌈diam(T_{i})/2⌉−1=⌈(diam(T_{i})−2)/2⌉≤t_{i}/2+1. Note that this also holds when T_{i} contains no bad edge. Therefore, the number of operations that we perform is at most \(b + \sum _{i=1}^k (t_i/2 + 1) = 3b/2 + k\). Our approximation ratio is therefore \(\frac {3b/2 + k}{b + k/2} \leq \frac {2b + k}{b + k/2} = 2\).