Volume 15 Supplement 6

## Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics

# Sets of medians in the non-geodesic pseudometric space of unsigned genomes with breakpoints

- Arash Jamshidpey
^{1}, - Aryo Jamshidpey
^{2}and - David Sankoff
^{1}Email author

**15(Suppl 6)**:S3

https://doi.org/10.1186/1471-2164-15-S6-S3

© Jamshidpey et al.; licensee BioMed Central Ltd. 2014

**Published: **17 October 2014

## Abstract

### Background

The breakpoint median in the set *S*_{
n
} of permutations on *n* terms is known to have some unusual behavior, especially if the input genomes are maximally different to each other. The mathematical study of the set of medians is complicated by the facts that breakpoint distance is not a metric but a pseudo-metric, and that it does not define a geodesic space.

### Results

We introduce the notion of partial geodesic, or geodesic patch between two permutations, and show that if two permutations are medians, then every permutation on a geodesic patch between them is also a median. We also prove the conjecture that the input permutations themselves are medians.

### Keywords

breakpoint distance pseudometric non-geodesic space random genomes## Backgound

Among the common measures of gene order difference between two genomes, the edit distances, such as reversal distance or double-cut-and-join distance, contrast with the breakpoint distance in that the former are defined in a geodesic space while the latter is not. Another characteristic of breakpoint distance that it does not share with most other genomic distances is that it is a pseudometric rather than a metric.

A problem in computational comparative genomics that has been extensively studied under many definitions of genomic distance is the gene order median problem [1], the archetypical instance of the gene order small phylogeny problem. The median genome is meant, in the first instance, to embody the information in common among *k* ≥ 3 given genomes, and second, to estimate the ancestral genome of these *k* genomes. We have shown that the second goal becomes unattainable as *n → ∞*, where *n* is the length of the genomes, if there are more than 0.5*n* mutational steps changing the gene order [2]. Moreover, we have conjectured, and demonstrated in simulation studies, that where there is little or nothing in common among the *k* input genomes, the median tends to reflect only one (actually, any one) of them, with no incorporation of information from the other *k −* 1 [3].

In the present paper, we investigate this conjecture mathematically in the context of a wider study of medians for the breakpoint distance between unsigned linear unichromosomal genomes, although the methods and results are equally valid for genomes with signed and/or circular chromosomes, as well as those with *χ >*1 chromosomes, where *χ* is a fixed parameter. Our approach involves first a rigorous treatment of the pseudometric character of the breakpoint distance. Then, given the non-geodesic nature of the space we are able to define a weaker concept of geodesic patch, that we use later, given two or more medians, to locate further medians. We also prove the conjecture that for *k* genomes containing no gene order information among them, the normalized (divided by *n*) median score tends to *k −* 1, with high probability.

## Results

### From pseudometric to metric

We denote by *S*_{
n
} the set of all permutations of length *n*. Each permutation represents a unichromosomal linear genome where the numbers all represent different genes. For a permutation *π* := *π*_{1} *... π*_{
n
} we define the set of adjacencies of *π* to be all the unordered pairs {*π*_{
i
}*, π*_{i+1}} = {*π*_{
i+1
}*, π*_{
i
}} for *i* = 1*, ..., n −* 1. For *I* ⊆ *S*_{
n
} we denote by ${\mathcal{A}}_{I}:={\mathcal{A}}_{I}^{\left(n\right)}$ the set of all common adjacencies of the elements of *I*. Then ${\mathcal{A}}_{{S}_{n}}=\varnothing $, and we also write ${\mathcal{A}}_{\varnothing}$ for the set of all pairs {*i, j*}*, i* ≠ *j*. For any *I, J* ⊆ *S*_{
n
} ${\mathcal{A}}_{I\phantom{\rule{2.77695pt}{0ex}}\cup \phantom{\rule{2.77695pt}{0ex}}J}={\mathcal{A}}_{I}\cap {\mathcal{A}}_{J}$. It will sometimes be convenient to write ${\mathcal{A}}_{I}$, the set of common adjacencies in *I* = {*x*_{1}*, ..., x*_{
k
} }, as ${\mathcal{A}}_{{x}_{1}},...,{x}_{k}$. For example *A*_{
x,y,z
} represents the set of adjacencies common to permutations *x, y* and *z*.

*x, y*∈

*S*

_{ n }we define the breakpoint distance (bp distance) between

*x*and

*y*by

*S*

_{ n }but rather a pseudometric because of nonreflexiveness: cases where

*d*

^{(n) }(

*x, y*) = 0 but

*x*≠

*y*, namely

*x*=

*π*

_{1}

*... π*

_{ n }and

*y*=

*π*

_{ n }

*... π*

_{1}, for any

*x*∈

*S*

_{ n }. In these cases, the permutations

*x*and

*y*are said to be equivalent, denoted by

*x*~

*y*. The equivalence class containing

*π*is represented by [

*π*] and contains exactly two permutations,

*π*

_{1}

*, ..., π*

_{ n }and

*π*

_{ n }

*, ..., π*

_{1}. The number of classes is thus

*n*!

*/*2. For any

*π*, we denote the other element of [

*π*] by $\stackrel{\u0304}{\pi}$. The bp distance, a metric on the set of all equivalence classes of

*S*

_{ n }, denoted by ${\u015c}_{n}:={S}_{n}/~$ is defined by

Where there is no risk of ambiguity, we can simplify the notation by using *x* and *y* instead of [*x*] and [*y*], and/or drop the superscript *n*.

*n −*1 when they have no common adjacencies. Bp distance is symmetric on

*S*

_{ n }and hence on ${\u015c}_{n}$. By construction, it is reflexive on ${\u015c}_{n}$. To verify the triangle inequality, consider three permutations

*x, y, z*. We have

But $|{\mathcal{A}}_{x,y}\cup {\mathcal{A}}_{y,z}|\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}|{\mathcal{A}}_{y}\cap \left({\mathcal{A}}_{x}\cup {\mathcal{A}}_{z}\right)|\phantom{\rule{2.77695pt}{0ex}}\le \phantom{\rule{2.77695pt}{0ex}}n-1$ and hence the triangle inequality holds.

We say a pseudometric (or a metric) $\stackrel{\u0303}{\rho}$ is right invariant on a group *G* if for any $x,y,z\phantom{\rule{2.77695pt}{0ex}}\in \phantom{\rule{2.77695pt}{0ex}}G,\phantom{\rule{2.77695pt}{0ex}}\stackrel{\u0303}{\rho}\left(x,y\right)=\stackrel{\u0303}{\rho}\left(xz,\phantom{\rule{2.77695pt}{0ex}}yz\right)$. The definition of the left invariance is similar. A pseudometric (metric) which is both right and left invariant is called invariant. Bp distance is an invariant pseudometric on *S*_{
n
}.

**Definition 1** *Given a set* {*x*_{1}*, . . . , x*_{
k
}} ⊆ *S and a pseudometric space ρ on S, a median for the set is µ* ∈ *S such that* ${\sum}_{i=1}^{k}\rho \left(\mu ,\phantom{\rule{2.77695pt}{0ex}}{x}_{i}\right)$*is minimal*.

### Defining the geodesic patch

A discrete metric space (*S, ρ*) is a geodesic space if for any two points *x, y* ∈ *S* there exists a finite subset of *S* containing *x, y* that is isometric with the discrete line segment [0, 1*, ..., ρ*(*x, y*)]. Any subset of *S* with this property, and there may be several, is called a geodesic between *x* and *y*. For example, all connected graphs are geodesic spaces. In a geodesic space the medians of two points *x* and *y* consist of all the points located on geodesics between *x* and *y*.

*x*and

*y*is a maximal subset of

*S*containing

*x, y*which is isometric to a subsegment (not necessarily contiguous) of the line segment [0, 1

*, ..., ρ*(

*x, y*)]. For any two points

*x, y*in an arbitrary metric space (

*S, ρ*) there exists at least one geodesic patch between them because

*x, y*is isometric to {0

*, ρ*(

*x, y*)}. In addition, any geodesic is a geodesic patch. Any point

*z*on a geodesic patch between

*x, y*satisfies:

*x*and

*y*must lie on a geodesic patch between them. We denote the set of all permutations lying on geodesic patches connecting

*x, y*∈

*S*

_{ n }by $\overline{\left[x,y\right]}$, as in Figure 1.

$\left({\u015c}_{n},\phantom{\rule{2.77695pt}{0ex}}d\right)$ is not a geodesic space. For example there is no geodesic connecting the identity permutation *id* and *π* := 1 2 *x*_{1} *x*_{2} *... x*_{n−4 }*n −* 1 *n* when *x*_{1} *x*_{2} *... x*_{n−4 }is a non-identical permutation on {3*, ..., n −* 2}. The smallest change to *id* is to cut one of its adjacencies, say {*i, i* + 1}, and rejoin the two segments in one of the three possible ways: 1 to *n*, 1 to *i* + 1 or *n* to *i*. Now if we cut the adjacencies {1, 2} or {*n −* 1*, n*} in *id* the distance of the new permutation to both *id* and *π* increases. If on the other hand we cut one of the other adjacencies in *id* all the ways of rejoining, which increase the distance to *id*, either increase or leave unchanged the distance to *π*, since {1*, n*}, {1*, i* + 1} and {*n, i*} are not adjacencies in ${\mathcal{A}}_{\pi}$. Therefore there is no geodesic connecting *id* to *π*.

is a geodesic between *id* and *π*. Note *d*(*id, π*) = 5, the maximum possible distance in ${\u015c}_{6}$.

### The median value and medians of permutations with maximum pairwise distances

In this section we investigate the bp median problem in the case of *k* permutations with maximum pairwise distances. As we shall see later, this situation is very similar to the case of *k* uniformly random permutations. Let (*S, ρ*) be a pseudometric space.

*x*∈

*S*to a finite subset

*∅*≠

*B*⊆

*S*is defined to be

*B*, ${m}^{S,\rho}\left(B\right)$, is the infimum of the total distance when the infimum is over all the points

*x*∈

*S*, that is

*∅*≠

*B*⊆

*S*. We define a multiplicity function

*n*

_{ B }from

*B*to $\mathbb{N}$ and write

*n*

_{ B }(

*x*) =

*n*

_{ x }. We call

*A*= (

*B, n*

_{ B }) a set with multiplicities. We define the total distance of a point

*x*∈

*S*to

*A*to be

The definition of median value in Equation (8) can be extended in an analogous way to the median value of a set with multiplicity *A*. When *S* is finite then the total distance function takes its minimum on *S* and "inf" turns into "min" in the above formulation. The points of the space *S* that minimize the total distance to *A* are called the median points or medians of *A* and the set of all these medians is called the median set of *A*, denoted by *M* ^{
S,ρ
}(*A*).

*B*and

*A*= (

*B, n*

_{ B }) be a subset and a subset with multiplicities of

*S*

_{ n }. We define [

*B*] to be the set of all permutation classes of

*S*

_{ n }that have at least one of their permutations in

*B*. That is

*B, B′*⊆

*S*

_{ n }are said to be equivalent, denoted by

*B ~ B'*, if [

*B*] = [

*B′*]. Also we define [

*n*

_{ B }] to be a function from [

*B*] to $\mathbb{N}$ with

*A*] is straightforward:

and we say two nonempty subsets of *S*_{
n
}with multiplicities, namely *A* and *A′* are equivalent, denoted by *A ~ A′*, if [*A*] = [*A′*]. In fact [*A*] is the equivalence class containing *A*. We call [*A*] a subset of ${\u015c}_{n}$ with multiplicities. We use the notations "[ ]" and " *~* " for all the above concepts without restriction.

*A ~ A′*and

*x ~ x′*, we have

*d*as both a metric on ${\u015c}_{n}$ and a pseudometric on

*S*

_{ n }. Therefore we can conclude that

Henceforward, we will simplify by replacing the notation ${m}^{{S}_{n},d}\left(A\right)$ and ${M}^{{S}_{n},d}\left(A\right)$ by *m*_{
n
}(*A*) and *M*_{
n
}(*A*), respectively. Also for a subset [*A*] of ${\u015c}_{n}$ with multiplicities, we will use the notation *m*_{
n
}([*A*]) and *M*_{
n
}([*A*]) instead of ${m}^{{\u015c}_{n},d}\left(\left[A\right]\right)$ and ${M}^{{\u015c}_{n},d}\left(\left[A\right]\right)$ respectively. Where there is no ambiguity we will suppress the subscript *n*.

**Proposition 1** Suppose $X:=\left\{{x}_{1},\dots ,{x}_{k}\right\}\subset {\u015c}_{n}$ such that d(*x*_{
i
}*, x*_{
j
}) = *n −* 1 *for any i ≠ j, i ≤ i, j ≤ n. Then the bp median value of × is* (*k −* 1)(*n −* 1)*. Moreover, m∗ is a median of X, m∗*∈ *M* (*X*)*, if and only if* ${A}_{m*}\subset {\cup}_{i=1}^{k}{A}_{{x}_{i}}$.

*Proof*Let $\pi \in {\u015c}_{n}$ be an arbitrary permutation class. Since ${A}_{\pi ,{x}_{i}}\subset {A}_{{x}_{i}}$ and ${A}_{\pi ,{x}_{j}}\subset {A}_{{x}_{j}}$ for any 1

*≤ i, j ≤ k*, we have ${A}_{\pi ,{x}_{i}}\cap {A}_{\pi ,{x}_{j}}=\mathrm{0\u0338}$. Also

The equality holds letting *π* = *x*_{
i
} for any 1 ≤ *i* ≤ *k*. This proves the first part of the proposition. For the second part we know that *m*^{
∗
} ∈ *M* (*X*) is equivalent with the fact that the total distance of *m*^{∗} to *X* is (*k −* 1)(*n −* 1), and this is equivalent to ${\sum}_{i=1}^{k}\left|{A}_{{m}^{*},{x}_{i}}\right|=n-1$ and ${\cup}_{i=1}^{k}{A}_{{m}^{*},{x}_{i}}={A}_{{m}^{*}}$ be written as ${A}_{{m}^{*}}\cap \left({\cup}_{i=1}^{k}{A}_{{x}_{i}}\right)$. This finishes the proof of the equivalence relation in the proposition.

**Lemma 1** Let x, y, z be three permutation classes in ${\u015c}_{n}$that are pairwise at a maximum distance n − 1 from each other. Then for any $w\in \overline{\left[x,y\right]}$ we have d (*w, z*) = *n −* 1.

*Proof* Having $w\in \overline{\left[x,y\right]}$ we have *A*_{
w
} ⊂ *A*_{
x
} ∪ *A*_{
y
}. Also we know that ${A}_{z}\cap \left({A}_{x}\cup {A}_{y}\right)=\mathrm{0\u0338}$. This concudes the result.

The above lemma simply indicates that for any two points *x*_{
i
}*, x*_{
j
} in the set *X* in the proposition above $\overline{\left[{x}_{i},{x}_{j}\right]}\subset M\left(X\right)$ since the total distance of each point in $\overline{\left[{x}_{i},{x}_{j}\right]}$ to *X* is (*k −* 1)(*n −* 1).

**Corollary 1** *Suppose* $X:=\left\{{x}_{1,\phantom{\rule{2.77695pt}{0ex}}\dots ,}{x}_{k}\right\}\subset {\u015c}_{n}$*such that d*(*x*_{
i
}*, x*_{
j
}) = *n −* 1 *for any i ≠ j. Then* ${\cup}_{i,j}\overline{\left[{x}_{i},{x}_{j}\right]}\subset M\left(X\right)$.

What more can we say about the median positions? The notion of "accessibility" will help us to keep track of some other medians of the set *X* that are not in ${\cup}_{i,j}\overline{\left[{x}_{i},{x}_{j}\right]}$. Before defining this concept, we first need more information about the properties of $\overline{\left[x,y\right]}$ for $x,y\in {\u015c}_{n}$.

**Lemma 2** *Let* $x,y\in {\u015c}_{n}$. *Then* $z\in \overline{\left[x,y\right]}$*if and only if* ${A}_{x,y}\subset {A}_{z}\subset {A}_{x}\cup {A}_{y}$.

*Proof*We know $z\in \overline{\left[x,y\right]}$ if and only if

*d*(

*x, z*) +

*d*(

*z, y*) =

*d*(

*x, y*). On the other hand we can write

*A*

_{ z }as follows

This results in *|A*_{
x,y
}*|* = *|A*_{
x,y,z
}*|* and hence in *A*_{
x,y
} ⊂ *A*_{
z
}. Otherwise the inequality in (26) will be strict, which is impossible. On the other hand the inequality in (26) shows ${A}_{z}\backslash \left({A}_{x}\cup {A}_{y}\right)=\mathrm{0\u0338}$ which concludes at ${A}_{z}\subset {A}_{x}\cup {A}_{y}$.

This is true because of *A*_{
z
} ⊂ *A*_{
x
} ∪ *A*_{
y
} and Equation (23). But since *A*_{
x,y
} ⊂ *A*_{
z
} ⊂ *A*_{
x
} ∪ *A*_{
y
} we have *|A*_{
x,y
}*|* = *|A*_{
x,y,z
}*|* and we can replace *|A*_{
x,y
}*|* by *|A*_{
x,y,z
}*|* in the left hand side of the last equality. This finishes the "necessity" proof.

**Definition 2**

*Let ×*:= {

*x*

_{1}

*, ..., x*

_{ k }} be a subset of ${\u015c}_{n}$. We say a permutation class $z\in {\u015c}_{n}$is 1

*-accessible from X if there exists an m*∈ $\mathcal{N}$,

*a finite sequence y*

_{1}

*, ..., y*

_{ m }

*where y*

_{ i }∈

*X and z*

_{1}

*, ..., z*

_{ m }

*, where*${z}_{i}\in {\u015c}_{n}$

*such that z*

_{1}=

*y*

_{1}

*, z*

_{ m }= z and ${z}_{i+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}$ for $i=1...m-1$. See Figure 2.

*We denote the set of all* 1*-accessible points of X by Z*(*X*)*. We define Z*_{0}(*X*) := *X. Also for r* ∈ $\mathcal{N}$ ∪ {0}, *by induction, we define Z*_{r+1}(*X*) *to be Z*(*Z*_{
r
}(*X*)) *and we call it the set of all r+1-accessible permutation classes. That is Z*_{1}(*X*) = *Z*(*X*)*, Z*_{2}(*X*) = *Z*(*Z*(*X*)) *and so on. It is clear that Z*_{r+1}(*X*) *includes Z*_{
r
} (*X*) and also ${\cup}_{x,y\in {Z}_{r}\left(X\right)}\overline{\left[x,y\right]}$. A permutation class z is said to be accessible from × if there exists r ∈ $\mathcal{N}$*such that z* ∈ *Z*_{
r
}(*X*). *We denote the set of all accessible points by* $\overline{Z}\left(X\right)={\cup}_{r\in IN\cup \left\{0\right\}}{Z}_{r}\left(X\right)$.

Note that $Z\left(\overline{Z}\left(X\right)\right)=\overline{Z}\left(X\right)$. This holds because for any 1-accessible permutation class *z* from $\overline{Z}\left(X\right)$, there must exist $m\in \mathcal{N},\phantom{\rule{2.77695pt}{0ex}}{r}_{0}\in \mathcal{N},\cup \left\{0\right\},{y}_{1},...,{y}_{m}\in {\overline{Z}}_{{r}_{0}}\left(X\right)$, (the *y*_{
i
}'s must be in $\overline{Z}\left(X\right)$, thus there must be such an *r*_{0}) and *z*_{1}*, ..., z*_{
m
} where ${z}_{i}\in {\u015c}_{n}$ such that *z*_{1} = *y*_{1}, *z*_{
m
} = *z* and ${z}_{i+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}$. Therefore $z\in {Z}_{{r}_{0}+1}\left(X\right)\subset \overline{Z}\left(X\right)$. We can then conclude that $\overline{Z}\left(\overline{Z}\left(X\right)\right)=\overline{Z}\left(X\right)$.

**Proposition 2** Suppose $X:=\left\{{x}_{1},...,{x}_{k}\right\}\subset {\u015c}_{n}$ such that d (*x*_{
i
}*, x*_{
j
}) = *n−*1 *for any i* ≠ j. Then for any permutation class $z\in \overline{Z}\left(X\right)$ the total distance d (*z, X*) *between z and × is* (*k −*1)(*n−*1) and hence $\overline{Z}\left(X\right)\subset M\left(X\right)$ Furthermore if m_{1}*, m*_{2} ∈ *M* (*X*) *then* $\overline{\left[{m}_{\mathsf{\text{1}}},{m}_{\mathsf{\text{2}}}\right]}\subset M\left(X\right)$.

*Proof* Suppose *m*_{1}*, m*_{2} ∈ *M* (*X*) and ${m}^{*}\in \overline{\left[{m}_{1},{m}_{2}\right]}$. By Lemma 2 and Proposition 1 we have ${A}_{{m}^{*}}\subset {A}_{{m}_{1}}\cup {A}_{{m}_{2}}\subset {\cup}_{i=1}^{k}{A}_{{x}_{i}}$. Applying Proposition 1 again, we have *m*^{
∗
}∈ *M* (*X*). Now it suffices to show that for any *r* ∈ *IN ∪* {0}, *Z*_{
r
} (*X*) ⊂ *M* (*X*). We prove this by induction. For *r* = 0 this follows from Corollary 1. Suppose *Z*_{
r
} (*X*) ⊂ *M* (*X*). By definition we have *Z*_{r+1}(*X*) = *Z*(*Z*_{
r
}(*X*)). That is for *z* ∈ *Z*_{r+1}(*X*) there exists an *m* ∈ $\mathcal{N}$, *y*_{1}*, ..., y*_{
m
} ∈ *Z*_{
r
} (*X*) and *z*_{1}*, ..., z*_{
m
}, where ${z}_{i}\in {\u015c}_{n}$, such that *z*_{1} = *y*_{1}, *z*_{
m
} = *z* and ${{z}_{i}}_{+1}\in \overline{\left[{z}_{i},{y}_{i+1}\right]}.\phantom{\rule{2.77695pt}{0ex}}{z}_{1}\in \overline{\left[{y}_{1},{y}_{2}\right]}$ and by the fact we proved above *z*_{1} ∈ *M* (*X*) since *y*_{1}*, ..., y*_{
m
} ∈ *Z*_{
r
} (*X*) ⊂ *M* (*X*). Continuing this we conclude that *z*_{1}*, z*_{2}*, ..., z*_{
m
} = *z* ∈ *M* (*X*). Hence *Z*_{r+1}(*X*) ⊂ *M* (*X*). This finishes the proof.

**Conjecture 1** *Every median point of X is accessible from X, that is* $M\left(X\right)=\overline{Z}\left(X\right)$.

The median value and medians of *k* random permutations

In this section we study the median value and median points of *k* independent random permutation classes uniformly chosen from ${\u015c}_{n}$. This is equivalent to studying the same problem for *k* random permutations sampled from *S*_{
n
}. All the results of this section carry over to permutations without any problem.

We make use of the fact that the bp distance of two independent random permutations tends to be close to its maximum value, *n −* 1. Xu et al. [4] showed that if we fix a reference linear permutation *id* and pick a random permutation *x* uniformly, the expected number and variance of $\left|{\mathcal{A}}_{id,x}^{\left(n\right)}\right|$ both are very close to 2 for large enough *n*. Because of the symmetry of the group *S*_{
n
} and the fact that bp distance is an invariant pseudometric the same results hold for two random permutations *x* and *y*. We first summarize the results we need from [4].

Let ${\stackrel{\u0303}{\nu}}_{n}$ be the uniform measure on S_{n}. Let $\Pi :{S}_{n}\to {\u015c}_{n}$ be the natural surjective map sending each permutation onto its corresponding permutation class.

to be the push-forward measure of ${\stackrel{\u0303}{\nu}}_{n}$ induced by the map Π. It is clear that ${\nu}_{n}$ is the uniform measure on ${\u015c}_{n}$. The following proposition is a reformulation of Theorems 6 and 7 in [4].

**Proposition 3**

*[Xu-Alain-Sankoff ] Let × and y be two independent random permutation classes (irpc) chosen uniformly from*${\u015c}_{n}$.

*Then*

*x, y*by

**Corollary 2**

*Suppose × and y are two irpc's sampled from the uniform measure*${\nu}_{n}$

*and*${a}_{n}$

*is an arbitrary sequence of real numbers diverging to*+

*∞. Then*$\frac{{\epsilon}_{n}\left(x,y\right)}{{a}_{n}}$

*converges to zero asymptotically*${\nu}_{n}^{*2}$

*-almost surely (a.a.s.), that is*

*Proof* The proof is straightforward from [4] and Chebyshev's inequality.

*k irpc*'s. Let [

*A*] be a subset of ${\u015c}_{n}$ with multiplicities and with

*k*elements. Define

**Theorem 1** *Let* ${X}^{\left(n\right)}:=\left\{{x}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{x}_{2}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}\dots .,\phantom{\rule{2.77695pt}{0ex}}{x}_{k}^{\left(n\right)}\right\}$*be a set of k irpc in* ${\u015c}_{n}$*sampled from the measure* ${\nu}_{n}^{*k}$. *Then their breakpoint median value* ${m}_{n}^{*};={m}_{n}\left({X}^{\left(n\right)}\right)$ tends to be close to its maximum after a convenient rescaling with high probability, that is for any arbitrary sequence ${a}_{n}$→ ∞ as $n\to \infty ,\phantom{\rule{2.77695pt}{0ex}}\infty \frac{{e}_{n}^{*}}{{a}_{n}}\to 0$*in* ${\nu}_{n}^{*k}$*-probability where* ${e}_{n}^{*}:={e}_{n}\left({X}^{\left(n\right)}\right)$

*Proof*Let

*π*be an arbitrary point of

*S*

_{ n }. Let ${\mathcal{A}}_{\pi \backslash X}={\mathcal{A}}_{\pi}\backslash {\mathcal{A}}_{X}$. We have

_{ i,j }

*ε*

_{ n }(

*x*

_{ i }

*, x*

_{ j }). On the other hand

*m*

_{ n }(

*X*

^{(n)})

*≤*(

*k −*1)(

*n −*1). The reason is the same as has already been discussed in the proof of Proposition 1. Therefore subtracting (

*k −*1)(

*n −*1) we have

Dividing by ${a}_{n}$ and letting *n* go to *∞* the result follows from the last corollary.

**Theorem 2**

*Let*${X}^{\left(n\right)}:=\left\{{x}_{1}^{\left(n\right)},{x}_{2}^{\left(n\right)},\dots ,{x}_{k}^{\left(n\right)}\right\}$

*be a set of k irpc's in*${\u015c}_{n}$

*sampled from the measure*${v}_{n}^{*k}$.

*Then for any permutation class*${z}^{\left(n\right)}\in \overline{Z}\left({X}^{\left(n\right)}\right)$

*the total distance of z*

^{(n) }

*to × is close to*(

*k −*1)(

*n−*1) with high probability after a convenient rescaling. More explicitly, for any arbitrary sequence of real numbers ${a}_{n}$ converging to ∞

*Therefore*

*Furthermore if*${m}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{m}_{2}^{\left(n\right)}\in {M}_{n}\left({X}^{\left(n\right)}\right)$

*then for any*${\stackrel{~}{m}}^{\left(n\right)}\in \overline{\left[{m}_{1}^{\left(n\right)},\phantom{\rule{2.77695pt}{0ex}}{m}_{2}^{\left(n\right)}\right]}$

*Proof*The structure of the proof is similar to the proof of Proposition 1. Suppose $o\in {\u015c}_{n}$ with ${\mathcal{A}}_{o}{\subset}_{i=1}^{k}\cup {\mathcal{A}}_{{x}_{i}}$. Let ${\alpha}_{n}$ be as defined in the proof of Theorem 1. Then by the same discussion we have

It suffices to show that $z\phantom{\rule{2.77695pt}{0ex}}:={Z}^{\left(n\right)}\in \overline{Z}\left(X\right)$ has the same property, that is ${\mathcal{A}}_{z}\in {\cup}_{i=1}^{k}{\mathcal{A}}_{{x}_{i}}$. But this is clear by induction. For the second part of the theorem let ${m}_{1,n}^{*},\phantom{\rule{2.77695pt}{0ex}}{m}_{2,n}^{*}\in M\left(X\right)$. Suppose ${m}^{*}\in \left[{m}_{1,n}^{*},\phantom{\rule{2.77695pt}{0ex}}{m}_{2,n}^{*}\right]$. By Theorem 1 $\frac{\left|{A}_{{m}_{in}^{*}\backslash X}\right|}{{a}_{n}}\to 0$ in probability for *i* = 1, 2. On the other hand we have ${\mathcal{A}}_{{m}^{*}\backslash X}\subset {\mathcal{A}}_{{m}_{1,n}^{*}\backslash X}\cup {\mathcal{A}}_{{m}_{2,n}^{*}\backslash X}$.

The statement follows from the last inequality.

## Conclusions

We have shown that the median value for a set of random permutations tends to be close to its extreme value with high probability. Also it has been shown that every permutation accessible from a set of random permutations can be considered as a median of that set asymptotically almost surely, and conjectured that the converse is true, that every median is accessible from the original set in this way.

Further work is needed to characterize the existence and size of non-trivial geodesic patches, in order to assess how extensive the set of medians is.

## Declarations

### Acknowledgements

Research supported in part by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC). DS holds the Canada Research Chair in Mathematical Genomics.

**Declarations**

The publication charges for this article were funded by the Canada Research Chair in Mathematical Genomics, and by the University of Ottawa.

This article has been published as part of *BMC Genomics* Volume 15 Supplement 6, 2014: Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S6.

## Authors’ Affiliations

## References

- Tannier E, Zheng C, Sankoff D: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics. 2009, 10: 120-10.1186/1471-2105-10-120.PubMedPubMed CentralView ArticleGoogle Scholar
- Jamshidpey A, Sankoff D: Phase change for the accuracy of the median value in estimating divergence time. BMC Bioinformatics. 2013, 14: S15:S7-10.1186/1471-2105-14-157.View ArticleGoogle Scholar
- Haghighi M, Sankoff D: Medians seek the corners, and other conjectures. BMC Bioinformatics. 2012, 13: S19:S5-10.1186/1471-2105-13-195.View ArticleGoogle Scholar
- Xu AW, Alain B, Sankoff D: Poisson adjacency distributions in genome comparison: multichromosomal, circular, signed and unsigned cases. Bioinformatics. 2008, 24: i146-i152. 10.1093/bioinformatics/btn295.PubMedView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.