- Research article
- Open Access
Comprehensive comparative analysis of kinesins in photosynthetic eukaryotes
BMC Genomicsvolume 7, Article number: 18 (2006)
Kinesins, a superfamily of molecular motors, use microtubules as tracks and transport diverse cellular cargoes. All kinesins contain a highly conserved ~350 amino acid motor domain. Previous analysis of the completed genome sequence of one flowering plant (Arabidopsis) has resulted in identification of 61 kinesins. The recent completion of genome sequencing of several photosynthetic and non-photosynthetic eukaryotes that belong to divergent lineages offers a unique opportunity to conduct a comprehensive comparative analysis of kinesins in plant and non-plant systems and infer their evolutionary relationships.
We used the kinesin motor domain to identify kinesins in the completed genome sequences of 19 species, including 13 newly sequenced genomes. Among the newly analyzed genomes, six represent photosynthetic eukaryotes. A total of 529 kinesins was used to perform comprehensive analysis of kinesins and to construct gene trees using the Bayesian and parsimony approaches. The previously recognized 14 families of kinesins are resolved as distinct lineages in our inferred gene tree. At least three of the 14 kinesin families are not represented in flowering plants. Chlamydomonas, a green alga that is part of the lineage that includes land plants, has at least nine of the 14 known kinesin families. Seven of ten families present in flowering plants are represented in Chlamydomonas, indicating that these families were retained in both the flowering-plant and green algae lineages.
The increase in the number of kinesins in flowering plants is due to vast expansion of the Kinesin-14 and Kinesin-7 families. The Kinesin-14 family, which typically contains a C-terminal motor, has many plant kinesins that have the motor domain at the N terminus, in the middle, or the C terminus. Several domains in kinesins are present exclusively either in plant or animal lineages. Addition of novel domains to kinesins in lineage-specific groups contributed to the functional diversification of kinesins. Results from our gene-tree analyses indicate that there was tremendous lineage-specific duplication and diversification of kinesins in eukaryotes. Since the functions of only a few plant kinesins are reported in the literature, this comprehensive comparative analysis will be useful in designing functional studies with photosynthetic eukaryotes.
Cytoskeletal networks (microtubules [MTs], actin and intermediate filaments) play important roles in many fundamental processes in eukaryotes including cell growth, cell division and development of organisms [1, 2]. Understanding cytoskeleton organization, dynamics and functions is an active area of research in biology. Molecular motors that organize and remodel cytoskeleton and transport various cellular components (e.g, vesicles, organelles, chromosomes, RNA and protein complexes) play fundamental roles in all aspects of cell and developmental biology of eukaryotes [1, 2]. High throughput genomic sequencing projects have greatly facilitated the identification of the full complement of molecular motors in several phylogenetically diverse species ranging from simple unicellular to complex multicellular organisms [1, 2].
Molecular motors that function on cytoskeletal networks belong to three groups: kinesins, dyneins and myosins. These motors utilize energy derived from ATP hydrolysis and transport cargo unidirectionally on one of the filamentous cytoskeletal tracks (MTs or F-actin) in the cell. Both kinesins and dyneins use MTs as tracks for motility whereas myosins use actin filaments (F-actin) [3, 4]. Kinesins constitute a superfamily of MT motor proteins ubiquitous in all eukaryotic organisms [1, 2, 5, 6]. In the mid 1980s, the first kinesin was discovered in squid giant axons as a "novel force generating protein" involved in vesicular transport . Since then, an explosion of research has centered upon the continual discovery, classification and functional characterization of the kinesin superfamily. Members of the kinesin superfamily have a highly conserved motor domain of ~350 amino acid residues, which contains ATPase and MT binding activities, located at the N terminus, C terminus or internally [1, 8]. A short neck region that often contains family-specific features and is adjacent to the motor domain works in concert with the catalytic core to produce movement [8, 9]. In addition, many kinesins have a less conserved coiled-coil region that is important for dimerization and a non-conserved tail domain that is thought to interact with specific cargo. All kinesins bind MTs and perform a variety of force-generating tasks such as movement of chromosomes, vesicles, organelles and RNA protein complexes, spindle formation and elongation, activation of protein kinases, movement of loosely bound rafts of soluble cargo, and MT polymerization and dynamics [5, 9–13].
Since the motor-domain sequence is conserved in all kinesins, it has been used to search completed eukaryotic genome sequences for encoded kinesins. Based on phylogenetic analyses of known kinesins using the conserved motor domain sequences, fourteen families designated as Kinesin-1 to Kinesin-14 are recognized . Members of most families have an N-terminal motor domain whereas one family (Kin-13) has an internal motor and one family (Kin-14) has a C-terminal motor. Kinesins move unidirectionally on MTs. Kinesins with the N-terminal motor show plus end motility whereas the C-terminal motors move toward the minus ends of MTs [1, 15–17].
While a "complete" inventory of Arabidopsis kinesins has been reported, functional studies of plant kinesins are limited to a few loci [18, 19]. Several plant kinesins have been shown to function in mitosis, meiosis and/or cytokinesis [20–28]. KCBP, a C-terminal minus-end-directed calmodulin-binding kinesin of the Kinesin-14 family, is involved in trichome morphogenesis and cell division [29–31]. This kinesin is negatively regulated by calmodulin as well as another novel calcium-binding protein (KIC) with a single EF hand [32, 33]. An internal kinesin of the Kinesin-13 family in Arabidopsis is also involved in trichome morphogenesis . AtFRAl, an N-terminal kinesin family member is involved in oriented deposition of cellulose myofibrils; mutants show aberrant deposition of cellulose microfibrils in secondary walls of fibers that are less organized when compared to the wild type . Two Arabidopsis kinesins are targeted to mitochondria whereas another kinesin interacts with geminivirus replicating protein [36, 37]. An interesting prospect of MT and microfilament crosstalk has recently been exemplified by studies with a plant-specific kinesin (GhKCHl) from cotton. This member of the Kinesin-14 family has a calponin homology domain, which appears to be important in mediating dynamic interaction between actin filaments and MTs . The motility properties of only a few plant kinesins have been analyzed [39–41].
Thus far, genome-wide analysis of kinesin encoding genes in plants was performed only with one plant species (Arabidopsis thaliand), which uncovered 61 kinesins. The number of kinesins in Arabidopsis is the largest as compared to human, mouse and other completed genomes [42, 43]. Recently, genome sequences of six phylogenetically divergent photosynthetic eukaryotes (two cultivars of rice, poplar, a green alga, a red alga and a diatom) have been completed. In addition, genomes of seven other non-plant systems including Giardia, which may represent the deepest known branch in the eukaryotic lineage ([44–47]; but see [48, 49]) have also been sequenced. The availability of these genome sequences offers opportunities to address a number of important questions related to kinesin evolution and function. These include: i) do other plants, like Arabidopsis, have a large repertoire of kinesins? ii) are there any kinesin families that are specific to plants or a particular lineage? iii) how many kinesin families are represented in all eukaryotes? iv) what is the evolutionary relationship among plant kinesins and between plant and non-plant kinesins? v) what is the full complement of kinesins in early-derived simple unicellular photosynthetic eukaryotes as well as in organisms that represent early diverging eukaryotic lineages? vi) how have these kinesins contributed to evolution of kinesins in the most recent complex multicellular flowering plants? vii) do plant kinesins have any domains that are unique to plants? and viii) what is the contribution of gene duplications and losses to kinesin evolution? To address these questions, we have mined 529 kinesin sequences from 19 phylogenetically diverse species. We have performed comprehensive analyses with this data set and inferred gene trees using Bayesian and parsimony methods. Our gene-tree analyses included 249 sequences from photosynthetic eukaryotes and 280 from non-photosynthetic systems. Many of these sequences were not included in any previous analyses. Although flowering plants have the largest number of kinesins, three or four of the 14 kinesin families are not represented in flowering plants whereas three of them may not be present in any photosynthetic eukaryote. Results presented here also indicate that flowering plants have the most kinesins primarily due to expansion of the Kinesin-7 and Kinesin-14 families. Our gene-tree analysis revealed seven of the ten families found in flowering plants are represented in a simple unicellular chlorophyte alga. Ten of the 14 families are represented in Giardia lamblia ([44–47]; but see [48, 49]), an early derived eukaryote, suggesting that most families were already present early in the evolution of extant eukaryotes. Plant kinesins have several domains that are not shared with non-plant systems suggesting functional specificity and diversification in plants.
Results and Discussion
Kinesins in photosynthetic and non-photosynthetic eukaryotes
In this study we have analyzed genome sequences of 19 eukaryotic species, which represent almost all major lineages (opisthokonts, amoebozoa, plants, alveolates, heterokonts, discicristates and excavates) of eukaryotes , for kinesins. Species were selected so as to include most of the eight major lineages in the consensus phylogenetic tree presented by Baldauf . Inclusion of representative members of non-plant groups is expected to help us identify plant-specific kinesins. Of the eight major eukaryotic lineages , only one lineage (cercozoa) was not sampled in our analysis because none of the species in this lineage has been fully sequenced. Among the 19 species analyzed, seven represent phylogenetically divergent photosynthetic eukaryotes that belong to monocots (two rice cultivars, Oryza sativa ssp. japonica cv. nipponbare and Oryza sativa ssp. indica cv 93-11), dicots (Arabidopsis thaliana and Populus trichocarpa), a chlorophyte alga (Chlamydomonas reinhardtii), a red alga (Cyanidioschyzon merolae) and a diatom (Thalassiosira pseudonana). So far, kinesins have been analyzed only in one plant (Arabidopsis) system  whereas the genome sequences of six other photosynthetic eukaryotes have been completed recently. The red (C. merolae) and green (C. reinhardtii) algae were included in our analysis because they are inferred to be early derived members of the lineage that gave rise to modern heterokonts and embryophyta , respectively. Inclusion of these species allows for the analysis of evolutionary relationships between kinesins of algae and flowering plants. The 12 non-plant species sampled include opisthokonts, amoebozoa, alveolates, heterokonts, discicristates and excavates. We have included members (Giardia, Leishmania, Plasmodiuni) of three extant lineages that diverged before the plant-animal split . In addition, Giardia is thought to be a member of the earliest extant branch on the eukaryotic tree based on the phylogeny inferred from several different genes, as well as a proteome-based eukaryotic phylogeny [44–47]. It would be interesting to see how many kinesin families were present before the divergence of plants, animals and fungi as it is likely that these families would represent a "basic set" of kinesin motors . We have also included Dictyostelium discoideum, which is believed to have diverged after the plant-animal split but before the divergence of fungi from animals .
As detailed in the methods section, we have used a variety of approaches to systematically analyze the completed genome sequences of 19 species to identify the kinesins. A total of 529 kinesins were identified and used in our phylogenetic analysis. Table 1 shows the number of kinesins in each of these species and lists the databases used in our searches. The details of the kinesins including gene IDs, gene organization, domain and family information for each species, except Arabidopsis, are presented in Tables 2 to 7 and Additional files 1 to 12. The details of Arabidopsis kinesins were reported previously . The number of kinesins varies considerably among species. In general flowering plants have the highest number of kinesins (Fig. 1). Arabidopsis still has the largest repertoire of kinesins  amongst the completed plant genomes, with P. trichocarpa next . Oryza sativa ssp. japonica and O.s. indica have 41 and 45 kinesins, respectively. Some changes in this number may occur as refinement of newly sequenced genomes proceeds. Not only does Arabidopsis have the most kinesins of all plants; it has the most of all 19 species analyzed (Fig. 1). It is surprising to see the large difference in kinesin number between the two species of early-derived unicellular photosynthetic eukaryotes (C. reinhardtii and C. merolae). The green alga, C. reinhardtii, has about five times  the number of kinesins as the red algae, C. merolae, which has the least number of kinesins (only five) of all species sampled. Dictyostelium discoideum, the social soil amoeba, has 13 kinesins. This is consistent with what one would expect from a free living amoeba that must search for its food and be active in cytoskeletal motility, in addition to being able to shift from a unicellular state to a multicellular fruiting body by coordinated aggregation of individual cells . Another interesting species with many kinesins is the intracellular immune system pathogen L. major. This parasite has the most kinesins among non-photosynthetic eukaryotes (Fig. 1). Leishmania has a considerably larger repertoire of kinesins than the amoeboid parasite, many of which appear to have evolved by multiple gene duplications (see below). Even though cis- splicing machinery exists in this parasite, its 54 kinesins are all translated from single exon genes (Chris Peacock, pers, comm.). The reason why this parasite has so many kinesins is an interesting prospect to consider. How many of these kinesins are actually functional remains to be seen. Whether or not these kinesins function in facilitating Leishmania- host-cell interaction is currently unknown.
Construction and analyses of kinesin gene trees
We used 529 kinesin motor domain sequences in our gene-tree analysis. The alignment of amino acid sequences was performed using DIALIGN-T . DIALIGN-T represents an improvement over DIALIGN 2.2.1  in that it is less liable to favor short sequence fragments of high similarity over longer fragments of lower similarity . In contrast to programs such as Clustal X , which perform global alignments, DIALIGN finds regions of local similarity without necessarily aligning the entire sequences with one another . Because DIALIGN does not perform alignments following a guide tree, the alignments produced were expected to be relatively robust to artifacts that may be introduced when aligning divergent sequences in a pairwise manner . DIALIGN has been shown to perform well relative to other alignment programs in aligning conserved domains within rapidly evolving (with respect to both indels and substitutions) regions [53, 58, 59]. Four kinesin trees were constructed with motor domain amino acid sequences using both the parsimony and Bayesian approaches. For each approach, amino-acid-characters and amino-acid-plus-gap-characters were used. The DIALIGN-T alignment (see Additional file 13), data matrices (see Additional files 14 to 17) used to construct the tree and two parsimony and one Bayesian trees (see Additional files 18 to 20) are available online.
The abbreviated unrooted Bayesian tree for the amino-acid-plus-gap-characters analysis is presented in Fig. 2 as our best estimate of the relationships within the kinesin gene family. The expanded view of kinesin subfamilies is presented in Figures 3 to 10. This analysis was favored over those that did not incorporate gap characters because it incorporated all available characters from the motor domain and is, therefore, favored by the total-evidence criterion [60, 61]. Furthermore, inclusion of the gap characters increased both the number of clades resolved (439 → 453 Bayesian; 288 → 296 parsimony) and average branch support (90.3% → 92.0% Bayesian; 87.3% → 87.5% parsimony) in both the Bayesian and parsimony analyses, which is consistent with the general patterns found by Simmons et al., . The Bayesian analysis was favored over the parsimony analysis because the inferred Bayesian tree is more resolved than the parsimony tree and the additional resolution is largely congruent with previous analyses with respect to resolution of the 14 kinesin families. The other three gene trees from both Bayesian and parsimony analyses are available as supplemental data (see Additional files 18 to 20).
Each branch of the Bayesian tree in Fig. 2 indicates the posterior probability from the amino-acid-plus-gap-characters analysis at the upper left of the branch. To allow for inspection of the support for each branch provided by the other analyses, the posterior probability from the amino-acid-characters-only analysis is at the upper right of each branch, the parsimony jackknife support from the amino-acid-plus-gap-characters analysis is at the lower left, and the parsimony jackknife support from the amino-acid-characters-only analysis is at the lower right. If a branch was unresolved in one of these other three analyses, it is indicated by "-" at the respective location. If a branch was contradicted in one of these other three analyses, it is indicated by underlined red font at the respective location with the single highest posterior probability or jackknife support value for the contradicting clade(s) shown.
Conflict between parsimony and Bayesian trees
There were three cases of well supported, conflicting resolution in which both Bayesian trees were contradicted by both parsimony trees. First, in the parsimony trees Pt 00151235 was well supported as nested within the second Kinesin-10 clade, whereas it was well supported as nested within the Kinesin-14 clade in the Bayesian trees. Aside from this single sequence, the Kinesin-10 and Kinesin-14 clades were largely congruent in all four trees (Figs. 8, 10). The parsimony resolution is supported by the Kinesin-10-specific domains that Pt 00151235 has, whereas the sequence lacks the Kinesin-14-specific domains. Therefore, we favor the parsimony resolution of Pt 00151235 as a member of the Kinesin-10 family (Table 4). Second, in the parsimony trees Tp 121289 was well supported as the sister group of Tp 163717, whereas the clades of (Tp 121289, Ps 127973) and (Tp 163717, Lm F22.0560) were well supported in the Bayesian trees with the former clade resolved with the Kinesin-7 family (Figs. 2, 7). There are no additional domains of Tp 121289 or Tp 163717 to distinguish between these alternative resolutions. Third, in the parsimony trees Ps 128382 was unresolved in the main polytomy, whereas it was well supported as nested within the Kinesin-3 clade in the Bayesian trees (Fig. 4). The Bayesian resolution is supported by the Kinesin-3-specific domain that Ps 128382 has.
To test for long-branch attraction  in the parsimony analyses between Pt 00151235 and the second Kinesin-10 clade, the amino-acid-plus-gap-characters parsimony jackknife analysis was repeated after eliminating all nine other sequences in the second Kinesin-10 clade . However, this explanation was not supported because Pt 00151235 was unresolved in the main polytomy rather than moving to within the Kinesin-14 clade. Likewise, to test for long-branch attraction in the parsimony analyses in the second case of conflicting resolution, the parsimony jackknife analysis was repeated after eliminating Tp 163717. However, this explanation was not supported because both Tp 121289 and Ps 127973 were unresolved in the main polytomy rather than being resolved as sister groups. Therefore, we were unable to discard either of the two alternative hypotheses regarding the relationships of Tp 121289 and Tp 163717. Because Ps 128382 was unresolved in the main polytomy on the parsimony trees, it was not possible to apply this test to it.
Many new kinesins are not grouped into recognized kinesin families
All 14 currently recognized kinesin families  are represented in our inferred gene tree (Fig. 2). All members in each family are presented in Figs. 3 to 10. However, 11% of the sequences (58 of 529 sequences) were not resolved among the lineages corresponding to previously recognized families (Fig. 2). Most of these kinesins are from Leishmania, Giardia, Phytopthora, Chlamydomonas and Thallassiosira. These sequences may represent novel kinesin families and/or early-derived members of the 14 recognized kinesin families that are not resolved as such in our inferred gene tree. Several clades that are not part of the known families contained members from two eukaryotic groups, indicating that they are not unique to one of the eight main eukaryotic lineages . A strongly supported clade of ten kinesins (Figs. 2, 3 plant-specific ungrouped) from flowering plants were not resolved into any of the 14 kinesin families. However, these formed a distinct clade (Fig. 2 and Fig. 3 plant-specific ungrouped) with strong support values from all four analyses. Interestingly, the members of this group also have an armadillo domain that is not present in any invertebrates or vertebrates (see domain analysis section below).
At least three of the 14 kinesin families are absent in flowering plants
The distribution of kinesin families in the 19 species sampled is shown in Fig. 1. Some families (e.g. Kinesin-5; Kinesin-13 and Kinesin-14) are present in almost all of the main eukaryotic lineages. Although flowering plants have the largest number of kinesins, at least three of the 14 families are conspicuously absent in flowering plants. It is unclear whether either three or four (Kinesin-2, Kinesin-3, Kinesin-9 and/or Kinesin-11) of these 14 families are absent in flowering plants because of the entirely unresolved flowering-plant clade (Figs. 2 and 3). Members of Kinesin-2 form either homo- or heterodimers and are present in ciliated and flagellated cells and function in organelle-intraflagellar transport [2, 9]. Interestingly, the flagellated unicellular photosynthetic eukaryote Chlamydomonas has one Kinesin-2, which is also involved in intraflagellar transport . Since flowering plants lack cilia or flagella in their life cycle, this family of kinesins is lost in this lineage. It would be interesting to see if the land plants that have ciliated/flagellated cells in their life cycle (e.g., bryophytes, pteridophytes and gymnosperms) retained this family of kinesins. Unfortunately, genomes of plants that belong to these groups have not been sequenced to address this. Members of the Kinesin-3 family are involved in organelle transport. The Kinesin-3 family is expanded in animals with seven members in humans, the largest of any family. Interestingly, 17 of the 54 Leishmania kinesins are grouped within the Kinesin-3 family and this grouping is strongly supported by both Bayesian analyses. It appears that lineage-specific duplication of genes contributed to expansion of this family. The fork-head-associated (FHA) domain present in vertebrate members of this family is absent in all Kinesin-3 members of Leishmania, suggesting that the acquisition of this domain occurred after the divergence of the Leishmania and animal lineages. The Kinesin-1 family, which is also involved in transport of vesicles, is underrepresented in plants. It was previously speculated  that there might be a higher plant Kinesin-1/KHC family member but it was not conclusive. It appears that Arabidopsis and O. sativa have one Kinesin-1/KHC gene and the diatom has two, whereas there may not be a Kinesin-1 in Chlamydomonas. Overall, cargo-transporting kinesins are either absent (Kinesin-2 and Kinesin-3) or underrepresented (Kinesin-1) in flowering plants. The cargo-transport functions of some of these kinesins are either not needed in flowering plants or performed by members of other families of kinesins or cargo transporting myosins, which are expanded in plants [2, 66]. Although Kinesin-9 family members are absent in flowering plants, they are present in two photosynthetic eukaryotes (four in Chlamydomonas and two in diatoms). It appears that this family is lost in flowering plants. The functions of the Kinesin-9 family are unknown . Members of the Kinesin-11 family function in signal transduction and contain a few highly divergent kinesins and are absent not only in plants but in many other lineages [67, 68].
Forty indica kinesins have orthologs in japonica and three of these (IBCD021045, IBCD025572 and IBCD012736 in kinesin 14 family, see Figure 10A&B) are duplicated in indica only. The duplications found in indica may have occurred recently or japonica sequence may not be complete. In addition, two kinesins in indica (IBCD031186 in kinesin 7 family, see Figure 7; IBCD014642 in kinesin 14 family, see Figure 10A) have no counterparts in japonica whereas one kinesin in japonica (SBCC002748, see Figure 9) has no counterpart in indica, suggesting that either a specific kinesin is lost in one cultivar or it is due to differences in unsequenced gaps in the genome of these two rice cultivars.
Two families (Kinesin-7 and Kinesin-14) are vastly expanded in plants
Kinesin-14, the C-terminal motor family, and the Kinesin-7 family have greatly expanded in plants through gene duplication. Kinesin-14 is a diverse family containing eukaryotic representatives from almost all major eukaryotic groups; therefore, the members of this family are likely to play important evolutionarily conserved cellular roles. Kinesin-14 family members show minus-end motility and perform multiple functions in cell division and organelle transport [9, 18]. Kinesin-14 is the largest family of kinesins in flowering plants (about 35% of Arabidopsis kinesins fall in this group). Flowering plant kinesins in this family contain several domains that are not present in non-plant kinesins. In the Kinesin-14 family, there are several subfamilies in which the motor domain is located in the middle or at the N or C terminus. It is interesting to observe the many plant-specific kinesin duplications (Fig. 10). Based on the inferred phylogenetic relationships among the plant species sampled [50, 69–71], we infer a minimum of nine duplications in the top clade of flowering-plant lineage (see 10A, internal motor-1 and -2) prior to the divergence of monocots from dicots, seven duplications within the dicot lineage, and one duplication within the monocot lineage (see Fig. 10A). Because the green and red algae Kinesin-14 sequences are resolved as sister to the upper Kinesin-14 flowering-plant clade, we also infer that there was a plant-specific duplication prior to the divergence of the red and green algae. One of the two copies was then lost in both the red and green algae lineages, yet retained in the flowering-plant lineage.
The first subfamily of the Kinesin-14 family has plant-specific kinesins that are shared by the both the green and the red algae (Fig. 10A). Members of flowering plants in this group have a calponin-homology (CH) domain that is not present in green or red algae, suggesting that this domain was gained in the flowering-plant lineage (see domain analysis section). The second group that is well supported by all four analyses only includes kinesins from flowering plants (Fig. 10A). Interestingly, members of these two top most groups have an internal motor domain instead of the C-terminal motor domain for which this family is named. Another group that is restricted to the plant kingdom deals with the N-terminal flowering plant-specific group in the Kinesin-14 family (See Fig. 10B). Perhaps the N-terminal and internal motor groups in the family could have arisen in flowering plants by domain shuffling of a C-terminal motor. The members of the seventh group with a C-terminal motor contain a myosin tail homology 4 (MyTH4) region and talin-like region (also known as Band 4.1 or FERM) that are not present in any non-plant kinesins. The last large subfamily of the Kinesin-14 family is reflective of true C-terminal motors in plants, diatoms, animals and protozoans (Fig. 10B). From this analysis it appears that the Kinesin-14 family is composed of multiple subfamilies instead of the two previously reported Kinesin-14A and Kinesin-14B families . Based on the location of the motor and the presence of other domains, there could be up to eight subfamilies within the Kinesin-14 family (five of which are likely to be plant-specific). There appears to be several dicot- and monocot-specific duplications in plant kinesins of this family. Several of the subfamilies resulted primarily due to the emergence of novel kinesins in the plant lineage. The members of this family with the C-terminal motor domain have been shown to be minus-end motors [8, 72]. Although plant motors with a C-terminal domain translocate toward the minus-end of MTs [39–41], it is not known if the internal and N-terminal plant motors are also minus-end motors. The functions of several Arabidopsis Kinesin-14 family members that contain the motor domain at the C terminus (e.g., KatA/ATKl [At 4g21270], ATK5 [At 4g05190], KCBP [At 5g65930]), N terminus (GRIMP/KCA1 [At 5gl0470]) and KCA2 [At 5g65460]) or in the middle (KatD [At5g 27000]) have been analyzed. Several of these (ATK1, ATK5, KCBP) are involved in some aspect of cell division [20, 23, 24, 28, 29, 31], suggesting that the plant kinesins of this family play important roles in cell division. The cotton homolog of Arabidopsis KatD localizes to cortical MTs and microfilaments and interacts directly with F-actin . GRIMP/KCA1 interacts with a geminivirus replication protein and localizes to segregating chromosomes and spindle poles . Both GRIMP/KCA1 and KCA2 interact with a cyclin-dependent kinase (CDKA;1), which controls cell cycle progression, and localizes to MTs and phragmoplast , suggesting a role for these also in cell division. The fact that plants have unique MT arrays such as the preprophase band and phragmoplast that play critical roles in plant cell division, lack centrosomes to organize MTs to establish a bipolar spindle  and have no (or few) dyneins, [74, 75], which are also minus-end motors, suggests that plants would require novel kinesins to perform these plant-specific roles and to cover the functions performed by dyneins in animals. In addition, several reports indicate that plants transport macromolecules (e.g. RNA and proteins) and viruses form cell to cell through plasmodesmata [1, 76]. Such transport activities may also require kinesins. Hence, it is possible that the expansion of kinesins in plants accounts for the need for plant-specific motors in flowering plants.
Kinesin-7/CENP-E is the second largest kinesin family in plants and one clade (plant-specific clade in Fig. 7) contains kinesins from only photosynthetic eukaryotes with a green algal kinesin as a sister group to those from the flowering plants. The flowering-plant-specific clade is strongly supported by both Bayesian and parsimony analyses. Hence, multiple members in a species in this clade (e.g., seven in Arabidopsis) are inferred to have arisen by at least six gene duplications in the flowering-plant lineage prior to the divergence of monocots from dicots, five in the dicot lineage, and one in the monocot lineage. The functions of two members of this clade have been reported. Kinesins encoded by At 1g18370 (also called NACK1/HINKEL) [25, 26] and At 3g43210 (also called STUD/TETRASPORE/NACK2) [25, 27] encode functionally related motors. Loss-of-function mutants of these kinesins revealed their role in cytokinesis [25–27, 77]. Interestingly the NACK1 activates a MAP kinase (MAPK) . The second clade in this family also contains flowering plants, amoebozoa and heterokonts, but not opisthokonts (vertebrates, invertebrates and fungi). Two Arabidopsis members of this group are targeted to mitochondria , implying an unknown function for these kinesins in this organelle. The third group contains kinesins from flowering plants and amoebozoa. Members of these groups are inferred to be more closely related to one another than to the small, animal subfamily. Because of extensive duplication in the CENP-E family in plants, the members of this family may have been recruited to perform plant-specific functions. In animals, members of this family function in capturing kinetechore MTs. Studies with some members of plant kinesins that belong to Kinesin-7 indicate their role in cytokinesis and some unknown function in mitochondria [36, 77]. Overall, our analyses indicate that there was tremendous diversification in Kinsin-14 and Kinesin-7 families in flowering plants.
Seven of the ten kinesin families in flowering plants are present in Chlamydomonas
Chlamydomonas, a member of chlorophyte algae, represents the sister group of the flowering plants given our taxon sampling [46, 50]. Hence, the analysis of kinesin families in this species should provide some insights into evolution of kinesins in flowering plants. Chlamydomonas, despite the fact that it is unicellular, has 23 kinesins (Table 3). Twenty of these were grouped into nine recognized families whereas the remaining three are ungrouped (Table 3). Kinesin-1, -3, -6, -10 and -11 families are absent in Chlamydomonas. Of the ten kinesin families present in flowering plants, seven are present in Chlamydomonas (Fig. 2). Three families (Kinesin-1, -6 and -10) of flowering plants are inferred to have been lost in the Chlamydomonas lineage. Two families (Kinesin-2 and Kinesin-9) that are present in Chlamydomonas were lost in the flowering-plant lineage. One of these families (Kinesin-2) is involved in intraflagellar transport. As mentioned above, the absence of flagella/cilia in flowering plants may have resulted in the loss of this family.
The red alga (C. merloae) has only five kinesins that belong to four families (Table 2) whereas the diatom (T. pseudonana) has 25 kinesins (about the same number as in the green alga) that fall into nine known kinesin families with four kinesins unresolved (Table 7). Although the green alga and the diatom have nine families, unlike Chlamydomonas, the diatom has Kinesin-1 and Kinesin-6 but may not have members of the Kinesin-2 or Kinesin-8 families. Remarkably, Kinesin-14 is the largest family in all photosynthetic eukaryotes. Among the 14 recognized families, only four (Kinesin-5, -7, -12 and 14) were shown to be present in all photosynthetic eukaryotes (Fig. 11). The absence of myosins and dyneins in the red alga (C. merolae) suggests that kinesins play important roles in this species [48, 78].
Giardia, an early-derived eukaryote, has ten of the fourteen kinesin families
In Giardia there are 24 kinesins (see Additional file 12). This is more than half the number of kinesins found in humans . However, Giardia has no recognizable myosin , suggesting that kinesins perform most of the transport functions. Sixteen of the 24 kinesins in Giardia were resolved into ten known families whereas the remaining eight were unresolved (see Additional file 12). If Giardia is indeed part of the earliest derived extant lineage of the eukaryotes and therefore existed prior to the plant-animal split [44, 46, 47] the ten families with representatives in Giardia are inferred to represent the basic set of kinesin families in early eukaryotes. The families that are not represented in Giardia are Kinesin-6, -10, -11 and -12. Hence, these families may have emerged later in eukaryotic evolution through gene duplication. Four of the ungrouped kinesins in Giardia did not group with kinesins from other species whereas the remaining four grouped either with Leishmania or Chlamydomonas kinesins (see Fig. 2). Many of the domains found in flowering plants and animals are not present in Giardia (Fig. 11).
Other kinesin families in plants
All flowering-plant kinesins from the Kinesin-4 family form a well-supported clade as the sister group of a Chylamydomonas sequence (Fig. 5). Two Chlamydomonas kinesins in this family did not group with flowering plants, though this resolution was not supported in the parsimony analyses (Fig. 5). A member of this family is involved in cell wall deposition . Kinesin-5 family motors function in cell division and spindle formation . All flowering-plant kinesins of the Kinesin-5/BIMC family form a well-supported clade sister to the single Chlamydomonas sequence (Fig. 5). Plant members of this family, like their animal counterparts, are likely to function in cell division . Members of the Kinesin-6 family that function in cytokinesis in animals are not inferred to have undergone any gene duplications in plants (in which both copies have been retained as functional genes). There is only one kinesin of this family in each of the flowering plant species analyzed here and none in the green or red algae. Since cytokinesis in plants is quite different from animals , it appears that members of other kinesins families perform this function. Kinesin-8 members are found in plants, fungi and animals. Oryza and Arabidopsis have two Kinesin-8 genes whereas Populus has one (Fig. 6). The non-plant members function in nuclear migration and mitochondrial transport. The function of plant members of this family remains unknown. Plant kinesins associated with the Kinesin-10 family were resolved as two separate clades from the main polytomy (Fig. 8) indicating that two copies of Kinesin-10 were present in flowering plants prior to the divergence of monocots from dicots.
The Kinesin-12 family members function in organelle transport . This family includes kinesins from both plants and animals. There are multiple members of this family in each flowering plant species. The flowering plant members formed two distinct clades, one as a sister group to Chlamydomonas, with the red algae sequence as sister to both (Fig. 9). Based on this resolution, we infer at least one gene duplication after the divergence of red and green algae from one another yet prior to the divergence of green algae from flowering plants, three duplications in the flowering-plant lineage prior to the divergence of monocots from dicots, and two duplications within the dicot lineage. Two plant members of this group localize to a plant-specific structure called the phragmoplast . Members of the Kinesin-13 family, most of which have internal motors, transport vesicles and have MT depolymerizing activity [81, 82]. Plant members of this family also form a distinct clade with the Chlamydomonas kinesin as a sister group (Fig. 9). These internal-motor plant kinesins are distinct from the other internal-motor plant kinesins (found only in plants) in the Kinesin-14 family.
Domain analysis was performed on all eukaryotes used in this study as described in the Methods section. The most prevalent domain is the coiled-coil region; almost every kinesin sequence analyzed has a coiled-coil prediction (based on the SMART algorithm  (see Tables 2, 3, 4, 5, 6, 7 and Additional files 1 to 12). Among the kinesins analyzed here, about 30 known functional domains (not including the motor domain and coiled-coil region) are found. Fig. 11 shows the list of functional domains and their presence in various species. A schematic diagram of kinesins depicting all the domains in the green alga, red alga and diatom are shown in Fig. 12 whereas domain figures of Populus and one species of Oryza are shown in Figs. 13 and 14, respectively. Domains in Arabidopsis kinesins were reported previously . Various known domains in non-plant systems are indicated in Additional files 1 to 12. Interestingly, not a single domain is present in all kinesins. Instead, most domains are restricted to a particular lineage (Fig. 11) suggesting that most of these are gained later in evolution and have novel functions. This is also supported by the fact that in Giardia most of these domains (except fork head associated and helix-turn-helix domains) are absent. Some domains such as myosin tail homology domain 4 (MyTH4) and band 4.1 (also called talin-like region or FERM) are restricted to green algae and flowering plants (Fig. 11). Although MyTH4 and band 4.1 domains are present in several animal proteins including some myosins, they are not found in non-plant kinesins. Interestingly, in Arabidopsis the MyTH4 and band 4.1 are present in one kinesin and are not present in any other protein encoded in the genome . We have previously shown that MyTH4 and talin-like regions are involved in binding to MTs , suggesting that it may be involved in cross-linking and/or bundling MTs. It was recently shown that MyTH4 and band4.1 in myosins also bind MTs , hence these domains are likely to function in cross-linking actin and MT cytoskeleton and/or transfer of cargo between two different cytoskeletal elements.
Calponin homology (CH) and kinesin-related (KR) domains are found in flowering plants but not in green and red algae or heterotrophs (Fig. 11). There are several kinesins with one CH domain and a KR domain in each flowering plant analyzed here (see Figs. 13 and 14). All CH domain kinesins belong to the Kinesin-14 family and were resolved as the first clade in this family (see 10A). The only other protein family that has the CH domain is fimbrin. Plant fimbrins have four copies of the CH domain and bind F-actin. The CH domain is a protein module of about 110 residues found in cytoskeletal and signal transduction proteins either as a single copy or multiple copies in tandem. Proteins with a tandem pair of CH domains cross-link F-actin, bundle actin or connect intermediate filaments to cytoskeleton . Proteins with a single copy are involved in signal transduction . Although plant kinesins have only one CH domain, recently it was shown that a kinesin with this domain interacts with F-actin, suggesting that the kinesins with this domain may be involved in interaction between actin and MT cytoskeleton . The function of the KR domain is not known. However, the kinesins with this domain associate with the phragmoplast [21, 22] and belong to the Kinesin-12 family (Fig. 9). Several flowering-plant kinesins and one non-plant (Leishmania) kinesin have armadillo/betacatenin-like (ARM) repeats that are known to mediate protein-protein interactions. Diverse proteins contain ARM repeats that form a superhelix of helices and function in intracellular signaling and cytoskeletal regulation. Although none of the vertebrate and invertebrate kinesins have an ARM, a Kinesin-2 family-associated protein called KAP3 in animals contains the ARM repeat [2, 88]
Animal kinesins have some domains that are not found in photosynthetic eukaryotes. These include fork-head associated (FHA), pleckstrin homology (PH), CAP-Gly domains and WD-40 repeats. The FHA domain is known to interact with phosphothreonine in proteins. Cap-gly is a glycine-rich domain of about 40 amino acids that is found in cytoskeleton-associated proteins (CAPs). The WD-40 repeats are also short (about 40 residues) motifs that often terminate in Trp-Asp (W-D) dipeptide and facilitate the formation of multi-protein complexes. Two of these domains (FHA and PH) are present in several protozoans (Fig. 11), suggesting that these domains may have been present in kinesins in the most recent common ancestor of all extant eukaryotes.
The only domain common to both plants and animals (both invertebrates and vertebrates) is the helix-hairpin-helix (HHH) DNA binding motif in the Kinesin-10 family that functions in chromosome segregation (Fig. 11). A member of this family has been shown to bind DNA ( and it is likely that others with this domain also bind DNA and function in chromosome segregation. However, the HHH domain is not present in fungi or the green or red algae, suggesting that plant and animals may have acquired this domain independently. Overall, the domain distribution in kinesins suggests several domains were added to plant and animal kinesins after the plant-animal split. Interestingly, there are several domains present only in either Leishmania or Phytopthora sojae, which suggests tremendous diversification of kinesins in lower eukaryotes that may have to do with their unique life cycle and cell biology. In diatom, most kinesins are short (Table 7 and Figure 12), which could be due to poor quality of the gene models.
Genome duplication in flowering plants may have contributed to the expansions of kinesins in this group
Whole genome duplications are believed to be a driving force for genome evolution in angiosperms since many modern diploids appear to be ancient polyploids [90, 91]. In Arabidopsis, the duplicated segments represent about 58% of the genome and several kinesins are present in the duplicated regions [43, 92]. In Oryza, 18 pairs of duplicated segments cover 65.7% of the mapped super-scaffolds . To visually represent the number of O. sativa ssp. japonica kinesins that fall within these segmental duplicated regions, an approximated chromosome map was generated according to the genomic map presented by Guyot and Keller, . Figure 15 depicts the distribution of kinesins across the 12 Oryza chromosomes. Roughly 26 of the 41 japonica kinesins are within duplicated segments. Chromosome 3 has the most kinesin genes (6), whereas chromosome 10 has none. The remaining chromosomes contain one or more kinesin encoding genes. The duplicated region in the long arm of chromosome 1 contains three kinesins (OSBCC02630, OSBCC02748, OSBCC03463); the corresponding duplicated block on chromosome 5 has only two (OSBCC18779, OSBCC19161), which could be suggestive of a gene loss event. The duplicated block on the short arm of chromosome 2 also may have experienced a gene loss event as it is bereft of any kinesins, whereas its counterpart on the long arm of chromosome 6 contains a single (OSBCC22107) kinesin.
Intron/exon organization of kinesin genes
Information on the presence of introns in kinesins of all species analyzed here is presented in Tables 2 to 7 and Additional files 1 to 12. All kinesins in Chlamydomonas and flowering plants have many introns whereas introns are absent in kinesin genes of red alga, Giardia and Leishmania. Because of the large number of introns in kinesin genes of most species, the diversity of kinesin motors may increase by alternative splicing of kinesin pre-mRNAs. Although the extent of alternative splicing of kinesin pre-mRNAs in plants is not known, there are examples in animals where alternative splicing of some kinesins results in generation of isoforms with different domains and with distinct functions [95, 96].
Flowering plants have the largest number of kinesins among all species yet sequenced. Gene duplication and functional diversification of specific families (e.g., Kinesin-14 and Kinesin-7) appears to have contributed to the high number of kinesins in flowering plants. Addition of novel domains to kinesins in lineage-specific groups contributed partly to the functional diversification of kinesins. The Kinesin-14 family, which typically contains a C-terminal motor, has many plant kinesins that have the motor domain in the middle or at the N terminus as well as at the C terminus. The presence of most kinesin families of flowering plants in Chlamydomonas indicates that these families were retained in both lineages. Since plants have no or few dyneins, it appears that the kinesin family of MT motors has expanded in plants. Despite the large number of kinesins in flowering plants, three or four of the 14 recognized families are absent. The vast expansion of some kinesin families in flowering plants suggests that they are likely to perform plant-specific functions. Many kinesins in Leishamania, Giardia and Chlamydomonas were not resolved with known kinesin families and may represent novel kinesin families and/or early-derived members of the 14 recognized kinesin families that are not resolved as such in our inferred gene tree. Lineage-specific domain architecture in the plant and opisthokont lineages and absence of these domains in kinesins of other eukaryotes suggests acquisition of these domains more recently. The gene-tree analysis presented here is important for understanding kinesin evolution and should provide a framework to study cellular roles of kinesins. The challenge ahead is to elucidate the functions of individual kinesins and their regulation.
Identification and analyses of kinesins in recently completed genome sequences
All BLAST searches were conducted by using three distinct motor domain sequences from the Kinesin-1 (human KHC, N-terminal motor), Kinesin-13 (mouse KIF2, internal motor domain) and Kinesin-14 (AtKCBP, C-terminal motor domain) families. Unless otherwise noted, BLAST searches were done using all three motor domains as queries. In all BLASTP searches we used an E value cut off of 1. With this cut off value, all database searches yielded kinesins and many unrelated proteins. We then performed domain analysis on all hits as described below in section II. All proteins with kinesin motor domain are retained whereas the rest are eliminated.
i) Oryza sativa ssp. japonica
All of the available genome sequences of this subspecies were extensively analyzed for kinesins as described below.
a) Searches at NCBI and Bioverse
BLASTP searches at  using the "Oryza sativa" (ssp. japonica) protein database were performed. Sequences with each motor-domain search were concatenated into a single file and duplicates were removed. The same query sequences were used in BLASTP searches against the nr database at NCBI . After the output files were parsed and compared with the original PlantBlast searches, a preliminary total of 41 kinesins were identified. To identify the kinesins that may not have been annotated in the genome using the gene-prediction programs, TBLASTN searches were performed against the PlantBlast "Oryza sativa" DNA sequence database. This search resulted in identification of one new kinesin that was not part of the preliminary total obtained via BLASTP searches.
Analysis of Bioverse Oryza database at  using the keyword, "kinesin" yielded 72 hits. The amino acid sequences from these 72 hits were extracted for BLASTP searches against the nr and PlantBlast databases at NCBI and against each other. This analysis resulted in identification of two new additional kinesins (bringing the total to 44), whereas the rest corresponded either to previous kinesin predictions or were not kinesins.
b) Searches at Oryza genome databases
The protein predictions for ssp. japonica were downloaded from rice genome database . Recently, the analysis of two subspecies of Oryza (indica and japonica) was refined . The Syngenta predictions were refined by using a new program, BGF (Beijing Gene Finding), developed by the Gene Finding Team at BGI for gene identification in eukaryotic genomic DNA sequences. It is based on Dynamic Programming and HSMM (Hidden Semi-Markov Model) algorithm with a special emphasis on Oryza genomes . BLASTP searches were performed. The output files were parsed, concatenated and duplicates were removed. Fifty-five possible kinesins were found and their full-length amino acid sequences were retrieved. FGENESH (another gene finding program) predictions were also downloaded from  and used for BLAST searches. The predicted kinesins here were blasted against the BGF Syngenta predictions and no new kinesins were found. The kinesins obtained from NCBI were used in a BLASTP search against the BGF Syngenta predictions. All sequences from NCBI were accounted for by the BGF Syngenta sequences. Closer inspection of the output file revealed that two submissions made by independent investigators to GenBank, AAF78897 (817aa) and CAE05519 (1094aa) are duplicates, with AAF78897 being a truncated version of CAE05519. AAF78897 was eliminated from the original NCBI list because the CAE05519 sequence is similar to the BGF predicted OSSBC014640 (1109aa) in sequence length. Also predictions XP_450031.1 and XP_450032.1 are duplicates. XP_450032.1 (971aa) is the truncated version of XP_450031.1 (1035aa), which better corresponds to OSSBCC029113 (1045aa). Hence, XP_450032.1 was eliminated. XP_483647.1 and XP_483646.1 (986aa and 1003aa, respectively) are also duplicates, but we eliminated XP_483646.1 from the NCBI list because the 986 amino acids of XP_483647.1 were a better match with OSSBC028926 (965aa). Consequently, theoriginal list of 44 NCBI kinesins has been decreased to 41. This analysis showed that all 41 kinesin sequences derived from NCBI searches were referenced to the BGF Syngenta predicted japonica sequences. Therefore, BGF Syngenta kinesins were used in order to foster continuity for gene-tree analyses. Analysis of all 55 BGF kinesins using Interproscan  resulted in (see Section II) a total of 41 kinesins.
ii) Oryza sativa ssp. indica
For the subspecies indica cv. 93-11, BGF and FGENESH protein databases were downloaded from the ftp site at . BLASTP searches were performed as above using similar criteria. After parsing the output files, the 54 FGENESH predictions were reciprocally blasted against the 55 BGF indica sequences and it was found that all FGENESH predictions were included in the BGF predictions. Analysis of these sequences using Interproscan as described in Section II yielded a total of forty-five indica kinesins.
iii) Arabidopsis thaliana
iv) Populus trichocarpa
A predicted protein database is not yet available for P. trichocarpa. This database presented a unique problem as the gene/protein predictions were done using four (FGENESH, EUGENE, GRAIL, GENEWISE) eukaryotic gene-prediction programs. Protein predictions at  from each program were searched using the keyword, "kinesin". This analysis yielded 115 putative kinesins (36 FGENESH, 50 EUGENE, 5 GRAIL/GENWISE, 24 EST_EXT FGENESH). Duplicate removal and domain analyses produced a set of 52 unique sequences (see Section II).
v) Cyanidioschyzon merolae
The C. merolae annotated coding sequences and translated ORF databases at [105, 106] were used for BLASTP searches. This search yielded five kinesins, which were extracted using the search function at  and inputting the locus accessions.
vi) Chlamydomonas reinhardtii
BLASTP searches were performed against the Version 2 protein models database at . The hits were parsed of duplicates and 35 sequences were extracted from the database. This analysis yielded 23 kinesins (see Section II).
vii) Thalassiosira pseudonana
T. pseudonana release 1.0 predicted proteins database was downloaded from JGI and used for BLASTP searches. Twenty-seven kinesins were extracted, but only 22 were used in our gene tree analyses (see Section II).
viii) Ciona intestinalis
C. intestinalis release 1.0 predicted-proteins database was downloaded from JGI and analyzed in the manner described above. Thirty-five potential kinesins were found, but the last hit was only 93 amino acids long and was discarded.
ix) Phanerochaete chryosporium
Due to the lack of a predicted protein database for P. chryosporium, potential kinesin sequences were acquired by means of advanced-keyword searches at JGI with the query, "kinesin". Eleven hits were found and downloaded in FASTA format but only eight were used in gene-tree analyses (see Section II).
x) Phytophthora sojae
The P. sojae predicted protein database was downloaded from  and used for BLASTP searches. Interproscan analysis of 56 hits led to 43 possible kinesins.
xi) Giardia lamblia
A translated ORF database for G. lamblia was downloaded from . BLASTP searches of this database returned 24 hits. These sequences were extracted from the database and analyzed using Interproscan.
xii) Homo sapiens
The IDs of 36 kinesins obtained from the Kinesin HomePage  were used to acquire the protein sequences using batch entrez at NCBI . Sequences were run through Interproscan for domain analysis and only 32 sequences were kept (see Section II).
Since this number is smaller than what was found in previously published studies, the CELERA protein database was downloaded from  and used for BLASTP searches. Forty-four putative kinesins were obtained and analyzed by Interproscan.
Sequences harboring motor domains that were less than 290 amino acids were discarded. The remaining 26 sequences were blasted against the 32 NCBI kinesins to remove duplicates. Eight unique CELERA sequences were appended to the 32 NCBI sequences for a working total of 40 human kinesins.
xiii) Drosophila melanogaster
BLASTP searches using HsKHC at  were performed. Sequences were downloaded and reciprocally blasted against each other. Five sequences (CG1453-PA, -PB, -PC, -PD and -PE) are replicates, thus only CG1453-PA was retained for analysis. Likewise, 8183-A and 8183-B are duplicates and 8183-A was kept. Also CG9913-A and B are duplicates and only 9913-A was kept. After Interproscan analysis 25 possible kinesins were found.
xiv) Caenorhabditis elegans
BLASTP searches were performed at . Nineteen sequences were retrieved and run through Interproscan. Sequence F22F4.3, which has a short motor domain (248 amino acids) was removed and corresponding GI|7499692 from NCBI, which has a longer predicted protein was used instead.
xv) Dictyostelium discoideum
Protein sequences were downloaded from  and used for BLASTP searches. Thirteen possible kinesin hits were retrieved and analyzed by Interproscan.
xvi) Plasmodium falciparum
P. falciparum- predicted protein databases (Pfa3D7_WholeGenome_Annotated_PEP_2004.11.23 and Pfa3D7_WholeGenome_Automatic_PEP_2004.11.23) were downloaded from  and used in BLASTP searches. Nine sequences were found from the annotated PEP database, whereas 25 sequences were recovered from the automated PEP database.
The 25 automated PEP sequences were extracted and reciprocally blasted to eliminate duplicates. A cross blast was performed between the two databases that resulted in 9 kinesins.
xvii) Leishmania major
The L. major amino acid database was downloaded from  and used for BLASTP searches. Fifty-five sequences were retrieved and reciprocally blasted to search for duplicates. The final number of kinesins in this species is 54 after one duplicate was removed.
xviii) Saccharomyces cerevisiae
BLASTP searches at  recovered six kinesins.
xix) Schizosaccharomyces pombe
BLASTP searches at  resulted in nine kinesins.
Additional searches of six frame translations of the genome sequences have not yielded any new kinesins.
Analysis of domains and retrieval of motor domain sequences
a) Interproscan analyses
Interproscan  was downloaded to perform batch analyses of the full-length sequences of the putative kinesins from all species. Start and end positions of motor domains were obtained by SMART predictions. The 55 full-length BGF-japonica sequences were scanned; inspection of O sSBCC020712 (indica ortholog, Os IBCD019766), a 192 amino acid protein provided the impetus for establishing the criteria necessary to generate a working list of kinesins for gene-tree analyses. Global alignments of all japonica kinesins to AtKCBP were examined and any sequences missing conserved domains such as ATP binding sites or with motor domains that were generally shorter than 290 amino acids were discarded. Adherence to these criteria reduced the 54 potential japonica kinesins to 41. Following these criteria, ssp. indica sequences were reduced to 45. The 115 P. trichocarpa hits (FGENESH, EUGENE, EST_EXT FGENESH) were scanned and filtered of any sequences that did not contain motor domains. Of these, 52 hits (21 FGENESH, 21 EUGENE, 10 EST_EXT FGENESH) contained the motor domain. The remaining sequences that did not contain the motor domain were eliminated as well as those that contained truncated motor domains (FGENESHl_pg.C_scaffold_70000181 and FGENESHl_pg.C_LG_I000890). Thirty-five C. reinhardtii sequences were scanned and all sequences with motor domains shorter than 290 amino acids were discarded to yield 23. Some T. pseudonana proteins were extremely short and consisted only of truncated motor domains. Consequently, the working number of kinesins was lowered to 22. Thirty-four C. intestinalis sequences were scanned and the number of kinesins was reduced to 29 due to truncated motor domains or very short sequences.
Scans of the 11 P. chryosporium sequences reduced the number of kinesins in this species to eight. Homo sapiens sequences GI6225915 and GI3978240 are duplicates. Sequence GI19923949 with a motor domain (232aa) was excluded. Twenty-six D. melanogaster kinesins were scanned and only 25 were truly predicted kinesins. The 55 L. major sequences were scanned and one sequence (LmjF25.2410) was removed because it had no motor domain.
b) Extraction of motor domain
Motor-domain sequences were extracted from their entire protein sequences by using the EMBOSS seqret utility  with base ranges obtained from SMART predictions. Concatenation of all motor domain files together yielded a final number of 529 kinesin sequences for gene-tree analysis. All coiled-coil predictions were found by utilization of SMART predictions at . A total of 529 sequences were included in the analysis. Ten sequences had one or two ambiguous amino acids, for a total of 12. Prior to alignment, all 12 internal stop codons (from eight sequences with one to three internal stops each) were changed to amino acid ambiguities ("X").
Alignment of amino acid kinesin-motor-domain sequences was performed using DIALIGN-T 0.1.2  with the default settings (length of a low-scoring region = 4; maximum fragment length that is allowed to contain regions of low quality = 40). Amino acids from individual sequences that DIALIGN did not align (5,126) were replaced with ambiguities. The DIALIGN output file was 6,046 positions long. Of these, 2,584 positions included aligned amino acid(s) and 834 (32%) of those positions were parsimony-informative.
SeqState 1.2  was use to implement simple indel coding  for gaps that were flanked by aligned residues at both the amino and carboxy termini. Of the 1,190 non-terminal gap characters, 668 were parsimony-informative. Fifty-one percent of the cells were either missing data or inapplicables for the parsimony-informative gap characters, as were 60% of the cells for the parsimony-informative amino acid characters. To examine the effect of incorporating gap characters into the study, gene-tree analyses were performed both with and without the gap characters. Gene-tree analyses were performed using amino acid characters. Although amino acid characters have problems with convergence [122, 123] and composite coding , that nucleotide characters are not subject to, they are expected to perform relatively better when high genetic distances occur among closely related terminals included in the analysis , as is the case here. This expectation is based on silent substitutions undergoing saturation (i.e., multiple hits along individual branches). Gene-tree inference was performed using both parsimony and Bayesian MCMC  approaches. Parsimony tree searches were performed using PAUP* 4.0b10  with all characters assigned equal weights. Jackknife analyses were performed using 1,000 replicates with each replicate consisting of one tree-bisection-reconnection heuristic search and only one tree held. Following Farris et al., , the deletion probability for each character was set at 36.7879% and "Jac" resampling was emulated, resulting in support values roughly equivalent to those provided by the bootstrap .
Bayesian tree searches were performed using MrBayes 3.1  with a mixed amino acid model. All analyses were performed with four chains per analysis and trees sampled every 100 generations. A preliminary analysis of over 3.6 million generations for the amino-acid-characters-only data matrix was performed using MrBayes 3.0b4. To speed convergence in the final analyses, the MAP tree topology found in this preliminary analysis was specified as the initial user tree with 20 permutations .
For the amino-acid-characters-only data matrix, three independent runs were performed for between 1,001,000 and 1,628,500 generations each. All three runs asymptotically approached the same stationarity within the first 200,000 generations, and the remaining 30,919 trees were used to infer the posterior probabilities for individual clades. The analysis that reached 1,628,500 generations was performed in parallel on two dual-processor 1.8 GHz Power Mac G5 computers and ran for approximately 6 weeks. The other two independent runs were executed for seven weeks on a cluster of dual-processor 1 GHz IBM PCs. The data matrix that included gap characters was analyzed using a mixed model with the binary Felsenstein (1981)-type model applied to the gap-characters partition with the ascertainment bias set to variable, as suggested by Ronquist et al., . The no-common-mechanism model  was not applied to the gap characters because MrBayes 3.1 crashed when applying this model together with a parametric model in a mixed-model analysis. Seven independent runs were performed for between 272,500 and 1,118,100 generations each. All seven runs asymptotically approached the same stationarity, six of which did so within the first 150,000 generations, and the seventh within the first 450,000 generations because of a late stepwise increase in likelihoods. The remaining 28,000 trees were used to infer the posterior probabilities for individual clades. The analysis that reached 1,118,100 generations was executed in parallel for approximately five weeks on two dual processor 1.8 GHz Power Mac G5 computers, whereas the other six independent runs were executed for seven weeks on the previously mentioned cluster.
Majority-rule consensus trees for the Bayesian analyses were calculated using PAUP*. Note that although Bayesian analyses appear to be more efficient than parsimony analyses , they also can produce inflated support values [134–136]. Also of concern for Bayesian analyses of these data matrices, which include high proportions of cells with missing data and inapplicables, is that smaller clades may receive high support despite ambiguous resolution of "wildcard" terminals . This may account, in part, for the greater resolution in the Bayesian trees (439 and 453 clades resolved in the amino-acid-only and amino-acid-plus-gap-character analyses, respectively) than in the parsimony trees (288 and 296 clades resolved) given the many completely unresolved terminals in both the Bayesian (36 and 20 terminals) and parsimony (107 and 112 terminals) trees.
The rooting of the kinesin family used by Goodson et al., , Kim and Endow , and Reddy and Day  is arbitrary. Outgroup sequences are to be selected such that all members of the ingroup are more closely related to one another than any one of them is to the outgroups (i.e., the ingroup should be monophyletic relative to the outgroup; . Although ScSMY1 is "a highly divergent kinesin protein" , this does not satisfy the criterion of selecting an outgroup. See Lawrence et al.,  for an alternative rooting, wherein ScSMY1 is nested within the Kinesin-I family. Following Hirokawa , Miki et al., , Iwabe and Miyata , Schoch et al.,  and Abdel-Ghany et al., , our gene trees are presented as unrooted. All gene trees (except Fig. 2) were drawn using a combination of automatic and manual methods. The use of TreeGraph 1.0b8  greatly facilitated the making of complex tree figures. Though TreeGraph allows one to specify node labels, ours were inputted manually using an external drawing program. TreeGraph can be downloaded from .
Analysis of gene structure and expression data
Gene-structure information was obtained by performing searches with gene identifiers at the appropriate web pages. For G. lamblia, the complete contig assembly was downloaded and used as a database for BLASTN searches using the 24 G. lamblia sequences in nucleotide format (obtained from ). All 24 sequences returned contig hits that were 100% identical with no gaps, indicating that there are no introns in G. lamblia kinesins. For the eight human Celera sequences, a Celera transcript database was downloaded and searched with the Celera protein IDs for corresponding transcripts that have been annotated with exon number.
Expression data were collected by performing BLAST searches against appropriate databases (O. sativa ssp. japonica, H. sapiens EST database, L. major EST database). A full sequence file containing japonica cDNAs was downloaded from  and used for BLASTP searches using the 41 BGF Syngenta kinesins as queries. No cDNA data were available for indica sequences. For the NCBI human kinesins, EST data were determined by performing a TBLASTN search against the human EST database using the 32 NCBI sequences. Leishmania major expression data were obtained by performing a TBLASTN search using the 54 full-length sequences against a database of 2,184 EST sequences. There were no hits found.
Mapping of kinesins in O. sativa ssp. japonica chromosomes
Chromosomal duplications in ssp. japonica on the genomic map were based on Guyot and Keller . Chromosomes were rescaled appropriately and kinesins were mapped according to base pair positions obtained from .
Reddy ASN: Molecular motors and their functions in plants. Int Rev Cytol & Cell Bio. 2001, 204: 97-178.
Vale RD: The molecular motor toolbox for intracellular transport. Cell. 2003, 112: 467-480. 10.1016/S0092-8674(03)00111-9.
Hirokawa N: Kinesin and dynein superfamily proteins and the mechanism of organelle transport. Science. 1998, 279: 519-526. 10.1126/science.279.5350.519.
Mermall V, Post PL, Mooseker MS: Unconventional myosins in cell movement, membrane traffic, and signal transduction. Science. 1998, 279: 527-533. 10.1126/science.279.5350.527.
Goldstein LSB, Philip AV: The road less traveled: Emerging principles of kinesin motor utilization. Annu Rev Cell Dev Biol. 1999, 15: 141-183. 10.1146/annurev.cellbio.15.1.141.
Miki H, Setou M, Kaneshiro K, Hirokawa N: All kinesin superfamily protein, KIF, genes in mouse and human. Proc Natl Acad Sci USA. 2001, 98: 7004-7011. 10.1073/pnas.111145398.
Vale RD, Reese TS, Sheetz MP: Identification of a novel force-generating protein, kinesin, involved in microtubule-based motility. Cell. 1985, 42: 39-50. 10.1016/S0092-8674(85)80099-4.
Vale RD, Fletterick RJ: The design plan of kinesin motors. Annu Rev Cell Dev Biol. 1997, 13: 745-777. 10.1146/annurev.cellbio.13.1.745.
Miki H, Okada Y, Hirokawa N: Analysis of the kinesin superfamily: insights into structure and function. Trends Cell Biol. 2005
Leopold PL, McDowall AW, Pfister KK, Bloom GS, Brady ST: Association of kinesin with characterized membrane-bounded organelles. Cell Motil Cytoskeleton. 1992, 23: 19-33. 10.1002/cm.970230104.
Sawin KE, LeGuellec K, Philippe M, Mitchison TJ: Mitotic spindle organization by a plus-end-directed microtubule motor. Nature. 1992, 359: 540-543. 10.1038/359540a0.
Barton NR, Goldstein LSB: Going mobile: Microtubule motors and chromosome segregation. Proc Natl Acad Sci USA. 1996, 93: 1735-1742. 10.1073/pnas.93.5.1735.
Carson JH, Worboys K, Ainger K, Barbarese E: Translocation of myelin basic protein mRNA in oligodendrocytes requires microtubules and kinesin. Cell Motil Cytoskeleton. 1997, 38: 318-328. 10.1002/(SICI)1097-0169(1997)38:4<318::AID-CM2>3.0.CO;2-#.
Lawrence CJ, Dawe RK, Christie KR, Cleveland DW, Dawson SC, Endow SA, Goldstein LS, Goodson HV, Hirokawa N, Howard J, Malmberg RL, Mclntosh JR, Miki H, Mitchison TJ, Okada Y, Reddy AS, Saxton WM, Schliwa M, Scholey JM, Vale RD, Walczak CE, Wordeman L: A standardized kinesin nomenclature. J Cell Biol. 2004, 167: 19-22. 10.1083/jcb.200408113.
McDonald HB, Steward RJ, Goldstein LSB: The kinesin-like ncd protein of Drosophila is a minus end-directed microtubule motor. Cell. 1990, 63: 1159-1165. 10.1016/0092-8674(90)90412-8.
Walker RA, Salmon ED, Endow SA: The Drosophila claret segregation protein is a minus-end directed motor molecule. Nature. 1990, 347: 780-782. 10.1038/347780a0.
Endow SA, Kang SJ, Satterwhite LL, Rose MD, Skeen VP, Salmon ED: Yeast Kar3 is a minus-end microtubule motor protein that destabilizes microtubules preferentially at the minus ends. EMBO J. 1994, 13: 2708-2713.
Reddy ASN: Molecular motors in plant cells. Molecular Motors. Edited by: Schliwa M. 2003, Weinheim: Wiley-VCH, 433-469.
Lee YR, Liu B: Cytoskeletal motors in Arabidopsis. Sixty-one kinesins and seventeen myosins. Plant Physiol. 2004, 136: 3877-3883. 10.1104/pp.104.052621.
Liu B, Cyr RJ, Palevitz BA: A kinesin-like protein, KatAp, in the cells of Arabidopsis and other plants. Plant Cell. 1996, 8: 119-132. 10.1105/tpc.8.1.119.
Lee Y-RJ, Liu B: Identification of a phragmoplast-associated kinesin-related protein in higher plants. Curr Biol. 2000, 10: 797-800. 10.1016/S0960-9822(00)00564-9.
Lee YR, Giang HM, Liu B: A novel plant kinesin-related protein specifically associates with the phragmoplast organelles. Plant Cell. 2001, 13: 2427-2439. 10.1105/tpc.13.11.2427.
Chen C, Marcus A, Li W, Hu Y, Calzada JP, Grossniklaus U, Cyr RJ, Ma H: The Arabidopsis ATK1 gene is required for spindle morphogenesis in male meiosis. Development. 2002, 129: 2401-2409. 10.1242/dev.00114.
Marcus AI, Li W, Ma H, Cyr RJ: A kinesin mutant with an atypical bipolar spindle undergoes normal mitosis. Mol Biol Cell. 2003, 14: 1717-1726. 10.1091/mbc.E02-09-0586.
Nishihama R, Soyano T, Ishikawa M, Araki S, Tanaka H, Asada T, Trie K, Ito M, Terada M, Banno H, Yamazaki Y, Machida Y: Expansion of the cell plate in plant cytokinesis requires a kinesin-like protein/MAPKKK complex. Cell. 2002, 109: 87-99. 10.1016/S0092-8674(02)00691-8.
Strompen G, El Kasmi F, Richter S, Lukowitz W, Assaad FF, Jurgens G, Mayer U: The Arabidopsis HINKEL gene encodes a kinesin-related protein involved in cytokinesis and is expressed in a cell cycle-dependent manner. Curr Biol. 2002, 12: 153-158. 10.1016/S0960-9822(01)00655-8.
Yang CY, Spielman M, Coles JP, Li Y, Ghelani S, Bourdon V, Brown RC, Lemmon BE, Scott RJ, Dickinson HG: TETRASPORE encodes a kinesin required for male meiotic cytokinesis in Arabidopsis. Plant j. 2003, 34: 229-240. 10.1046/j.1365-313X.2003.01713.x.
Ambrose JC, Li W, Marcus A, Ma H, Cyr R: A Minus-End Directed Kinesin with +TIP Activity Is Involved in Spindle Morphogenesis. Mol Biol Cell. 2005, 1584-1592. 10.1091/mbc.E04-10-0935.
Bowser J, Reddy AS: Localization of a kinesin-like calmodulin-binding protein in dividing cells of Arabidopsis and tobacco. Plant J. 1997, 12: 1429-1437. 10.1046/j.1365-313x.1997.12061429.x.
Oppenheimer DG, Pollock MA, Vacik J, Szymanski DB, Ericson B, Feldmann K, Marks D: Essential role of a kinesin-like protein in Arabidopsis trichome morphogenesis. Proc Natl Acad Sci USA. 1997, 94: 6261-6266. 10.1073/pnas.94.12.6261.
Voss JW, Safadi F, Reddy ASN, Hepler PK: The kinesin-like calmodulin binding protein is differentially involved in cell division. Plant Cell. 2000, 12: 979-990. 10.1105/tpc.12.6.979.
Reddy ASN, Day IS: The role of the cytoskeleton and a molecular motor in trichome morphogenesis. Trends Plant Sci. 2000, 5: 503-505. 10.1016/S1360-1385(00)01792-1.
Reddy VS, Day IS, Thomas T, Reddy AS: KIC, a novel Ca2+ binding protein with one EF-hand motif, interacts with a microtubule motor protein and regulates trichome morphogenesis. Plant Cell. 2004, 16: 185-200. 10.1105/tpc.016600.
Lu L, Lee YR, Pan R, Maloof JN, Liu B: An internal motor Kinesin is associated with the Golgi apparatus and plays a role in trichome morphogenesis in Arabidopsis. Mol Biol Cell. 2005, 16: 811-823. 10.1091/mbc.E04-05-0400.
Zhong R, Burk DH, MWH , Ye ZH: A kinesin-like protein is essential for oriented deposition of cellulose microfibrils and cell wall strength. Plant Cell. 2002, 14: 3101-3117. 10.1105/tpc.005801.
Itoh R, Fujiwara M, Yoshida S: Kinesin-related proteins with a mitochondrial targeting signal. Plant Physiol. 2001, 127: 724-726. 10.1104/pp.127.3.724.
Kong LJ, Hanley-Bowdoin L: A geminivirus replication protein interacts with a protein kinase and a motor protein that display different expression patterns during plant development and infection. Plant Cell. 2002, 14: 1817-1832. 10.1105/tpc.003681.
Preuss ML, Kovar DR, Lee YR, Staiger CJ, Delmer DP, Liu B: A plant-specific kinesin binds to actin microfilaments and interacts with cortical microtubules in cotton fibers. Plant Physiol. 2004, 136: 3945-3955. 10.1104/pp.104.052340.
Song H, Golovkin M, Reddy ASN, Endow SA: In vitro motility of AtKCBP, a calmodulin-binding kinesin-like protein of Arabidopsis. Proc Natl Acad Sci USA. 1997, 94: 322-327. 10.1073/pnas.94.1.322.
Marcus AI, Ambrose JC, Blickley L, Hancock WO, Cyr RJ: Arabidopsis thaliana protein, ATK1, is a minus-end directed kinesin that exhibits non-processive movement. Cell Motil Cytoskeleton. 2002, 52: 144-150. 10.1002/cm.10045.
Asada T, Shibaoka H: Isolation of polypeptides with microtubule-translocating activity from phragmoplasts of tobacco BY-2 cells. J Cell Sci. 1994, 107: 2249-2257.
Lawrence CJ, Malmberg RL, Muszynski MG, Dawe RK: Maximum likelihood methods reveal conservation of function among closely related kinesin families. J Mol Evol. 2002, 54: 42-53.
Reddy ASN, Day IS: Kinesins in the Arabidopsis genome: A comparative analysis among eukaryotes. BMC Genomics. 2001, 2: 2.1-2.13. 10.1186/1471-2164-2-2.
Sogin ML: Early evolution and the origin of eukaryotes. Curr Opin Genet Dev. 1991, 1: 457-463. 10.1016/S0959-437X(05)80192-3.
Hashimoto T, Nakamura Y, Nakamura F, Shirakura T, Adachi J, Goto N, Okamoto K, Hasegawa M: Protein phylogeny gives a robust estimation for early divergences of eukaryotes: phylogenetic place of a mitochondria-lacking protozoan, Giardia lamblia. Mol Biol Evol. 1994, 11: 65-71.
Eichinger L, Pachebat JA, Glockner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q, Tunggal B, Kummerfeld S, Madera M, Konfortov BA, Rivero F, Bankier AT, Lehmann R, Hamlin N, Davies R, Gaudet P, Fey P, Pilcher K, Chen G, Saunders D, Sodergren E, Davis P, Kerhornou A, Nie X, Hall N, Anjard C, Hemphill L, Bason N, Farbrother P, Desany B, Just E, Morio T, Rost R, Churcher C, Cooper J, Haydock S, van Driessche N, Cronin A, Goodhead I, Muzny D, Mourier T, Pain A, Lu M, Harper D, Lindsay R, Hauser H, James K, Quiles M, Madan Babu M, Saito T, Buchrieser C, Wardroper A, Felder M, Thangavelu M, Johnson D, Knights A, Loulseged H, Mungall K, Oliver K, Price C, Quail MA, Urushihara H, Hernandez J, Rabbinowitsch E, Steffen D, Sanders M, Ma J, Kohara Y, Sharp S, Simmonds M, Spiegler S, Tivey A, Sugano S, White B, Walker D, Woodward J, Winckler T, Tanaka Y, Shaulsky G, Schleicher M, Weinstock G, Rosenthal A, Cox EC, Chisholm RL, Gibbs R, Loomis WF, Platzer M, Kay RR, Williams J, Dear PH, Noegel AA, Barrell B, Kuspa A: The genome of the social amoeba Dictyostelium discoideum. Nature. 2005, 435: 43-57. 10.1038/nature03481.
Arisue N, Hasegawa M, Hashimoto T: Root of the Eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. Mol Biol Evol. 2005, 22: 409-420. 10.1093/molbev/msi023.
Richards TA, Cavalier-Smith T: Myosin domain evolution and the primary divergence of eukaryotes. Nature. 2005, 436: 1113-1118. 10.1038/nature03949.
He D, Wen JF, Chen WQ, Lu SQ, Xin de D: Identification, characteristic and phylogenetic analysis of type II DNA topoisomerase gene in Giardia lamblia. Cell Res. 2005, 15: 474-482. 10.1038/sj.cr.7290316.
Baldauf SL: The deep roots of eukaryotes. Science. 2003, 300: 1703-1706. 10.1126/science.1085544.
Falkowski PG, Katz ME, Knoll AM, Quigg A, Raven JA, Schofield O, Taylor FJR: The evolution of modern eukaryotic phytoplankton. Science. 2004, 305: 354-360. 10.1126/science.1095964.
Iwabe N, Miyata T: Kinesin-related genes from diplomonad, sponge, amphioxus, and cyclostomes: divergence pattern of kinesin family and evolution of giardial membrane-bounded organella. Mol Biol Evol. 2002, 19: 1524-1533.
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B: DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics. 2005, 6: 66-10.1186/1471-2105-6-66.
Morgenstern B: DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res. 2004, 32: W33-36. 10.1093/nar/gnh029.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.
Morgenstern B, Freeh K, Dress A, Werner T: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics. 1998, 14: 290-294. 10.1093/bioinformatics/14.3.290.
Lake JA: The order of sequence alignment can bias the selection of tree topology. Mol Biol Evol. 1991, 8: 378-385.
Thompson JD, Plewniak F, Poch O: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999, 27: 2682-2690. 10.1093/nar/27.13.2682.
Lassmann T, Sonnhammer EL: Quality assessment of multiple alignment programs. FEBS Lett. 2002, 529: 126-130. 10.1016/S0014-5793(02)03189-7.
Kluge AG: A concern for evidence and a phylogenetic hypothesis for relationships among Epicrates (Boidae, Serpentes). Systematic Zoology. 1989, 38: 7-25.
Nixon KC, Carpenter JM: On simultaneous analysis. Cladistics. 1996, 12: 221-242. 10.1111/j.1096-0031.1996.tb00010.x.
Simmons MP, Ochoterena H, Carr TG: Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. Syst Biol. 2001, 50: 454-462. 10.1080/106351501300318049.
Felsenstein J: Cases in which parsimony or compatibility methods will be postively misleading. Systematic Zool. 1978, 27: 401-410.
Siddall ME, Whiting MF: Long-branch abstractions. Cladistics. 1999, 15: 9-24. 10.1111/j.1096-0031.1999.tb00391.x.
Rosenbaum JL, Witman GB: Intraflagellar transport. Nat Rev Mol Cell Biol. 2002, 3: 813-825. 10.1038/nrm952.
Reddy ASN, Day IS: Analysis of the myosins encoded in the recently completed Arabidopsis thaliana genome sequence. Genome Biol. 2001, 2: 24.21-24.17. 10.1186/gb-2001-2-7-research0024.
Lillie SH, Brown SS: Immunofluorescence localization of the unconventional myosin, Myo2p, and the putative kinesin-related protein, Smylp, to the same regions of polarized growth in Saccharomyces cerevisiae. J Cell Biol. 1994, 125: 825-842. 10.1083/jcb.125.4.825.
Lillie SH, Brown SS: Smylp, a kinesin-related protein that does not require microtubules. J Cell Biol. 1998, 140: 873-883. 10.1083/jcb.140.4.873.
Soltis DE, PS S, Chase MW, Mort ME, Albach DC, Zanis M, Savolainen V, Hahn WH, Hoot SB, Fay MF, Axtell M, Swensen SM, Nixon KC, JC F: Angiosperm phylogeny inferred from a combined data set of 18S rDNA, rbcL, and atpB sequences. Botanical J Linnean Soc. 2000, 133: 381-461. 10.1006/bojl.2000.0380.
Nishiyama T, Wolf PG, Kugita M, Sinclair RB, Sugita M, Sugiura C, Wakasugi T, Yamada K, Yoshinaga K, Yamaguchi K, Ueda K, Hasebe M: Chloroplast phylogeny indicates that bryophytes are monophyletic. Mol Biol Evol. 2004, 21: 1813-1819. 10.1093/molbev/msh203.
Bachvaroff TR, Sanchez Puerta MV, Delwiche CF: Chlorophyll c-containing plastid relationships based on analyses of a multigene data set with all four chromalveolate lineages. Mol Biol Evol. 2005, 22: 1772-1782. 10.1093/molbev/msi172.
Reddy ASN: Molecular motors in plant cells. Enclyopedia of Molecular Cell Biology and Molecular Medicine. Edited by: Meyers RA. 2005, Weinheim: Wiley-VCH, 8: 461-494.
Vanstraelen M, Torres Acosta JA, De Veylder L, Inze D, Geelen D: A plant-specific subclass of C-terminal kinesins contains a conserved a-type cyclin-dependent kinase site implicated in folding and dimerization. Plant Physiol. 2004, 135: 1417-1429. 10.1104/pp.104.044818.
Lawrence CJ, Morris NR, Meagher RB, Dawe RK: Dyneins have run their course in plant lineage. Traffic. 2001, 2: 362-363. 10.1034/j.1600-0854.2001.25020508.x.
King SM: Dyneins motor on in plants. Traffic. 2002, 3: 930-931. 10.1034/j.1600-0854.2002.31208.x.
Kim JY: Regulation of short-distance transport of RNA and protein. Curr Opin Plant Biol. 2005, 8: 45-52. 10.1016/j.pbi.2004.11.005.
Tanaka H, Ishikawa M, Kitamura S, Takahashi Y, Soyano T, Machida C, Machida Y: The AtNACK1/HINKEL and STUD/TETRASPORE/AtNACK2 genes, which encode functionally redundant kinesins, are essential for cytokinesis in Arabidopsis. Genes Cells. 2004, 9: 1199-1211. 10.1111/j.1365-2443.2004.00798.x.
Matsuzaki M, Misumi O, Shin IT, Maruyama S, Takahara M, Miyagishima SY, Mori T, Nishida K, Yagisawa F, Yoshida Y, Nishimura Y, Nakao S, Kobayashi T, Momoyama Y, Higashiyama T, Minoda A, Sano M, Nomoto H, Oishi K, Hayashi H, Ohta F, Nishizaka S, Haga S, Miura S, Morishita T, Kabeya Y, Terasawa K, Suzuki Y, Ishii Y, Asakawa S, Takano H, Ohta N, Kuroiwa H, Tanaka K, Shimizu N, Sugano S, Sato N, Nozaki H, Ogasawara N, Kohara Y, Kuroiwa T: Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature. 2004, 428: 653-657. 10.1038/nature02398.
Enos AP, Morris NR: Mutation of a gene that encodes a kinesin-like protein blocks nuclear division in A. nidulans. Cell. 1990, 60: 1019-1027. 10.1016/0092-8674(90)90350-N.
Asada T, Kuriyama R, Shibaoka H: TKRP125, a kinesin-related protein involved in the centrosome-independent organization of the cytokinetic apparatus in tobacco BY-2 cells. J Cell Sci. 1997, 110: 179-189.
Desai A, Verma S, Mitchison TJ, Walczak CE: Kin I kinesins are microtubule-destabilizing enzymes. Cell. 1999, 96: 69-78. 10.1016/S0092-8674(00)80960-5.
Homma N, Takei Y, Tanaka Y, Nakata T, Terada S, Kikkawa M, Noda Y, Hirokawa N: Kinesin superfamily protein 2A (KIF2A) functions in suppression of collateral branch extension. Cell. 2003, 114: 229-239. 10.1016/S0092-8674(03)00522-1.
Simple Modular Architecture Research Tool.
Narasimhulu SB, Kao Y-L, Reddy ASN: Interaction of Arabidopsis kinesin-like calmodulin-binding protein with tubulin subunits: Modulation by Ca2+-calmodulin. Plant J. 1997, 12: 1139-1149. 10.1046/j.1365-313X.1997.12051139.x.
Weber KL, Sokac AM, Berg JS, Cheney RE, Bement WM: A microtubule-binding myosin required for nuclear anchoring and spindle assembly. Nature. 2004, 431: 325-329. 10.1038/nature02834.
Banuelos S, Saraste M, Carugo KD: Structural comparisons of calponin homology domains: implications for actin binding. Structure. 1998, 6: 1419-1431. 10.1016/S0969-2126(98)00141-5.
Leinweber BD, Leavis PC, Grabarek Z, Wang CL, Morgan KG: Extracellular regulated kinase (ERK) interaction with actin and the calponin homology (CH) domain of actin-binding proteins. Biochem J. 1999, 344: 117-123. 10.1042/0264-6021:3440117.
Yamazaki H, Nakata T, Okada Y, Hirokawa N: Cloning and characterization of KAP3: a novel kinesin superfamily-associated protein of KIF3A/3B. Proc Natl Acad Sci USA. 1996, 93: 8443-8448. 10.1073/pnas.93.16.8443.
Tokai N, Fujimoto-Nishiyama A, Toyoshima Y, Yonemure S, Tsukita S, Inoue J, Yamamoto T: Kid, a novel kinesin-like DNA binding protein, is localized to chromosomes and the mitotic spindle. EMBO J. 1996, 15: 457-467.
Patterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 2004, 9903-9908. 10.1073/pnas.0307901101.
Paterson AH, Bowers JE, Burow MD, Draye X, Elsik CG, Jiang CX, Katsar CS, Lan TH, Lin YR, Ming R, Wright RJ: Comparative genomics of plant chromosomes. Plant Cell. 2000, 12: 1523-1540. 10.1105/tpc.12.9.1523.
Initiative TAG: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.
Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, Li R, Xu Z, Li S, Li X, Zheng H, Cong L, Lin L, Yin J, Geng J, Li G, Shi J, Liu J, Lv H, Li J, Wang J, Deng Y, Ran L, Shi X, Wang X, Wu Q, Li C, Ren X, Wang J, Wang X, Li D, Liu D, Zhang X, Ji Z, Zhao W, Sun Y, Zhang Z, Bao J, Han Y, Dong L, Ji J, Chen P, Wu S, Liu J, Xiao Y, Bu D, Tan J, Yang L, Ye C, Zhang J, Xu J, Zhou Y, Yu Y, Zhang B, Zhuang S, Wei H, Liu B, Lei M, Yu H, Li Y, Xu H, Wei S, He X, Fang L, Zhang Z, Zhang Y, Huang X, Su Z, Tong W, Li J, Tong Z, Li S, Ye J, Wang L, Fang L, Lei T, Chen C, Chen H, Xu Z, Li H, Huang H, Zhang F, Xu H, Li N, Zhao C, Li S, Dong L, Huang Y, Li L, Xi Y, Qi Q, Li W, Zhang B, Hu W, Zhang Y, Tian X, Jiao Y, Liang X, Jin J, Gao L, Zheng W, Hao B, Liu S, Wang W, Yuan L, Cao M, McDermott J, Samudrala R, Wang J, Wong GK, Yang H: The Genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005, 3: e38-10.1371/journal.pbio.0030038.
Guyot R, Keller B: Ancestral genome duplication in rice. Genome. 2004, 47: 610-614. 10.1139/g04-016.
Gong TW, Winnicki RS, Kohrman DC, Lomax MI: A novel mouse kinesin of the UNC-104/KIF1 subfamily encoded by the Kif1b gene. Gene. 1999, 239: 117-127. 10.1016/S0378-1119(99)00370-4.
Zhao C, Takita J, Tanaka Y, Setou M, Nakagawa T, Takeda S, Yang HW, Terada S, Nakata T, Takei Y, Saito M, Tsuji S, Hayashi Y, Hirokawa N: Charcot-Marie-Tooth disease type 2A caused by mutation in a microtubule motor KIF1Bbeta. Cell. 2001, 105: 587-597. 10.1016/S0092-8674(01)00363-4.
BLAST Plant Genomes. [http://www.ncbi.nlm.nih.gov/BLAST/Genome/PlantBlast.shtml?10]
NCBI BLAST. [http://www.ncbi.nlm.nih.gov/BLAST/]
Beijing Genomics Institute. [http://rise.genomics.org.cn]
Zhao W, Wang J, He X, Huang X, Jiao Y, Dai M, Wei S, Fu J, Chen Y, Ren X, Zhang Y, Ni P, Zhang J, Li S, Wang J, Wong GK, Zhao H, Yu J, Yang H, Wang J: BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res. 2004, 32: D377-382. 10.1093/nar/gkh085.
Zdobnov EM, Apweiler R: InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
The Arabidopsis Information Resource (TAIR). [http://www.arabidopsis.org]
The Joint Genome Institute Genome Portal. [http://genome.jgi-psf.org/]
Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981, 17: 368-376. 10.1007/BF01734359.
Cyanidioschyzon merolae Genome Project. [http://merolae.biol.s.u-tokyo.ac.jp/]
The Kinesin HomePage. [http://www.proweb.org/kinesin/]
NCBI Batch Entrez. [http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Protein]
Celera Discovery System. [http://publication.celera.com/humanpub/index.jsp]
PlasmoDB: The Plasmodium Genome Resource. [http://plasmodb.org/PlasmoDB.shtml]
The Sanger Institute: The Leishmania major Genome Project. [http://www.sanger.ac.uk/Projects/L_major/]
Saccharomyces Genome Database. [http://www.yeastgenome.org/]
S. Pombe: GeneDB. [http://www.genedb.org/genedb/pombe/index.jsp]
Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.
Muller K: SeqState: Primer Design and Sequence Statistics for Phylogenetic DNA Datasets. Appl Bioinformatics. 2005, 4: 65-69. 10.2165/00822942-200504010-00008.
Simmons MP, Ochoterena H: Gaps as characters in sequence-based phylogenetic analyses. Syst Biol. 2000, 49: 369-381. 10.1080/10635159950173889.
Simmons MP: A fundamental problem with amino-acid-sequence characters for phylogenetic analyses. Cladistics. 2000, 16: 274-282. 10.1111/j.1096-0031.2000.tb00283.x.
Simmons MP, Ochoterena H, Freudenstein JV: Conflict between amino acid and nucleotide characters. Cladistics. 2002, 18: 200-206. 10.1111/j.1096-0031.2002.tb00148.x.
Simmons MP, Freudenstein JV: Artifacts of coding amino acids and other composite characters for phylogenetic analysis. Cladistics. 2002, 18: 354-365. 10.1111/j.1096-0031.2002.tb00156.x.
Simmons MP, Ochoterena H, Freudenstein JV: Amino acid vs. nucleotide characters: challenging preconceived notions. Mol Phylogenet Evol. 2002, 24: 78-90. 10.1016/S1055-7903(02)00202-6.
Yang Z, Rannala B: Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Mol Biol Evol. 1997, 14: 717-724.
Swofford DL: PAUP*: Phylogenetic analysis using parsimony. (*and other methods). 2001, Sunderland, MA: Sinauer Associates
Farris JS, Albert VA, Kallersjo M, Lipscomb D, Kluge AG: Parsimony jackknifing outperforms neighbor-joining. Cladistics. 1996, 12: 99-124. 10.1111/j.1096-0031.1996.tb00196.x.
Felsenstein J: Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985, 39: 783-791.
Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.
Ronquist F, Huelsenbeck JP, van der Mark P: MrBayes 3.1 manual. 2005, Downloaded 5/17/2005., [http://mrbaves.csit.fsu.edu/manual.phd]
Tuffley C, Steel M: Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol. 1997, 59: 581-607. 10.1016/S0092-8240(97)00001-3.
Simmons MP, Miya M: Efficiently resolving the basal clades of a phylogenetic tree using Bayesian and parsimony approaches: a case study using mitogenomic data from 100 higher teleost fishes. Mol Phylogenet Evol. 2004, 31: 351-362. 10.1016/j.ympev.2003.08.004.
Suzuki Y, Glazko GV, Nei M: Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA. 2002, 99: 16138-16143. 10.1073/pnas.212646199.
Cummings MP, Handley SA, Myers DS, Reed DL, Rokas A, Winka K: Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol. 2003, 52: 477-487.
Simmons MP, Pickett KM, Miya M: How meaningful are Bayesian posterior probabilities?. Mol Biol Evol. 2004, 21: 188-199. 10.1093/molbev/msh014.
Goloboff PA, Pol D: Parsimony and Bayesian phylogenetics. Parsimony, phylogeny, and genomics. Edited by: Albert VA. 2005, Oxford University Press, 148-159.
Goodson HV, Kang SJ, Endow SA: Molecular phylogeny of the kinesin family of microtubule motor proteins. J Cell Sci. 1994, 107: 1875-1884.
Kim AJ, Endow SA: A kinesin family tree. J Cell Sci. 2000, 113: 3681-3682.
Nixon KC, Carpenter JM: On outgroups. Cladistics. 1993, 9: 413-426. 10.1111/j.1096-0031.1993.tb00234.x.
Schoch CL, Aist JR, Yoder OC, Gillian Turgeon B: A complete inventory of fungal kinesins in representative filamentous ascomycetes. Fungal Genet Biol. 2003, 39: 1-15. 10.1016/S1087-1845(03)00022-7.
Abdel-Ghany S, Day IS, Simmons M, grens P, Reddy ASN: Origin and evolution of kinesin-like calmodulin-binding protein. Plant Physiol. 2005, 138: 1711-1722. 10.1104/pp.105.060913.
Müller J, Müller K: TreeGraph: automated drawing of complex tree figures using an extensible tree description format. Mol Ecol Notes. 2004, 4: 786-788. 10.1111/j.1471-8286.2004.00813.x.
Goldstein LS, Gunawardena S: Flying through the drosophila cytoskeletal genome. J Cell Biol. 2000, 150: F63-68. 10.1083/jcb.150.2.F63.
Siddiqui SS: Metazoan Motor Models: Kinesin Superfamily inC. elegans. Traffic. 2002, 3: 20-28. 10.1034/j.1600-0854.2002.30104.x.
Kollmar M, Glockner G: Identification and phylogenetic analysis of Dictyostelium discoideum kinesin proteins. BMC Genomics. 2003, 4: 47-10.1186/1471-2164-4-47.
We thank Kai Müller for assistance with SeqState; Jim Cox and Pat Reeves for computing time, and Simon Tavener and Zube for access to, and assistance with, the CSU Math Department Cluster. We thank Dr. Irene Day for her comments on the manuscript. This work was supported by a grant form the National Science Foundation to ASNR.
ASNR conceived of the study and coordinated the work. ASNR, DNR and MPS participated in the design of the study. DNR performed all database searches, acquired the sequence data and prepared all figures and tables. MPS and DNR performed the alignments and phylogenetic analyses. ASNR, DNR and MPS participated in data analysis and interpretation, and writing of the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.