Reconstructing the phylogeny of 21 completely sequenced arthropod species based on their motor proteins
© Odronitz et al. 2009
Received: 06 March 2008
Accepted: 21 April 2009
Published: 21 April 2009
Skip to main content
© Odronitz et al. 2009
Received: 06 March 2008
Accepted: 21 April 2009
Published: 21 April 2009
Motor proteins have extensively been studied in the past and consist of large superfamilies. They are involved in diverse processes like cell division, cellular transport, neuronal transport processes, or muscle contraction, to name a few. Vertebrates contain up to 60 myosins and about the same number of kinesins that are spread over more than a dozen distinct classes.
Here, we present the comparative genomic analysis of the motor protein repertoire of 21 completely sequenced arthropod species using the owl limpet Lottia gigantea as outgroup. Arthropods contain up to 17 myosins grouped into 13 classes. The myosins are in almost all cases clear paralogs, and thus the evolution of the arthropod myosin inventory is mainly determined by gene losses. Arthropod species contain up to 29 kinesins spread over 13 classes. In contrast to the myosins, the evolution of the arthropod kinesin inventory is not only determined by gene losses but also by many subtaxon-specific and species-specific gene duplications. All arthropods contain each of the subunits of the cytoplasmic dynein/dynactin complex. Except for the dynein light chains and the p150 dynactin subunit they contain single gene copies of the other subunits. Especially the roadblock light chain repertoire is very species-specific.
All 21 completely sequenced arthropods, including the twelve sequenced Drosophila species, contain a species-specific set of motor proteins. The phylogenetic analysis of all genes as well as the protein repertoire placed Daphnia pulex closest to the root of the Arthropoda. The louse Pediculus humanus corporis is the closest relative to Daphnia followed by the group of the honeybee Apis mellifera and the jewel wasp Nasonia vitripennis. After this group the rust-red flour beetle Tribolium castaneum and the silkworm Bombyx mori diverged very closely from the lineage leading to the Drosophila species.
Nearly each single cell in eukaryotes hosts particular proteins, which are responsible for intracellular transport. These molecular motor molecules are highly conserved among the different species of eukaryotes and evolved slowly over time [1, 2]. This property grants them the role of an appropriate candidate to carry out evolutionary studies. The three superfamilies of transporting motor proteins are the myosins, kinesins, and dyneins. Attached to the cytoskeletal networks (microtubules and actin) they transport all kinds of organelles and vesicles , and organize and remodel the cytoskeleton and developmental processes in eukaryotes . The energy for their unidirectional cargo transport on one of the filamentous cytoskeletal tracks is derived from ATP hydrolysis . Out of the three superfamilies only the members of the kinesin superfamily are found in all eukaryotes, whereas members of the dynein  and myosin  superfamilies are lacking in particular eukaryotic lineages.
The members of the actin-based myosin family have their origin early in eukaryotic evolution. Based on the latest analysis, the myosins are grouped into 35 classes . Myosins consist of three regions, the motor (or head) domain, a neck domain, and the tail, which comprises all C-terminal domains as well as domains N-terminal to the motor domain. The motor domain is highly conserved and contains both the ATP and actin binding site, where the force generation resides. This energy-transducing motor domain is coupled to a regulatory neck region (helical region), which is able to bind calmodulin or calmodulin-like light chains. Linked to the neck region most myosins have tail domains. Contrary to the head domains the tail domains show high variability in sequence and length, reflecting their functional diversity. The functions range from cytokinesis, organellar transport, cell polarization to signal transduction [8–10]. Some of the myosin classes also contain large domains at the N-terminus of the motor domains .
The second molecular motor protein family is kinesin (members also known as KRPs, KLPs, or KIFs) . The members of this superfamily are microtubule-based and facilitate movement in both directions (either plus or minus end-directed) . For their movement along the microtubules they utilize ATP similarly to the other motor proteins. The classical kinesin forms a tetramer with two kinesin heavy chains (KHCs) and two kinesin light chains (KLCs). Like in myosins the head domain is well conserved and responsible for the movement, whereas the stalk and tail domains play fundamental roles in the interaction with other subunits of the holoenzyme or with cargo molecules such as proteins, lipids or nucleic acids . The region between the head and the stalk is family-specific and determines the direction of movement . Kinesins bind a variety of cargoes and perform tasks such as vesicle and organelle transport, spindle formation and elongation, chromosome segregation, and microtubule organization [15, 16].
The members of the dynein superfamily are minus end-directed motor proteins . Thus they are responsible for the retrograde transport of cargoes along microtubules. They are involved in many processes like spindle formation, chromosome segregation, and the transport of a variety of cargoes like viruses, RNAs, signaling molecules, and organelles . Dyneins are multi-subunit protein complexes with two or three heavy chains (DHCs), light chains, light intermediate, and intermediate chains . Supported by an activator protein called dynactin, which consists of 11 subunits, dynein is able to move and bind to membranes or further cargoes [20–22].
The genome of Drosophila melanogaster was the third eukaryotic genome to be completely sequenced . Since then, the number of sequenced organisms has increased rapidly. Of the Arthropoda phylum, the genomes of the mosquitos Anopheles gambiae  and Aedes aegyptii , the silkworm Bombyx mori [26, 27], the beetle Tribolium castaneum , the waterflea Daphnia pulex (this special series in BMC journals), and eleven of the Drosophila species group [29, 30] have been published. The draft genome sequences of Culex pipiens quinquefasciatus, Nasonia vitripennis, and Pediculus humanus corporis have been finished recently. The phylogenetic relationship of the twelve sequenced Drosophila species has been described in detail .
Here, we present the analysis of the phylogenetic relationship of 21 completely sequenced arthropods based on the sequences and inventory of their motor proteins.
The arthropod motor protein genes were identified by TBLASTN searches against the corresponding genome data of the different species. Species, that missed certain orthologs in the first instance, were searched again with supposed-to-be orthologs of the other species. In this iterative process all motor proteins have been identified or their absence in certain species have been confirmed. The species analyzed were the mosquitos Aedes aegyptii (Aea), Culex pipiens quinquefasciatus (Cpq), and Anopheles gambiae (Ang), the silkworm Bombyx mori (Bm_b), the honeybee Apis mellifera (Am), the jewel wasp Nasonia vitripennis (Nav), the waterflea Daphnia pulex (Dap), the rust-red flour beetle Tribolium castaneum (Tic), the body louse Pediculus humanus corporis (Pdc), twelve Drosophila species (Drosophila ananassae (Da), Drosophila erecta (Der), Drosophila grimshawi (Dg), Drosophila melanogaster (Dm), Drosophila mojavensis (Dmo), Drosophila persimilis (Dp), Drosophila pseudoobscura (Drp), Drosophila sechellia (Dse), Drosophila simulans (Dss_a), Drosophila virilis (Dv), Drosophila willistoni (Dw) and Drosophila yakuba (Dy)), and the mollusc Lottia gigantea (Lg), which we used as outgroup. The sequences were assigned by manual inspection of the genomic DNA sequences. Exons have been confirmed by the identification of flanking consensus intron-exon splice junction donor and acceptor sequences . The genomic sequences of Drosophila virilis, Apis mellifera, and especially Bombyx mori contain several gaps. Many of the gaps have been filled by analyzing EST data.
First, we calculated the phylogenetic tree of each of the protein families. When inspecting the phylogenetic tree of each protein family, it can be Stated that three clades and their internal topologies are constant: The Drosophila clade, a clade of Apis mellifera and Nasonia vitripennis, and the clade of Aedes aegypti, Culex pipiens quinquefasciatus, and Anopheles gambiae. Only in the tree of the LC8 proteins (see Additional File 1), the clade of Anopheles, Aedes and Culex is placed within the Drosophila clade. All other species were placed at varying branches. The discrepancy among the phylogenetic trees based on the dynein and dynactin subunits was higher when compared to the ones based on myosins and kinesins (see Additional File 1). The trees calculated from myosins and kinesins only disagree in the positions of Bombyx mori, Tribolium castaneum and Pediculus humanus corporis.
The phylogenetic tree inferred from the occurrence of classes/variants has a limited resolution and agrees only in some respects with the maximum likelihood tree: Drosophila form a clade, Drosophila pseudoobscura and Drosophila persimilis are monophyletic, Drosophila virilis, Drosophila mojavensis and Drosphila grimshawi are monophyletic and Culex, Aedes and Anopheles are monophyletic.
Most of the myosins that we discuss here have been identified and annotated in the course of the annotation of over 2000 myosins from more than 300 organisms . Since then, the genome sequences of the arthropod species Culex pipiens quinquefasciatus and Pediculus humanus corporis have been finished as well as that of the mollusc Lottia gigantea, which we used as outgroup. All myosins have been grouped into 35 classes. The arthropods encode members of 13 of these classes, namely members of the classes I, II, III, V, VI, VII, IX, XV, XVIII, XIX, XX, XXI, and XXII. It has been found, that the Drosophila melanogaster NinaC protein, which has previously been classified as class-III myosin, is part of the new class-XXI . Most arthropod genomes contain a real ortholog to the mammalian class-III myosins. Although both class-III and class-XXI myosins have an N-terminal kinase domain, the phylogenetic tree of the motor domain sequences clearly shows that both classes are distinct. Daphnia pulex contains the largest diversity of myosins, while the Drosophila species seem to have lost several classes, namely the members of class-III, class-IX, and class-XIX. Most of the Drosophila species have also lost their class-XXII myosin. Class-XXII myosins have two tandem repeats of MyTH4 and FERM domains like the class-VII myosin, but they miss the N-terminal SH3-like domain as well as the SH3 domain in the C-terminal tail. The specific function of a member of the class-XXII myosin has not been analyzed yet.
Of the kinesin superfamily the arthropods have members of all 14 specified classes  except for class-X. Class-IX kinesins have only been identified in Apis mellifera and Pediculus humanus corporis. However, the function of class-IX kinesins in not clear yet . In addition to the kinesins, that could be classified, each of the analyzed arthropod species contains two or more kinesin homologs that could not be grouped to any of the known classes. Two of these orphan kinesins have been identified in all arthropod species except Daphnia, but some arthropods contain further species-specific kinesins. Notably, Drosophila willistoni contains two further kinesins, of which homologs have not been identified in any of the other sequenced arthropod genomes. Compared to the myosin repertoire, the kinesin inventory of the arthropods is far more varied. Although the analyzed arthropods have members of almost all classes, there are prominent differences in the subclass composition. Even the Drosophila species have different sets of kinesins. Thus, it is likely that the evolution of the kinesin diversity in arthropods is strongly determined by taxon- and species-specific gene losses and gene duplication events.
The arthropods contain a highly variable set of cytoplasmic dynein subunits. The dynein motor protein complex is build of dynein heavy chains, intermediate chains, light-intermediate chains, and the light chain 8, the Roadblock, and the TcTex light chains. All arthropods encode one dynein intermediate chain and a dynein light-intermediate chain. In addition, the closely related species Drosophila pseudoobscura and Drosophila persimilis contain another dynein light-intermediate chain. Of the light chains, the arthropods share one of each of the different types, the LC8, the Roadblock, and the TcTex light chains. All arthropods contain different numbers of further homologs of these light chains. Thus, they can build very specific cytoplasmic dynein complexes. For example, if all members of the Roadblock light chain family are also members of the cytoplasmic dynein complex the Drosophila species could build up to nine different cytoplasmic dynein complexes just by exchanging light chains of the Roadblock family. These different Roadblock light chains might bind different cargoes and by tissue specific or developmentally regulated expression of these Roadblock genes the Drosophila species might be able to fine tune their dynein mediated transport processes. Thus, there are far more possibilities to adjust cargo binding by combining different light chains than by using the dynein activator complex, dynactin. The arthropods contain one of each of the eleven dynactin subunits. Alternative splice forms have not been identified. Only the Drosophila species contain a further homolog of the p150 (Glued) subunit, that has not been identified and characterized yet.
It has been observed, given heterogeneous evolutionary rates, that the results of the maximum likelihood method are statistically more robust than the ones produced by neighbour joining . Therefore we conclude that Apis, Nasonia, and Pediculus are not monophyletic, but that Pediculus is more closely related to Daphnia. The class occurrence tree shows that the classification system we used for the protein families does not contradict the finding of the sequence-based phylogenetic inference.
Our study suggests the following phylogeny: The Drosophila clade is composed of the Drosophila simulans/Drosophila sechella clade which forms a clade with Drosophila melanogaster. This clade together with the Drosophila yakuba/Drosophila erecta clade forms the melanogaster subgroup. This subgroup together with Drosophila ananassae forms the melanogaster group. The melanogaster group is most closely related to the obscura group, a clade that consists of Drosophila pseudoobscura and Drosophila persimilis. The closest relative to the obscura group is Drosophila willistoni. All of the before mentioned species form the subgenus Sophophora. Its sister subgenus is Drosophila, consisting of the clade of Drosophila virilis/Drosophila mojavensis and Drosophila grimshawi (taxonomy as in ). The phylogeny of the Drosophila clade is in exact agreement with what has been found in an analysis based on the complete genome sequences of the twelve species .
The closest relatives to the Drosophila clade are Aedes aegypti and Culex pipiens, forming one clade, and Anopheles gambiae. All these species belong to the Diptera. The placing of the remaining species, that have been analyzed here, is mainly in accordance with an analysis of 128 arthropod species that was based on 275 morphological variables as well as 18S and 28S rDNA data . In accordance with this study, the Lepidoptera, to which Bombyx mori belongs, are the closest relatives to the Diptera forming the Mecopteroidea. Also in aggreement with the morphological data, the Hymenoptera (Nasonia vitripennis/Apis mellifera) are basal to the Mecopteroidea together forming the Holometabola, and the Phthiraptera (Pediculus humanus corporis) are basal to the Holometabola. The main difference between our study and the analysis of the morphological data is the placement of Tribolium castaneum, a Coleoptera species. Our study placed Tribolium closer to the Mecopteroidea while the other study placed the Coleoptera outside the Hymenoptera and Mecopteroidea. Daphnia pulex, a Crustacea species, diverged earlier to all the other Hexapoda species.
In this analysis, we were able to resolve the phylogenetic relationship of 21 completely sequenced arthropod species based in their motor proteins. A large number of sequences were used that have been checked manually. We have systematically analyzed the protein inventory of all species as well as the domain composition of all members of the four protein families in Daphnia pulex. When inferring phylogenetic trees from the sequence data, variations in evolutionary speed were accounted for by using a phylogenomics approach. This analysis produced a phylogenetic tree that is highly resolved and that has statistically well supported branchings. Our findings are in accordance with results from studies based on whole genome and rDNA sequences as well as morphological variables. We can conclude that from all arthropods analyzed, Daphnia pulex is the most basal one. Pediculus humanus corporis is the closest relative to Daphnia, followed by the clade of Apis mellifera and Nasonia vitripennis. Next, Tribolium castaneum and Bombyx mori diverged, followed by the mosquito species and the Drosophila clade.
The genes for Aea, Ang, Am, Bm, Cpq, Da, Der, Dg, Dm, Dmo, Drp, Dp, Dse, Dss, Dv, Dy, Dw, Nav, Pdc, and Tic have been obtained by TBLASTN searches against the insects section of the NCBI wgs database . The Dap sequences have been obtained by TBLASTN searches against the 8.7× coverage Dappu v1.1 draft genome sequence assembly (September, 2006) provided by the DOE Joint Genome Institute  and the Daphnia Genomics Consortium . All hits were manually analysed at the genomic DNA level. The correct coding sequences were identified with the help of the multiple sequence alignments of the corresponding proteins. In this process, the sequence alignments of all proteins contained in our in-house version of CyMoBase have been used. As the amount of protein sequences increased (especially the number of sequences in classes with few representatives), many of the initially predicted sequences were reanalysed to correctly identify all exon borders. Where possible, EST data available from the NCBI EST database has been analysed to help in the annotation process. All sequence related data (names, corresponding species, GenBank ID's , alternative names, corresponding publications, domain predictions, and sequences) and references to genome sequencing centers are available through the CyMoBase [42, 43].
The phylogenetic trees based on protein sequences were generated using two different methods: 1. Neighbour joining using the GONNET substitution matrix with bootstrapping (1,000 replicates) using ClustalW 2.0 . 2. Maximum likelihood (ML)  using a JTT model with estimated proportion of invariable sites and bootstrapping (1,000 replicates) using PHYML .
The sequence data, which was used for the analyses, were multiple sequence alignments consisting either of single homologous sequences from each species or multiple concatenated homologous sequences from each species (phylogenomics approach). For comparison, multiple sequence alignments were used including columns with gaps or with columns containing gaps removed.
The class occurrence tree was generated using Bayesian inference with a binary model using MrBayes 3.1.2 . For each species the existence/non-existence of a protein class/variant was used as a binary character as depicted in Figure 7. Using this encoding, each species is represented by a series of binary characters, one for each protein class/variant. Constant rates were used whereas gamma-distributed rates gave very similar results. The tree was generated using 1.000.000 generations and a burnin of 500.000 generations since at that point the average standard deviation of split frequencies fell below 0.011.
Protein domains were predicted using the SMART [48, 49] and Pfam [50, 51] web server. The prediction of protein motifs (coiled coils, leucine zipper, etc.) is mainly based on the results of the predict-protein server [52, 53]. The IQ-motifs and N-terminal domains of the myosins were predicted manually based on the homology to similar domains of other myosins included in the multiple sequence alignment of the myosins. The recognition motifs included in the SMART and Pfam databases are too restrictive, as the motifs have been created based on the small datasets available some years ago.
This work has been funded by grant I80798 of the VolkswagenStiftung and grants KO 2251/3-1 and KO 2251/6-1 of the Deutsche Forschungsgemeinschaft.
The sequencing and portions of the analyses were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC) . Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by The Center for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Our work benefits from, and contributes to the Daphnia Genomics Consortium.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.