- Research article
- Open Access
Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications
© Siegel et al; licensee BioMed Central Ltd. 2007
- Received: 13 April 2007
- Accepted: 06 September 2007
- Published: 06 September 2007
The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes.
We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons.
There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters.
- Gene Clock
- Genome Duplication
- Whole Genome Duplication
- Relative Rate Test
- ParaHox Gene
The cichlid fish
Cichlids belong to the most diverse and species-rich families of fishes. With an estimated number of more than 3,000 species they alone represent more than ten percent of all fish species. The family Cichlidae belongs to the teleosts that, with more than 26,500 species, are the most diverse lineage of all vertebrates . Cichlids have a Gondwanian distribution and are found in India, Madagascar, South and Central America and Africa and developed a stunning variety of coloration patterns, body shapes, behaviors and trophic as well as ecological specializations within a few millions of years see [2–8]. Their unparalleled diversity made the cichlid species flocks a textbook example for parallel adaptive radiations and explosive speciation .
The evolutionary success of the cichlids has been attributed to morphological and behavioral patterns, although the relative importance of different mechanisms – as there will be surely more than one – is still debated. One plausible factor that is at least partly responsible for the cichlids' unique diversity is the complexity of their breeding system and social behavior. Cichlids evolved a variety of brood care strategies and mating systems, and it is likely that female choice with respect to male coloration played an important role during cichlid evolution [2, 5, 9–11]. Another possible reason for their evolutionary success is the particular architecture of the cichlids' jaw apparatus. They possess two sets of jaws, one oral and one pharyngeal jaw derived from the fifth gill arch. These jaws evolved independently from each other and allow for an immense variety of possible feeding types leading to different diets. Therefore, many different niches could be colonized by cichlids . There is a large amount of behavioral and morphological divergence between different cichlid species in the East African lakes. Yet, rather surprising parallelisms have evolved in species flocks of the different lakes [3, 5, 8] indicating that the genetic "predisposition" for the modification of these traits might have been already present in the genome of the common ancestor of all the East African cichlid species. We assume that a substantial part of the necessary modifications of the cichlids' genome takes place in the regulatory elements of only a few important genes. To test this hypothesis it would be important to identify those genes of relevance in speciation. As part of this overall research effort we focus here on the ParaHox genes, a sister-cluster of the Hox genes that are crucial in development [13, 14]. Here, we report on an investigation of the genomics of the ParaHox C and D paralogons of the cichlid Astatotilapia burtoni and present the results of a comparison of some of its genomic features with those of other vertebrate ParaHox clusters.
Genome duplication, Hox- and ParaHox clusters in vertebrates
It has been suggested that gene- or genome duplications might be important evolutionary mechanisms resulting in new copies of genes, which are then free to accumulate mutations and to evolve new or additional functions . Changes in regulatory elements of duplicated gene copies could, for example, cause neofunctionalization; the gain of a new function, or a subfunctionalization; i.e., subdividing the original functions of the duplicated gene between the daughter genes . Genes under relaxed selection can arise after the duplication of single genes, large chromosomal fragments or even whole genomes [ and references therein]. For each of these three possibilities, different effects are characteristic: the preservation or disruption of regulatory control, the genomic context, the potential for dosage imbalance and, of course, the size of the duplicated fragment .
Duplications of genes as a consequence of the activity of transposable elements, unequal crossing-over and other mechanisms occur frequently in the course of vertebrate evolution . The duplication of whole genomes, however is a rare event in animals, although there are quite a few polyploid species in some taxonomic groups such as frogs , salamanders  and several fish lineages such as salmonids , cyprinids and catfish ). In plants polyploidy is a rather common phenomenon [24–26].
Several studies have proposed the existence of two rounds of whole genome duplications during vertebrate evolution (2R hypothesis) [[14, 27] and references therein]. More recent analyses revealed that in the lineage leading to the ray-finned fish, an additional genome duplication event, the fish-specific genome duplication (3R or FSGD), has occurred [28–33]. The 1R and 2R can be roughly dated 430 – 750 mya [27, 34] in the lineage of the Gnathostomata. However, the phylogenetic relationships of the agnathan lineages to one other and to the vertebrates as well the timing of 1R and 2R is not fully resolved yet . The FSGD [36, 37] took place in the lineage of ray-finned fish, after the separation of gars but before the origin of the Osteoglossomorpha , around 320 mya [27, 38].
Among the first to be discovered and still among the most prominent examples for duplicated genes through whole genome duplications are the Hox clusters . The number of Hox gene clusters and their genomic architecture in vertebrate genomes are an excellent illustration for the vertebrate genomic history of two rounds of genome duplications (1R, 2R), as well as an additional fish specific genome duplication (3R/FSGD) . One cluster is found in the genome of the Cephalochordate Branchiostoma and one cluster is assumed to be the ancestral state . Two rounds of genome duplication led to four copies in sharks and tetrapods and another round of genome duplication along with reciprocal losses of genes lead to a total number seven Hox clusters in teleost fish [33, 39]. Therefore it might be expected that the genes of the ParaHox clusters, just as those of the Hox clusters, should reflect the history of the last two genome duplications in fish as well .
The present study regards a ParaHox paralogon as the ParaHox gene(s) from a cluster together with the respective 3' adjoining genes as a  (see Figure 1). Therefore, genes located 5'of the whole paralogon are referred to as 5'of a gene X and genes more towards the 3'end of a paralogon are referred to as 3'of a particular gene X, irrespective of the orientation of gene X.
To investigate the evolution of the vertebrate ParaHox paralogons C and D we shotgun sequenced a BAC clone of a BAC library of the East African cichlid fish Astatotilapia burtoni  that contained the C1 ParaHox paralogon, i.e., the ParaHox gene gsh2 and its 3'adjoining genes. The obtained BAC contig (GenBank accession EF526075 GenBank accession number: sequence will be submitted upon acceptance of the paper) was then further analyzed and compared to the sequences of two other BAC clones of the African cichlid Astatotilapia burtoni, 20D21 (DQ386647) and 26M7 (DQ386648) containing the D1 and D2 paralogons .
Sequence assembly and analysis
Sequences for the C2 and the D ParaHox gene loci and their 3' adjoining genes were also retrieved from the aforementioned databases and aligned by hand. The locations of the different genes in the respective genome assemblies are summarized in Additional File 1.
Identification and characterization of Astatotilapia burtoni ParaHox paralogon containing BAC clones
The BAC library was screened for the C1 ParaHox paralogon gene kita as described previously . A PCR screen for the presence of the ParaHox gene gsh2 was subsequently performed to identify BAC clones covering the entire C1 ParaHox paralogon. The kita and gsh2 positive clone 99M12, which was determined to have an insert length of 154 kb, was chosen for further investigation. The BAC clone was shotgun sequenced and BAC contigs were assembled into a scaffold and a complete sequence as described earlier .
Using cDNAs, annotated and predicted genes of Homo sapiens, Mus musculus and Danio rerio available on NCBI , we deduced the coding sequences of Takifugu rubripes, Tetraodon nigroviridis, Oryzias latipes, Gasterosteus aculeatus and Astatotilapia burtoni. We were able to assemble the complete coding sequences of four of the five genes located on the BAC clone 99M12 of A. burtoni. The only incompletely assembled gene is kdrb where approximately 200 bp of the coding sequence are missing.
From the beginning of the gene gsh2 to the end of clock this sequence of the clone 99M12 spans 133.56 kb. This length was used for comparisons of the lengths of the C1 ParaHox paralogons of the different organisms used in this study (Homo sapiens, Danio rerio, Takifugu rubripes, Tetraodon nigroviridis, Oryzias latipes, Gasterosteus aculeatus and Astatotilapia burtoni) because the real length of the inserted gaps is unknown as of present.
The ancestral ParaHox complex fish is fragmented in teleosts fish . Therefore, we use the expression ParaHox "paralogon" instead of "cluster" since, especially for the case of the C2 ParaHox paralogon, not a single ParaHox gene is still present and in the case of the D2 ParaHox paralogon, the data we investigated did not include the ParaHox cdx1b gene. Possibly the most interesting finding is that the ParaHox complex of teleost fish, even after another round of whole genome duplication (WGD), the FSGD, and subsequent deletion of genes, contains exactly the same number of genes and orthologous set of ParaHox genes as the mammalian four ParaHox clusters which did not experience the FSGD. This is all the more surprising since in teleosts all six ParaHox genes are distributed across seven instead of four paralogons, and there is not a single complete ParaHox cluster left in the fish lineage . As outlined above, the ParaHox paralogous genomic regions remain identifiable and we wish to emphasize that the paralogous relationship of the RTKs and other genes 3' of the remnants of those ParaHox clusters stay intact. This is because the remaining genes of the ParaHox clusters, and the 3' adjoining RTKs, as well as the genes clock and clock3 that lie directly 3' of the RTKs on the C1 and C2 ParaHox clusters respectively, clearly form paralogous genomic regions.
The C1, C2, D1 and D2 ParaHox paralogons
Using sequence orthology to Astatotilapia burtoni we were able to determine the C and D ParaHox paralogons of Homo sapiens, Mus musculus, Danio rerio, Takifugu rubripes, Tetraodon nigroviridis, Oryzias latipes and Gasterosteus aculeatus (Additional File 1).
We were unable to find the gene clock in the G. aculeatus genome except for two very short blast hits. Since each of the genes of the C1 ParaHox paralogon of this species lies on different contigs within one scaffold, it seems likely that this gene was not correctly assembled in the current release of the stickleback genome. The entire D2 ParaHox paralogon of D. rerio and a major portion of the expected paralogon of G. aculeatus could not be located in the current releases of public genomic databases. The flt4 gene of M. musculus was relocated to another chromosome and the flt4 of H. sapiens was relocated to a location 30 Mb 5' of cdx1. Therefore it was excluded from the following analyses. The cdx1b of Tetraodon nigroviridis is relocated as well and the cdx1b of Oryzias latipes is reversed. All other examined organisms kept the orientations and positions unaltered in reference to the more 5' genes, but the distance to those is always very large (Figure 5).
The orientation and the order of the genes of the C ParaHox paralogons are conserved in all vertebrates species examined (Figures 3 and 4), implying that the orientation and order of these genes have remained unchanged for more than 450 my . The genes gsh2, pdgfrα and kita with its paralogous gene kitb all have a 5' – 3' orientation. The genes kdrb and clock as well as its paralogous gene clock3, show a 3' – 5' orientation (Figures 3 and 4, [42, 45]). The genomic architecture of the D1 and D2 ParaHox paralogons is less conserved. The orientation of the genes has stayed the same in all but one species included in this study. Yet, in four genomes the position of a gene compared to pdgfrβa/b and csf1ra/b has changed (Figure 5). Furthermore, the csf1rb gene seems to have been lost in Danio rerio . The gene content of all other paralogons examined in this study was completely conserved.
The presumed ancestral condition of the C ParaHox paralogon that can still be found in mammalian genomes [43, 45], it is also conserved in all teleostean a-copies of this ParaHox paralogon (Figures 1a and 3). In all organisms examined here, the b-copy of the C ParaHox complex has lost genes, namely gsh2, pdgfrα and kdrb. The remaining genes of this paralogon nevertheless retained their orientation. A similar scenario can be seen in the D ParaHox paralogon. Here only the b-copy of the gene flt4, a 2R-paralog of kdrb, was lost. We found no trace of clock-like genes 3' of the RTKs so we can not say whether both clock copies were deleted or if the clock precursor was located in the C ParaHox paralogon after the precursor of the C and the D ParaHox paralogons was duplicated. This implies that there never was a clock-like gene in the D1 and D2 ParaHox paralogons.
It seems quite remarkable, that this gene complex maintained both its gene order as well as gene orientation (with the exception of two genes) over very significant evolutionary time spans. Only the lengths of the introns and intergenic regions differ between the species examined. Possible reasons for this conservation might be related to the presence of promoter, enhancer or inhibitor motifs in that complex that influence more than just one gene, so that an inversion or a relocation of one gene would possibly destroy the regulation of the proteins constructed from this and other genes nearby. If such a co-regulation exists, it might explain the maintenance of the gene order and gene orientation over the course of vertebrate evolution. Chiou et al.  showed an example of the important role of clustering in the regulation of the expression of biosynthetic genes in A. parasiticus.
Another possibility might be that a regulator is not located immediately next to its corresponding gene but at a distance, and that other genes exist between regulator and corresponding gene. A relocation or inversion in such a case is expected to lead to disruption of the interactions. It has already been shown that regulatory genes or regions lying in a gene cluster are able to control the expression of genes outside of this cluster . Nevertheless, the selective pressures leading to the maintenance of gene clusters are still poorly understood.
In both the C and the D paralogons, only the a-copy retained the ParaHox gene. It was either lost (C2 copy) or relocated (D2 copy, Figure 5). So the b-paralogon in both cases lost more genes than the a-paralogon. Therefore, when comparing the C and the D ParaHox paralogons it is apparent that the a-copies of the ParaHox paralogons C and D of the teleosts are more conserved and show a higher degree of synteny with the mammalian ParaHox paralogons than the b-copies. Interestingly, this finding is similar to the pattern previously found in the Hox clusters . This finding implies that one copy of the paralogon pair evolved faster than the other. That this is always usually the b-copy is explained most easily by the fact that the more conserved (a) copy is much more likely to be discovered and named first.
Nonparametric Relative Rate Tests of the C and D ParaHox paralogons
Nucleotide Sequence (first and second codon position)
Amino Acid Sequence
A comparison of the C1 ParaHox paralogons of Homo sapiens, Astatotilapia burtoni, Danio rerio, Takifugu rubripes, Tetraodon nigroviridis and Gasterosteus aculeatus (Figures 3 and 4) showed that the cichlid sequence is of an intermediate length. It is considerably shorter that H. sapiens (10%),D. rerio (39%) and O. latipes (34%) but longer than T. nigroviridis (150%), T. rubripes (141%) and G. aculeatus (135%) (see Additional file 2). In O. latipes only fragments of clock, the last gene of the paralogon, could be found. Because of seemingly incomplete assembly in this genomic region 34% might not be the final result.
While the C1 paralogons of teleosts show a similar genome size to cluster size relationship as the mammalian clusters, the C2 clusters are much more condensed but also show the same trend, namely that the cluster size is linked to overall genome size (Figure 7). The D1 paralogons are also much more condensed than the C1 paralogons, but they also display a linear relationship between genome and cluster size, including also the mammalian sequences. An obvious deviation from previously described pattern can be seen in the D2 paralogons of the pufferfishes. Relative to their very compact genome size, the D2 paralogon is surprisingly large. A possible reason for this could be that the maximal condensation of this cluster has already been reached and a further condensation might be detrimental in terms of selection. We can only speculate on this, but the minimum absolute size of the ParaHox paralogons might be determined by the necessary spatial relationships of the individual transcriptional units within these gene complexes and regulatory regions might need to be maintained at a minimal distance in the intergenic regions of adjacent genes in order to maintain the proper function of these genes. The Astatotilapia burtoni D2 ParaHox paralogon could not be included in this comparison because the gene cdx1b is not on the investigated BAC clone.
Gene cluster breakup and gene retention after genome duplications
The FSGD provides the opportunity to study genomes following a whole genome duplication event [ and references therein]. For the Hox gene clusters of teleosts, it has been observed before [ and references therein] that, although all fish genomes studied so far vary in the gene content and even number of Hox gene clusters, the total number of Hox genes contained in their genomes is about the same as in the genomes of tetrapods, which did not experience this WGD. It has been suggested that particularly the Hox gene clusters are, typically, maintained more or less intact, because they are likely to be strongly regulated by sequential activation and cluster completeness is necessitated by corrected interdigitated gene control .
What seems remarkable as well is that the evolutionary forces keeping Hox gene number rather constant seem to be stronger than those that maintain the cohesion and physical linkage on chromosomes of individual clusters following a WGD. Mulley et al.  noted that the ParaHox cluster stayed intact in ancestral fish lineages such as Amia and Polypterus, yet noted the fragmentation of the ParaHox clusters in teleosts, that happened due to gene loss and not because of transpositions or inversions . The FSGD duplicated all genomic regions including the clustered sets of homeobox genes such as Hox, ParaHox and NK. The selection pressures that maintained those clusters intact in part of the metazoans, seem to be relaxed, as for many of these gene clusters, several genes seem to have been lost , despite the fact that these, often apparently co-regulated arrays of genes, seem to share enhancers and are regulated in an interdigitated fashion (Figure 1). Mulley et al.  proposed that the maintenance of a gene cluster is based on interdigitated and/or shared enhancers. The FSGD duplicated not only the genes but also the enhancers and therefore might have released the need for a tight clustering. Our analysis of the ParaHox clusters in teleosts supports this idea in so far as the ParaHox clusters are broken up. Yet, the total number of six ParaHox genes is maintained in post duplication teleost genomes. If the comparison is extended to a larger paralogon than the set of ParaHox genes alone – as was done in this study -it becomes clear that in larger genomic regions there the constancy of gene numbers does not persist. Our analysis shows that some, although not all, additional duplicated genes flanking the ParaHox clusters were retained following a WGD (Figure 1). This might imply that different selective forces such as increased tolerance to more gene product, due to the doubled number of genes, or functional changes (sub-, neofunctionalization) of those genes might be acting. This finding might argue that although differential gene loss on different chromosomal regions is permitted following a WGD through genetic redundancy of cis-regulatory elements, the overall constancy of gene number is strongly selected for by balancing selection at least for transcription factors such as ParaHox genes. Balancing selection might be acting on trans-regulatory mechanisms to countact possibly negative effects of dosage differences. Moreover, possibly weaker selective forces against duplicate genes might permit the retention of probably not co-regulated genes outside of gene clusters after a WGD on one hand. It seems plausible that these different selective forces might also have to do with not only their arrangements in clusters, but also which kind of gene is duplicated (e.g., regulatory genes vs. housekeeping genes). Again, selection might act more strongly in bringing about the loss of interdigitated genes within cluster following a WGD to maintain a modal gene number per genome of these clustered homeobox genes in order to reduce potentially negative changes in dosage following a WGD. The fact, that the number of ParaHox genes before and after the FSGD remained unchanged, indicates a possibly strong regulatory gene dose restriction that would select for the rapid loss of "superfluous" genes. With the exception of the gene cdx1 no gene of the ancestral ParaHox cluster was retained in two copies. Possibly one of the two cdx1 genes may be compensating for the loss of cdx2 gene, hence the retention of two cdx1 genes (Figure 1b, ).
Recently Negre and Ruiz  have discovered a surprising diversity of Hox gene cluster architectures in different species of Drosophila. Since breaks and inversions were found not too infrequently, they argue that not the integrity and organization of Hox clusters is the strongest target of selection. Rather they argue that functional constraints on individual Hox genes might be acting more forcefully on genomes so that functional sets of homeobox genes are maintained in the genome, which are not necessarily physically linked with unbroken colinearity. Other studies showed that an intact cluster is only important for temporal and not for special colinearity. In Drosophila where development is so rapid that almost all the Hox genes are activated at the same time, the cluster is permitted to be interrupted [ and references therein]. Similar reasoning might explain the sitution we describe for "dissolved" ParaHox parologons. Their genomically fixed gene content and orientation in teleost genomes, but their dispersed distribution over seven instead of four chromosomal regions would support the hypothesis that overall gene content is more strongly selected for than the integrity of gene clusters.
We demonstrated the orthologous relationship of the genes of the C and D ParaHox paralogons (Figure 2). Relative rate tests revealed that with the exception of one gene the a-copy always evolves more slowly than the b-copy, the exception being the ParaHox gene cdx1, where the b-copy evolves significantly slower. The relative rate tests also show that the C paralogons evolve more slowly than the D paralogons.
A mVista analysis of the D clusters was performed in an earlier study . We found a number of conserved genomic regions in the C1 ParaHox paralogon that were located in intergenic regions. One conserved sequence block, located at the position 119–130 kb on the A. burtoni BAC clone 99M12, was confirmed to be another gene, the transmembrane protein HPT-1, by BLAST search. We also found evidence that the ParaHox paralogon of the pufferfishes is apparently close to the maximal possible reduction in size.
Despite having undergone an additional genome duplication the total number of ParaHox genes in the genome of teleost fish is maintained at six genes that are distributed over seven chromosomal regions instead of four as in the genomes of tetrapods. Other genes that are physically linked with the ParaHox genes in the same paralogon were also reduced in number following the FSGD. However, while typically ten of these are found in tetrapods 14 are maintained in teleost fish genomes. We discuss possible selective reasons for keeping modal numbers of homoebox genes constant throughout hundreds of millions of years of evolution while permitting to differentially loose ParaHox genes on some ParaHox paralogons.
Future research should include the description of possible binding sites in the conserved elements and functional studies of those putative regulatory elements found by in silico analyses.
BAC Library screening & Shotgun Sequencing
We previously constructed a BAC library of the East African haplochromine cichlid fish Astatotilapia burtoni . This library was screened for kita positive clones with the kita specific primer set Burt-Kit-F-474/Burt-Kit_R-672 according to . Using universal primers (gsh2_Ex1_For (AGAYCCCAGRAGATACCACT) and gsh2_Ex2.3_R (GTGCGCGCTCCTCTGGGTG)) designed on known teleost sequences, we confirmed the presence of the gsh2 gene on the BAC clone. The BAC plasmids of the recovered clones were extracted using the Large-Construct Kit (Qiagen) according to the manufacturer's manual and then sheared by sonification. The fraction of 2–3 kb was recovered from an agarose gel and blunt-end-ligated into the pUC18 vector of Roche and later electro-transformed into "Electro Max DH10B T1 Phage Resistant Cells" (Invitrogen). The subclones were grown in standard LB-medium (0.5 mg/ml ampicillin). The plasmid DNA was recovered using standard methods. The clones were sequenced directly using a standard M13F/M13R primer set on an ABI3100 automatic DNA sequencer (Applied Biosystems).
The obtained sequences were quality trimmed by hand and checked for vector sequences using Sequencher 4.2 (Gene Codes Corporation). The same software was used for the contig assembly at the setting "dirty data", with a sequence similarity of 85% and an overlap of 20 bp. The full sequence of the genome of E. coli from the GENBANK database  was added to the analyses, so that all E. coli contaminated reads were filtered out of the assembly. Gaps between contigs were closed with gap spanning primers, designed with Primer3 . For further analyses the remaining gaps were closed by 33 N's each.
The contigs of the BAC clone 99M12 were checked for corresponding forward/reverse clones in other contigs and a contig map was drawn. To check this map, the contigs were assembled into one single sequence according to the contig map. Using the tool bl2seq (align two sequences) (GENBANK database) , the contigs were BLAST-searched against chromosome 11 of Tetraodon nigroviridis containing its C1 ParaHox cluster. The contig map was then corrected using the information from the bl2seq analysis.
The ontology of the genes sequenced were determined by sequence comparisons with the available genomes of Takifugu rubripe s , (version 4.0), Tetraodon nigroviridis , (version 1–64), Oryzias latipes , (version 200506), Gasterosteus aculeatus , (version 41) and already annotated genes from Danio rerio, Mus musculus and Homo sapiens were taken from GENBANK . The provided annotations of the Homo sapiens, Mus musculus and Danio rerio sequences from GENBANK database  were used to help to identify the intron/exon structure of the respective genes in Takifugu rubripes, Tetraodon nigroviridis and Oryzias latipes. In some cases the Danio rerio sequence could not be included into the analyses due to apparent miss-assemblies.
Phylogenetic and Sequence Analyses
For phylogenetic analyses, Nexus files were processed via PAUP*  to eliminate positions that could not be aligned. The appropriate models of molecular evolution were estimated using the program modelgenerator . Maximum likelihood trees and bootstrapping (1000 replicates) were calculated in PHYML . Bayesian Inference was performed in Mr. Bayes 3.1 [70, 71] (1,000,000 generations/5000 burnin).
Vista Plots were obtained via the mVista option on the Vista homepage . The alignment program used was LAGAN (Global multiple alignment of finished sequences) . For Homo sapiens the human/primate-specific RepeatMasker and for Mus musculus the mouse/rat/rodent specific RepeatMasker was used. For all other sequences the fugu-specific RepeatMasker was used as a stand in.
Support from the Deutsche Forschungsgemeinschaft (DFG), from the European Community, and the Landesstiftung Baden-Württemberg GmbH is gratefully acknowledged.
- Nelson JS: Fishes of the world. 2006, Hoboken, New Jersey , John Wiley & SonsGoogle Scholar
- Fryer G, Iles TD: The Cichlid Fishes of the Great Lakes of Africa: Their Biology and Evolution. Oliver and Boyd, Edinburgh. 1972Google Scholar
- Kocher TD: Adaptive evolution and explosive speciation: the cichlid fish model. Nature Reviews Genetics. 2004, 5 (4): 288-298. 10.1038/nrg1316.PubMedView ArticleGoogle Scholar
- Kornfield I, Smith PF: African Cichlid Fishes: Model Systems for Evolutionary Biology. Annual Reviews in Ecology and Systematics. 2000, 31: 163-196. 10.1146/annurev.ecolsys.31.1.163.View ArticleGoogle Scholar
- Meyer A: Phylogenetic relationships and evolutionary processes in East African cichlids. Trends in Ecology and Evolution. 1993, 8: 279-284. 10.1016/0169-5347(93)90255-N.PubMedView ArticleGoogle Scholar
- Salzburger W, Meyer A: The species flocks of East African cichlid fishes: recent advances in molecular phylogenetics and population genetics. Naturwissenschaften. 2004, 91 (6): 277-290. 10.1007/s00114-004-0528-6.PubMedGoogle Scholar
- Schluter D: The ecology of adaptive radiation. Oxford University Press, New York. 2000Google Scholar
- Stiassny MLJ, Meyer A: Cichlids of the Rift Lakes. Scientific American. 1999, February: 64-69.View ArticleGoogle Scholar
- Turner GF, Burrows MT: A model of sympatric speciation by sexual selection. Proceedings Biological sciences / The Royal Society B. 1995, 260: 287-292. 10.1098/rspb.1995.0093.View ArticleGoogle Scholar
- Barlow GW: The cichlid fishes. Nature's grand experiment in evolution. 2000, Cambridge , Perseus PublishingGoogle Scholar
- Salzburger W, Niederstatter H, Brandstatter A, Berger B, Parson W, Snoeks J, Sturmbauer C: Colour-assortative mating among populations of Tropheus moorii, a cichlid fish from Lake Tanganyika, East Africa. Proc Biol Sci. 2006, 273 (1584): 257-266. 10.1098/rspb.2005.3321.PubMed CentralPubMedView ArticleGoogle Scholar
- Liem KF: Adaptive significance of intra- and interspecific differences in the feeding repertoires of cichlid fishes. American Zoologist. 1980, 20: 295-314.View ArticleGoogle Scholar
- Garcia-Fernandez J: Hox, ParaHox, ProtoHox: facts and guesses. Heredity. 2005, 94 (2): 145-152. 10.1038/sj.hdy.6800621.PubMedView ArticleGoogle Scholar
- Holland PW, Garcia-Fernandez J, Williams NA, Sidow A: Gene duplications and the origins of vertebrate development. Dev Suppl. 1994, 125-133.Google Scholar
- Ohno S: Evolution by Gene Duplication. Springer-Verlag, New York. 1970Google Scholar
- Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999, 151 (4): 1531-1545.PubMed CentralPubMedGoogle Scholar
- Meyer A, Schartl M: Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol. 1999, 11 (6): 699-704. 10.1016/S0955-0674(99)00039-3.PubMedView ArticleGoogle Scholar
- Durand D, Hoberman R: Diagnosing duplications--can it be done?. Trends in genetics. 2006, 22 (3): 156-164. 10.1016/j.tig.2006.01.002.PubMedView ArticleGoogle Scholar
- Long M: Evolution of novel genes. Curr Opin Genet Dev. 2001, 11 (6): 673-680. 10.1016/S0959-437X(00)00252-5.PubMedView ArticleGoogle Scholar
- Tymowska J, Fischberg M, Tinsley RC: The karyotype of the tetraploid species Xenopus vestitus Laurent (Anura: pipidae). Cytogenetics and cell genetics. 1977, 19 (6): 344-354.PubMedView ArticleGoogle Scholar
- Beetschen JC: [5 generations of polyploid individuals in the salamander, Pleurodeles watlii Michah]. C R Seances Soc Biol Fil. 1967, 161 (4): 930-936.PubMedGoogle Scholar
- Danzmann RG, Cairney M, Davidson WS, Ferguson MM, Gharbi K, Guyomard R, Holm LE, Leder E, Okamoto N, Ozaki A, Rexroad CE, Sakamoto T, Taggart JB, Woram RA: A comparative analysis of the rainbow trout genome with 2 other species of fish (Arctic charr and Atlantic salmon) within the tetraploid derivative Salmonidae family (subfamily: Salmoninae). Genome. 2005, 48 (6): 1037-1051. 10.1139/g05-067.PubMedView ArticleGoogle Scholar
- Leggatt RA, Iwama GK: Occurrence of polyploidy in the fishes. Reviews in Fish Biology and Fisheries. 2004, 13: 237–246-Google Scholar
- Hanson RE, Islam-Faridi MN, Percival EA, Crane CF, Ji Y, McKnight TD, Stelly DM, Price HJ: Distribution of 5S and 18S-28S rDNA loci in a tetraploid cotton (Gossypium hirsutum L.) and its putative diploid ancestors. Chromosoma. 1996, 105 (1): 55-61.PubMedView ArticleGoogle Scholar
- Islam N, Tsujimoto H, Hirano H: Proteome analysis of diploid, tetraploid and hexaploid wheat: towards understanding genome interaction in protein expression. Proteomics. 2003, 3 (4): 549-557. 10.1002/pmic.200390068.PubMedView ArticleGoogle Scholar
- Patterson JT, Larson SR, Johnson PG: Genome relationships in polyploid Poa pratensis and other Poa species inferred from phylogenetic analysis of nuclear and chloroplast DNA sequences. Genome. 2005, 48 (1): 76-87. 10.1139/g04-102.PubMedView ArticleGoogle Scholar
- Vandepoele K, De Vos W, Taylor JS, Meyer A, Van de Peer Y: Major events in the genome evolution of vertebrates: Paranome age and size differs considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci USA. 2004, 101 (6): 1638-1643. 10.1073/pnas.0307968100.PubMed CentralPubMedView ArticleGoogle Scholar
- Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, Van de Peer Y: The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol. 2006, 7 (5): R43-10.1186/gb-2006-7-5-r43.PubMed CentralPubMedView ArticleGoogle Scholar
- Amores A, Suzuki T, Yan YL, Pomeroy J, Singer A, Amemiya C, Postlethwait JH: Developmental roles of pufferfish Hox clusters and genome evolution in ray-fin fish. Genome Res. 2004, 14 (1): 1-10. 10.1101/gr.1717804.PubMed CentralPubMedView ArticleGoogle Scholar
- Hoegg S, Brinkmann H, Taylor JS, Meyer A: Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish. Journal of molecular evolution. 2004, 59 (2): 190-203. 10.1007/s00239-004-2613-z.PubMedView ArticleGoogle Scholar
- Taylor JS, Braasch I, Frickey T, Meyer A, Van de Peer Y: Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome Res. 2003, 13: 382-390. 10.1101/gr.640303.PubMed CentralPubMedView ArticleGoogle Scholar
- Taylor JS, Van de Peer Y, Braasch I, Meyer A: Comparative genomics provides evidence for an ancient genome duplication event in fish. Phil Trans R Soc Lond B Biol Sci. 2001, 356 (1414): 1661-1679. 10.1098/rstb.2001.0975.View ArticleGoogle Scholar
- Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005, 27 (9): 937-945. 10.1002/bies.20293.PubMedView ArticleGoogle Scholar
- Gu X, Wang Y, Gu J: Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nature Genetics. 2002, 31: 205-209. 10.1038/ng902.PubMedView ArticleGoogle Scholar
- Delsuc F, Brinkmann H, Chourrout D, Philippe H: Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature. 2006, 439 (7079): 965-968. 10.1038/nature04336.PubMedView ArticleGoogle Scholar
- Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, Westerfield M, Ekker M, Postlethwait JH: Zebrafish hox clusters and vertebrate genome evolution. Science. 1998, 282 (5394): 1711-1714. 10.1126/science.282.5394.1711.PubMedView ArticleGoogle Scholar
- Wittbrodt J, Meyer A, Schartl M: More genes in fish?. BioEssays. 1998, 20: 511-515. 10.1002/(SICI)1521-1878(199806)20:6<511::AID-BIES10>3.0.CO;2-3.View ArticleGoogle Scholar
- Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B: Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004, 21 (6): 1146-1151. 10.1093/molbev/msh114.PubMedView ArticleGoogle Scholar
- Hoegg S, Meyer A: Hox clusters as models for vertebrate genome evolution. Trends in genetics. 2005, 21 (8): 421-424. 10.1016/j.tig.2005.06.004.PubMedView ArticleGoogle Scholar
- Garcia-Fernandez J, Holland PW: Archetypal organization of the amphioxus Hox gene cluster. Nature. 1994, 370 (6490): 563-566. 10.1038/370563a0.PubMedView ArticleGoogle Scholar
- Chourrout D, Delsuc F, Chourrout P, Edvardsen RB, Rentzsch F, Renfer E, Jensen MF, Zhu B, de Jong P, Steele RE, Technau U: Minimal ProtoHox cluster inferred from bilaterian and cnidarian Hox complements. Nature. 2006, 442 (7103): 684-687. 10.1038/nature04863.PubMedView ArticleGoogle Scholar
- Mulley JF, Chiu CH, Holland PW: Breakup of a homeobox cluster after genome duplication in teleosts. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (27): 10369-10372. 10.1073/pnas.0600341103.PubMed CentralPubMedView ArticleGoogle Scholar
- Ferrier DE, Dewar K, Cook A, Chang JL, Hill-Force A, Amemiya C: The chordate ParaHox cluster. Curr Biol. 2005, 15 (20): R820-2. 10.1016/j.cub.2005.10.014.PubMedView ArticleGoogle Scholar
- Minguillon C, Garcia-Fernandez J: Genesis and evolution of the Evx and Mox genes and the extended Hox and ParaHox gene clusters. Genome Biol. 2003, 4 (2): R12-10.1186/gb-2003-4-2-r12.PubMed CentralPubMedView ArticleGoogle Scholar
- Prohaska SJ, Stadler PF: Evolution of the vertebrate parahox clusters. J Exp Zoolog B Mol Dev Evol. 2006Google Scholar
- Braasch I, Salzburger W, Meyer A: Asymmetric evolution in two fish-specifically duplicated receptor tyrosine kinase paralogons involved in teleost coloration. Mol Biol Evol. 2006, 23 (6): 1192-1202. 10.1093/molbev/msk003.PubMedView ArticleGoogle Scholar
- Parichy DM, Turner JM: Temporal and cellular requirements for Fms signaling during zebrafish adult pigment pattern development. Development. 2003, 130 (5): 817-833. 10.1242/dev.00307.PubMedView ArticleGoogle Scholar
- Parichy DM, Ransom DG, Paw B, Zon LI, Johnson SL: An orthologue of the kit-related gene fms is required for development of neural crest-derived xanthophores and a subpopulation of adult melanocytes in the zebrafish, Danio rerio. Development. 2000, 127 (14): 3031-3044.PubMedGoogle Scholar
- Parichy DM, Rawls JF, Pratt SJ, Whitfield TT, Johnson SL: Zebrafish sparse corresponds to an orthologue of c-kit and is required for the morphogenesis of a subpopulation of melanocytes, but is not essential for hematopoiesis or primordial germ cell development. Development. 1999, 126 (15): 3425-3436.PubMedGoogle Scholar
- Lang M, Miyake T, Braasch I, Tinnemore D, Siegel N, Salzburger W, Amemiya CT, Meyer A: A BAC library of the East African haplochromine cichlid fish Astatotilapia burtoni. J Exp Zoolog B Mol Dev Evol. 2006, 306 (1): 35-44. 10.1002/jez.b.21068.View ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
- Blair JE, Hedges SB: Molecular phylogeny and divergence times of deuterostome animals. Mol Biol Evol. 2005, 22 (11): 2275-2284. 10.1093/molbev/msi225.PubMedView ArticleGoogle Scholar
- Chiou CH, Miller M, Wilson DL, Trail F, Linz JE: Chromosomal location plays a role in regulation of aflatoxin gene expression in Aspergillus parasiticus. Appl Environ Microbiol. 2002, 68 (1): 306-315. 10.1128/AEM.68.1.306-315.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Price MS, Yu J, Nierman WC, Kim HS, Pritchard B, Jacobus CA, Bhatnagar D, Cleveland TE, Payne GA: The aflatoxin pathway regulator AflR induces gene transcription inside and outside of the aflatoxin biosynthetic cluster. FEMS Microbiol Lett. 2006, 255 (2): 275-279. 10.1111/j.1574-6968.2005.00084.x.PubMedView ArticleGoogle Scholar
- Wagner GP, Takahashi K, Lynch V, Prohaska SJ, Fried C, Stadler PF, Amemiya C: Molecular evolution of duplicated ray finned fish HoxA clusters: increased synonymous substitution rate and asymmetrical co-divergence of coding and non-coding sequences. J Mol Evol. 2005, 60 (5): 665-676. 10.1007/s00239-004-0252-z.PubMedView ArticleGoogle Scholar
- Gregory TR: Animal Genome Size Database. http://www.genomesize.com. 2005Google Scholar
- Pearson JC, Lemons D, McGinnis W: Modulating Hox gene functions during animal body patterning. Nat Rev Genet. 2005, 6 (12): 893-904.PubMedView ArticleGoogle Scholar
- Garcia-Fernandez J: The genesis and evolution of homeobox gene clusters. Nat Rev Genet. 2005, 6 (12): 881-892.PubMedView ArticleGoogle Scholar
- Negre B, Casillas S, Suzanne M, Sanchez-Herrero E, Akam M, Nefedov M, Barbadilla A, de Jong P, Ruiz A: Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex. Genome Res. 2005, 15 (5): 692-700. 10.1101/gr.3468605.PubMed CentralPubMedView ArticleGoogle Scholar
- Monteiro AS, Ferrier DE: Hox genes are not always Colinear. Int J Biol Sci. 2006, 2 (3): 95-103.PubMed CentralPubMedView ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols: Methods in Molecular Biology Humana Press Totowa NJ. 2000, 132: 365-386.View ArticleGoogle Scholar
- JGI Database. [http://www.jgi.doe.gov/]
- Genoscope. [http://www.genoscope.cns.fr/externe/tetranew/]
- Medaka Genome Project. [http://dolphin.lab.nig.ac.jp/medaka/]
- Ensembl Genome Browser. [http://www.ensembl.org/index.html]
- Rogers JS, Swofford DL: A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences. Systematic biology. 1998, 47 (1): 77-89. 10.1080/106351598261049.PubMedView ArticleGoogle Scholar
- Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology. 2006, 6: 29-10.1186/1471-2148-6-29.PubMed CentralPubMedView ArticleGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology. 2003, 52 (5): 696-704. 10.1080/10635150390235520.PubMedView ArticleGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17 (8): 754-755. 10.1093/bioinformatics/17.8.754.PubMedView ArticleGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-1574. 10.1093/bioinformatics/btg180.PubMedView ArticleGoogle Scholar
- Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I: VISTA : visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000, 16 (11): 1046-1047. 10.1093/bioinformatics/16.11.1046.PubMedView ArticleGoogle Scholar
- Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome research. 2003, 13 (4): 721-731. 10.1101/gr.926603.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.