The increasing number of sequenced genomes provides molecular explanations for both the unity and diversity of living organisms. The more divergent the organisms, the less they share genes. This explains why annotation of genomes using genes with known functions in other organisms leaves a high number of predicted genes with no predicted function. For some prokaryotes, the percentage of genes with no predicted function rises to 65% but falls to 20% for the closely related vertebrate genomes [1–3].
The majority of genes with no assigned functions are those involved in the recent evolutionary success of the considered taxonomic group. This is both true for prokaryotes that develop original metabolisms allowing growth in special environments and for the vertebrate species that have developed original solutions in response to environmental pressures. Comparison of mammalian proteins show that host defense ligands and receptors make up the group of proteins that diverge the most rapidly . According to the «red queen model» the pressure of pathogens is, at small time scales, the most drastic pressure for the evolution of vertebrate species.
At the genomic level, together with the mutation/modification of regulatory elements, three driving forces are instrumental for the diversification. The first is the emergence of new domain architecture through domain accretion and shuffling, the second is deletion of genes, and the third is the expansion of a gene family either by gene duplications or by retropositions. Lineage specific expansion (LSE) is the proliferation of a given gene family in a given lineage. Its description implies the comparison of sister lineages . Using predicted proteomes, Lespinet et al. have recently performed a systematic comparative analysis of LSEs in the following eukaryotic genomes: Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana. They reached the conclusion that «LSE seems to be one of the most important sources of structural and regulatory diversity in crown-group eukaryotes, which was critical for the tremendous exploration of the morphospace seen in these organisms» . A good example for an LSE is the expansion of immunoglobulin genes in gnathostomes compared to other chordates. But LSEs also exist when comparing the different orders of mammals as exemplified by the expansion of the alpha interferons [7, 8].
Vertebrate immunoglobulins (Ig) are built up from modules of one hundred amino acids. These modules are defined both by a common 3-D structure, by conserved disulfide bridges and by conserved amino acid positions. They share the same 3-D structure with the Fibronectin type III repeats (FNIII), but conserved amino acid positions are different in both groups of domains [9, 10]. Genes coding for such modules were already present in the genomes of invertebrates . The originality of the gnathostomes is the invention of rearranging antigen receptors by insertion of a transposable element in a gene coding for one of these Ig modules [3, 12]. During the further diversification of the vertebrates, the different lineages (for example, condrychtians and osteichtians) have developed the system in different ways but the main difference consists in the maturation of the immune response that mainly takes place in the lymphoid organs. Cytokines that regulate the maturation of the immune response from antigen detection to clonal expansion of the one cell with better affinity mostly belong to the helical cytokine (HC) family . They include interferons, most interleukins, LIF, CNTF, GCSF, GM-CSF, thrombopoïetine. These helical cytokines have no similarities at the level of primary amino acid sequences, but they are all structured around a similar four alpha helix bundle. They share this common 3-D structure with some hormones that for this reason are structurally described as helical cytokines: Growth Hormone (GH), Prolactin (PRL), Erythropoïetin (EPO) and Leptin . These helical cytokines all bind to the extracellular binding domains of their cognate receptors (helical cytokines receptors: HCR) which all contain a 200 amino-acids (D200) domain that is the identification mark of the HCR gene family. These D200 domains are composed of two subdomains of 100 amino-acids (SD100A & SD100B) that are both structured like the basic Ig domains with two β sheets of respectively 3 and 4 strands (C type). Conserved amino-acid positions clearly distinguish these D200 domains from the Ig superfamily and from the FNIII family [9, 10].
Whereas Ig and FNIII families have been expanded in invertebrates, a single gene with a D200 has been described in invertebrates: the dome gene in drosophila [15–17]. The HCR family is therefore an interesting example of a vertebrate specific LSE. Like other families of receptors involved in host defense, it mostly consists of highly diverging receptors (28% amino acids identities between the human and chicken IFNAR2 proteins) [4, 18]. Together with the difficulty of predicting genes from genomic sequences, this explains why the comparison of the predicted human and Fugu proteomes did not allow the identification of the complete repertoire of HCR in Fugu . Depending on the conserved amino acids residues, HCRs can be divided in two classes: Class I and Class II. Class II consists of the Tissue Factor, the receptors for interferons and the receptors for IL10 and its related cytokines(IL10, IL19, IL20, IL22, IL24 and IL26). Class I consists of all other HCRs [9, 10]. Their cognate ligands have been called class I and class II helical cytokines. Genes for HCRI have been described in the major vertebrate groups including fish, birds and mammals, but HCRII have only been described in birds and mammals [10, 18, 19, 20, 21]. The question is therefore open as to whether the HCRII expansion is amniote specific or not. The recent efforts to sequence genomes from fish offer an interesting opportunity to answer this question.
Interestingly, the intron/exon structure of the vertebrate HCR genes is strictly conserved in all the family: like the exons coding for the Igs and the FNIII, the exons coding each SD100 are bordered by phase 1 introns, but what is specific for D200s is that SD100As are encoded by two exons with an internal phase 2 intron falling at the level of the third β strand and that SD100Bs are encoded by two exons with an internal phase 0 intron falling at the level of the fourth β strand [22–24]. Intron/exon structures can thus be used as a criterion for the identification of homologs in distant species.
We decided to use the genomic data from Tetraodon nigroviridis to look for the genes coding the class II HCR (HCRII) and their ligands. The main interest of T. nigroviridis is both its completely sequenced compact genome and the ease with which it can be maintained in the laboratory and used for experiments. We report here the complete description of the T. nigroviridis class II HCR repertoire and show that its diversification from common ancestral elements has occurred independently in fish and mammalian lineages. We have also characterized two ligands for these receptors.