Genome-wide identification of nuclear receptor (NR) superfamily genes in the copepod Tigriopus japonicus

Background Nuclear receptors (NRs) are a large superfamily of proteins defined by a DNA-binding domain (DBD) and a ligand-binding domain (LBD). They function as transcriptional regulators to control expression of genes involved in development, homeostasis, and metabolism. The number of NRs differs from species to species, because of gene duplications and/or lineage-specific gene losses during metazoan evolution. Many NRs in arthropods interact with the ecdysteroid hormone and are involved in ecdysone-mediated signaling in arthropods. The nuclear receptor superfamily complement has been reported in several arthropods, including crustaceans, but not in copepods. We identified the entire NR repertoire of the copepod Tigriopus japonicus, which is an important marine model species for ecotoxicology and environmental genomics. Results Using whole genome and transcriptome sequences, we identified a total of 31 nuclear receptors in the genome of T. japonicus. Nomenclature of the nuclear receptors was determined based on the sequence similarities of the DNA-binding domain (DBD) and ligand-binding domain (LBD). The 7 subfamilies of NRs separate into five major clades (subfamilies NR1, NR2, NR3, NR4, and NR5/6). Although the repertoire of NR members in, T. japonicus was similar to that reported for other arthropods, there was an expansion of the NR1 subfamily in Tigriopus japonicus. The twelve unique nuclear receptors identified in T. japonicus are members of NR1L. This expansion may be a unique lineage-specific feature of crustaceans. Interestingly, E78 and HR83, which are present in other arthropods, were absent from the genomes of T. japonicus and two congeneric copepod species (T. japonicus and Tigriopus californicus), suggesting copepod lineage-specific gene loss. Conclusions We identified all NR receptors present in the copepod, T. japonicus. Knowledge of the copepod nuclear receptor repertoire will contribute to a better understanding of copepod- and crustacean-specific NR evolution. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-993) contains supplementary material, which is available to authorized users.


Background
Nuclear receptors (NRs) are a group of proteins defined by the presence of two functional domains: a highly conserved DNA-binding domain (DBD) and a less conserved ligand-binding domain (LBD). They function as transcriptional regulators to control the expression of specific genes involved in development, homeostasis, and metabolism [1]. Nuclear receptors are a large superfamily specific to metazoans, and seven subfamilies (NR0 to NR6) are currently recognized based on structural and functional data [2]. The superfamily includes receptors for hydrophobic molecules such as steroid hormones, retinoic acids, thyroid hormones, and fatty acids [3]. Two large subfamilies are NR1 and NR2. The subfamily NR1 includes the thyroid hormone receptors (TRs), retinoic acid receptors (RARs), vitamin D receptors (VDRs), and peroxisome proliferator-activated receptors (PPARs). NR2 subfamily contains retinoid X receptors (RXRs) and hepatocyte nuclear factor 4 (HNF4). NR3 subfamily comprises receptors for sex and adrenal steroid hormones. While steroid hormone receptors (SRs) such as the androgen receptor (AR), estrogen receptor (ER), glucocorticoid (GR), mineralocorticoid (MR), and progestagen receptor (PR) belong to NR3, the insect SR, ecdysone receptor (EcR) is in the NR1 clade [4]. Some NRs are referred to as orphan nuclear receptors because their endogenous ligands remain unknown [5]. Subfamily NR4 contains orphan receptors such as nerve growth factor-induced clone B (NGFIB), nuclear receptor related 1 (NURR1), and insect HR38 [6][7][8]. Subfamily NR5 includes Fushi Tarazu-factor 1 (FTZ-F1) and HR39, which are involved in ecdysoneregulated response pathways [9][10][11]. NR6 contains insect HR4, which is homologous to the vertebrate orphan receptor germ cell nuclear factor (GCNF). The last subfamily, NR0, includes two groups, A and B. NR0A group members lack a LBD and have been found only in insects, while members of NR0B, DAX1 and SHP, have been identified only in vertebrates [1]. Because nuclear receptors are dispersed in genomes and functionally well characterized, they are considered a good model to study gene or genome duplication [12]. Genome sequencing and comparative genomic studies have enabled identification of all nuclear receptor family members in some organisms. The number of NRs identified in sequenced genomes varies from species to species, as some nuclear receptors have undergone several gene duplication events during metazoan evolution [12] while others have been lost due to gene loss. There are an estimated 21 NRs in Drosophila melanogaster, 25 in Daphnia pulex, 21 in Tribolium castaneum, 48 in Homo sapiens, and 284 in Caenohabditis elegans [13][14][15][16][17]. An interesting feature of nuclear receptors in arthropods is that several nuclear receptors interact with the ecdysteroid hormone and are involved in ecdysone-mediated signaling, which regulates development, growth, and molting during embryogenesis [10,18]. Molting and metamorphosis in arthropods are remarkable developmental processes trigger by ecdysteroid hormones and signal transduction by members of the nuclear receptor superfamily. During the onset of metamorphosis, ecdysone regulates expression of several members of the NR superfamily [19,20]. NRs, therefore, are considered to be responsible for the evolution of insect metamorphosis by responding to ecdysone and juvenile hormone [20]. Insect maturation is regulated by the steroid hormone 20-hydroxyecdysone (20E), which binds to a heterodimer of EcR and ultraspiracle (USP) [21]. In addition, many nuclear receptors such as DHR3, DHR4, DHR39, E75, E78, and FTZ-F1 function as direct targets of 20E-EcR-USP [10,[22][23][24]. DHR38 functions as a second ecdysteroid receptor by dimerizing with USP or RXR, while DHR78 inhibits 20E-induced reporter gene transcription by binding to a subset of ECR-USP binding sites in cultured cells [25,26]. ERR directs a metabolic switch that supports developmental growth [27].
The copepod Tigriopus is a marine model species for ecotoxicology and environmental genomic studies [18]. Copepoda is the second largest crustacean taxa and one of the dominant taxa in aquatic zooplankton communities, representing 70% of the ocean's zooplankton biomass [28,29]. The nuclear receptor superfamily complement of some arthropods, including crustaceans, has been reported, but not that of any copepods. In this study, we identified the complete nuclear receptor repertoire of the copepod T. japonicus based on whole genome sequence and transcriptome data [30].

Identification of nuclear receptors in T. japonicus
Using whole genome sequences and RNA-seq sequences of T. japonicus, we performed a BLAST analysis against the non-redundant (NR) database of NCBI to obtain contigs containing the DBD and/or LBD domains of nuclear receptors. We found 67 contigs containing full or partial nuclear receptor DBD sequences. Of these 67 contigs, 42 had both DBD and LBD sequences of nuclear receptors. Full-length sequences of the putative nuclear receptors were obtained from partial sequences using RACE (Rapid Amplification of cDNA Ends) methods. Finally, full-length mRNAs from 36 contigs were confirmed by RT-PCR analysis (Additional file 1: Figure S1; Additional file 1: Table S1). Further analysis of a conserved domain database allowed us to determine the DBD and LBD regions of each nuclear receptor [31] and revealed that four receptors (NR1D TJ-E75, NR1F TJ-HR3, NR2F TJ-SVP, and NR3B TJ-ERR) had more than two isoform structures (Figure 1, Additional file 1: Figure S2). TJ-E75A and TJ-E75C shared both the DBD and LBD, while TJ-E75B shared only the LBD (Figure 1, Additional file 1: Figure S2A). In the case of TJ-HR3, two isoforms shared both the DBD and LBD (Figure 1, Additional file 1: Figure S2B). The two isoforms of TJ-SVP have an independent DBD and LBD in a consecutive tail to head structure ( Figure 1, Additional file 1: Figure S2C). Two TJ-ERR isoforms shared both a DBD and LBD ( Figure 1, Additional file 1: Figure S2D). Nucleotide sequences of each isoform were confirmed by RT-PCR validation (Additional file 1: Figure S1). We thus found a total of 31 nuclear receptors in the T. japonicus genome; the sequences of these receptors have been deposited in GenBank (Table 1).

Phylogenetic analysis and nomenclature
Phylogenetic analysis of nuclear receptors of T. japonicus and sequence comparisons with other organisms (C. elegans, D. pulex, D. melanogaster, and H. sapiens) were performed to classify the nuclear receptors found in T.
japonicus and to investigate their evolutionary history with maximum likelihood method. The phylogenetic tree segregated the nuclear receptors of T. japonicus into five major clades (subfamily NR1, NR2, NR3, NR4, and NR5/6) ( Figure 2). Seven subfamilies with 16 groups were evident (Table 1). Of the 31 nuclear receptors we identified, 16 receptors belonged to five groups (D, F, H, J, and L) of subfamily NR1, seven NRs to five groups (A, B, D, E, and F) of subfamily NR2, two NRs to two groups (A and B) of subfamily NR5, and three NRs to group A of NR0 (Table 1). Subfamilies NR3, NR4, and NR6 contained one member in each subfamily. Overall repertoire of NR subfamily groups in T. japonicus followed that seen in other arthropods, which is distinct from that in H. sapiens as an example of a chordate, although NR sets also vary between different vertebrate species [32] ( Figure 3). Non-SRs such as TR, RAR, and PPAR in NR1 and SRs such as ER and members in NR3C have not been detected in arthropods, as represented by D. melanogaster and D. pulex [17,33]. Similarly, the NR2E1 group in the NR2E subfamily is present in H. sapiens but not arthropods or crustaceans, whereas the opposite pattern has been observed for members of the NR2E2 and NR2E4 groups.
This kind of gene expansion in NR1L has also been observed in other copepods; 12 NR1L members have been identified in Tigriopus californicus (Prof. Ronald S. Burton, T. californicus Transcriptome Shotgun Assembly; http:// www.ncbi.nlm.nih.gov/nuccore/?term=txid6832%5BOrgan ism%3Anoexp%5D+AND+TSA) and eight NR1L members have been identified in the copepod Paracyclopina nana (Prof. Jae-Seong Lee, unpublished data). These NR1L members are considered unique to crustaceans, as the NR1L group has been identified in Daphnia species and three copepods, but not in arthropods. As described above, T. japonicus had a similar set of nuclear receptors to those present in other arthropods ( Figure 3). However, we were not able to detect E78 (NR1E) in the genome of T. japonicus. HR83 in the E5 group of subfamily NR2 was also not detected in T. japonicus, as reported for D. pulex, suggesting a crustacean-specific gene loss of these two genes. To clarify whether this gene loss was copepodspecific or not, we attempted to find E78 and HR83 in the genomes of other copepods (T. californicus and P. nana) but were not able to find any matches.

Discussion
NRs are a diverse class of transcription factors that regulate hormonal and non-hormonal signalling processes in metazoans, and different numbers of NRs have been identified in different species [13][14][15][16][17]. Nuclear receptors have a conserved DBD and a moderately conserved LBD, which makes identification of nuclear receptors in the genome relatively easy and allows robust phylogenetic reconstruction at the superfamily level. Using whole genome and transcriptome sequence databases, we identified a total of 36 nuclear receptors, including isoforms, in the genome of the copepod T. japonicus.

Homologous nuclear receptors involved in developmental processes
NRs have been reported to be good phylogenetic markers, as they give robust results because of their structural conservation [12]. Representatives of all seven subfamilies of nuclear receptors present in metazoans were identified in T. japonicus. Of 31 NRs in T. japonicus, 19 members were homologous to those in other species, while 12 receptors were unique to T. japonicus (Figure 2). Homologous nuclear receptors have been identified in D.
melanogaster and D. pulex [17,33]. Developmental functions of nuclear receptors in arthropods have been well studied in Drosophila [33]. Comparative functional studies of homologous NRs in other arthropods and expression studies of T. japonicus specific nuclear receptors might help clarify the involvement of nuclear   Figure 3 Simplified phylogenetic distribution of nuclear receptors in chordates and arthropods. This diagram was updated based on schematic features of the NR superfamily suggested by a previous study [34]. Both arthropod names and the official nomenclature names of chordates are given for each nuclear receptor. A colored box indicates the presence of a homolog, while the character X indicates the absence of the NR from the genome of that species. receptors in development and ecdysone signaling in T. japonicus.

Chordates
Typical evolutionary patterns of NRs in crustaceans, arthropods, and chordates Arthropod nuclear receptor subfamilies show different evolutionary patterns from those of chordates [12,36]. Nuclear receptor members in groups A, B, and C of subfamily NR1 are present in H. sapiens, but not in arthropods. The same is true for SRs in the subfamilies NR3A and NR3C. There are two major SR lineages, namely the ER lineage and the AR/PR/GR/MR lineage that descended from a single ancestral receptor by genome duplication [37]. Despite the fact that SRs play important physiological roles in vertebrates and a few invertebrates (mollusks and annelids), homologues of these receptors have not been detected in arthropods. Many non-steroid receptors such as RXR, FTZ-F1, COUP-FT, and HNF4 appear to have been highly conserved through evolution, even though their developmental roles may be different in different metazoans [36,38]. However, it appears that Ecdysozoa species do not have homologous members to NR3A in subfamily 3. ERR from Drosophila was identified as the gene most closely related to the vertebrate ER gene in a non-chordate species [39]. ERRs share a common ancestor with ERs based on phylogenetic analyses, and like TR and RAR, are present in Urbilateria [40]. SRs such as ER have well defined roles in mammalian reproduction. However, the ER may originally play a role in the development and regulation of the nervous system, which controls species-specific behavior and endocrine homeostasis in birds [40]. No ER has been found in arthropods ( Figure 3).

Expansion of NR1L members in crustaceans
Although the subfamily composition of all NR members in T. japonicus is similar to that of other arthropods, there are more NR1 subfamily members in the copepod than in D. melanogaster and D. pulex. In T. japonicus, 16 of 31 (52%) nuclear receptors are members of the NR1 subfamily. More than half (52%) of all nuclear receptors in another crustacean, D. pulex, are NR1 members [17]. Considering that 38% of NRs in D. melanogaster and 24% in T. castaneum are NR1 members [13,16], crustaceans have more NR1 members than other arthropods. The unique NR1Ls that we identified support expansion of the NR1 subfamily in crustaceans. Daphnia has unique HR97/NR1Ls in subfamily NR1 (Figure 4) [17,35]. We also detected 12 unique nuclear receptors in T. japonicus, which all clustered in subfamily NR1L based on phylogenetic analysis. Although we named these receptors 'HR97/ NR1L' following the nomenclature for Daphnia species, the genetic distance between T. japonicus NR1L members and Daphnia NR1L subfamily members was high, implying that these receptors in Daphnia may be an independent lineage. More sequence information from copepod NRs needs to be obtained to study copepod-specific NR1L gene expansion.
Previously, lineage-specific expansions of 10 members of NR1H and three members of NR2E were demonstrated in the cephalochordate Branchiostoma floridae   All copepod NR1La subfamilies. 3 All copepod NR1Lb subfamilies. [41]. Moreover, a massive duplication of HNF4 (>250 HNF4-like NRs) occurred in C. elegans [42]. Also a lineage specific expansion in a Lophotrochozoan has also been identified recently in the NR1 subfamily (11 members) of the pacific oyster [43]. NR complements can therefore be shaped by lineage-specific loss and/or tandem duplication events. In most crustaceans, nuclear receptors involved in ecdysone signaling are well studied, but the entire NR superfamily has only been studied in D. pulex and T. japonicus thus far. The expansion of NR1L members identified in T. japonicus and copepods (T. californicus and P. nana) highlights the usefulness of studying copepods to gain a better understanding of comparative evolutionary diversification of NRs. Data from more diverse crustaceans would further increase our understanding of this expansion.

Copepod-specific NR evolution
Although the overall pattern of nuclear receptor members present in T. japonicus is highly similar to that in other arthropods, we failed to detect a receptor homologous to E78 in T. japonicus, which was unexpected. In Drosophila, E75 and E78, orthologous of Rev-Erb, are derived from duplication events that did not occur in the vertebrate lineage [33]. Unlike other duplicated pairs of genes such as HR51 and HR83, HR39 and FTZ-F1, and DSF and TLL, the pair E75 and E78 was detected in C. elegans, indicating that this duplication event occurred before nematodes branched off from the arthropod lineage [33]. In addition, the E75-E78 pair was also identified in crustaceans [17]. Based on these observations, we expected that T. japonicus as a crustacean would have E75 and E78. However, we were not able to find E78, which suggests loss of this gene in this copepod lineage, because E78 is also absent from the genomes of T. californicus and P. nana. Further phylogenetic studies of E75 and E78 in more crustaceans and copepods species as well as other related organisms are required to resolve this issue.
Vertebrates do not have HR83 due to a gene loss event [42]. However, the HR83 gene has been identified in nematodes (NHR-239 in C. elegans and C. briggsae), an echinoderm (Strongylocentrotus purpuratus), and a hemichordate (Saccoglossus kowalevskii) [44]. In the red flour beetle T. castaneum, HR83 was the NR with the most divergent DBD and LBD domains, indicating decreased constraint on this gene in T. castaneum [4]. Particularly, the absence of an HR83 homologue in the genomes of D. pulex, T. japonicus, and other copepods (T. californicus and P. nana) suggests that the HR83 gene was present in the urbilaterian ancestor to the Arthropods but was subsequently selectively lost from the crustacean lineage. Loss of NHR-239/HR83 appears to be unique to crustaceans among invertebrates.

Conclusions
To our knowledge, this is the first report of all members of the nuclear receptor superfamily in a copepod species. T. japonicus has 31 nuclear receptors in the genome, 12 of which are copepod-unique receptors. The overall pattern of nuclear receptor members in T. japonicus was highly similar to that in other arthropods, although there appears to have been copepod-specific expansion of NR1 subfamily members. The absence of E78 and HR83 from T. japonicus and two other copepods suggests the possibility that this is a copepod-specific gene loss.

Animal culture and maintenance
The copepod T. japonicus was maintained and reared in 0.2 μm-filtered seawater adjusted to 25°C with a photoperiod of 12 h:12 h light/dark and a salinity of 30 practical salinity units (psu). Copepods were fed with green algae (Chlorella sp.; approximately 6 × 10 4 cells/mL). Species identification was based on morphological characteristics and sequence identity of the universal barcode marker, the mitochondrial cytochrome oxidase I (COI) gene [45].

RNA extraction and cDNA synthesis
Approximately 300 adult copepods of both sexes were homogenized in three volumes of TRIZOL® reagent (Molecular Research Center, Inc., Cincinnati, OH, USA) with a tissue grinder and stored at -80°C until use. Total RNA was isolated from tissues according to the manufacturer's instructions. Genomic DNA was removed using DNase I (Sigma, St. Louis, Mo, USA). Quantity of total RNA was measured at 230, 260, and 280 nm with a spectrophotometer (Ultrospec 2100 pro, Amersham Bioscience, Freiburg, Germany). To check for genomic DNA contamination, we loaded total RNA on a 1% agarose gel that contained ethidium bromide (EtBr) and visualized the gel after electrophoresis using a UV transilluminator (Wealtec Corp., Sparks, NV, USA). To verify total RNA quality, we loaded total RNA in a 1% formaldehyde/agarose gel stained with EtBr and checked 18/28S ribosomal RNAs integrity and band ratio. Single-strand cDNA was synthesized from total RNA using an oligo(dT) 20 primer for reverse transcription (SuperScript™ III RT kit, Invitrogen, Carlsbad, CA, USA).

Sequence retrieval and RACE of nuclear receptors
To obtain partial sequences of nuclear receptor cDNAs, we searched the T. japonicus genomic DNA database and RNA-Seq database.  25.56 giga base pairs (Gb) and total coverage was approximately 116 X. In assembly, NGS_Cell (ver. 4.01 beta 59916; CLC Bio., Boston, MA, USA) software was employed and a total of 60,979 scaffolds were constructed (174,022,895 bp scaffold length; average 2,854 bp; 6,355 bp N50). RNA-seq was accomplished using entire developmental stages including nauplius, copepodid, and adult. Trinity software was employed for RNA-seq de novo assembly [46]. A total of 59,983 mRNAs were annotated (78,311,238 bp mRNA length; 2,139 bp N50).
Contigs coding for nuclear receptor proteins obtained in this study were subjected to a BLAST analysis against the GenBank non-redundant (NR; including all Gen-Bank, EMBL, DDBJ, and PDB sequence except EST, STS, GSS, or HTGS) amino acid sequence database to confirm sequence identities. All isolated genes were subjected to 5′-and 3′-Rapid Amplification of cDNA Ends (RACE) to obtain full-length transcripts according to the manufacturer's protocol (Invitrogen). Primers were designed after comparing the exon/intron boundaries with genomic DNA using GENRUNNER software (Hastings Software, Inc., NY, USA), and confirmed with the Primer 3 program (Whitehead Institute for Biomedical Research, Cambridge, MA, USA). A series of RACE reactions were performed with target primers under the following conditions: 94°C/4 min; 40 cycles of 98°C/25s, 55°C/30s, 72°C/ 60s; and 72°C/10 min. Final PCR products were excised from 1% agarose/TBE gels, cloned into pCR2.1 TA vectors (Invitrogen), and sequenced with an ABI PRISM 3700 DNA analyzer (Bionics Co., Seoul, South Korea). All T.
japonicus gene information has been registered in Gen-Bank, and accession numbers of each gene are provided in Table 1.

Annotation and phylogenetic analysis of nuclear receptors
Nomenclature of the nuclear receptors found in T. japonicus was determined based on sequence similarity and the results of phylogenetic analysis using the system recommended by the Nuclear Receptors Nomenclature Committee [47]. To investigate evolutionary relationships of T. japonicus NRs to other NRs, nuclear receptors identified in T. japonicus were subjected to phylogenetic analysis. We searched for NRs annotated in other species (Caenorhabditis elegans, Daphnia pulex, Drosophila melanogaster, Danio rerio, Gallus gallus, Homo sapiens, and Xenopus laevis) in NCBI (www.ncbi.nlm.nih.gov/projects/genome/ seq/BlastGen/BlastGen.cgi?taxid=7070) by performing BLAST searches with each potential T. japonicus NR hit. Combined DBD-LBD amino acid sequences from T. japonicus and other species were aligned using MEGA software (ver. 6.0) with default parameters [48]. To establish the best-fit substitution model for phylogenetic analysis, the model with the lowest Bayesian Information Criterion (BIC) [49] and Akaike Information Criterion (AICc) [50,51] scores was estimated using maximum likelihood (ML) analysis. According to the results of model test, WAG + G + I + F model was identified as the best-fit model and used in subsequent phylogenetic analyses. MrBayes (ver. 3.1.2) was used to reconstruct phylogenetic trees based on Bayesian inference [52]. The Markov chain Monte Carlo (MCMC) process was conducted with four chains and run for 5,000,000 generations. Sampling frequency was every 100 generations. After analysis, the first 5,000 trees were deleted as part of the burn-in process and a consensus tree of the remaining trees was constructed and visualized using Sea-View ver. 4.2.1. Nodal support is reported as Bayesian posterior probabilities (maximum = 1.00). In addition, we estimated the genetic distance of the T. japonicus NR1L subfamily from the other NR1 subfamilies using a Dayhoff matrix-based model [53]. Rate variation among sites was modeled with a gamma distribution (shape parameter = 4). The analysis involved 49 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 60 positions in the final dataset. Evolutionary analyses were conducted in MEGA software (ver. 6.0) [48]. Nomenclature of each T. japonicus NR was based on the name of the homologous D. pulex and/or D. melanogaster NR after comparison of the evolutionary distance of the T. japonicus NR to that of the other ingroup members. To more specify the evolutionary relationship of the NR1L members, additional phylogenetic analysis was performed using only NR1 members in different taxa based on the same method described above (Additional file 1: Table S5).

Availability of supporting data
Additional file 1. Supplementary tables and figures.

Ethics statement
The copepod T. japonicus was reared in accordance with the guidelines of the Animal Welfare Ethics Committee of Sungkyunkwan University.