Evolution of coding and non-coding genes in HOX clusters of a marsupial
© Yu et al.; licensee BioMed Central Ltd. 2012
Received: 30 November 2011
Accepted: 22 May 2012
Published: 18 June 2012
The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals.
Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOX A11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters.
This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
KeywordsMarsupial HOX cluster MicroRNAs Long non-coding RNAs
The origin, evolution, function and regulation of HOX genes are amongst the most intriguing questions in developmental biology and evolutionary genetics. Their highly conserved clustered arrangement on chromosomes, their spatio-temporal expression and their patterning results in each distinctive body plan during embryogenesis and organogenesis in bilaterian animals[1, 2]. HOX genes are expressed as early as the pre-somite stage of gastrulation in the posterior primitive streak of the epiblast, a region that gives rise mainly to the lateral plate and extraembryonic mesoderm in chicken and mouse embryos[3–5]. The dynamic expression of HOX genes in the ectoderm, mesoderm and endoderm during gastrulation suggests that HOX genes are key regulators of regional patterning along the antero-posterior (A-P) axis[2–4, 6]. HOX genes confer positional information for proper organ development and are expressed in ordered patterns that control the segmentation of the hindbrain and axial skeleton along the A-P axis, while mis-expression or mutation leads to the conversion of one structure into another, (homeotic transformation). Limb development and regeneration depends on patterning formation along three axes: A-P, dorsal ventral (D-V), and proximal distal (P-D) axes, where HOX A and HOX D, especially groups 9–13, are responsible for positional information along the A-P and P-D axes[8, 9]. De-regulation of the HOX network results in cancers including breast, bladder, prostate and kidney, as well as abnormal expression during proliferation, differentiation and apoptosis and signal transduction[1, 10].
In all vertebrates, HOX genes are comprised of two exons, in which exon 2 includes the highly conserved 180 bp of homeobox region, and a variable length of intron, from less than 200 bp to several kilobase pairs. The homeodomain encoded by a homeobox consists of 60 highly conserved amino acids and forms an N-terminal extended structure followed by three alpha helices. The homeodomain binds target DNA sequences at its N-terminal arm and the third helix from the minor and major groove of DNA, respectively. Orthologues of every HOX gene, including the homeodomain and flanking regions, are highly conserved among species. However, within species, the most conserved region between paralogues is restricted to the homeodomain. HOX genes are clustered on different chromosomes and are believed to have evolved from a single ancestral HOX gene by tandem duplications and sequence divergence[1, 11]. There are four HOX clusters, denoted A, B, C and D, produced by two successive whole genome duplication events followed by subsequent divergence[12, 13]. Paralogues within each cluster are designated 13 to 1 based on gene 5′-3′ transcribing orientation although there are only 11 paralogues at most found so far in vertebrates.
The low density of interspersed repeats in the human HOX clusters suggests that cis-regulatory elements are important in the tight control of HOX gene expression. Global enhancer sequences located outside the clusters regulate HOX D temporal co-linearity. Non-coding RNAs known to be involved in regulation of HOX gene expression[16, 17], include the highly conserved microRNAs, such as miR-196 and miR-10. The long non-coding RNAs HOTAIR[21, 22] and HOTAIRM1 are known only in the mouse and human.
The comparison of HOX genes between vertebrates and invertebrates has highlighted conserved features of HOX gene expression regulation and evolution. Comparisons of DNA sequences between evolutionarily distantly-related genomes are highly efficient ways to identify conserved (and novel) functional regions, especially non-coding RNAs, and to discover how they regulate HOX gene expression[24, 25]. However, some conserved functional features show lineage-specific distributions and will be missed if the taxa chosen are too distant in evolutionary terms. Similarly, if they are too close, differences can be missed. Marsupials fill the mammalian “gap” because they are a distinct lineage that diverged from eutherian mammals 130–160 Ma ago[26–29], but they are still mammals. There is a high ratio of conservation signal to random noise in comparisons between therian mammal (marsupial and eutherian) genomes, suggesting that there are localized regions under evolutionary constraint. The divergence time between these groups is sufficient for non-functional sequences to have diverged while important genes are sufficiently conserved to enable their clear identification. Comparative genomics between eutherians and marsupials is therefore invaluable for predicting new and novel mammalian-specific motifs participating in HOX gene expression and regulation during mammalian evolution.
In this study, we used the tammar wallaby (Macropus eugenii), a macropodid marsupial of the kangaroo family, as our model. We screened BAC clones and further characterized all 39 tammar HOX genes as well as genome mapping and deep sequencing. Comparative genomic analyses identified the known HOX coding genes and non-coding regulatory regions including regulatory elements and non-coding RNAs. Importantly, we uncovered a new potential microRNA in the tammar HOX cluster.
Sequencing and assembly
Annotation of HOX clusters
The abundance of repetitive DNA elements is extremely low in the core of tammar HOX clusters, in agreement with the previous findings in gnathostome HOX clusters. Utilizing RepeatMasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker), repeat elements including short interspersed repeat elements (SINEs), long interspersed repeat elements (LINEs), long terminal repeats (LTRs) and other DNA elements were investigated in each tammar HOX cluster (Additional file3). Strikingly, there were no Alu (short interspersed repeat element of about 300 bp, comprising 10.75% of the human genome), ERVL (long terminal repeats), TcMar-Tigger and satellite sequences found in any tammar HOX locus, resembling the human HOX clusters.
Tammar HOX gene expression in adult tissues
Some anterior HOX genes (HOX1 to −3) were expressed in the forebrain, midbrain and hindbrain in tammar, similar to the expression patterns of human HOXA genes, but very few HOX genes were expressed in hypothalamus, pituitary and pancreas. Interestingly, almost all HOX genes were expressed in cerebellum, suggesting that HOX genes continue to participate in coordinating motor activity and communication as they do during development[34, 35]. Anterior (1–3) and central (4–8) HOX genes of cluster A/B/D were expressed in the spleen and carry important roles in replenishing red blood cells and in activating the immune response. In the tammar gastrointestinal tract, weak expression was found in intestine while much stronger expression was observed in stomach and caecum, showing tissue-specific expression patterns. Anterior and central HOX genes of clusters A and B, but not C or D, were expressed in liver and heart. In tammar lung tissue, almost no posterior HOX genes were expressed. Skeletal muscle had broad expression of HOX genes (HOX1-11). HOX gene expression in reproductive tissues was similar to those in the developing tissues, displaying ongoing proliferation, differentiation, and degeneration of multiple cell types. HOX genes were strongly expressed in the mammary gland, kidney, adrenal, testis and ovary, but had a restricted expression in epididymis and uterus. Overall, HOX genes had tissue-specific expression patterns, maintaining high expression in some tissues, while in other tissues they were down-regulated or switched off.
Functional and conserved non-coding sequences in the kangaroo HOX clusters
Known long non-coding RNAs are conserved in the kangaroo HOX clusters
Long non-coding RNAs (lncRNAs) play critical roles in transcription regulation, epigenetic gene regulation and diseases. They are rapidly evolving genes, and are expected to be poorly conserved at the sequence level[37–39]. However, we found conserved orthologues of all three known mammalian lncRNAs—HOTAIRM1 HOXA11AS and HOTAIR (sequences provided in Additional file8)—by comparative genomic analysis and RT-PCR amplification.
The kangaroo HOX clusters encode conserved microRNAs
mVISTA plots showed numerous non-coding regions, possibly representing microRNAs, were highly conserved (Additional files4,5,6,7). We examined the presence of known microRNAs, miR-196a1, miR-196a2, miR-196b, miR-10a and miR-10b, previously described in the human, mouse and zebrafish HOX clusters. As expected, we found 5 known conserved miRNAs in tammar HOX clusters (summary in Figure 2 and the sequences provided in Additional file8, genomic sequence alignment referred to Additional files4,5,6,7). We examined tammar microRNA deep sequencing libraries from different tissues and cells to determine the expression profile of each of these miRNAs. We found that miR-10a and miR-10b were strongly expressed in the testis. They are also expressed in fibroblast cells of the tammar.
Comparative genomic analysis of the marsupial HOX clusters uncovered a new microRNA and confirmed the presence of numerous known mammalian RNAs. There was a strikingly high level of conservation of coding sequences between this member of the kangaroo family and that of eutherian mammals.
Marsupial HOX gene clusters are compact and uninterrupted by large repeat domains. In the tammar, the length of all clusters were remarkably similar to that found in human (tammar HOXA-D: 113 kb, 207 kb, 144 kb and 110 kb; human HOX A-D 112 kb, 205 kb, 137 kb and 112 kb retrieved from the UCSC genome browser GRCh37/hg19). Similar patterns are also found in frog, chicken and mouse (Additional files4,5,6,7), demonstrating that the HOX gene clusters are highly conserved and compact across vertebrate lineages. However, Amphioxus, which is viewed as an “archetypal” genus in the chordate lineage, carries a HOX cluster length of about 448 kb. In invertebrates, HOX clusters are often more than 1 Mb, as is found in the sea urchin. Thus the vertebrate HOX clusters are more compact than the ancient and invertebrate HOX clusters.
All 39 tammar HOX genes had conserved gene structures (Additional file11) and chromosomal arrangement (Figure 2), consistent with the theory that two rounds of genome duplications occurred after the vertebrate–invertebrate divergence but before bony fishes and tetrapods split[12, 13, 44]. In adults, HOX genes continue to be expressed and thereby retain developmental plasticity in certain tissues or maintain homeostasis. However, there has been much less work on gene expression in adult tissues compared to developing tissues[45, 46]. We showed that HOX gene expression in adult marsupial tissues was tissue-specific and differentially expressed (Figure 3). Interestingly, almost all HOX genes were expressed in the cerebellum, suggesting that HOX genes continue to participate in coordinating motor activity and communication in adults, as they do during development.
Using the tammar HOX genomic sequences as a reference for phylogenetic footprinting, we were able to identify a large number of conserved non-coding genomic sequences which may act as transcription factor binding sites in promoters, regulatory motifs involved in chromatin remodeling or non-coding RNAs that modulate gene expression post-transcriptionally[25, 47]. Long non-coding RNAs play diverse roles in biological processes but are thought to be under different evolutionary constraints and are expected to have low sequence conservation compared to protein-coding sequences, which has hampered the study of long non-coding RNA in vertebrates. We not only found these lncRNAs orthologues in the tammar HOX genome, but also confirmed that they were expressed in certain tissues. For example, human HOTAIRM1 is expressed specifically in myeloid cells to regulate HOXA1 and HOXA4 expression in NB4 cells (an acute promyelocytic leukaemia cell line). Tammar HOTAIRM1 was also expressed in bone marrow, suggesting it has a conserved role in myelopoiesis across all mammals. In addition, HOTAIRM1 appears to be restricted to mammals and so must have evolved during the mammalian radiation. A recently discovered long non-coding RNA, HOTAIR[21, 22], acts as a trans-regulator to regulate HOX D but not HOX C gene expression during limb development and participates in reprogramming chromatin states to promote cancer metastasis. Tammar HOTAIR was also found in the tammar HOX genomic sequence, and was expressed at the early head-fold stage of the tammar embryo at the time just before limb buds develop, suggesting that it may have a role in the regulation of limb development—especially important structures for the kangaroos. In addition, the 5′ flanking sequence of HOTAIR was conserved, suggesting that it has the same or similar transcriptional regulation mechanism (Figure 5 and Additional file6). Thus, contrary to expectation, mammalian lncRNAs do show a reasonable level of sequence conservation.
Micro-RNAs are highly conserved, in contrast to long non-coding RNAs, and play important roles in animal development by controlling translation or stability of mRNAs. They are normally 22 nucleotide RNA that binds to complementary sequences in the 3′UTR to repress gene activities. Using the tammar as a reference and searching the microRNA database we were able to identify four known HOX microRNAs (miR-196a miR-196b miR-10a and miR-10b), and most significantly, we uncovered one new potential microRNA, meu-miR-6313 in the tammar which was expressed in testis and fibroblasts. The precursor sequence was used to search the human, mouse, and frog genomes and was not present (Figure 9). We also searched the opossum and Tasmanian devil genome sequences using the precursor sequence plus of 1 kb flanking sequences. While the flanking sequences were conserved in these two other marsupial species, we did not find the sequence immediately around the precursor, suggesting that it is a recent insertion in tammar. In silico analysis as well in vitro and in vivo experiments have shown that the miRNAs miR-10 and miR-196 target several HOX genes, such as HOXA5/7/9, HOXB1/6/7/8, HOXC8, HOXD8, HOXA1/3/7, HOXB3 and HOXD10[18–20, 50, 51]. In this study, we also predicted targets of miRNAs, and found the targets of miR-10a miR-10b miR-414 and miR-466 in the HOX clusters (Additional file9). We also found numerous new targets whose microRNAs precursor genes were located outside the HOX clusters in the tammar genome (Additional file10). These novel microRNAs have a typical secondary hairpin structure and targets in the HOX clusters. These miRNAs may participate in HOX gene expression and regulation to control the kangaroo type body plan and hopping mode of locomotion. Thus, using the tammar HOX as the reference genome, the examination of the marsupial HOX gene clusters has uncovered new and known non-coding RNAs of mammals.
Annotation and comparative genomic analysis of tammar HOX genes demonstrated a high degree of evolutionary conservation. As expected, 39 HOX marsupial genes were mapped to four different chromosomal loci. The tammar HOX clusters had a low concentration of repetitive elements and were compact as in other vertebrate HOX clusters. The protein-coding regions and their UTRs also showed high conservation but there was a novel potentially functional miRNA meu-miR-6313 within a HOX cluster. Interestingly, the long-coding RNAs (HOTAIR, HOTAIRM1 and HOXA11AS) and microRNAs (miR-196a2, miR-196b, miR-10a and miR-10b) were highly conserved in this marsupial. These lncRNAs and miRNAs may control the HOX genes to influence phenotypic differences in the body plan, as they do in other mammals. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predates the marsupial-eutherian divergence up to 160 Ma ago.
Animals, tissues and cells
Tammar wallabies originating from Kangaroo Island, South Australia, were held in the University of Melbourne marsupial breeding colony in Melbourne, Victoria. All sampling techniques and collection of tissues conformed to Australian National Health and Medical Research Council (2004) guidelines and were approved by The University of Melbourne Animal Experimentation & Ethics Committees.
Tissues (forebrain, midbrain, hindbrain, cerebellum, hypothalamus, pituitary, pancreas, spleen, stomach, intestine, caecum, heart, liver, lung, muscle, kidney and adrenal) were collected from five adults. Bone marrow, mammary glands, uterus and ovary were collected from three adult females. Prostates, epididymides and testes were collected from two adult males. HOX gene expression was examined using all tissues listed above except bone marrow. Bone marrow, whole embryos (day 20 of the 26.5 day gestation, n = 2) and endometrium (collected from three additional pregnant females) were used to examine lncRNA expression. All tissues were collected under RNase-free conditions. All collected tissues for molecular analysis were snap frozen in liquid nitrogen and stored at −80°C until use.
Tammar primary cells were prepared from a day 10 post partum pouch young testis. Briefly, the primary cells were cultivated in 50% DMEM (containing 10% fetal bovine serum) (Invitrogen, Melbourne, Australia) and 50% AminoMax (Gibco, Carlsbad, USA) containing 15% fetal calf serum.
Probe preparation and BAC library screening
The six frame translation of the tammar genome (assembly 1.0) was searched for homeobox domains using a profile hidden Markov model (Pfam accession PF00046.21) and the HMMer software (version 2.3.2). An E-value threshold of 10−4 was used. Predicted homeobox domain sequences of at least 80aa and related DNA were extracted from the tammar genome. The domain classes of these sequences were then classified using HOX Pred. At the same time, tammar HOX partial sequences were also obtained by searching the tammar trace archives with human exon 1 and exon 2 of 39 HOX genes using BLASTN. Gene specific primers were designed to amplify probes and to confirm identity of isolated BACs. All primers and their annealing temperatures as well as the product size are listed in Additional file12.
The tammar BAC library (Me_KBa) with average insert size of 166 kb was constructed by M. Luo at AGI (Me_KBa; Arizona Genomics Institute, Tucson, AZ, USA). Radioactively 32P-labelled PCR probes from 5′ and 3′ (HOX A to HOX D) were used to screen the BAC library. Resulting positive BACs for each HOX cluster were further confirmed with all corresponding HOX genes by PCR.
When screening the BAC library, at least two probes from the 5′ end and 3′ end were selected and 5 positive clones were identified: 205I5, 9G11, 168N24, 6P18 and 214D22. BAC clone 205I5 covered HOX A cluster genes (HOXA2 to HOXA13); BAC clone 9G11 covered the HOX B cluster (HOXB1 to HOXB9); BAC clone 168N24 covered the HOX B cluster (HOXB4 to HOXB13); BAC clone 6P18 contained all HOX C cluster genes and clone 214D22 covered the HOX D cluster (HOXD1 to HOXD12).
BAC DNA preparation, sequencing and assembly
Positive BAC bacteria were cultured overnight in LB medium containing 12 μg/ml chloramphenicol at 37°C. BAC DNA was extracted according to manufacturer’s instructions of Maxipreps DNA purification system (Promega, Sydney, Australia). The quality was assessed by gel electrophoresis in 0.8% agarose gel and NanoDrop ND-1000 Spectrophotometer (Wilmington, USA) with the ratio of A260/A280 at over 1.8. The amount of DNA was also measured by NanoDrop ND-1000 Spectrophotometer. BAC samples were sequenced with GS-FLX method at the Australian Genome Research Facility Ltd (AGRF, Brisbane, Australia).
The Roche 454 reads of the tammar were extracted and de novo assembled with the program CAP3. There are 202 contigs from BAC 205I5 in HOXA cluster, 85 contigs from 168N24 and 2613 contigs from 9G11 in HOXB cluster, 405 contigs from 6P18 in HOXC cluster and 89 contigs from 214D22 in HOXD cluster. The contigs were then aligned to the genomic sequence of human, tammar, opossum and platypus and any gaps between the new contigs from the BAC sequencing filled where sequence was available using the tammar genome sequence. Based on these genomic sequences, gene structures of all HOX genes and full HOX scaffolds were identified.
microRNA sequencing and in silico analysis
The recently published marsupial genome paper provided deep sequencing information and additional sequencing of the tammar microRNAs was performed on an Illumina GAII platform. Briefly, 40 μg Trizol extracted total RNA from tammar brain, liver, testis, and pouch young fibroblast cells grown in culture was electrophoresed on a 15% denaturing polyacrylamide gel with γ-[32P]-ATP end labeled 19-mer, 24-mer and 33-mer oligonucleotides. The bands corresponding to the miRNA fraction (19–24nt) were excised and ligated to an adenylated 3′ adapter (IDT, Inc.). The 3′ ligated RNA was electrophoresed on a 15% polyacrylamide gel and the bands corresponding to miRNA were excised. A 5′ ligation reaction and subsequent polyacrylamide gel purification followed by reverse transcription and PCR was performed in preparation for Illumina sequencing. Sequencing was performed on an Illumina GAII according to the manufacturer’s protocol.
miRNAs mapped to HOX genome were performed using Bowtie, allowing for at most 1 mismatch. Potential hairpin locations were first identified using the SRNALOOP program (http://arep.med.harvard.edu/miRNA/pgmlicense.html). They were further refined by manual inspection of the hairpin loop using an interactive instance of RNAfold program (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). Target prediction was done using the miRanda tool with default parameters. The novel microRNAs and the complete HOX genes were used as the query and target sequences, respectively.
Phylogenetic footprinting analyses
For interspecies DNA sequence comparison, tammar or human genomic sequence acted as a reference in four species (Human, Mouse, Tammar and Frog). Genomic sequences containing HOX A, HOX B, HOX C and HOX D clusters from Human (HOX A, chr7: 27098056–27210689; HOX B, chr17: 43960868–44165742; HOX C, chr12: 52605461–52742874; HOX D, chr2: 176656359–176768195; released in Feb 2009), Mouse (HOX A, chr6: 52104079–52216539; HOX B, chr11: 96024912–96229585; HOX C, chr15: 102757899–102892969; HOX D, chr2: 74497085–74613489; released in July 2007) and Frog (Xenopus tropicalis) (HOX A, scaffold_56: 1381000–1485000; HOX B, scaffold_334: 483000–620000; HOX C, scaffold_226: 269568–557892; HOX D, scaffold_163: 534804–660354; released in Aug. 2005) were retrieved from UCSC website (http://genome.ucsc.edu/).
Alignment of each HOX cluster from these species and tammar were performed using the LAGAN algorithm available on the mVISTA website with default parameters. The sequence from tammar was set as reference. The conserved tammar microRNAs were found in HOX genomic sequences by alignment of human/mouse microRNAs and further confirmed by deep sequencing and miRNA mapping. Tammar specific and new conserved microRNAs were identified by deep sequencing and miRNA mapping. Annotation of tammar long non-coding RNAs (lincRNAs) was performed according to human/mouse lincRNAs and confirmed by RT-PCR (primers in Additional file12).
RNAs were isolated from various tissues with TRI Reagent solution (Ambion, Scoresby, Australia) following the instructions. The quality and integrity of the RNA was assessed by gel electrophoresis in 1% agarose gel and the quantity was measured with NanoDrop ND-1000 Spectrophotometer (Wilmington, USA). Total RNA was digested and purified with DNA-free™ DNase (Ambion, Scoresby, Australia) to remove the contaminated genomic DNA prior to cDNA synthesis. To ensure that there was no genomic DNA contamination, the quality of RNAs was accessed by PCR with primers in one exon.
Approximately 2 μg of total RNA was used as template for reverse transcription with the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen, Melbourne, Australia) each reaction, using 1 μl of Oligo(dT)20 (50 μM). The quality of the first strand synthesis reaction was examined by PCR amplification of 18S standards.
About 20 ng of cDNA was used as a template for gene amplification with HOX genes specific primers (All sequences and annealing temperatures of primers are listed in Additional file12). PCR cycling conditions were: 35 cycles of 30 s, 95°C; 30 s, 47–62°C; 30 s, 72°C, in a 25 μl reaction with GoTaq Green Master Mix (Promega, Sydney, Australia) and 0.4 μM of both forward and reverse primers.
Comparative analysis of long non-coding RNAs
To perform comparative analyses of long non-coding RNAs, the following human genomic sequences were employed to outline sequence similarity and evo-lution in UCSC genome browser (http://genome.ucsc.edu/), HOX C12-HOTAIR-HOX C11 (Chr12: 54,348,714–54,370,201), HOX A1-HOTAIRM1-HOX A2 (chr7: 27,132,617–27,142,393) and HOX A13-HOX A11AS-HOX A11 (chr7: 27,220,777–27,239,725).
To search for the long non-coding RNAs, we retrieved the genomic sequences upstream to the nearest HOX gene and the corresponding downstream HOX gene in multiple eutherian mammals including chimpanzee, rhesus, mouse, rat, dog and elephant. The “Infernal” program (http://infernal.janelia.org/) was employed to blast each genome sequence with default parameters. Briefly, we used the secondary RNA structure of each exon in human lncRNAs to produce *.sto file. The secondary structure was predicted by online program RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). Cmsearch of “Infernal” program was then used to build a model from above secondary structure. Cmcalibrate of “Infernal” program was used to determine expectation value scores (E-values) for more sensitive searches and appropriate HMM filter score cutoffs for faster searches. Cmsearch was used to blast genomic sequences downloaded from NCBI or Ensembl. Using cmsearch, the lowest E-value with less than 0.01 has the priority.
A phylogenetic trees (Figures 678) of lincRNAs were constructed with MEGA 5.05 program. Briefly, MUSCLE protocol was used to align DNA sequence from single corresponding exon of predicted lincRNAs and known exons in humans. When constructing trees, a maximum likelihood strategy was employed with default parameters.
Based on HoxPred, homeodomain regions plus 20 amino acids adjacent to their upstream and downstream region are enough to classify Hox proteins in their groups of homology. We therefore chose these sequences to perform phylogenetic analysis of HOX genes (Figure 3). The sequences were aligned with MUSCLE, and a neighbor-joining tree was built with JTT distance and bootstrap analyses by using the SeaView package.
miRNA pipeline, miRNA and hairpin annotation
In order to computationally explore the cause and effects of miRNA in the HOX cluster of the tammar wallaby we followed a processes inspired by. Our miRNA has three main goals; separating valid sequences from noise and degradation product, identifying miRNA targets and genes. The targets and genes of our pipeline can then be compared against known features from miRBase (http://www.mirbase.org/) to determine which are confirmed and which are novel.
Each sequenced library is pre-processed to remove both 3′ and 5′ prime adapters and is then size selected to remove reads with less than 15 or more than 32 bases. Next the reads were aligned against the HOX cluster allowing for no mismatches, all valid alignments for each read were reported. The same reads are aligned against the genome, except one mismatch is allowed to compensate for the draft nature of the tammar genome.
To separate between valid miRNA and degraded product/sequencing noise it is required that each read must align at least once within an annotated miRNA gene or hairpin region. The construction of this annotation is detailed in a later section. The novel miRNA gene in HOX was identified by during the annotation stage detailed in a later section. The novel miRNA targets required to meet the following conditions: 1) a valid read aligned to the HOX cluster, 2) the location of the aligned read did not overlap with a previously annotated target.
The main requirement of the miRNA pipeline previously presented is that each read must have aligned within an annotated miRNA gene or hairpin at least once in the genome. The miRNA gene annotations generally come from an external gene annotation pipeline such as ENSEMBL (http://asia.ensembl.org/info/docs/genebuild/genome_annotation.html). Since the tammar genome is quite new, and highly fragmented this annotation is incomplete. To augment it, the hairpin sequences in miRBase are aligned to the genome using BLAST. The locations where the known hairpins align are considered equivalent to a miRNA gene.
To capture novel miRNA genes and hairpins, a simple pipeline of commonly available tools was created. Many published tools which identify new micro RNA genes use sequence and structure based alignments to find the best candidates. Unfortunately these tools do not scale well and are too slow to use on large genomes and large micro RNA datasets. Therefore we implemented a custom version of the strategy mentioned;above. First, all miRNAs were mapped to the genome. Next, each aligned sequence plus 100 bp flanking windows were put into SRNALOOP a hairpin prediction tool. Regions containing valid hairpins which did not overlap with a previously known miRNA gene or miRBase annotation were recorded.
miRNA target annotation
miRNA targets were annotated in a two-step process. First the valid miRNA were mapped against the HOX cluster allowing for no mismatches. Then the mature miRNA from miRBase release 18 were mapped against the HOX cluster, allowing for 1 mismatch. A target was considered confirmed if a valid miRNA from our pool co-located with a miRNA from miRBase. Otherwise the aligned sequence was considered to be novel.
Our definition of a valid miRNA required each sequence to be associated with at least one miRNA gene, or hairpin structure somewhere in the genome. All of the putative novel miRNA targets in HOX were associated with a hairpin [table XYZ]. However, none of these hairpins were found within an annotated gene. This could be due to a poor annotation, the draft status of the genome, or it is simply a false signal. Each of these will be further validated in future research.
Anthony T Papenfuss and Marilyn B Renfree are joint senior authors
Endogenous retrovirus L
HOX antisense intergenic RNA
HOX antisense intergenic RNA myeloid 1
- HOX A11AS:
HOX A11 antisense
Long interspersed repeat elements
Long non-coding RNAs
Long terminal repeats
Mammalian-wide interspersed repeats
Short interspersed repeat elements
We thank members of the tammar research team for assistance with the collection of the samples. We thank Ms. Bonnie Dopheide for assistance with the FISH experiment. We also thank Prof A Fujiyama and Dr. Y Kuroki for kindly providing the BAC library for screening the incomplete HOX A1 and HOX D13 genes. This study was supported by the Australian Research Council (ARC) Centre of Excellence in Kangaroo Genomics; an ARC Federation Fellowship to MBR, a National Health and Medical Research Council (NHMRC) R.D. Wright Fellowship to AJP and an NHMRC Career Development Fellowship to ATP.
- Papageorgiou S: HOX Gene Expression. 2007, Landes Bioscience, TexasGoogle Scholar
- Pourquié O: HOX Genes. 2009, Elsevier, San DiegoGoogle Scholar
- Iimura T, Pourquie O: Collinear activation of Hoxb genes during gastrulation is linked to mesoderm cell ingression. Nature. 2006, 442 (7102): 568-571. 10.1038/nature04838.View ArticlePubMedGoogle Scholar
- Forlani S, Lawson KA, Deschamps J: Acquisition of Hox codes during gastrulation and axial elongation in the mouse embryo. Development. 2003, 130 (16): 3807-3819. 10.1242/dev.00573.View ArticlePubMedGoogle Scholar
- Gaunt SJ, Strachan L: Temporal colinearity in expression of anterior Hox genes in developing chick embryos. Dev Dyn. 1996, 207 (3): 270-280. 10.1002/(SICI)1097-0177(199611)207:3<270::AID-AJA4>3.0.CO;2-E.View ArticlePubMedGoogle Scholar
- Deschamps J, van Nes J: Developmental regulation of the Hox genes during axial morphogenesis in the mouse. Development. 2005, 132 (13): 2931-2942. 10.1242/dev.01897.View ArticlePubMedGoogle Scholar
- Capdevila J, Izpisua Belmonte JC: Patterning mechanisms controlling vertebrate limb development. Annu Rev Cell Dev Biol. 2001, 17: 87-132. 10.1146/annurev.cellbio.17.1.87.View ArticlePubMedGoogle Scholar
- Ohgo S, Itoh A, Suzuki M, Satoh A, Yokoyama H, Tamura K: Analysis of hoxa11 and hoxa13 expression during patternless limb regeneration in Xenopus. Dev Biol. 2010, 338 (2): 148-157. 10.1016/j.ydbio.2009.11.026.View ArticlePubMedGoogle Scholar
- Kmita M, Tarchini B, Zakany J, Logan M, Tabin CJ, Duboule D: Early developmental arrest of mammalian limbs lacking HoxA/HoxD gene function. Nature. 2005, 435 (7045): 1113-1116. 10.1038/nature03648.View ArticlePubMedGoogle Scholar
- Shah N, Sukumar S: The Hox genes and their roles in oncogenesis. Nat Rev Cancer. 2010, 10 (5): 361-371. 10.1038/nrc2826.View ArticlePubMedGoogle Scholar
- Ferrier DE, Holland PW: Ancient origin of the Hox gene cluster. Nat Rev Genet. 2001, 2 (1): 33-38.View ArticlePubMedGoogle Scholar
- Panopoulou G, Poustka AJ: Timing and mechanism of ancient vertebrate genome duplications—the adventure of a hypothesis. Trends Genet. 2005, 21 (10): 559-567. 10.1016/j.tig.2005.08.004.View ArticlePubMedGoogle Scholar
- Van de Peer Y, Maere S, Meyer A: The evolutionary significance of ancient genome duplications. Nat Rev Genet. 2009, 10 (10): 725-732. 10.1038/nrg2600.View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.View ArticlePubMedGoogle Scholar
- Duboule D: The rise and fall of Hox gene clusters. Development. 2007, 134 (14): 2549-2560. 10.1242/dev.001065.View ArticlePubMedGoogle Scholar
- Lempradl A, Ringrose L: How does noncoding transcription regulate Hox genes?. BioEssays. 2008, 30 (2): 110-121. 10.1002/bies.20704.View ArticlePubMedGoogle Scholar
- Petruk S, Sedkov Y, Brock HW, Mazo A: A model for initiation of mosaic HOX gene expression patterns by non-coding RNAs in early embryos. RNA Biol. 2007, 4 (1): 1-6. 10.4161/rna.4.1.4300.View ArticlePubMedGoogle Scholar
- Yekta S, Tabin CJ, Bartel DP: MicroRNAs in the Hox network: an apparent link to posterior prevalence. Nat Rev Genet. 2008, 9 (10): 789-796. 10.1038/nrg2400.PubMed CentralView ArticlePubMedGoogle Scholar
- Yekta S, Shih IH, Bartel DP: MicroRNA-directed cleavage of HOXB8 mRNA. Science. 2004, 304 (5670): 594-596. 10.1126/science.1097434.View ArticlePubMedGoogle Scholar
- Woltering JM, Durston AJ: MiR-10 represses HoxB1a and HoxB3a in zebrafish. PLoS One. 2008, 3 (1): e1396-10.1371/journal.pone.0001396.PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL: Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010, 464 (7291): 1071-1076. 10.1038/nature08975.PubMed CentralView ArticlePubMedGoogle Scholar
- Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, Goodnough LH, Helms JA, Farnham PJ, Segal E: Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007, 129 (7): 1311-1323. 10.1016/j.cell.2007.05.022.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang X, Lian Z, Padden C, Gerstein MB, Rozowsky J, Snyder M, Gingeras TR, Kapranov P, Weissman SM, Newburger PE: A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster. Blood. 2009, 113 (11): 2526-2534. 10.1182/blood-2008-06-162164.PubMed CentralView ArticlePubMedGoogle Scholar
- Santini S, Boore JL, Meyer A: Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res. 2003, 13 (6A): 1111-1122.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuntz SG, Schwarz EM, Demodena JA, De Buysscher T, Trout D, Shizuya H, Sternberg PW, Wold BJ: Multigenome DNA sequence conservation identifies Hox cis-regulatory elements. Genome Res. 2008, 18 (12): 1955-1968. 10.1101/gr.085472.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature. 2007, 446 (7135): 507-512. 10.1038/nature05634.View ArticlePubMedGoogle Scholar
- Luo ZX: Transformation and diversification in early mammal evolution. Nature. 2007, 450 (7172): 1011-1019. 10.1038/nature06277.View ArticlePubMedGoogle Scholar
- Luo ZX, Yuan CX, Meng QJ, Ji Q: A Jurassic eutherian mammal and divergence of marsupials and placentals. Nature. 2011, 476 (7361): 442-445. 10.1038/nature10291.View ArticlePubMedGoogle Scholar
- Ji Q, Luo ZX, Zhang X, Yuan CX, Xu L: Evolutionary development of the middle ear in Mesozoic therian mammals. Science. 2009, 326 (5950): 278-281. 10.1126/science.1178501.View ArticlePubMedGoogle Scholar
- Wakefield MJ, Graves JA: The kangaroo genome. Leaps and bounds in comparative genomics. EMBO Rep. 2003, 4 (2): 143-147. 10.1038/sj.embor.embor739.PubMed CentralView ArticlePubMedGoogle Scholar
- Renfree MB, Papenfuss AT, Deakin JE, Lindsay J, Heider T, Belov K, Rens W, Waters PD, Pharo EA, Shaw G: Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 2011, 12 (8): R81-10.1186/gb-2011-12-8-r81.PubMed CentralView ArticlePubMedGoogle Scholar
- Amemiya CT, Powers TP, Prohaska SJ, Grimwood J, Schmutz J, Dickson M, Miyake T, Schoenborn MA, Myers RM, Ruddle FH: Complete HOX cluster characterization of the coelacanth provides further evidence for slow evolution of its genome. Proc Natl Acad Sci U S A. 2010, 107 (8): 3622-3627. 10.1073/pnas.0914312107.PubMed CentralView ArticlePubMedGoogle Scholar
- Sasaki YT, Sano M, Kin T, Asai K, Hirose T: Coordinated expression of ncRNAs and HOX mRNAs in the human HOXA locus. Biochem Biophys Res Commun. 2007, 357 (3): 724-730. 10.1016/j.bbrc.2007.03.200.View ArticlePubMedGoogle Scholar
- Narita Y, Rijli FM: Hox genes in neural patterning and circuit formation in the mouse hindbrain. Curr Top Dev Biol. 2009, 88: 139-167.View ArticlePubMedGoogle Scholar
- Tumpel S, Wiedemann LM, Krumlauf R: Hox genes and segmentation of the vertebrate hindbrain. Curr Top Dev Biol. 2009, 88: 103-137.View ArticlePubMedGoogle Scholar
- Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I: VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004, 32 (Web Server issue): W273-W279.PubMed CentralView ArticlePubMedGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458 (7235): 223-227. 10.1038/nature07672.PubMed CentralView ArticlePubMedGoogle Scholar
- Ponjavic J, Ponting CP, Lunter G: Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007, 17 (5): 556-565. 10.1101/gr.6036807.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang J, Zhang J, Zheng H, Li J, Liu D, Li H, Samudrala R, Yu J, Wong GK: Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature. 2004, 431 (7010): 757-10.1038/431757a.View ArticleGoogle Scholar
- Chew KY, Yu H, Pask AJ, Shaw G, Renfree MB: HOXA13 and HOXD13 expression during development of the syndactylous digits in the marsupial Macropus eugenii. BMC Dev Biol. 2012, 12 (1): 2-10.1186/1471-213X-12-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Schorderet P, Duboule D: Structural and functional differences in the long non-coding RNA hotair in mouse and human. PLoS Genet. 2011, 7 (5): e1002071-10.1371/journal.pgen.1002071.PubMed CentralView ArticlePubMedGoogle Scholar
- Amemiya CT, Prohaska SJ, Hill-Force A, Cook A, Wasserscheid J, Ferrier DE, Pascual-Anaya J, Garcia-Fernandez J, Dewar K, Stadler PF: The amphioxus Hox cluster: characterization, comparative genomics, and evolution. J Exp Zool B Mol Dev Evol. 2008, 310 (5): 465-477.View ArticlePubMedGoogle Scholar
- Cameron RA, Rowen L, Nesbitt R, Bloom S, Rast JP, Berney K, Arenas-Mena C, Martinez P, Lucas S, Richardson PM: Unusual gene order and organization of the sea urchin hox cluster. J Exp Zool B Mol Dev Evol. 2006, 306 (1): 45-58.View ArticlePubMedGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, Springer, HeidelbergView ArticleGoogle Scholar
- Neville SE, Baigent SM, Bicknell AB, Lowry PJ, Gladwell RT: Hox gene expression in adult tissues with particular reference to the adrenal gland. Endocr Res. 2002, 28 (4): 669-673. 10.1081/ERC-120016984.View ArticlePubMedGoogle Scholar
- Takahashi Y, Hamada J, Murakawa K, Takada M, Tada M, Nogami I, Hayashi N, Nakamori S, Monden M, Miyamoto M: Expression profiles of 39 HOX genes in normal human adult organs and anaplastic thyroid cancer cell lines by quantitative real-time RT-PCR system. Exp Cell Res. 2004, 293 (1): 144-153. 10.1016/j.yexcr.2003.09.024.View ArticlePubMedGoogle Scholar
- Prohaska SJ, Fried C, Flamm C, Wagner GP, Stadler PF: Surveying phylogenetic footprints in large gene clusters: applications to Hox cluster duplications. Mol Phylogenet Evol. 2004, 31 (2): 581-604. 10.1016/j.ympev.2003.08.009.View ArticlePubMedGoogle Scholar
- Stefani G, Slack FJ: Small non-coding RNAs in animal development. Nat Rev. 2008, 9 (3): 219-230. 10.1038/nrm2347.View ArticleGoogle Scholar
- Bartel DP: MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136 (2): 215-233. 10.1016/j.cell.2009.01.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Braig S, Mueller DW, Rothhammer T, Bosserhoff AK: MicroRNA miR-196a is a central regulator of HOX-B7 and BMP4 expression in malignant melanoma. Cell Mol Life Sci. 2010, 67 (20): 3535-3548. 10.1007/s00018-010-0394-7.View ArticlePubMedGoogle Scholar
- Hornstein E, Mansfield JH, Yekta S, Hu JK, Harfe BD, McManus MT, Baskerville S, Bartel DP, Tabin CJ: The microRNA miR-196 acts upstream of Hoxb8 and Shh in limb development. Nature. 2005, 438 (7068): 671-674. 10.1038/nature04138.View ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.View ArticlePubMedGoogle Scholar
- Thomas-Chollier M, Leyns L, Ledent V: HoxPred: automated classification of Hox proteins using combinations of generalised profiles. BMC Bioinforma. 2007, 8: 247-10.1186/1471-2105-8-247.View ArticleGoogle Scholar
- Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.PubMed CentralView ArticlePubMedGoogle Scholar
- John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human microRNA targets. PLoS Biol. 2004, 2 (11): e363-10.1371/journal.pbio.0020363.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomas-Chollier M, Ledent V, Leyns L, Vervoort M: A non-tree-based comprehensive study of metazoan Hox and ParaHox genes prompts new insights into their origin and evolution. BMC Evol Biol. 2010, 10 (1): 73-10.1186/1471-2148-10-73.PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Gouy M, Guindon S, Gascuel O: SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010, 27: 221-224. 10.1093/molbev/msp259.View ArticlePubMedGoogle Scholar
- Buermans HP, Ariyurek Y, van Ommen G, den Dunnen JT, t Hoen PA: New methods for next generation sequencing based microRNA expression profiling. BMC Genomics. 2010, 11: 716-10.1186/1471-2164-11-716.PubMed CentralView ArticlePubMedGoogle Scholar
- Kozomara A, Griffiths-Jones S: miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011, 39 (Database issue): D152-D157.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y: MicroRNA identification based on sequence and structure alignment. Bioinformatics. 2005, 21 (18): 3610-3614. 10.1093/bioinformatics/bti562.View ArticlePubMedGoogle Scholar
- Grad Y, Aach J, Hayes GD, Reinhart BJ, Church GM, Ruvkun G, Kim J: Computational and experimental identification of C. elegans microRNAs. Mol Cell. 2003, 11 (5): 1253-1263. 10.1016/S1097-2765(03)00153-9.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.