Strain-specific copy number variation in the intelectin locus on the 129 mouse chromosome 1
© Lu et al; licensee BioMed Central Ltd. 2011
Received: 28 October 2010
Accepted: 16 February 2011
Published: 16 February 2011
C57BL/6J mice possess a single intelectin (Itln) gene on chromosome 1. The function of intelectins is not well understood, but roles have been postulated in insulin sensitivity, bacterial recognition, intestinal lactoferrin uptake and response to parasites and allergens. In contrast to C57BL/6J mice, there is evidence for expansion of the Itln locus in other strains and at least one additional mouse Itln gene product has been described. The aim of this study was to sequence and characterise the Itln locus in the 129S7 strain, to determine the nature of the chromosomal expansion and to inform possible future gene deletion strategies.
Six 129S7 BAC clones were sequenced and assembled to generate 600 kbp of chromosomal sequence, including the entire Itln locus of approximately 500 kbp. The locus contained six distinct Itln genes, two CD244 genes and several Itln- and CD244-related pseudogenes. It was approximately 433 kbp larger than the corresponding C57BL/6J locus. The expansion of the Itln locus appears to have occurred through multiple duplications of a segment consisting of a full-length Itln gene, a CD244 (pseudo)gene and an Itln pseudogene fragment. Strong evidence for tissue-specific distribution of Itln variants was found, indicating that Itln duplication contributes more than a simple gene dosage effect.
We have characterised the Itln locus in 129S7 mice to reveal six Itln genes with distinct sequence and expression characteristics. Since C57BL/6J mice possess only a single Itln gene, this is likely to contribute to functional differences between C57BL/6J and other mouse strains.
Intelectins are glycoproteins with an approximate subunit size of 37 kDa, that have been described in mammals, fish and amphibians. The genome sequence of the sea squirt Ciona intestinalis shows the presence of closely-related genes, indicating that intelectins arose early in chordate evolution. In mammals, intelectins have so far been described in humans [2–4], mice [5, 6], sheep , cattle  and pigs , although interestingly, the dog genome  apparently lacks any intelectin genes.
The three-dimensional structures of intelectins have not yet been elucidated. Much of the predicted protein sequence is highly conserved across species, including eight cysteine residues and a fibrinogen-like domain . There has been significant interest in the functions of intelectins, which appear to be many and varied. A Ca2+-dependent galactose-specific lectin activity was first detected in Xenopus intelectins, such as XL35 , which subsequently led to the naming of the family as "intelectins" (intestinal lectins) following the detection of a closely-related gene in mouse small intestinal Paneth cells by Komiya et al. . Due to the cellular localisation of expression of this gene, currently denoted Itln1 or Itlna, a role in innate immunity was postulated.
Human and mouse intelectins were shown to bind lactoferrin [3, 13], leading to the alternative nomenclature of "intestinal lactoferrin receptor" (Lfr). Additionally, transcripts identical to human intelectin-1 (Itln1) were detected in omental fat tissue, and termed omentin by Yang et al. . These authors and others  have investigated the metabolic significance of Itln1, which is present in blood plasma and is thought to function as an adipocytokine. Circulating levels are inversely correlated to body mass index  and to occurrence of type-1 diabetes mellitus . A single nucleotide polymorphism in Itln1 has also been implicated in susceptibility to Crohn's disease , and a closely-linked coding SNP was also found to be associated with incidence of asthma .
Intelectin expression in the gut and lung mucosa is known to be highly up-regulated in the immune response to parasitic infections and Th2-cytokine dominated allergic responses [19–22]. Whilst Itln1 is expressed in Paneth cells at a high level in normal mouse small intestine, infection of BALB/c mice with the small intestinal dwelling helminth Trichinella spiralis, induced de novo expression of an additional gene termed intelectin-2 (Itln2) in goblet cells, which was secreted into the mucus layer . This gene, alternatively termed Itlnb, is absent from the reference C57BL/6J genome, and it was suggested that lack of Itln2/Itlnb may be partially responsible for the delayed expulsion of T. spiralis in C57BL/6J mice compared to BALB/c .
In a subsequent genome-wide screen for gene copy number variation (CNV), Graubert et al.  highlighted the Itln locus on distal chromosome 1 as a hotspot of gene duplication in most mouse strains analysed. Gene duplication  represents an evolutionary mechanism whereby populations can adapt to increase the effective dosage of an advantageous gene in the face of a particular selection pressure, such as the burden of intestinal parasites. Once fixed, duplicated genes (paralogs) can evolve distinct functions. In the context of responses to parasites, a notable example of gene duplication in the mouse is at the mast cell proteinase locus on chromosome 14 . Mast cell proteinases contribute anti-parasite effector mechanisms, and the gene Mcpt1 has been shown to enhance expulsion of T. spiralis.
In this study, we aimed to sequence the Itln locus in a non-C57BL/6J strain, in order to characterise the nature and extent of duplication of mouse Itln genes. We focussed on the 129S7 mouse, as a BAC library was available for this strain , which is commonly used in gene deletion studies, and this information would enable the design of future specific Itln knockout strategies.
Sequencing of 129S7 Itln locus
Analysis of the 129S7 Itln locus
In addition to the flanking genes (Refbp2, Ly9 and Slamf7), the ~500 kbp Itln locus has been predicted to be made up of 5 full-length Itln variants (Itln1-2,4-6), one truncated Itln (Itln3), 6 pseudo-Itln, one CD244 variant CD244_SE, one full-length CD244 and 5 pseudo-CD244 (Figure 1 & Additional file 2). The 6 Itln genes can be further classified into two categories based on their sizes of either ~11 or ~17 kbp; with Itln1 and Itln6 having additional LINE/L1 retrotransposable elements inserted into their respective intron 7. In fact, variations also exist in the organisation of the pseduogenes on each of the duplicated segments; with some having different exons removed.
Although all 8 exons of the Itln genes can be mapped onto the 6 variants and their predicted mRNA sequences are highly conserved (94 to 97% sequence identity), they are predicted to be expressed and probably regulated differently. Both Itln3 and Itln4 transcripts are found to contain an early stop codon. While this is most likely to result in a truncated Itln3, the presence of potential ribosomal -1 frameshift sites identified further upstream of the stop codon at positions 175 and 177 is predicted to allow the translation of the full-length Itln4 protein to proceed. Sequence conservation among the 5 predicted full-length Itln proteins ranges from 91 to 96%. Not surprisingly, similar conservation is also observed when comparing the variants with the Itln1 [GenBank:NP_034714] of C57BL/6J; with the Itln1 from both strains being identical to each other on the protein level. However, higher degrees of variation are found among the 5 Itln variants and the predicted Celera homologues; with the protein sequence identity ranges from 71 to 100% (Additional file 1). Two of Celera's Itln variants may also contain sites for ribosomal -1 frameshifting. The Itln1 and Itln2 genes of 129S7 are found to be identical to Celera's Itlna [GenBank:NP_034714] and Itlnb [GenBank:NP_001007553] respectively. The observed variation in the remaining Itln variants is probably the result of a mixed Itln population derived from the 5 mouse strains.
CD244_SE was initially predicted to be a CD244 pseudogene due the lack of the terminal exon 9. However, a closer inspection has resulted in the identification of a potential internal splicing site in the predicted exon 3. Comparison of all the probable coding sequences derived from this internal splicing site has led to the prediction of a protein which shares 65% identity with the soluble form of rat CD244 . This novel protein is made up of 4 exons which are highly similar (~90% conservation) to that of CD244; with part of exon 3 spliced to exon 5. On the other hand, it is also interesting to note that the CD244 protein sequence of 129S7 and Celera [GenBank:XP_001003781] are almost identical with 99% sequence identity but they share only 89% sequence identity with that of C57BL/6J [GenBank:NP_061199].
All the pseudogenes contain exons that share at least 70% sequence identity with their full-length counterparts. None of them contains the full set of exons and multiple stop codons are often spread across them.
Southern blot analysis
Itln CNVs in mouse
Relative to the single Itln locus in the C57BL/6J reference mouse, an approximately 62 kbp segment containing the Itln gene has apparently been duplicated several times in all the 16 strains investigated. Although the log2 ratio of the CAST/EiJ strain falls close to the threshold (Additional file 4, Figure S3a), the consistently higher than reference coverage across the Itln locus (Figure 5) suggests that the segment may actually be duplicated once. The log2ratio for the 129S1/SvImJ strain vs C57BL/6J varies between 1.8 and 2.7 over the Itln locus (Additional file 4, Figure S3a), well above the threshold for calling gene amplication, and suggesting 4 to 6-fold gain of the locus compared to C57BL/6J. Similar gain is also observed in the other strains (result not shown).
Evidence for expression of mouse Itln variants
Itln variants identified in various mouse tissues.
Itln variant identified
Detection of putative Nkx3.1 transcription factor binding sites in Itln promoter regions
In silico prediction of Nkx3.1 binding sites in 129S7 Itln promoters.
However, the "TAAGTG" motif is only conserved between the Itln1 promoter of C57BL/6J and 129S7. A single bp variation results in a "TAAATG" binding motif on the promoters of Itln2 and Itln6 which may in turn affect the binding affinity for Nkx3.1.
We present here the sequence of the Itln locus on mouse chromosome 1 in a non-C57BL/6J mouse strain (129S7). Multiple duplications of Itln and CD244 genes/pseudogenes have resulted in the expansion of the locus in comparison to that found in the reference C57BL/6J mouse. Nearly half of this expanded Itln locus comprises repeat sequence elements. It is likely that the presence of flanking ERV and LINE elements was involved in the duplications and inversion of the Itln locus; and that the events may be relatively recent and active [29, 39].
The present data allows us to unequivocally define the genetic architecture of the Itln locus in the 129S7 genome. It bears considerable similarity to that obtained in the Celera mouse genome but with reduced homology over the coding regions. This is most likely due to the Celera sequence being a composite of 5 different strains and the difficulty in resolving repetitive sequences; and hence not necessarily representative of any one strain. It has been estimated that as much as 57% of highly identical segmental duplications in the mouse genome were potentially misassembled and that segmental duplications/CNVs make up 1.7-2.0% of the mouse genome . We have overcome this pitfall by a combined strategy of tiling path sequencing of BAC clones and Southern verification.
Furthermore, our Southern blot data suggested that other non-C57BL/6J strains also have a similar structure. In support of this, not only have a number of previous genome-wide studies on the structural variations of the mouse genome identified this locus as a hotspot of recombination and CNV, the ENSEMBL annotated Itln protein family ENSFM00250000003313 also reports variations in the copy number of the Itln gene in different animal species. The majority of the 17 mouse strains show evidence of expansion at this locus too, raising the question as to whether the Itln locus has been deleted in the C57BL/6J and related strains, or alternatively expanded in a common ancestor to the 129S7 and other non-C57BL/6J strains.
It was not known whether CNV at the mouse Itln locus confers a simple dosage effect, or alternatively, whether each variant plays a different functional role. We found that in contrast to the C57BL/6J mouse strain, which expresses a single Itln1 gene , 129S2 and 129P2 mice exhibited site-specific expression of at least three different Itln variants along the gastrointestinal tract. Such differential tissue expression of Itln variants has also been observed in the channel catfish . This highly selective expression pattern and the minor amino acid sequence differences between the Itln variants suggest that these duplicated genes may be of functional significance at the different sites of expression. It has been suggested that intelectins play a role in modifying mucus properties, and therefore site-selective expression of variants may be important in the same way that mucosal trefoil factor variant expression tends to be co-ordinated with specific mucin types in different regions of the gastrointestinal tract . Similarly, sheep intelectin variants have been found to exhibit site-specific expression differences between airways, gastric stomach (abomasum) and small intestine . However, the consequences to C57BL/6J mice, if any, of lack of expression of Itln2 and Itln6 in the GI tract, remain to be established.
It has been shown that a functional Nkx3.1 transcription factor binding site is located in the mouse Itln1 promoter . Nkx3.1 is highly expressed in the prostate, where it is believed to function as a tumour repressor gene , and regulates Itln1 expression in prostate epithelial cells . Haploinsufficiency in Nkx3.1, where only a single functional allele of the gene is present, was found to result in reduced expression of a range of dosage-sensitive genes in the mouse prostate, including Itln1, expression of which was essentially lost in Nkx3.1+/- mice . Importantly, Itln1 expression was itself shown to suppress prostate cell growth , and thus Itln1 appears to be an effector of prostate cancer repression. It was of interest therefore to determine the occurrence of Nkx3.1 binding sites in all six Itln genes described here. The Pr5 region which was found to contain the Nkx3.1 binding sequence TAAGTG  was present in the promoter regions of four of the six Itln genes but the binding region was mutated to TAAATG in all but Itln1, suggesting that although Itln1 probably remains under the transcriptional control of Nkx3.1 in 129S7 mice, the same is not necessarily true for the other Itln variants.
Importantly, an adipocytokine  role has been ascribed to human Itln1 (omentin-1), which can be detected in blood plasma and serum. Estimates for normal human serum Itln1 concentration vary from 20 ng/ml  to over 100 ng/ml . Recombinant Itln1 has been shown to stimulate insulin-dependent glucose uptake by adipocytes in vitro, and its expression in visceral fat and in the circulation is negatively correlated with body mass index and waist circumference . Weight loss and aerobic exercise were associated with significant increases in serum Itln1 levels  and a corresponding reduction in cardiometabolic risk factors.
With reference to the comparative metabolic significance of Itln expression in mice, Orozco et al.  investigated the influence of CNV on metabolic traits using a combined genomic - metabolomic approach in the C57BL/6J (B6) and C3H/HeJ (C3H) strains, and in B6xC3 H crosses. Hotspots of copy number variation in chromosomes 1, 4 and 17 were associated with metabolic traits, and specifically, CNV in Itln1 (Itlna) was linked with weight, triglycerides, adiposity, glucose and insulin level . Itln mRNA expression level, as detected by Agilent microarray analysis , was significantly elevated in the high copy number genotype (C3H), when assayed in adipose, brain and liver tissue. The detailed characterisation of the Itln locus described here provides new specific candidates for further investigation of the role of Itln genes in mouse models of metabolic disorders.
In addition to Itln, the CD244 gene sequence is also amplified within the 129S7 locus. Specifically, this study has identified a putative CD244 variant, analogous to the secreted CD244 variant previously described in the rat . However, we found no evidence of expression of the corresponding transcript by RT-PCR in mouse small intestine (data not shown), so it is not clear whether it is actively transcribed in this mouse strain. CD244 (also known as 2B4) is a member of the SLAM family of cell surface receptors, which are present as a gene cluster adjacent to the Itln locus on mouse and human chromosome 1. A range of lymphocytes, including natural killer (NK) cells and CD8+ T-cell subsets have been shown to express CD244, which is the high affinity counter-receptor for CD48 (for a review, see ). While engagement of human NK cell CD244 with CD48 results in enhanced killing of CD48-expressing cells , the situation appears more complex in mice, where both activating and inhibitory roles have been demonstrated . It is known that C57BL/6J mice possess a single CD244 gene, expressed as long and short splice variants, which are distinct from CD244 cDNAs amplified from other mouse strains . The potential expression of a soluble CD244 variant suggested by this study, may introduce an additional level of regulation of lymphocyte interactions in non- C57BL/6J -related strains, and may therefore contribute to immunological differences between C57BL/6J and other strains.
To conclude, we have determined the sequence for essentially the complete Itln locus in the 129S7 mouse, as a prototype for non-C57BL/6J mouse strains. This has elucidated the nature of the copy number variation occurring at this locus, arising from tandem duplication of Itln and CD244 genes. Individual Itln genes showed strong tissue expression specificity while most duplicated CD244 genes were non-functional.
Identification and sequencing of Itln-containing BAC clones
The Itln locus of the 129/Sv mouse was characterised by sequencing BAC clones derived from the AB2.2 ES cell of the 129S7/SvEvBrd-Hprtb-m2 substrain . Since the reference C57BL/6J mouse genome contains only a single copy of the Itln gene (Additional file 2), two BAC clones flanking the gene were first identified from the BAC end sequences aligned to the ENSEMBL Mouse Assembly (ENSEMBL release 46, NCBI m36) . A tiling path of candidate BAC clones moving towards each other from the two ends of the locus (Figure 1) was then built by selecting BAC clones with end sequences that matched to the sequenced ones. Two iterations of blast search were done to identify these matching clones; an initial identity cut-off of 90% was applied, followed by quality trimming of the filtered sequences and a final round of blast with 98% identity cut-off. PCRs to check the presence of Itln were also performed to further ensure the correct candidates were picked. A total of six overlapping BAC clones (bMQ411i17, bMQ453f04, bMQ285e14, bMQ_239m09, bMQ312m04 and bMQ302g15) spanning the entire Itln locus were purchased from Geneservice (UK). BAC clones were grown in LB broth containing chloramphenicol (12.5 μg/ml) and DNA was purified using a proprietary kit (NucleoBond® BAC 100, Macherey-Nagel). Four shortgun libraries with insert size of about 2 kbp were constructed from the four respective flanking clones (bMQ411i17, bMQ453f04, bMQ312m04 and bMQ302g15) and sequenced by GATC Biotech (Konstanz, Germany) using the ABI Big Dye Terminator Mix v3.0 in a ABI 3730 sequencing machine. Gap closures for these four clones were achieved by primer walking. 36 bp paired-end next-generation sequencing using the Illumina Genome Analyzer II was carried out by ARK-Genomics (Roslin, UK) on the two remaining middle clones (bMQ_239m09 and bMQ312m04).
Sequences from each of the four shortgun libraries were assembled using Phred (v 0.020425.c)/Phrap (v1.080812)/Consed (v15.0)  into their respective scaffolds. Parameters for the assembly were set to high stringency: vector scanning, 32bp trimming of all 5' ends, minimum Phred score of 20, minimum length of matching word increased to 30, and level of contigs merging stringency set to the highest. Contaminating contigs from E. coli were removed by blasting them against the NCBI's E. coli genomes . All assembled contigs were also manually inspected to correct for errors due to duplicated segments and repeats; with special attention paid to regions having higher than 97% cross_match (v 1.080812)  identity. They were ordered into a single scaffold for each of the libraries with the aid of the forward/reverse pairing data and locations of the predicted coding sequences.
A strategy involving three iterations of mapping against reference sequences, de novo sequence assembly and a final scaffold construction was employed to assemble the remaining two BAC clones. The 4.8 million 36 bp Illumina paired-end reads were adaptor- and quality-trimmed to 32 bp with the fastx toolkit (v0.0.11)  to ensure only reads with phred scores higher than 15 were kept. These processed reads were first mapped against the contaminating E. coli str. K-12 substr. DH10B [GenBank:NC_010473] reference genome and pBACe3.6 [GenBank:U80929] cloning vector using MAQ (v0.7.1) . The 2.4 million clean unmapped reads next underwent a second round of mapping, again with MAQ, to remove sequences overlapping with the known flanking upstream BAC clones. Reads which were unmapped, containing indels or more than one mismatch and mapping quality score less than 20 amounted to about 1.2 million. They were finally MAQ mapped, allowing 2 mismatches, against the corresponding Itln locus of Celera's mouse assembly. Again the unmapped reads were extracted and input into the de nov o assembly program, Velvet (v0.7.54)  to obtain a set of contigs that should cover the gaps in the MAQ mapped consensus sequence. In addition, de novo assembly using Velvet on the 2.4 million reads extracted from the first mapping exercise was also done to facilitate the resolution of errors in the MAQ consensus sequence. The Velvet optimiser script was used and no scaffolding was done. Contigs with exceptionally high or low k-mer coverage values were also discarded. The scaffold for these two BAC clones was constructed by merging together the final MAQ consensus sequence and the two sets of Velvet contigs using the SeqMan module of Lasergene (v8.1). Merging was only allowed where end sequences overlapped for at least 15 bp with 99% identity. Where necessary manual correction was always performed. The final single scaffold of the six BAC clones was also assembled with the SeqMan.
Spidey (v1.40)  and Blast2seq (v2.2.20)  were used initially to search and predict the gene organisation of the 6 known genes, namely Refbp2, Itln1, Itln2, CD244, Ly9 and Slamf7, on the new 129S7 Itln locus. The reference mRNA sequences and the intron/exon boundaries used were obtained from both NCBI Refseq and ENSEMBL gene transcripts (Additional file 2). Splicing junctions were manually corrected to reflect the exact acceptor/donor sites. Pseudogenes were only annotated when their exons share at least 70% identity to the coding sequences. In addition, the exon-exon junctions of the predicted transcripts were predicted with the RNASPL program of the Softberry web server  to check for alternative splicing. Potential frame-shifting in the transcripts were checked with the web tool KnotInFrame . Repeat elements on the locus were identified using the program RepeatMasker (v3.2.7) .
Dotplot and blastz analyses between the Itln locus of the 129S7 strain and itself and that of the Celera mouse assembly were carried out using dotter (v3.1)  and zPicture  to gain insights into the structural organisation of the locus. Intelectins of other animals were obtained from the ENSEMBL protein family ENSFM00250000003313. ClustalW (v1.83)  was used to align the mRNA and the translated protein sequences of the different intelectins while MUSCLE  was the program of choice for the multiple sequence alignment of the genomic and the intronic sequences. For the phylogenetic analysis, the alignment of intron 5 sequences of the full length and pseudo-Itln genes from 129S7, C57BL/6J and rat was manually adjusted before being analysed with the MEGA (v5.0 beta) software suite . The tree was constructed using the neighbour-joining methodology with the branch distances computed based on the Kimura-2-parameter model. All ambiguous positions were removed for each sequence pair. The final consensus tree was inferred from 1000 bootstrap replicates. The presence of the Itln CNV in 17 other mouse strains (129P2/OlaHsd, 129S1/SvImJ, 129S5/SvEvBrd, A/J, AKR/J, BALB/cJ, C3H/HeJ, C57BL/6NJ, CAST/EiJ, CBA/J, DBA/2J, LP/J, NOD/ShiLtJ, NZO/HiLtJ, PWK/PhJ, Spret/EiJ and WSB/EiJ) was detected by using CNV-seq  with the threshold of the log2 ratio set to 0.6. Furthermore, the relatively high sequencing coverage, averaging 20.1 fold, of the 17 genomes allows the use of a small sliding window size of about 4 kb and a p-value of 10-5 to increase the CNV resolution (results not shown). The zoom-in coverage plots of the identified Itln CNV were drawn by the plotrix library of R . Sequencing data of these mice were obtained from the Wellcome Trust Sanger Institute ftp://ftp.sanger.ac.uk/pub/mouse_genomes/.
Nkx3.1 transcription factor binding sites
The potential binding sites for the Nkx3.1 transcription factor on the 5 kbp (upstream of the transcription start site) promoters of the six 129S7 Itln variants and that of the C57BL/6J were predicted using three different web tools, namely JASPAR scan of individual promoters applying 75% relative profile score threshold ; zPicture pairwise alignment between C57BL/6J and 129S7 promoters followed by rVista  search for conserved TAA[G/A]T[A/G][A/C/T] binding sites; and MEME motif discovery . Results from the three predictions were next compared to identify conserved binding sites that fall on the evolutionary conserved regions conserved between C57BL/6J and 129S7. Where no such region exists between C57BL/6J and 129S7, sites with the highest similarity to the C57BL/6J's TAAGTG motif were picked.
Two probes (Additional file 3) were designed to sit on the 5' and 3' end of the Itln genes. The probe sequences were selected based on the conserved regions of the 6 Itln variants. There were blasted against the reference mouse genome to make sure that they do not hybridise to other parts of the genome. Southern blots were done according to the standard protocols of Southern . Briefly, genomic DNA from 7 mouse strains (C57BL/6J, 129P2, 129S1/SvImJ, A/J, DBA/2J, 129X1/SvJ and 129S7) were digested over night at 37°C with either Nco I or Stu I for hybridising with either the 5' or 3' probe respectively. DNAs were then separated on 0.8% agarose gel overnight at 20 V before being transferred to the nylon membrane for hybridisation with the radioactive probes.
Evidence for expression of Itln variants
Samples of the following tissues were collected from healthy uninfected mice of the 129S2 and 129P2 strains: trachea, stomach, duodenum, jejunum, ileum, caecum and colon. The samples were collected into RNAlater and subsequently RNA was extracted, reverse transcribed and then amplified by PCR using common primers for all six Itln variants (ITLN_all_F: TCAGCTAGCAACTCTCAGCTCCT; ITLN_all_R ACACTAGCCACCAGGGTCCA; 35 cycles, annealing temp: 57°C). PCR products were sequenced and results analysed for evidence of a predominant Itln variant, or evidence of a mixture. Additionally, PCR products were digested with restriction enzymes Hha I, Mbo II and Hae III (see Additional file 5 for specificities). The PCR products from 129S2 colon and 129P2 trachea were subjected to TOPO cloning. Positive clones from 129S2 colon (48) and 129P2 trachea (4) were individually sequenced.
We thank Dr Richard Talbot, ARK-Genomics, Roslin, UK, for performing Illumina sequencing and Dr Andrew Law, Roslin Institute, for helpful discussion on the sequence analysis. This work was funded by the Biotechnology and Biological Sciences Research Council (Award ref: BB/E009069).
- Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, et al: The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science. 2002, 298: 2157-2167. 10.1126/science.1080049.PubMedView ArticleGoogle Scholar
- Tsuji S, Uehori J, Matsumoto M, Suzuki Y, Matsuhisa A, Toyoshima K, Seya T: Human intelectin is a novel soluble lectin that recognizes galactofuranose in carbohydrate chains of bacterial cell wall. Journal of Biological Chemistry. 2001, 276: 23456-23463. 10.1074/jbc.M103162200.PubMedView ArticleGoogle Scholar
- Suzuki YA, Shin K, Lonnerdal B: Molecular cloning and functional expression of a human intestinal lactoferrin receptor. Biochemistry. 2001, 40: 15771-15779. 10.1021/bi0155899.PubMedView ArticleGoogle Scholar
- Lee JK, Schnee J, Pang M, Wolfert M, Baum LG, Moremen KW, Pierce M: Human homologs of the Xenopus oocyte cortical granule lectin XL35. Glycobiology. 2001, 11: 65-73. 10.1093/glycob/11.1.65.PubMedView ArticleGoogle Scholar
- Komiya T, Tanigawa Y, Hirohashi S: Cloning of the novel gene intelectin, which is expressed in intestinal paneth cells in mice. Biochemical and Biophysical Research Communications. 1998, 251: 759-762. 10.1006/bbrc.1998.9513.PubMedView ArticleGoogle Scholar
- Pemberton A, Knight P, Gamble J, Colledge W, Lee J, Pierce M, Miller H: Innate BALB/c enteric epithelial responses to Trichinella spiralis: inducible expression of a novel goblet cell lectin, intelectin-2, and its natural deletion in C57BL/10 mice. Journal of Immunology. 2004, 173: 1894-1901.View ArticleGoogle Scholar
- French AT, Knight PA, Smith WD, Pate JA, Miller HR, Pemberton AD: Expression of three intelectins in sheep and response to a Th2 environment. Vet Res. 2009, 40: 53-10.1051/vetres/2009035.PubMed CentralPubMedView ArticleGoogle Scholar
- Blease SC, French AT, Knight PA, Gally DL, Pemberton AD: Bovine intelectins: cDNA sequencing and expression in the bovine intestine. Res Vet Sci. 2009, 86: 254-256. 10.1016/j.rvsc.2008.06.002.PubMedView ArticleGoogle Scholar
- Liao Y, Lopez V, Shafizadeh TB, Halsted CH, Lonnerdal B: Cloning of a pig homologue of the human lactoferrin receptor: Expression and localization during intestinal maturation in piglets. Comparative Biochemistry and Physiology Part A, Molecular & Integrative Physiology. 2007, 148: 584-590.View ArticleGoogle Scholar
- Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, et al: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005, 438: 803-819. 10.1038/nature04338.PubMedView ArticleGoogle Scholar
- Lee JK, Baum LG, Moremen K, Pierce M: The X-lectins: a new family with homology to the Xenopus laevis oocyte lectin XL-35. Glycoconj J. 2004, 21: 443-450. 10.1007/s10719-004-5534-6.PubMedView ArticleGoogle Scholar
- Lee JK, Buckhaults P, Wilkes C, Teilhet M, King ML, Moremen KW, Pierce M: Cloning and expression of a Xenopus laevis oocyte lectin and characterization of its mRNA levels during early development. Glycobiology. 1997, 7: 367-372. 10.1093/glycob/7.3.367.PubMedView ArticleGoogle Scholar
- Suzuki YA, Lonnerdal B: Baculovirus expression of mouse lactoferrin receptor and tissue distribution in the mouse. Biometals. 2004, 17: 301-309. 10.1023/B:BIOM.0000027709.42733.e4.PubMedView ArticleGoogle Scholar
- Yang RZ, Lee MJ, Hu H, Pray J, Wu HB, Hansen BC, Shuldiner AR, Fried SK, McLenithan JC, Gong DW: Identification of omentin as a novel depot-specific adipokine in human adipose tissue: possible role in modulating insulin action. Am J Physiol Endocrinol Metab. 2006, 290: E1253-1261. 10.1152/ajpendo.00572.2004.PubMedView ArticleGoogle Scholar
- Schaffler A, Neumeier M, Herfarth H, Furst A, Scholmerich J, Buchler C: Genomic structure of human omentin, a new adipocytokine expressed in omental adipose tissue. Biochim Biophys Acta. 2005, 1732: 96-102.PubMedView ArticleGoogle Scholar
- Tan BK, Pua S, Syed F, Lewandowski KC, O'Hare JP, Randeva HS: Decreased plasma omentin-1 levels in Type 1 diabetes mellitus. Diabet Med. 2008, 25: 1254-1255. 10.1111/j.1464-5491.2008.02568.x.PubMedView ArticleGoogle Scholar
- Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmada MM, et al: Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet. 2008, 40: 955-962. 10.1038/ng.175.PubMed CentralPubMedView ArticleGoogle Scholar
- Pemberton AD, Rose-Zerilli MJ, Holloway JW, Gray RD, Holgate ST: A single nucleotide polymorphism in intelectin-1 is associated with increased asthma risk. Journal of Allergy and Clinical Immunology. 2008, 122: 1033-1034. 10.1016/j.jaci.2008.08.037.PubMedView ArticleGoogle Scholar
- Pemberton AD, Knight PA, Wright SH, Miller HR: Proteomic analysis of mouse jejunal epithelium and its response to infection with the intestinal nematode, Trichinella spiralis. Proteomics. 2004, 4: 1101-1108. 10.1002/pmic.200300658.PubMedView ArticleGoogle Scholar
- Datta R, deSchoolmeester ML, Hedeler C, Paton N, Brass AM, Else KJ: Identification of novel genes in intestinal tissue which are regulated post infection with an intestinal nematode parasite. Infection and Immunity. 2005, 73: 4025-4033. 10.1128/IAI.73.7.4025-4033.2005.PubMed CentralPubMedView ArticleGoogle Scholar
- Kuperman DA, Lewis CC, Woodruff PG, Rodriguez MW, Yang YH, Dolganov GM, Fahy JV, Erle DJ: Dissecting asthma using focused transgenic modeling and functional genomics. J Allergy Clin Immunol. 2005, 116: 305-311. 10.1016/j.jaci.2005.03.024.PubMedView ArticleGoogle Scholar
- Gu N, Kang G, Jin C, Xu Y, Zhang Z, Erle DJ, Zhen G: Intelectin is required for IL-13-induced monocyte chemotactic protein-1 and -3 expression in lung epithelial cells and promotes allergic airway inflammation. Am J Physiol Lung Cell Mol Physiol. 298: L290-296. 10.1152/ajplung.90612.2008.
- Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, Eis PS, Shannon WD, Li X, McLeod HL, Cheverud JM, Ley TJ: A High-Resolution Map of Segmental DNA Copy Number Variation in the Mouse Genome. PLoS Genet. 2007, 3: e3-10.1371/journal.pgen.0030003.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol. 2003, 18: 292-298. 10.1016/S0169-5347(03)00033-8.View ArticleGoogle Scholar
- Gimelbrant AA, Chess A: An epigenetic state associated with areas of gene duplication. Genome Res. 2006, 16: 723-729. 10.1101/gr.5023706.PubMed CentralPubMedView ArticleGoogle Scholar
- Knight PA, Wright SH, Lawrence CE, Paterson YY, Miller HR: Delayed expulsion of the nematode Trichinella spiralis in mice lacking the mucosal mast cell-specific granule chymase, mouse mast cell protease-1. Journal of Experimental Medicine. 2000, 192: 1849-1856. 10.1084/jem.192.12.1849.PubMed CentralPubMedView ArticleGoogle Scholar
- Adams DJ, Quail MA, Cox T, van der Weyden L, Gorick BD, Su Q, Chan WI, Davies R, Bonfield JK, Law F, et al: A genome-wide, end-sequenced 129Sv BAC library resource for targeting vector construction. Genomics. 2005, 86: 753-758. 10.1016/j.ygeno.2005.08.003.PubMedView ArticleGoogle Scholar
- Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, et al: A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science. 2002, 296: 1661-1671. 10.1126/science.1069193.PubMedView ArticleGoogle Scholar
- Jern P, Coffin JM: Effects of retroviruses on host genome function. Annu Rev Genet. 2008, 42: 709-732. 10.1146/annurev.genet.42.110807.091501.PubMedView ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.PubMedView ArticleGoogle Scholar
- Kumaresan PR, Stepp SE, Bennett M, Kumar V, Mathew PA: Molecular cloning of transmembrane and soluble forms of a novel rat natural killer cell receptor related to 2B4. Immunogenetics. 2000, 51: 306-313. 10.1007/s002510050624.PubMedView ArticleGoogle Scholar
- Cutler G, Marshall LA, Chin N, Baribault H, Kassner PD: Significant gene content variation characterizes the genomes of inbred mouse strains. Genome Res. 2007, 17: 1743-1754. 10.1101/gr.6754607.PubMed CentralPubMedView ArticleGoogle Scholar
- Egan CM, Sridhar S, Wigler M, Hall IM: Recurrent DNA copy number variation in the laboratory mouse. Nat Genet. 2007, 39: 1384-1389. 10.1038/ng.2007.19.PubMedView ArticleGoogle Scholar
- Cahan P, Li Y, Izumi M, Graubert TA: The impact of copy number variation on local gene expression in mouse hematopoietic stem and progenitor cells. Nat Genet. 2009, 41: 430-437. 10.1038/ng.350.PubMed CentralPubMedView ArticleGoogle Scholar
- Quinlan AR, Clark RA, Sokolova S, Leibowitz ML, Zhang Y, Hurles ME, Mell JC, Hall IM: Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res. 2010, 20: 623-635. 10.1101/gr.102970.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 2009, 10: 80-10.1186/1471-2105-10-80.PubMed CentralPubMedView ArticleGoogle Scholar
- Mogal AP, van der Meer R, Crooke PS, Abdulkadir SA: Haploinsufficient prostate tumor suppression by Nkx3.1: a role for chromatin accessibility in dosage-sensitive gene regulation. J Biol Chem. 2007, 282: 25790-25800. 10.1074/jbc.M702438200.PubMedView ArticleGoogle Scholar
- Steadman DJ, Giuffrida D, Gelmann EP: DNA-binding sequence of the human prostate-specific homeodomain protein NKX3.1. Nucleic Acids Res. 2000, 28: 2389-2395. 10.1093/nar/28.12.2389.PubMed CentralPubMedView ArticleGoogle Scholar
- Belancio VP, Roy-Engel AM, Pochampally RR, Deininger P: Somatic expression of LINE-1 elements in human tissues. Nucleic Acids Res. 2010, 38: 3909-3922. 10.1093/nar/gkq132.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey JA, Church DM, Ventura M, Rocchi M, Eichler EE: Analysis of segmental duplications and genome assembly in the mouse. Genome Res. 2004, 14: 789-801. 10.1101/gr.2238404.PubMed CentralPubMedView ArticleGoogle Scholar
- Takano T, Sha Z, Peatman E, Terhune J, Liu H, Kucuktas H, Li P, Edholm ES, Wilson M, Liu Z: The two channel catfish intelectin genes exhibit highly differential patterns of tissue expression and regulation after infection with Edwardsiella ictaluri. Dev Comp Immunol. 2008, 32: 697-705. 10.1016/j.dci.2007.10.008.View ArticleGoogle Scholar
- Longman RJ, Douthwaite J, Sylvester PA, Poulsom R, Corfield AP, Thomas MG, Wright NA: Coordinated localisation of mucins and trefoil peptides in the ulcer associated cell lineage and the gastrointestinal mucosa. Gut. 2000, 47: 792-800. 10.1136/gut.47.6.792.PubMed CentralPubMedView ArticleGoogle Scholar
- Bhatia-Gaur R, Donjacour AA, Sciavolino PJ, Kim M, Desai N, Young P, Norton CR, Gridley T, Cardiff RD, Cunha GR, et al: Roles for Nkx3.1 in prostate development and cancer. Genes Dev. 1999, 13: 966-977. 10.1101/gad.13.8.966.PubMed CentralPubMedView ArticleGoogle Scholar
- Magee JA, Abdulkadir SA, Milbrandt J: Haploinsufficiency at the Nkx3.1 locus. A paradigm for stochastic, dosage-sensitive gene regulation during tumor initiation. Cancer Cell. 2003, 3: 273-283. 10.1016/S1535-6108(03)00047-3.PubMedView ArticleGoogle Scholar
- Pan HY, Guo L, Li Q: Changes of serum omentin-1 levels in normal subjects and in patients with impaired glucose regulation and with newly diagnosed and untreated type 2 diabetes. Diabetes Res Clin Pract. 2010, 88: 29-33. 10.1016/j.diabres.2010.01.013.PubMedView ArticleGoogle Scholar
- Moreno-Navarrete JM, Catalan V, Ortega F, Gomez-Ambrosi J, Ricart W, Fruhbeck G, Fernandez-Real JM: Circulating omentin concentration increases after weight loss. Nutr Metab (Lond). 2010, 7: 27-10.1186/1743-7075-7-27.View ArticleGoogle Scholar
- Saremi A, Asghari M, Ghorbani A: Effects of aerobic training on serum omentin-1 and cardiometabolic risk factors in overweight and obese men. J Sports Sci. 2010, 1-6.Google Scholar
- Orozco LD, Cokus SJ, Ghazalpour A, Ingram-Drake L, Wang S, van Nas A, Che N, Araujo JA, Pellegrini M, Lusis AJ: Copy number variation influences gene expression and metabolic traits in mice. Hum Mol Genet. 2009, 18: 4118-4129. 10.1093/hmg/ddp360.PubMed CentralPubMedView ArticleGoogle Scholar
- Boles KS, Stepp SE, Bennett M, Kumar V, Mathew PA: 2B4 (CD244) and CS1: novel members of the CD2 subset of the immunoglobulin superfamily molecules expressed on natural killer cells and other leukocytes. Immunol Rev. 2001, 181: 234-249. 10.1034/j.1600-065X.2001.1810120.x.PubMedView ArticleGoogle Scholar
- Nakajima H, Cella M, Langen H, Friedlein A, Colonna M: Activating interactions in human NK cell recognition: the role of 2B4-CD48. Eur J Immunol. 1999, 29: 1676-1683. 10.1002/(SICI)1521-4141(199905)29:05<1676::AID-IMMU1676>3.0.CO;2-Y.PubMedView ArticleGoogle Scholar
- Chlewicki LK, Velikovsky CA, Balakrishnan V, Mariuzza RA, Kumar V: Molecular basis of the dual functions of 2B4 (CD244). J Immunol. 2008, 180: 8159-8167.PubMedView ArticleGoogle Scholar
- Kumaresan PR, Huynh VT, Mathew PA: Polymorphism in the 2B4 gene of inbred mouse strains. Immunogenetics. 2000, 51: 758-761. 10.1007/s002510000198.PubMedView ArticleGoogle Scholar
- Ensembl Release 46. [http://aug2007.archive.ensembl.org/]
- Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8: 195-202.PubMedView ArticleGoogle Scholar
- NCBI microbial genomes BLAST. [http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi]
- FASTX-Toolkit. [http://hannonlab.cshl.edu/fastx_toolkit/]
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-1858. 10.1101/gr.078212.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.PubMed CentralPubMedView ArticleGoogle Scholar
- Wheelan SJ, Church DM, Ostell JM: Spidey: a tool for mRNA-to-genomic alignments. Genome Res. 2001, 11: 1952-1957.PubMed CentralPubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Softberry - RNASPL. [http://linux1.softberry.com/berry.phtml?topic=rnasplgroup=programssubgroup=gfind]
- Theis C, Reeder J, Giegerich R: KnotInFrame: prediction of -1 ribosomal frameshift events. Nucleic Acids Res. 2008, 36: 6013-6020. 10.1093/nar/gkn578.PubMed CentralPubMedView ArticleGoogle Scholar
- RepeatMasker Open 3.0. [http://www.repeatmasker.org]
- Sonnhammer EL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995, 167: GC1-10. 10.1016/0378-1119(95)00714-8.PubMedView ArticleGoogle Scholar
- Ovcharenko I, Loots GG, Hardison RC, Miller W, Stubbs L: zPicture: dynamic alignment and visualization tool for analyzing conservation profiles. Genome Res. 2004, 14: 472-477. 10.1101/gr.2129504.PubMed CentralPubMedView ArticleGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.PubMed CentralPubMedView ArticleGoogle Scholar
- Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008, 9: 299-306. 10.1093/bib/bbn017. 69PubMed CentralPubMedView ArticleGoogle Scholar
- Lemon J: Plotrix, a package in the red light district of R. R-News. 2006, 6: 8-12.Google Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010, 38: D105-110. 10.1093/nar/gkp950.PubMed CentralPubMedView ArticleGoogle Scholar
- Loots GG, Ovcharenko I: rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 2004, 32: W217-221. 10.1093/nar/gkh383.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey TL, Boden M, Whitington T, Machanick P: The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics. 2010, 11: 179-10.1186/1471-2105-11-179.PubMed CentralPubMedView ArticleGoogle Scholar
- Southern EM: Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol. 1975, 98: 503-517. 10.1016/S0022-2836(75)80083-0.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.