Quack leptin

Background A LEP transcript up-regulated in lungs of ducks (Anas platyrhynchos) infected by avian influenza A virus was recently described in the Nature Genetics manuscript that reported the duck genome. In vertebrates, LEP gene symbol is reserved for leptin, the key regulator of energy balance in mammals. Results Launching an extensive search for this gene in the genome data that was submitted to the public databases along with duck genome manuscript and extending this search to all avian genomes in the whole-genome shotgun-sequencing database, we were able to report the first identification of coding sequences capable of encoding the full leptin protein precursor in wild birds. Gene structure, synteny and sequence-similarity (up to 54% identity and 68% similarity) to reptilian leptin evident in falcons (Falco peregrinus and cherrug), tits (Pseudopodoces humilis), finches (Taeniopygia guttata) and doves (Columba livia) confirmed that the bird leptin was a true ortholog of its mammalian form. Nevertheless, in duck, like other domestic fowls the LEP gene was not identifiable. Conclusion Lack of the LEP gene in poultry suggests that birds that have lost it are particularly suited to domestication. Identification of an intact avian gene for leptin in wild birds might explain in part the evolutionary conservation of its receptor in leptin-less fowls.


Background
The duck (Anas platyrhynchos) genome and transcriptome were recently reported in Nature Genetics [1] as part of an investigation of immune-related genes implicated in the response to infection by avian influenza virus A. Using deep sequencing, the authors compared the lung transcriptomes of control and H5N1-infected ducks and used the gene symbol LEP to describe a transcript that was upregulated in the infected ducks. In vertebrates, this gene symbol is reserved for leptin, the key regulator of energy balance in mammals; however, the avian ortholog has never been established.

Leptin in poultry research
An entry in the Gene database [Gene ID: 373955] is set aside for the chicken (Gallus gallus) leptin gene. The lack of a nucleotide sequence for this entry reflects its complex history, having been cloned and its sequence then retracted [2][3][4][5]. After removing the Bos taurus sequences that contaminated the first submission of the chicken genome project and the EST database [6], it was finally established that no close ortholog of mammalian leptin is present in this genome. However, the obvious importance of identifying a master gene that controls appetite and fattening in poultry promoted cloning of mammalian-like leptins in turkey (Meleagris gallopavo, [GenBank: AAC32381], 95% identity to mouse leptin) and duck ([GenBank: AAT38807], 99% identity to mouse leptin). In the turkey genome housed in the ENSMBL database, there are neither annotations for LEP nor murine-like leptin sequences in its build; hence, like chickens, turkeys lack leptin.

Synteny confirms leptin in birds
The typical structure of the leptin gene include 3 exons: a non-coding exon followed by large intron 1, a second exon harboring the translation-initiation codon close to the splicing acceptor site, and a third large exon that encodes most of the protein (e.g. human Gene ID: 3952, fugu, Takifugu rubripes, Figure 1a). Based on a short contig (1482 bp) of a whole-genome shotgun sequencing (WGS) project, a partial gene capable of encoding the third exon of a leptin-like protein has recently been annotated in Taeniopygia guttata [Gene ID: 101233729], suggesting that leptin is expressed in the zebra finch. A BLASTP search using the putative 115-aa polypeptide encoded by this exon against the NR database indicated that the reptilian green sea turtle (Chelonia mydas) leptin was its closest ortholog, with 34% identity and 52% similarity (Figure 1c), while mouse leptin was more distant with 31% identity and 47% similarity. We hypothesized that if leptin is indeed present in birds, it would have been revealed in other avian WGS projects. Indeed, TBLASTN against the WGS database with taxid restricted to Aves revealed that the gene may be present in falcons (Falco peregrinus and F. cherrug; [GenBank: AKMT01018335 and AKMU01055767], respectively), tits (Pseudopodoces humilis, [GenBank: ANZD01014665, ANZD01014667]) and doves (Columba livia, AKCR01028475). The falcon sequences were 99.6% syntenic [7] and we therefore assembled the contigs of both species together (15,541 bp, also including [GenBank: AKMT01018336 and AKMU 01055766], Figure 1a) using GAP4 and 5 software [8], and Gene structures are drawn to scale shown by the bar below. Black and gray boxes represent translated and untranslated regions (UTRs) of exons, respectively. When the exact transcription termination site is not characterized, large gray arrowheads at the 3′ UTRs indicate the direction of transcription, which is generally indicated by small arrowheads on the intron delineations. Gene identification and exon numbers are given above and below the gene depictions, respectively. Exon numbers for falcon RBM28 follow their numbering in the orthologous rat gene. (b) Identification of errors in the genome submission of falcon based on alignment with individual reads from Sequence Read Archive (SRA). Reads were located by BLASTN search of the SRA database, downloaded with their quality information (FASTQ format), and assembled using GAP5 software [8]. The relevant protein sequence is added above the contig editor output for a region of low coverage within the second LEP exon. The contig editor shows quality values by gray scale and discrepancies between the sequences and the consensus are highlighted by a base symbol. The cutoff option was not turned on and therefore low quality (dark gray) bases that were manually trimmed are not displayed. Individual reads and the mapping template (AKMU01055767) are identified on the left. A base substitution and 4-base deletion (A****) are denoted on the mapping template, which is the first read below the consensus line. incorporating additional reads in critical regions from the Sequence Read Archive Nucleotide BLAST (Figure 1b). This revealed coding exons fitting the typical leptin gene structure and capable of encoding a full-length leptin-like 166-aa precursor with 52% identity and 69% similarity of F. cherrug to the turtle leptin ( Figure 1c). Moreover, the 3′-neighboring gene of the falcon leptin showed 56% identity and 68% similarity to rat RNA-binding motif protein 28 (RBM28, [Gene ID: 312182]). Local LEP-RBM28 synteny is conserved and observed in fish (e.g. fugu, [Gene ID: 101064097]) and mammals (Figure 1a), and thus strongly indicating that these sequences are orthologous to the mammalian leptin.
The tit contigs were GC-rich (68%) with highly repetitive GC elements and we were unable to combine them; nevertheless, both coding exons corresponding to the typical leptin structure were observable. These exons were capable of encoding a full-length leptin-like 161-aa precursor with 36% identity and 57% similarity to the turtle leptin ( Figure 1c). Further search of the WGS database revealed similarity to single exons: the previously annotated exon 3 for zebra finch and a novel match to exon 2 for dove. We extended the detected dove contig with reads [SRA: SRR511892.31385855, SRR511913.3134902] and found the initiation codon of a typical structure of leptin exon 2 capable of encoding 48 aa of the 5′ end of a leptinlike precursor with 56% identity and 72% similarity to turtle leptin (Figure 1c

Leptin remains unidentifiable in domestic fowls
Examination of the recently submitted duck genome annotations revealed no gene with LEP as its symbol and no gene annotated as leptin. Moreover, BLASTN search of the WGS database using "duck leptin" [GenBank: AAT38807] or any of the novel leptin-like bird proteins described here indicated no significant similarity to leptin in this genome submission. Thus, we conclude that this gene may be also missing in duck. It is expected of the of the editorial process of a high ranking journal to ensure that when seeking a fast impact, genome publications would not turn into lists of unverified gene symbols that no one actually reads. It is further recommended that authors who deposited erroneous sequences of murine-like leptins for birds in sequence databases [GenBank: AAC32380, AAC60368, AAL35557, AAT38807, O42164, O93416] caution users of the possibility of sequence contamination. It should be also noted that 11 GenBank mRNA submissions of fish leptins with >98% identity to the mouse transcript should be similarly annotated [GenBank: DQ784814-6, AY497007, AY547279, AY547322, AY551335-9].
Moreover, a large volume of misinformation may have been generated as these murine-like leptins were the basis for studies without prior knowledge of leptin's activity in the targeted species, including reports of the expression of the erroneous leptin gene product at the mRNA and protein levels (e.g. [9][10][11][12][13][14][15][16]). These leptins were reported to attenuate appetite, or affect other parameters related to the control of energy balance when administered to chickens [17][18][19][20], chick embryos [21][22][23], ovarian [24] and hepatoma [25] cells in culture or skeletal bones in an ex vivo model system [26].

Recent findings
While this work was under consideration 3 reports describing avian leptins were submitted and published [27][28][29], including indication that is based on a single RNA-seq read for leptin-like transcript in the duck [27]. We used the sequence information from this read and the read from this fragment opposite end, to design a pair of PCR primers which bridged the sequence gap between these reads. The PCR protocol applied was adapted for amplification of leptin GC rich sequences [28]. DNA sequencing of the resulting PCR product confirmed the existence of leptin-like sequence orthologous to the sequence of the last exon of other avian leptins (Figure 2) in the duck genome. However, analysis of the genomic raw deep-sequencing data [BioProject: PRJNA46621] was hampered by existence of similar repetitive sequence structures; and we were able to extend this sequence only towards the 5′. We detected no reads that could extend the 3′ with sequence coding for valid cysteine knot motif that is typical of all leptins [30]. Furthermore, analyzing the raw RNA-seq data [BioProjects: PRJNA194464, PRJNA188394] revealed transcription matching the repetitive sequence structures but no additional reads for the duck leptin-like sequence described here could be identified (data not shown). Detection of leptin syntenic genes like miR129-1 favors the possibility that the leptin gene may also exist in ducks [29]. Hence, the existence of fully functional leptin gene in the duck remains an open question.
Further BLASTN and TBLASTN searches of the WGS database using the novel avian leptin sequences revealed indications for existence of leptin in additional bird species. These include woodpecker, eagle and quails ( Figure 2). Protein motifs typical of leptin were identified and annotated including leader peptide, 4-helix bundle structure and cysteine knot (Figure 2). While the leptin gene of woodpecker was apparent on an unplaced genomic scaffold [GenBank: JJRU01076739] the gene of golden eagle was much obscure. The eagle's first coding exon (exon 2) was intact in a WGS contig [GenBank: JDSB01143511]. However, de-novo assembly of genomic raw deep-sequencing data [BioProject: PRJNA222866] was unable to extend the sequence of the last-exon-like structure [GenBank: JDSB01163119]. Yet, all the putative motifs encoded by the highly (89% identity and 91% similarity) orthologous falcon leptin gene were assembled to form disordered palindromic and repetitive contigs containing also the leptin's syntenic gene RBM28. Such structures were also typical for the duck (data not shown). Bobwhite quail was the first galliforme with a partial exon 3 like sequence observable in a contig assembly of the WGS effort of this quail (Figure 2, [GenBank: AWGU01372785]). We used this sequence as a template for a BLAST search of the deep-sequencing data deposited for the Japanese quail in the SRA database and the related WGS assembly. The leptin gene was not identifiable in the latter, however we were able to download and assemble the matching SRA sequence reads (Figure 2), which correspond to an intact exon 3 structure. We repeated the sequence searches against the chicken genome and confirmed that even this galliforme LEP-like sequence is not detectable in Gallus gallus, in agreement with the observation that . Dashes indicate gaps introduced by the alignment program. Identical and similar amino-acid residues in at least three or six sequences are indicated by a black and gray background, respectively. White boxes indicate non-conservative amino-acid changes between the proteins. The signal peptide and structural elements, helixes and loops [28] are denoted above the alignment. The two conserved cysteines forming a lasso knot [30] are indicated by black arrowheads. Duck's genomic sequence was confirmed using previously described procedures [28]; DNA was extracted from frozen mallard duck purchased from a local husbandry (Levin, Kfar Baruch, Israel) and nucleotide sequence was determined by capillary sequencing of the 81 bp product amplified using PCR primers (F, 5′-CAGCTTTTCCAGCGCGTC-3; R, 5′-GAGGTTCTCCAGGTCGCTTA-3′).
administration of a leptin antagonist had no effect on appetite and body growth of layer chickens [31]. We could not associate any ESTs or RNA-seq reads to the quails' leptin-like genes and moreover the role of the leptin signaling pathway may differ in galliformes [28]. This hypothesis may also be related to the finding that the hunger hormone ghrelin [32], which is predominantly synthesized in the gastrointestinal tract in chickens and mammals, has been reported to have an opposite effect on appetite in chickens compared to mammals [33,34]. Hence, galliformes provide a unique model system to decipher an alternative control mechanism of energy homeostasis and we intend to further study this in the Japanese quail.

Conclusions
The absence of a leptin gene in genomes related to domestic fowls seems incompatible with the presence of the leptin receptor gene, which has been cloned in chicken [35], turkey [36] and duck [1]. Herein we report the first identification of coding sequences capable of encoding the full leptin protein precursor in birds. Identification of an intact avian gene for leptin might explain in part the evolutionary conservation of its receptor in Aves. The loss of leptin in the lineage of domestic fowls suggests that relaxing the control of appetite made these birds particularly suited to domestication.

Comparative sequence analysis
For the characterization of leptin genes not yet annotated in the avian genomes assemblies, sequence homology searches were carried out in different, publicly available database (NCBI: NR, WGS, SRA; and Ensembl) using the BLAST family of programs. Relevant sequence entries were downloaded with their quality information (FASTQ format), and reassembled using the GAP5 software [8]. The amino acid sequences were aligned using CLUS-TALW (http://www.genome.jp/tools/clustalw/) with the default parameters and the GONNET matrix; and colored using the BOXSHADE program (http://www.ch.embnet. org/software/BOX_form.html).

Sequence data accessions
The annotated sequences are available in GenBank under accessions HG425120-3.