Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence
© Uenishi et al.; licensee BioMed Central Ltd. 2012
Received: 12 December 2011
Accepted: 9 August 2012
Published: 15 November 2012
Along with the draft sequencing of the pig genome, which has been completed by an international consortium, collection of the nucleotide sequences of genes expressed in various tissues and determination of entire cDNA sequences are necessary for investigations of gene function. The sequences of expressed genes are also useful for genome annotation, which is important for isolating the genes responsible for particular traits.
We performed a large-scale expressed sequence tag (EST) analysis in pigs by using 32 full-length-enriched cDNA libraries derived from 28 kinds of tissues and cells, including seven tissues (brain, cerebellum, colon, hypothalamus, inguinal lymph node, ovary, and spleen) derived from pigs that were cloned from a sow subjected to genome sequencing. We obtained more than 330,000 EST reads from the 5′-ends of the cDNA clones. Comparison with human and bovine gene catalogs revealed that the ESTs corresponded to at least 15,000 genes. cDNA clones representing contigs and singlets generated by assembly of the EST reads were subjected to full-length determination of inserts. We have finished sequencing 31,079 cDNA clones corresponding to more than 12,000 genes. Mapping of the sequences of these cDNA clones on the draft sequence of the pig genome has indicated that the clones are derived from about 15,000 independent loci on the pig genome.
ESTs and cDNA sequences derived from full-length-enriched libraries are valuable for annotation of the draft sequence of the pig genome. This information will also contribute to the exploration of promoter sequences on the genome and to molecular biology-based analyses in pigs.
KeywordsSus scrofa Full-length cDNA Sequencing Genome annotation
The pig is the world's most frequently consumed meat animal, and its genetic improvement, particularly in terms of productivity and meat quality, is of interest to livestock science. To date, intensive genetic improvement of livestock animals has been conducted by using classical selection and mating, but genomic information is required for further improvement. Improvements in aspects of the rearing management of pigs, such as feeding and hygiene control, have to be based on knowledge obtained from physiological studies. Moreover, the pig is unique among livestock in that it is very useful in biomedical research because of the structural and size similarities of its organs (particularly cardiovascular and dermal) to those of humans[2–4]. Improvement in the breeding and rearing of pigs with the help of molecular genetics and physiology, as well as the use of pigs as biomedical model animals, requires fundamental information on pig molecular biology, particularly in terms of the genome and genes.
Recently, the International Swine Genome Sequencing Consortium (SGSC) completed its draft sequencing of the pig genome; these sequences will form the basis of further investigations of pig molecular biology[5–7]. Sequencing of the pig genome will accelerate the development of genetic markers to improve breeds and populations and give basic information on the genes encoded on the genome. However, genes cannot be precisely localized on the genome solely from information on the genome sequence. Determination of precise sequences, structures, and locations requires information on the sequences of expressed genes per se. The locations and structures of genes on the pig genome are now being explored by using automated systems or by manual inspection by annotators using the Otterlace system of the Wellcome Trust Sanger Institute to add information to databases such as Vertebrate Genome Annotation (VEGA)[10–12]. The sequences of expressed genes are also useful for genome annotation, which is important for isolating the genes responsible for particular traits.
Expressed sequence tag (EST) analyses have been conducted by many research groups in pigs and other organisms. More than 1,600,000 pig ESTs have been accumulated and registered in the public nucleotide databases, and several attempts at transcriptome analysis using next-generation DNA sequencing (NGS) have been made[13–15]. Most of the cDNA libraries constructed by using traditional methods do not cover the transcription start sites, because the limitations of cloning techniques can cause incomplete synthesis of full-length cDNA. On the other hand, the sequences of transcripts obtained by using NGS alone are reconstructed by the compilation of short reads and do not directly reflect the actual structure of the mRNA; this may be problematic in considering the alternative splicing products that are actually expressed in the tissues. As far as possible, it is therefore important to clone full-length mRNA transcripts in order to collect gene expression data and use these data in further analyses of expressed genes and of genome annotation in pigs[17–19]. cDNA clones carrying full-length transcripts have additional benefits—they can be used for protein production in vitro and for exploring promoter sequences on the genome.
So far we have conducted EST analysis and sequencing of entire mRNA transcripts in pigs by using full-length-enriched cDNA libraries. Here, we outline the data we have collected and the advantages of their use, particularly in genome annotation of the draft sequence of the pig genome.
Results and discussion
Pig ESTs based on full-length-enriched cDNA libraries
Pig cDNA libraries, ESTs, and completely sequenced cDNA clones
Method of library construction
Mapped on Sscrofa10.2
Brain (frontal lobe)
Immature dendritic cells
Inguinal lymph node
Mesenteric lymph node
Peripheral blood lymphocytes
Genes and chromosomal locations corresponding to assemblies generated from pig ESTs
Correspondence to mammalian genes and estimated efficiencies of cloning of start codons of EST assemblies
Unique Gene ID (without HomoloGene ID)
Unique HomoloGene ID
Assemblies matched to protein sequences
Assemblies estimated to include start codons
Among the human genes matched to the assemblies, 12,937 corresponded to 12,911 unique NCBI HomoloGene IDs, which are indices of homologs among genomes of different eukaryote species. This covered about two-thirds of all HomoloGene groups in humans (18,431 IDs; release 65). Furthermore, about 2000 additional genes were also included in the ESTs, as estimated from the numbers of genes without HomoloGene IDs (Table2). In total, we estimated that more than 15,000 different genes were included in the ESTs thus obtained. However, the numbers of genes included in the EST assemblies might in fact increase because of gene duplication specifically occurring in the Sus genus.
Mapping of pig EST assemblies on pig chromosomes
Generation of the collection of pig cDNA clones, and complete sequencing of their inserts
The pig cDNA libraries used for the EST analysis were full-length–enriched libraries, which are ideal for determining the entire sequences of transcripts functioning as protein-encoding mRNA. In parallel with the EST analysis described above, we selected cDNA clones for the sequencing of entire inserts. The cDNA clones located at the forefront position in contigs generated by the assembly were selected for sequencing of the entire inserts, because we considered that they were the best candidates for clones carrying entire transcripts. As the EST analysis progressed, if cDNA clones located upstream of the clones already selected in the contigs appeared, we added these clones into the pipeline for sequencing of the entire inserts. On the other hand, among the singlets that did not join the contigs, there were many clones corresponding to human genes that had no counterparts among the clones selected from the contigs. We also selected these clones to ensure that the cDNA collection included a broad variety of genes. In total, we selected 42,047 clones as candidates for complete sequencing (31,545 clones from contigs and 10,502 clones from singlets).
Distributions of lengths of cDNA clones completely sequenced by the primer walking and transposon shotgun sequencing methods
Sequenced using universal primersa
Transposon shotgun sequencing
Genes corresponding to cDNA clones
Mammalian genes corresponding to sequenced cDNA clones
Genes corresponding to sequenced cDNA clones
Sequenced cDNA clones containing full-length CDSs
Unique Gene ID (without HomoloGene ID)
Unique HomoloGene ID
Unmapped cDNA clones
cDNA clones(with corresponding loci on pig genome)
Unique Gene ID (with corresponding loci on pig genome)
Loci with Gene ID
Gene ID with cDNA clones unmapped on pig genomea
Unique Gene ID (without Homolo Gene ID)
Unique HomoloGene ID
Distribution of cDNA clones on the draft sequence of the pig genome
Mapping of the pig cDNA clones sequenced in this study on pig chromosomes
Similar to the human genome, the pig genome is estimated to include 20,000 to 25,000 genes, because the genome size of pig is comparable to that of human. Our sequencing of the cDNA clones therefore covered slightly more than half of the entire gene set of the pig. The reason why thousands of genes were not included in our cDNA resources may be that the libraries were constructed with tissues of animals that were healthy and not subject to stressors such as infection or starvation. Genes that are highly expressed only during acute responses to pathogens or nutritional exhaustion might be difficult to clone in such libraries. In addition, if a gene is rarely expressed in a particular tissue (e.g., if its expression frequency among all the transcripts is less than 0.006%), then the probability that it will fail to be detected in the cloning of 10,000 transcripts will be more than 50%. To increase the number of cloned genes it would be necessary to normalize the libraries or use tissues derived from animals stimulated by particular stressors.
Genes encoded on the genome may be duplicated specifically in particular species but not in others. To detect duplication specifically occurring in the pig genome, we extracted the cDNA sequences with the longest open reading frames (ORFs) from among the clones that we sequenced here for the 12,441 putative loci on the autosomes and sex chromosomes in Sscrofa10.2. The extracted cDNA sequences were compared with human and cattle protein sequences deduced from the NCBI RefSeq (release 49). We estimated that 635 human protein sequences and 709 cattle protein sequences matched more than one putative locus on Sscrofa10.2. Conversely, the total numbers of loci estimated to be duplicated on Sscrofa10.2 in comparison with humans and cattle were 1358 and 1505, respectively. However, most of the potentially duplicated loci encoded shorter ORFs than their counterparts, implying that the loci were only pseudogenes or that they had arisen from the remaining sequencing errors in the genome sequence. Further refinement of the draft sequences of the pig genome will elucidate the duplication of genes occurring in the Sus genus.
The cDNAs thus analyzed were synthesized by reverse transcription using a poly-dT primer; therefore, most of the clones showed canonical mRNA features and had ORFs. Among the 13,894 loci mapped on the chromosomes and scaffolds, the representative (the longest) cDNA clones for 4144 loci showed no correspondence to the genes of humans, cattle, dogs, or mice. However, most of the cDNA clones with no obvious correspondence to the genes of other animals had apparent ORFs, and only 160 clones did not have ORFs for sequences more than 30 amino acids long. The average insert length of these 160 clones was 665 bp, whereas the average for all of the clones was much longer (1631 bp). These “non-coding” transcripts may be transcribed randomly and may have no function, although they may have a certain regulatory function on other protein-coding transcripts.
The draft sequence of the pig genome in its latest build (Sscrofa10.2) corresponds to the bacterial artificial chromosome (BAC) clones covering 98% of the physical map of the entire chromosome[5, 6]. In our analysis, about 2% of the unique human genes corresponding to the cDNA sequences were not mapped to any chromosomes or unplaced scaffolds, showing that our estimate of the coverage of the whole genome by the draft sequence was correct. Refinement of the draft sequence of the pig genome will reveal the precise locations on the pig genome of the loci generating those transcripts that we cloned but that were not mapped, or that we mapped only on unplaced scaffolds.
The cDNA libraries that we used included seven libraries constructed by using animals that were cloned from a sow used for draft sequencing of pig genome (Duroc 2–14)[5–7]. A total of 4010 cDNA clones from four libraries were completely sequenced. Among them, 3729 were mapped to 2993 loci on the pig genome (Table6B). These cDNA clones will be valuable for genome annotation of the draft sequence generated by the SGSC. In addition, comparison of the cDNA sequences of Duroc 2–14 clones with the draft sequences of the pig genome can be used to estimate the sequencing accuracy of the draft sequence and the frequency of RNA editing, although polymorphisms among the chromosomes of Duroc 2–14 hinder precise discrimination of such errors and edited bases. We roughly estimated such base changes by aligning cDNA sequences derived from Duroc 2–14 clones with the genome sequences. The region of each cDNA sequence that appeared most aligned was extracted, and the adenosine (A) to guanosine (G) base changes, which reflected the most representative A to I (inosine) RNA editing, were counted. To simplify the estimation, we investigated only A-to-G changes flanked by 5-base matches on both sides. Among the cDNAs of Duroc 2–14 clone pigs, 124 carried only A-to-G base changes, which totaled 142. In contrast, 91 cDNAs carried only G-to-A base changes, which totaled 97. Therefore, we estimated that about one-fourth of the inconsistency between G and A in the cDNAs and the draft sequence, respectively, is caused by RNA editing in the pig. Alignment of the 3′-UTR sequences of the cDNAs of Duroc 2–14 clone pigs showed differences of less than 0.4% from the draft genome sequence (data not shown). The differences thus detected included polymorphisms between different chromosomes and bases subjected to RNA editing. Furthermore, more than half of the aligned 3166 3′-UTRs (1883) were completely matched to the draft genome sequence. We therefore estimated that the actual error rate in the draft sequences of the pig genome was much less than 0.4%; the draft sequence was thus reliable.
Coverage of coding sequences by cDNA clones on the pig genome
We expected that the collection of pig cDNA clones that we sequenced would include sequences covering the entire CDSs of pig genes. To estimate the numbers of cDNA clones covering entire CDSs, we investigated the coverage of those protein sequences of humans, mice, cattle, and dogs that showed the greatest similarity to the amino acid sequences deduced from the cDNAs. We also examined the distribution of the cDNA clones considered to cover entire CDSs on the pig chromosomes in the draft genome sequence (Sscrofa10.2). Among the cDNA clones sequenced completely, 14,616 were estimated to contain entire CDSs in their inserts. We estimated that these clones corresponded to 6466 different loci on the pig chromosomes (Table5).
Usefulness of the pig cDNA collection in genome annotation and other applications
The cDNA clones sequenced here were derived from libraries by methods that preferentially cloned intact RNA transcripts. About three-fourths of EST assemblies showing considerable similarity to known genes carrying the beginning of CDSs; we estimated that about half of the cDNA clones that were completely sequenced contained entire CDSs. An outline of the pig genome sequence is currently available, and use of the sequences of these expressed genes should help in precisely identifying the locations of genes on the genome and in determining the exon–intron structures of the genes. Along with the progress made in draft sequencing of the pig genome, automated annotation of the pig genome sequence has been conducted by the pipelines in Pre-Ensembl/Ensembl and publicized through the Pre-Ensembl/Ensembl database[35, 36]. In the automated pipelines, about 30,000 pig cDNA clones were utilized; most of these were derived from our pig cDNA sequencing project. Our data resources on pig-expressed genes have greatly contributed to prediction of the structures of genes on the draft sequence of the pig genome. In addition, the use of pig cDNA sequences that have been completely sequenced accelerates the process of manual refinement of automated genome annotation. Until now, many projects for full-length cDNA sequencing have been conducted in parallel with genome sequencing in eukaryotic species, and the results of these studies have contributed to our knowledge of gene locations and structures in target species such as humans and mice[9, 38]. In fact, the pig cDNA sequences presented here have contributed greatly to the process of annotation of immune-related genes in the draft sequence of the pig genome by the Immune Response Annotation Group. We expect that additional efforts to annotate other groups of pig genes will be accelerated by the use of our pig cDNA sequences.
One of the characteristics of the ESTs and cDNA sequences presented here is that the majority of the sequences were derived from intact mRNA with transcription start sites. This has great merit for exploring promoter sequences on the genome sequence. The consensus sequences bound by transcription factors in the promoter sequences are generally well conserved among species; however, there are many variations in the binding-site sequences of transcription factors, and precise determination of the genomic region of the promoter sequence of each gene is essential for clarifying the efficiency of transcription in cells in response to stimuli. Extraction of the upstream regions of the EST assemblies and cDNA sequences presented here, combined with direct evidence from, for example, ChIP-Seq studies, which will be accelerated by using the cDNA sequences for transcription factors in pigs, will enable the construction of a variable database for understanding transcriptional regulation of pig genes. Notably, we were able to completely sequence 1340 pig cDNA clones associated with “nucleic acid binding transcription factor activity,” as classified according to Gene Ontology (GO:0001071) (Figure3D).
Another advantage of this collection of cDNA sequences is its usefulness for investigating alternative splicing events in pig genes. We mapped 29,430 cDNA clones to 13,894 different loci on the pig genome—that is, on average more than two different cDNA clones were derived from a single locus. Future studies should include a detailed exploration of splicing variants by using the cDNA sequences we have sequenced, together with the pig gene sequences presented by other groups.
The cDNA sequences and the ESTs themselves will also be useful in other studies, such as in gene expression analysis and in detecting polymorphisms in pig genes. A number of polymorphisms have been reported in mRNA sequences (particularly in CDSs), and it should be emphasized that there are many polymorphisms in CDSs that affect the functions of the molecules encoded by the genes that carry the polymorphisms. Our explorations of polymorphisms using the cDNA sequences and ESTs presented here have been useful in characterizing the genetic features of pig breeds and populations[18, 42]. We have also investigated polymorphisms in genes encoding pattern-recognition receptors[43–46] and have demonstrated that some of the polymorphisms observed so far in commercial pig and wild boar populations truly affect the ligand-recognition ability of the molecules encoded by the genes[47–49]. In addition, many ongoing studies are revealing the potential associations of gene polymorphisms with economically important traits in pigs. The use of pig gene sequences, including those presented here, will help greatly in promoting the exploration of polymorphisms that may be candidates for markers for selecting or breeding pigs with distinguished traits. Furthermore, the gene sequences can be used directly to design probes for microarrays. We have developed oligomer microarrays by using sequences derived mainly from the ESTs and cDNA sequences presented here, and we have successfully elucidated the characteristics of changes in gene expression in pig subcutaneous preadipocytes. Designing microarray probes with full-length cDNA sequences has benefits in terms of reliability, because the probes are highly specific to the target genes and there is clear evidence of correspondence between the probes and fully annotated genes. Full-length cDNA sequences will even be valuable in transcriptome analysis with NGS, which will become the mainstream method of expression analysis; these sequences will be useful in determining which short NGS reads belong to gene sequences that truly exert functions in organisms.
Here, we demonstrated our attempts to collect pig-expressed genes by EST analysis and sequencing of entire cDNA clones using full-length-enriched cDNA libraries. We have so far accumulated 330,707 ESTs and 31,079 cDNA sequences. The ESTs and cDNA clones thus sequenced were respectively mapped to 40,666 and 13,894 different loci on the latest pig genome sequence Sscrofa10.2; they corresponded to more than 15,000 and 12,000 different genes of other species, respectively. The cDNA resource presented here is valuable for annotation of the draft sequence of the pig genome and for exploring promoter sequences on the genome. It will also be valuable for molecular biology–based analyses in pigs, for example for analyses of protein production in vitro.
Construction of cDNA libraries
Tissues for construction of the cDNA libraries were prepared from 10 crossbred [(Landrace × Large White) × Duroc] pigs, which are representative of those used for the Japanese pork market, and a Meishan animal, a breed representative of those used in China. The pigs were housed at the National Institute of Livestock and Grassland Science (Tsukuba, Ibaraki, Japan)[18, 19]. We also used tissues from two Landrace and one Berkshire breed pig and one NIBS miniature pig. In addition, we used a pig cloned from an animal of the Duroc breed (2–14) that was subjected to genome sequencing by the SGSC[5–7].
Using the collected tissues and cell populations, cDNA libraries were constructed by one of the following three methods. About two-thirds of the libraries (23) were constructed by using the oligo-capping method. Fifteen of the oligo-capped cDNA libraries were constructed by using Gateway-compatible pCMVFL3 vector (Invitrogen, Carlsbad, CA, USA), whereas the vector for the rest was pME18SFL3 (Toyobo, Osaka, Japan). Five cDNA libraries were constructed by using another method for constructing full-length cDNA libraries, namely the vector-capping method. In total, 28 libraries were generated by methods using the 5′ cap structure, which is characteristic of intact mRNA. To compile the remaining four libraries, because only small amounts of RNA could be prepared from the tissues or cell populations, we used the SMART method and pDNR-LIB vector (Clontech, Palo Alto, CA, USA); this method selectively clones cDNAs that are synthesized as far as the 5′-end of the mRNA molecule. All of the cDNAs were cloned into the vector unidirectionally. The library construction methods used for each tissue or cell population are shown in Table1.
EST analysis and clustering/assembling
The cDNA libraries thus constructed were subjected to EST analysis by single-pass sequencing from the 5Â´-ends of the respective clones. The EST reads obtained underwent basecalling using Phred; the vector sequences were screened by using the crossmatch program in the Phrap package[54, 55]. Repetitive sequences and regions of low-complexity regions (e.g., poly(A) tracts) in the chromatograms thus generated were screened by using RepeatMasker with RepBase and in-house-generated Perl scripts. Clustering and assembly of sequences were performed with the TGICL package with CAP3. Chromatograms that were not included in the contigs or that did not have regions containing more than 100 bases with Phred quality values ≥10 were discarded.
Sequencing of entire inserts of cDNA clones
cDNA clones located at the most 5′ position in contigs generated by the assembling were selected for sequencing of the inserts. We also chose clones in singlets that did not join the contigs, provided that the clones corresponded to human genes with no counterparts among the clones selected from the contigs.
First, we sequenced the selected clones from the 5′-end with primer annealing with the vector sequence to confirm whether the correct clones had been selected. We also sequenced from the 3′-end with primer annealing with the vector sequence and T25V primer. The chromatograms that were obtained from the EST analysis and generated by sequencing from the 5′- and 3′-ends were subjected to basecalling with Phred and assembly by using Phrap. The contigs thus generated were screened with in-house Perl scripts to check for the low-quality regions (Phred quality values ≤ 25), and they were manually inspected for sequencing errors by using the Consed program. Regions with low-quality or ambiguous bases were re-sequenced with primers (PW method;) designed by using the Consed program with the “autofinish” option. The procedure of inspection and sequencing of the remaining low-quality regions was performed twice or until no low-quality or ambiguous bases were observed. In addition to the PW method, we adopted another approach based on TPS, particularly for cDNA clones that could be difficult to sequence with the primer walking method[30–32, 61]. With TPS, a large majority of the cDNA clones constructed by using the Gateway-compatible cloning vector (pCMVFL3; Toyobo, Osaka, Japan) were sequenced by using a combination of TPS and insert transfer with Gateway technology to reduce the number of shotgun clones with transposons in the vector sequence. TPS was conducted with pooled DNA from two to 12 cDNA clones, as described previously[30, 31].
The EST assemblies and cDNA clones were used in similarity analyses after interspersed repetitive sequences and low-complexity regions (such as polynucleotides and microsatellites) had been masked with the RepeatMasker program. EST assemblies and cDNA clones on the pig genome sequence were mapped by using a BLAST similarity search with the latest pig genome sequence, Sscrofa10.2. The best alignment of the respective query sequences, with a BLAST similarity score above 100 and identity above 90%, was anchored on the genome sequence. Overlapped alignments of cDNA sequences on the pig genome with opposite directions were regarded as different loci. The region of the locus was extended from the anchored alignment to both ends by using other alignments of the same query sequence meeting the following criteria: (1) the direction of the alignment was identical to that of the anchored alignments; and (2) the distance between the alignment from the region of the locus (which is a possible intron) was less than 1 Mb. The loci were regarded as identical if the locus regions after extension overlapped and were mapped in the same direction. Correspondence of the pig cDNA sequences to genes was investigated by BLAST similarity search using the mRNA or protein sequences in the NCBI RefSeq databases of humans, mice, cattle, dogs, and pigs. The similarity was considered as positive if the BLAST score was more than 50. The presence of a full-length CDS in each pig cDNA sequence was estimated by BLAST similarity analysis, using those protein sequences showing the highest similarity in the NCBI RefSeq database. Two sequences were aligned without any filtering and masking. If the cDNA sequence was aligned with the specified protein sequence trimmed at both ends by fewer than 10 amino acids, then we considered that the cDNA contained a full-length CDS. EST assemblies and cDNA clones were classified according to the Gene Ontology terms by using the ontology file. Classification according to Gene Ontology was conducted by using the similarity of cDNA clones to the mRNA sequences of human genes in the NCBI RefSeq and the correspondence between genes and Gene Ontology terms provided in NCBI Gene.
We thank Maiko Tanaka-Matsuda, Toshie Iioka, Takako Suzuki, and Ikuyo Nakamori (JATAFF) for their technical assistance. This work was supported by the Integrated Research Project for Insect and Animal Using Genome Technology from the Ministry of Agriculture, Forestry and Fisheries, Japan, and Grants-in-Aid from the Japan Racing Association.
- Dekkers JCM, Mathur PK, Knol EF: Genetic improvement of the pig. The Genetics of the Pig. Edited by: Rothschild MF, Ruvinsky A. 2011, CAB International, Wallingford, 390-425. 2View ArticleGoogle Scholar
- Vodicka P, Smetana K, Dvorankova B, Emerick T, Xu YZ, Ourednik J, Ourednik V, Motlik J: The miniature pig as an animal model in biomedical research. Ann N Y Acad Sci. 2005, 1049: 161-171. 10.1196/annals.1334.015.View ArticlePubMedGoogle Scholar
- Kuzmuk KN, Schook LB: Pigs as a model for biomedilcal sciences. The Genetics of the Pig. Edited by: Rothschild MF, Ruvinsky A. 2011, CAB International, Wallingford, 426-444. 2View ArticleGoogle Scholar
- Lunney JK: Advances in swine biomedical model genomics. Int J Biol Sci. 2007, 3 (3): 179-184.PubMed CentralView ArticlePubMedGoogle Scholar
- Archibald AL, Bolund L, Churcher C, Fredholm M, Groenen MA, Harlizius B, Lee KT, Milan D, Rogers J, Rothschild MF, et al: Pig genome sequence–analysis and publication strategy. BMC Genomics. 2010, 11: 438-10.1186/1471-2164-11-438.PubMed CentralView ArticlePubMedGoogle Scholar
- International Swine Genome Sequencing Consortium: Draft sequence of the pig genome. in preparation
- Schook LB, Beever JE, Rogers J, Humphray S, Archibald A, Chardon P, Milan D, Rohrer G, Eversole K: Swine Genome Sequencing Consortium (SGSC): a strategic roadmap for sequencing the pig genome. Comp Func Genomics. 2005, 6: 251-255. 10.1002/cfg.479.View ArticleGoogle Scholar
- Groenen MAM, Schook LB, Archibald AL: Pig genomics. The Genetics of the Pig. Edited by: Rothschild MF, Ruvinsky A. 2011, CAB International, Wallingford, 179-199. 2View ArticleGoogle Scholar
- Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, Hayashi K, Sato H, Nagai K, et al: Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet. 2004, 36 (1): 40-45. 10.1038/ng1285.View ArticlePubMedGoogle Scholar
- Immune Response Annotation Group: Structural and functional annotation of the porcine immunome. in preparation
- Loveland JE, Gilbert JG, Griffiths E, Harrow JL: Community gene annotation in practice. Database: the journal of biological databases and curation 2012. 2012, bas009-Google Scholar
- Searle SM, Gilbert J, Iyer V, Clamp M: The otter annotation system. Genome Res. 2004, 14 (5): 963-970. 10.1101/gr.1864804.PubMed CentralView ArticlePubMedGoogle Scholar
- Fahrenkrug SC, Smith TP, Freking BA, Cho J, White J, Vallet J, Wise T, Rohrer G, Pertea G, Sultana R, et al: Porcine gene discovery by normalized cDNA-library sequencing and EST cluster assembly. Mamm Genome. 2002, 13 (8): 475-478. 10.1007/s00335-001-2072-4.View ArticlePubMedGoogle Scholar
- Gorodkin J, Cirera S, Hedegaard J, Gilchrist MJ, Panitz F, Jorgensen C, Scheibye-Knudsen K, Arvin T, Lumholdt S, Sawera M, et al: Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags. Genome Biol. 2007, 8 (4): R45-10.1186/gb-2007-8-4-r45.PubMed CentralView ArticlePubMedGoogle Scholar
- Isom SC, Spollen WG, Blake SM, Bauer BK, Springer GK, Prather RS: Transcriptional profiling of day 12 porcine embryonic disc and trophectoderm samples using ultra-deep sequencing technologies. Mol Reprod Dev. 2010, 77 (9): 812-819. 10.1002/mrd.21226.View ArticlePubMedGoogle Scholar
- Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, et al: Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics. 2009, 10: 347-10.1186/1471-2164-10-347.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim TH, Kim NS, Lim D, Lee KT, Oh JH, Park HS, Jang GW, Kim HY, Jeon M, Choi BH, et al: Generation and analysis of large-scale expressed sequence tags (ESTs) from a full-length enriched cDNA library of porcine backfat tissue. BMC Genomics. 2006, 7: 36-10.1186/1471-2164-7-36.PubMed CentralView ArticlePubMedGoogle Scholar
- Uenishi H, Eguchi T, Suzuki K, Sawazaki T, Toki D, Shinkai H, Okumura N, Hamasima N, Awata T: PEDE (Pig EST Data Explorer): construction of a database for ESTs derived from porcine full-length cDNA libraries. Nucleic Acids Res. 2004, 32: D484-D488. 10.1093/nar/gkh037. (Database issue)PubMed CentralView ArticlePubMedGoogle Scholar
- Uenishi H, Eguchi-Ogawa T, Shinkai H, Okumura N, Suzuki K, Toki D, Hamasima N, Awata T: PEDE (Pig EST Data Explorer) has been expanded into Pig Expression Data Explorer, including 10 147 porcine full-length cDNA sequences. Nucleic Acids Res. 2007, 35: D650-D653. 10.1093/nar/gkl954. (Database issue)PubMed CentralView ArticlePubMedGoogle Scholar
- Kato S, Oshikawa M, Ohtoko K: Full-length transcriptome analysis using a bias-free cDNA library prepared with the vector-capping method. Methods Mol Biol. 2011, 729: 53-70. 10.1007/978-1-61779-065-2_4.View ArticlePubMedGoogle Scholar
- Suzuki Y, Yoshitomo-Nakagawa K, Maruyama K, Suyama A, Sugano S: Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA library. Gene. 1997, 200 (1–2): 149-156.View ArticlePubMedGoogle Scholar
- Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD: Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques. 2001, 30 (4): 892-897.PubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, 39: D38-D51. 10.1093/nar/gkq1172. (Database issue)PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Okayama H, Berg P: High-efficiency cloning of full-length cDNA. Mol Cell Biol. 1982, 2 (2): 161-170.PubMed CentralView ArticlePubMedGoogle Scholar
- Pig (Sus scrofa) - Pre-ensembl.http://http://pre.ensembl.org/Sus_scrofa/,
- Craig JM, Bickmore WA: Chromosome bands–flavours to savour. BioEssays. 1993, 15 (5): 349-354. 10.1002/bies.950150510.View ArticlePubMedGoogle Scholar
- Wiemann S, Weil B, Wellenreuther R, Gassenhuber J, Glassl S, Ansorge W, Bocher M, Blocker H, Bauersachs S, Blum H, et al: Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. 2001, 11 (3): 422-435. 10.1101/gr.GR1547R.PubMed CentralView ArticlePubMedGoogle Scholar
- Butterfield YS, Marra MA, Asano JK, Chan SY, Guin R, Krzywinski MI, Lee SS, MacDonald KW, Mathewson CA, Olson TE, et al: An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones. Nucleic Acids Res. 2002, 30 (11): 2460-2468. 10.1093/nar/30.11.2460.PubMed CentralView ArticlePubMedGoogle Scholar
- Morozumi T, Toki D, Eguchi-Ogawa T, Uenishi H: A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology. BioTechniques. 2011, 51 (3): 195-197.View ArticlePubMedGoogle Scholar
- Shevchenko Y, Bouffard GG, Butterfield YS, Blakesley RW, Hartley JL, Young AC, Marra MA, Jones SJ, Touchman JW, Green ED: Systematic sequencing of cDNA clones using the transposon Tn5. Nucleic Acids Res. 2002, 30 (11): 2469-2477. 10.1093/nar/30.11.2469.PubMed CentralView ArticlePubMedGoogle Scholar
- Farajollahi S, Maas S: Molecular diversity through RNA editing: a balancing act. Trends Genet. 2010, 26 (5): 221-230. 10.1016/j.tig.2010.02.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Nishikura K: Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem. 2010, 79: 321-349. 10.1146/annurev-biochem-060208-105251.PubMed CentralView ArticlePubMedGoogle Scholar
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl automatic gene annotation system. Genome Res. 2004, 14 (5): 942-950. 10.1101/gr.1858004.PubMed CentralView ArticlePubMedGoogle Scholar
- Potter SC, Clarke L, Curwen V, Keenan S, Mongin E, Searle SM, Stabenau A, Storey R, Clamp M: The Ensembl analysis pipeline. Genome Res. 2004, 14 (5): 934-941. 10.1101/gr.1859804.PubMed CentralView ArticlePubMedGoogle Scholar
- Ensembl gene annotation project - Sus scrofa (Pig).http://www.ensembl.org/info/docs/genebuild/2012_04_sus_scrofa_genebuild.pdf,
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563.View ArticlePubMedGoogle Scholar
- Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, et al: Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 2003, 100 (26): 15776-15781. 10.1073/pnas.2136655100.PubMed CentralView ArticlePubMedGoogle Scholar
- Dermitzakis ET, Clark AG: Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002, 19 (7): 1114-1121. 10.1093/oxfordjournals.molbev.a004169.View ArticlePubMedGoogle Scholar
- Ma W, Wong WH: The analysis of ChIP-Seq data. Methods Enzymol. 2011, 497: 51-73.View ArticlePubMedGoogle Scholar
- Matsumoto T, Okumura N, Uenishi H, Hayashi T, Hamasima N, Awata T: Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags. Anim Sci J. 2012, 83 (1): 14-22. 10.1111/j.1740-0929.2011.00920.x.View ArticlePubMedGoogle Scholar
- Kojima-Shibata C, Shinkai H, Morozumi T, Jozaki K, Toki D, Matsumoto T, Kadowaki H, Suzuki E, Uenishi H: Differences in distribution of single nucleotide polymorphisms among intracellular pattern recognition receptors in pigs. Immunogenetics. 2009, 61 (2): 153-160. 10.1007/s00251-008-0350-y.View ArticlePubMedGoogle Scholar
- Morozumi T, Uenishi H: Polymorphism distribution and structural conservation in RNA-sensing Toll-like receptors 3, 7, and 8 in pigs. Biochim Biophys Acta. 2009, 1790 (4): 267-274. 10.1016/j.bbagen.2009.01.002.View ArticlePubMedGoogle Scholar
- Shinkai H, Tanaka M, Morozumi T, Eguchi-Ogawa T, Okumura N, Muneta Y, Awata T, Uenishi H: Biased distribution of single nucleotide polymorphisms (SNPs) in porcine Toll-like receptor 1 (TLR1), TLR2, TLR4, TLR5, and TLR6 genes. Immunogenetics. 2006, 58 (4): 324-330. 10.1007/s00251-005-0068-z.View ArticlePubMedGoogle Scholar
- Uenishi H, Shinkai H: Porcine Toll-like receptors: the front line of pathogen monitoring and possible implications for disease resistance. Dev Comp Immunol. 2009, 33 (3): 353-361. 10.1016/j.dci.2008.06.001.View ArticlePubMedGoogle Scholar
- Jozaki K, Shinkai H, Tanaka-Matsuda M, Morozumi T, Matsumoto T, Toki D, Okumura N, Eguchi-Ogawa T, Kojima-Shibata C, Kadowaki H, et al: Influence of polymorphisms in porcine NOD2 on ligand recognition. Mol Immunol. 2009, 47 (2–3): 247-252.View ArticlePubMedGoogle Scholar
- Shinkai H, Okumura N, Suzuki R, Muneta Y, Uenishi H: Toll-Like receptor 4 polymorphism impairing lipopolysaccharide signaling in Sus scrofa, and its restricted distribution among Japanese wild boar populations. DNA Cell Biol. 2012, 31 (4): 575-581. 10.1089/dna.2011.1319.View ArticlePubMedGoogle Scholar
- Shinkai H, Suzuki R, Akiba M, Okumura N, Uenishi H: Porcine Toll-like receptors: recognition of Salmonella enterica serovar Choleraesuis and influence of polymorphisms. Mol Immunol. 2011, 48 (9–10): 1114-1120.View ArticlePubMedGoogle Scholar
- Rothschild MF, Hu ZL, Jiang Z: Advances in QTL mapping in pigs. Int J Biol Sci. 2007, 3 (3): 192-197.PubMed CentralView ArticlePubMedGoogle Scholar
- Matsumoto T, Nakajima I, Eguchi-Ogawa T, Nagamura Y, Hamasima N, Uenishi H: Changes in gene expression in a porcine preadipocyte cell line during differentiation. Anim Genet. in press
- Kano M, Toyoshi T, Iwasaki S, Kato M, Shimizu M, Ota T: QT PRODACT: usability of miniature pigs in safety pharmacology studies: assessment for drug-induced QT interval prolongation. J Pharmacol Sci. 2005, 99 (5): 501-511. 10.1254/jphs.QT-C13.View ArticlePubMedGoogle Scholar
- Hartley JL, Temple GF, Brasch MA: DNA cloning using in vitro site-specific recombination. Genome Res. 2000, 10 (11): 1788-1795. 10.1101/gr.143000.PubMed CentralView ArticlePubMedGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185.View ArticlePubMedGoogle Scholar
- RepeatMasker Open-3.0.http://www.repeatmasker.org/,
- Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 16 (9): 418-420. 10.1016/S0168-9525(00)02093-X.View ArticlePubMedGoogle Scholar
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19 (5): 651-652. 10.1093/bioinformatics/btg034.View ArticlePubMedGoogle Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8 (3): 195-202.View ArticlePubMedGoogle Scholar
- Devine SE, Chissoe SL, Eby Y, Wilson RK, Boeke JD: A transposon-based strategy for sequencing repetitive DNA in eukaryotic genomes. Genome Res. 1997, 7 (5): 551-563.PubMed CentralPubMedGoogle Scholar
- Ontology file downloads.http://www.geneontology.org/GO.downloads.ontology.shtml,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.