Pig genome sequence - analysis and publication strategy
© Archibald et al; licensee BioMed Central Ltd. 2010
Received: 15 April 2010
Accepted: 19 July 2010
Published: 19 July 2010
The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing.
Assemblies of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30× genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication.
In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.
The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium . A Data Release Workshop convened in Toronto in May 2009 by Genome Canada and other funding agencies affirmed and extended the commitments to prepublication release of large data sets in the life sciences which were originally developed in the context of the Human Genome Project. The Toronto Statement  places obligations on the producers of such data sets, including genome sequence data, in respect of prepublication release of the data and confirms the principle that allows the data producers to publish the first global analyses of the data set. The data producers are encouraged to produce a citable statement or "marker paper" in which they describe the data set and their intentions in respect of analysis and publication. In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results. These plans were presented to participants in the Pig Genome III conference held at the Wellcome Trust Sanger Institute, 2-4 November 2009.
Pig genome sequence data
The sequence data from which a draft pig genome sequence will be assembled comprises hierarchical shotgun sequence data providing 4-6× genome coverage from BAC clones representing a minimal tile path across the genome plus > 30× genome coverage in whole genome shotgun sequence (WGS) data generated using Sanger (capillary) and next-gen (Illumina) technologies. The minimal tile path was identified from a high quality physical (BAC contig) map  and provides coverage of 98.3% of this physical map. As at 5th July 2010 the total length of the BAC-derived sequence contigs, prior to the removal of sequence redundancy between overlapping BAC clones, was 3.01 Gbp of which 156.3 Mbp was at finished quality. These sequence data were generated from 16,707 BAC clones of which 15,895 have been subjected to one round of automated pre-finishing.
Prepublication data release
In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement  the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. Assemblies of the genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. The current assembly (Sscrofa9) was constructed entirely from the BAC-derived sequence data.
Swine Genome Sequencing Consortium genome sequence analysis groups
The target for the next assembly is to incorporate all the available sequence data for Duroc 2-14, including BAC clones sequences, WGS Sanger and next-generation short sequence reads. Contig and scaffold order and orientation will be tested against other genome maps and in particular the high resolution radiation hybrid maps.
Structural variation, segmental duplication, copy number variation
The reference genome sequence will be analysed for evidence of segmental duplications. Comparative Genomic Hybridisation data, paired-end and mate-pair re-sequence data from other pigs will be used to identify smtructural and copy number variation.
Repetitive DNA, transposable elements Speciation, wild and related suids and selection
Retroviruses and related repetitive sequences in Sus scrofa and related species will be characterized. Sequence and 60 K SNP genotype data from wild boar and related species will be explored to address the origins of domestic pigs. Comparative sequence analyses of domesticated and wild boar genome sequences is expected to reveal signatures of artificial and natural selection.
Natural and artificial selection will have shaped the pig genome sequence. Comparison of the pig genome sequence with the sequences of other mammals is expected to reveal genes that are evolving more rapidly in the pig and artiodactyl lineages.
Genome rearrangements and conserved synteny compared to other suids and other mammals.
RNA-seq data from a range of tissues from Duroc 2-14 or her clones will be analysed to identify genes that show differential allelic expression and potentially imprinted genes.
Re-sequence data and the WGS sequence data from Duroc 2-14 will be examined for putative SNPs and small indels, including those for which Duroc 2-14 is heterozygous.
The genome sequence will be explored for putative ncRNA sequences and microRNA encoding loci.
The Ensembl automated pipeline will be used to establish a Gene Build for the pig genome that will be compared with builds generated by other systems including NCBI.
Development of a proteome will be initiated.
The immune gene analysis group will manually annotate pig genes predicted/known to have roles in the immune system. The repertoire of pig immune genes will be examined for evidence of pig-lineage specific features.
The reproduction gene analysis group will manually annotate pig genes predicted/known to have roles in reproductive functions and seek to identify pig-lineage specific features.
The obesity gene analysis group will manually annotate pig genes predicted/known to have roles in obesity and seek to identify pig-lineage specific features
Approximately 5% of the genes in the Sscrofa9 Gene Build are predicted to have olfactory functions. These genes will be manually annotated and examined for pig-specific characteristics. In addition, the neuropeptide and prohormone gene families will be annotated.
The pig research community is engaged in efforts to manually Annotate genes identified/predicted by the Ensembl analysis pipeline. The otterlace system will be used to enable this community annotation activity.
The use of genomic information to enhance the utilization of the pig in xenotransplantation and as a model for cardiovascular, cancer and obesity will be addressed. How genomic information supports the further development of transgenic pigs for creating essential animal models will also be discussed.
The Swine Genome Sequencing  and Swine HAPMAP  consortia respectively propose to develop two summary papers for publication describing a) the sequencing and analysis of the pig genome and b) genetic variation and haplotype structures across a range of pig breeds and related Sus species. In addition, the consortia propose to develop a series of companion papers describing either the results from the analysis groups and/or results from other research projects that have been enabled by the publication of a draft sequence of the pig genome. The consortia would be pleased to hear from research groups with plans for manuscripts that could be included within the list of companion papers. Please address correspondence to either Alan Archibald firstname.lastname@example.org or Larry Schook email@example.com.
The value of the pig genome sequence lies not only in shaping the continued use of pigs in agriculture and medical research but also in the realm of evolution and domestication (natural and artificial selection) . The pig is an economically important species not only as a major source of meat-based protein but also increasingly as a model for biomedical research. For example, the pig has value as a model of a spectrum of human diseases that may be modelled less well in rodents, including obesity, arthritis and cardiovascular disease.
The domestic pig (Sus scrofa) is a eutherian mammal and a member of the Cetartiodactyla order, a clade distinct from rodent and primates that last shared a common ancestor with humans between 79 and 87 million years ago. The domestic pig belongs to the Suidea family that consists of multiple species, all found in Asia, Europe and Africa. The availability of this wide variety of pig species that diverged over a period of around 2 to 15 million years provides a rich resource to study genomic changes in relation to speciation. A well characterised pig genome sequence forms a template for the study of within and between species genetic variation. Our analysis of the pig genome sequence will be set in the context of parallel research on the genomes of closely related and contemporary Suids (e.g. Sus verrocus, Sus celebensis and Sus barbatus) and on within breed genetic variation using the 60 K pig SNP chip  and by re-sequencing.
The pig genome sequencing project has been conducted in an open international collaborative manner in the spirit of the Bermuda and Fort Lauderdale agreements. In accordance with the more recent Toronto Statement the sequence data have been released in advance of publication. In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.
The Swine Genome Sequencing Consortium is grateful to the following for funding support for the pig genome sequencing project: the USDA National Institute of Food and Agriculture, formerly the Cooperative State Research, Education and Extension Service; the Agence Nationale de la Recherche; European Union SABRE; the Institute for Pig Genetics, Netherlands; INRA Genescope, France; Iowa Pork Producers Association; Iowa State University; Korean National Livestock Research Institute; National Institute of Agrobiological Sciences, Japan; National Pork Board, U.S.; North Carolina Pork Council; North Carolina Agricultural Research Service; North Carolina State University; the University of Illinois; the "Pigs and Health" programme of the Danish Advanced Technology Foundation, Denmark; the Wellcome Trust Sanger Institute; The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, U.K.; the University of Illinois Livestock Genome Sequencing Initiative.
- Schook LB, Beever JE, Rogers J, Humphray S, Archibald A, Chardon P, Milan D, Rohrer G, Eversole K: Swine Genome Sequencing Consortium (SGSC): a strategic roadmap for sequencing the pig genome. Comp Funct Genom. 2005, 6: 251-255. 10.1002/cfg.479.View ArticleGoogle Scholar
- Toronto International Data Release Workshop Authors: Prepublication data release. Nature. 2009, 461: 168-70. 10.1038/461168a.View ArticleGoogle Scholar
- Humphray SJ, Scott CE, Clark R, Marron B, Plumb R, Bender C, Camm N, Davis J, Jenks A, Noon A, Patel M, Sehra H, Yang F, Rogatcheva MB, Milan D, Chardon P, Rohrer G, Nonneman D, de Jong P, Meyers SN, Archibald A, Beever JE, Schook LB, Rogers J: A high utility integrated map of the pig genome. Genome Biol. 2007, 8 (7): R139-10.1186/gb-2007-8-7-r139.PubMed CentralPubMedView ArticleGoogle Scholar
- Ramos AM, Crooijmans RPMA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Dehais P, Affara NA, Hansen MS, Hedegaard J, Hu Z-L, Kerstens HH, Law AS, Megens HJ, Milan D, Nonneman DJ, Rohrer GA, Rothschild MF, Smith TPL, Schnabel RD, Van Tassell CP, Clark R, Churcher C, Taylor JF, Wiedmann RT, Schook LB, Groenen MAM: Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS ONE. 4 (8): e6524-10.1371/journal.pone.0006524.
- Rohrer G, Beever JE, Rothschild MF, Schook L, Gibbs R, Weinstock G: Porcine genomic sequencing initiative. (NIH White Paper). 2002, [http://www.animalgenome.org/pigs/community/PigWhitePaper/]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.