Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence

Table 2 Correspondence to mammalian genes and estimated efficiencies of cloning of start codons of EST assemblies

	Unique Gene ID (without HomoloGene ID)	Unique HomoloGene ID	Assemblies matched to protein sequences			Assemblies estimated to include start codons
	Unique Gene ID (without HomoloGene ID)	Unique HomoloGene ID		Contigs	Singlets		Contigs	Singlets
Human	13,691 (754)	12,911	64,011	12,056	51,955	47,229	9,635	37,594
Mouse	12,955 (730)	12,137	63,444	12,028	51,416	45,539	9,588	35,951
Cattle	13,445 (1935)	11,341	63,718	12,035	51,683	47,118	9,634	37,484
Dog	12,293 (763)	11,410	62,815	11,871	50,944	37,193	8,090	29,103
Pig	14,275		63,169	11,917	51,252	46,063	9,396	36,667

Numbers of genes that had unique NCBI Gene IDs and corresponded to contigs and singlets generated by assembly of expressed sequence tags (ESTs) are indicated. Also shown are the numbers that had unique Gene IDs in the NCBI HomoloGene database (a database of orthologs among species) and corresponded to the contigs and singlets generated. Numbers in parentheses indicate numbers of gene IDs that had no corresponding HomoloGene IDs. HomoloGene IDs in pigs are not indicated, because there is no HomoloGene ID database for pig genes.
EST assemblies were estimated to contain start codons if the length upstream of the matches (BLAST score >50) in the assemblies was greater than that between the start base of the coding sequence and the matched region of the corresponding gene. Numbers of assemblies (contigs and singlets) corresponding to protein sequences in humans, mice, cattle, dogs, and pigs are also shown.

ISSN: 1471-2164