Skip to main content

Table 7 Estimating annotation quality using protein features and sequence data

From: Comparison of RefSeq protein-coding regions in human and vertebrate genomes

Organism

Domain score outlier

+,-,truncated domain

Length outlier

Conserved downstream Methionine

No protein in main cluster

Negative sum-of-logs score

Core proteins

*human

0.000

0.021

0.017

0.028

0.012

10.256

7419

*housemouse

0.002

0.033

0.036

0.051

0.049

8.135

6811

*dog

0.005

0.068

0.115

0.051

0.030

7.268

6495

*African savanna elephant

0.006

0.096

0.163

0.038

0.042

6.816

5685

*domestic guinea pig

0.005

0.093

0.158

0.047

0.045

6.812

5699

*cow

0.008

0.082

0.101

0.041

0.060

6.795

6394

pygmy chimpanzee

0.003

0.100

0.146

0.081

0.044

6.764

6584

*horse

0.006

0.105

0.149

0.054

0.042

6.699

5838

rabbit

0.004

0.122

0.189

0.050

0.052

6.646

4159

chimpanzee

0.005

0.108

0.138

0.068

0.048

6.600

6455

*norway rat

0.008

0.077

0.084

0.061

0.080

6.600

5851

giant panda

0.005

0.139

0.224

0.063

0.035

6.501

5765

olive baboon

0.006

0.115

0.156

0.077

0.048

6.416

5872

*bolivian squirrel monkey

0.007

0.108

0.152

0.086

0.053

6.284

6035

white-tufted-ear marmoset

0.005

0.126

0.207

0.083

0.055

6.197

5777

domestic cat

0.005

0.193

0.254

0.070

0.041

6.178

6007

northern white-cheeked gibbon

0.006

0.148

0.195

0.077

0.056

6.151

5730

western gorilla

0.007

0.167

0.197

0.078

0.057

5.958

5722

small-eared galago

0.012

0.138

0.197

0.076

0.046

5.955

6128

gray short-tailed opossum

0.006

0.214

0.370

0.058

0.041

5.924

4500

chinese hamster

0.006

0.190

0.265

0.066

0.061

5.892

4901

sheep

0.007

0.232

0.292

0.070

0.041

5.871

5339

*chicken

0.013

0.175

0.228

0.072

0.051

5.729

3354

rhesus macaque

0.011

0.169

0.216

0.083

0.072

5.639

5466

pig

0.025

0.192

0.230

0.061

0.042

5.551

4616

sumatran orangutan

0.014

0.169

0.222

0.081

0.071

5.509

5050

*green anole

0.018

0.215

0.296

0.068

0.048

5.431

2797

nile tilapia

0.011

0.392

0.583

0.071

0.061

4.961

1296

tasmanian devil

0.016

0.337

0.478

0.082

0.054

4.933

3457

*zebrafish

0.025

0.344

0.464

0.062

0.068

4.773

1210

torafugu

0.017

0.448

0.601

0.060

0.063

4.765

1130

turkey

0.023

0.425

0.520

0.077

0.055

4.659

1821

western clawed frog

0.023

0.394

0.484

0.073

0.070

4.644

2268

platypus

0.031

0.435

0.586

0.068

0.067

4.451

1439

Average

0.010

0.187

0.256

0.066

0.052

6.111

4796

Standard deviation

0.008

0.120

0.160

0.014

0.014

1.114

1826

Correlation Contig N50

−0.290

−0.337

−0.362

−0.489

−0.434

0.719

0.319

Correlation EST count

−0.164

−0.299

−0.342

−0.555

−0.395

0.684

0.263

  1. Columns 2–7 provide fraction of genes from each species in the orthology dataset and lacking conserved protein; outlier domain score; missing, extra, or truncated domain; outlier length; conserved downstream or upstream Met; and proteins not found in largest cluster. Column 8 provides the negative sum of logs score. The last column provides the number of genes in each species that are members of a “core” set defined as ortholog sets with members from at least 17 species that each have conserved splicing to at least 3 proteins from reference species (reference species are marked with *.)