From: Disclosing ambiguous gene aliases by automatic literature profiling
Initial dataset | Dataset with PubMed abstracts | Dataset fulfilling the algorithm’s requirements* | Final dataset (ambiguous aliases excluded) | |
---|---|---|---|---|
EntrezGene official symbols | 100 | 73 | 68** | 68 |
Aliases | 425 | 256 | 223 | 165 |
Abstracts in text corpus | - | 13355 | 12088 | 9005 |
Unique PubMed IDs in text corpus | - | 11022 | 10312 | 7523 |
Redundancy in text corpus (%) | - | 21 | 16.6 | 19.7 |