Evaluating annotations of an Agilent expression chip suggests that many features cannot be interpreted

Gertz, E Michael; Sengupta, Kundan; Difilippantonio, Michael J; Ried, Thomas; Schäffer, Alejandro A

doi:10.1186/1471-2164-10-566

Issues about annotations and probe content.

Stu Matlow, Agilent Technologies

8 December 2009

The article “Evaluating Annotations of an Agilent Expression Chip Suggests that Many Features Cannot Be Interpreted” by Gertz et al, raises important issues about annotations and probe content, relevant to all microarray platforms. An important lesson from this exercise is that although annotations can change based on updates to the reference databases, probe IDs and sequences remain a constant for any array design. When researchers publish on significant genes, it is important that they not limit their identification to the genes themselves, but include in the publication the probe ID or probe sequences used to measure the genes.
The article evaluated the Agilent Whole Human Genome Microarray with the conclusion that a portion of the content on the microarray does not provide useful information. The authors evaluated Agilent annotations, but focused on their own annotations. Importantly, the annotation file provided by Agilent is regularly updated (about every 9 months), reflecting the rapid changes in understanding of the genomic landscape. The version used for this study was two years out-of-date, and the most recent version was released in April 2009. Meanwhile, the annotations performed by the authors relied on certain assumptions that were not shared during probe selection.
Key assumptions applied during the authors’ annotations were that probe validity relates specifically to its being associated with RefSeq RNA transcripts and that probe sequences should align back to the human reference genome. In fact, Agilent utilizes RefSeq, Ensembl, Unigene , EST databases, and TIGR during array design and annotation; and Agilent gene expression arrays are transcript-based rather than genome-based. Therefore, by intent, probes align to annotated transcripts that may differ from RefSeq or the human reference genome. Based on current annotations, 92.6% of array probes map to transcripts from the above-mentioned databases. Additionally, Agilent performs sequence mappings using BLAT, wherein the probes and the transcripts are both searched against the same version of the UCSC genome. In contrast, the authors used a version of transcript sequence inconsistent with the version of the genome they used, further complicating interpretation of their results.
The paper highlights an important consideration, which is that older content on microarrays can include out-dated probes which were designed to genes that were later removed from the databases. This issue is not platform-specific, but rather one that reflects the need to update both microarray content and annotations on a regular basis. Agilent has always performed regular updates to the annotation files for all gene expression arrays so that any changes in the databases will be reflected in the newest annotation file; in addition, Agilent is preparing the launch of a new Gene Expression Microarray based on the most current transcript databases.
This paper serves as a strong reminder that, as the definition of the transcriptome continues to evolve, publications citing discoveries involving genes and/or transcripts should include specific details in regards to the probes or sequences used to identify the discoveries. Using a gene name alone, or even a transcript name, may prove to be misleading to researchers if that gene or transcript identification changes.
--Vinayak Kulkarni, Ph.D, Informatics Scientist, Agilent Technologies, Inc.
--Sharoni Jacobs, Ph.D, Applications/Workflow Manager, Genomics, Agilent Technologies, Inc.

Competing interests

We work for Agilent Technologies, Inc.

Issues about annotations and probe content.

Stu Matlow, Agilent Technologies

8 December 2009

The article “Evaluating Annotations of an Agilent Expression Chip Suggests that Many Features Cannot Be Interpreted” by Gertz et al, raises important issues about annotations and probe content, relevant to all microarray platforms. An important lesson from this exercise is that although annotations can change based on updates to the reference databases, probe IDs and sequences remain a constant for any array design. When researchers publish on significant genes, it is important that they not limit their identification to the genes themselves, but include in the publication the probe ID or probe sequences used to measure the genes.
The article evaluated the Agilent Whole Human Genome Microarray with the conclusion that a portion of the content on the microarray does not provide useful information. The authors evaluated Agilent annotations, but focused on their own annotations. Importantly, the annotation file provided by Agilent is regularly updated (about every 9 months), reflecting the rapid changes in understanding of the genomic landscape. The version used for this study was two years out-of-date, and the most recent version was released in April 2009. Meanwhile, the annotations performed by the authors relied on certain assumptions that were not shared during probe selection.
Key assumptions applied during the authors’ annotations were that probe validity relates specifically to its being associated with RefSeq RNA transcripts and that probe sequences should align back to the human reference genome. In fact, Agilent utilizes RefSeq, Ensembl, Unigene , EST databases, and TIGR during array design and annotation; and Agilent gene expression arrays are transcript-based rather than genome-based. Therefore, by intent, probes align to annotated transcripts that may differ from RefSeq or the human reference genome. Based on current annotations, 92.6% of array probes map to transcripts from the above-mentioned databases. Additionally, Agilent performs sequence mappings using BLAT, wherein the probes and the transcripts are both searched against the same version of the UCSC genome. In contrast, the authors used a version of transcript sequence inconsistent with the version of the genome they used, further complicating interpretation of their results.
The paper highlights an important consideration, which is that older content on microarrays can include out-dated probes which were designed to genes that were later removed from the databases. This issue is not platform-specific, but rather one that reflects the need to update both microarray content and annotations on a regular basis. Agilent has always performed regular updates to the annotation files for all gene expression arrays so that any changes in the databases will be reflected in the newest annotation file; in addition, Agilent is preparing the launch of a new Gene Expression Microarray based on the most current transcript databases.
This paper serves as a strong reminder that, as the definition of the transcriptome continues to evolve, publications citing discoveries involving genes and/or transcripts should include specific details in regards to the probes or sequences used to identify the discoveries. Using a gene name alone, or even a transcript name, may prove to be misleading to researchers if that gene or transcript identification changes.
--Vinayak Kulkarni, Ph.D, Informatics Scientist, Agilent Technologies, Inc.
--Sharoni Jacobs, Ph.D, Applications/Workflow Manager, Genomics, Agilent Technologies, Inc.

Competing interests

We work for Agilent Technologies, Inc.

Archived Comments for: Evaluating annotations of an Agilent expression chip suggests that many features cannot be interpreted

Issues about annotations and probe content.

Competing interests

BMC Genomics

Contact us