Identification of the references and reference characteristics
To date, data entry of references from 1998 through 2001 has been completed and is now in its second iteration. A series of charts describing the dataset are available at http://www.prostategenomics.org/datamining/chrom-sorter_pc/summaries.html. Graphical summaries of literature citations across four categories; Ethnicity, Age, Method and Chromosome are summarized in two ways: first by merely counting the number of times a region is identified, and second by adding the citation index score to determine the relative "significance" or importance of the region. To view the results on the web, the user chooses two categories from the menu and clicks the "Show Result" button. The following is a brief description of all the charts currently available online.
Identification of chromosomes implicated by multiple experimental methods
As evident from the reference count chart, chromosome 8 has the most references and citations, followed by chromosome 1 (figures 2, 3). Chromosome 7 has the 3rd highest reference count, followed by chromosomes 10, 16, 13 and Y respectively. In the citation index chart, it is the 18 chromosome that has the 3rd highest value, followed by chromosomes 13, 16, and 10 respectively.
Prostate cancer chromosomal regions based on ethnicity
Caucasians are by far the most analyzed ethnic group with respect to prostate cancer, followed by Scandinavians and African Americans (figures 4, 5). Japanese patients were the fourth most studied ethnic group, followed distantly by Ashkenazi Jews and Asian/Pacific Islanders. Results for both publications and citations mirrored each other, with Caucasians having a much more significant citation index score than reference count.
The general results are similar when the data is analyzed with respect to ethnicity, where chromosome 1 seems to have both the highest reference count and combined citation index score. In both charts, chromosome 8 is second, but the difference between first and second is much more apparent in the combined citation index score chart. In both charts, chromosomes 1 and 8 are clearly the most studied, and the remaining chromosomes have comparatively low counts or scores. Caucasians are the most common ethnic group studied, and most have an association with chromosome 1. Scandinavians, a subgroup of Caucasians, also have a higher association with chromosome 1. Ashkenazi Jews, another Caucasian subgroup, had an equal number of citations on chromosomes 1 and 8, but chromosome 8 had the highest citation index score. African Americans are studied in references related to chromosomes 1, 4, 5, 8, 13, 16, 20 and X, but it was on chromosome 1 that this group of patients have the highest reference count and chromosome 8 with the highest citation index score. African Americans also had a relatively high combined citation index score on chromosome 5. Japanese patients also had their highest reference counts at chromosome 8, but had their highest citation index scores at chromosome 18. This group had references at chromosomes 7, 8, 9, 13, 17, 18, and Y. The Asian/Pacific Islanders only had one reference at chromosome 20 and Y. Interestingly, these were not chromosomes associated with Japanese patients in this dataset.
Age related chromosomal regions in prostate cancer
In studies of prostate cancer age is often identified as an important associated feature. For this reason we examined the various research articles for indications of age related findings, which were present in 185 referenced data entries. In each case where age demographics were supplied and associated with a specific chromosomal region these results were recorded. Because this resulted in a broad and highly variable grouping of ages we sought to group ages for ease of visualization of the data, and arbitrarily assigned samples to four age categories (> 59 years, 59–65 years, 65–70 years, > 70 years). Using these categories all of the references with age related data could be placed in a specific age category.
Chromosome 8 had the highest number of age-related references, followed closely by chromosome 1 (figure 6). Chromosome Y had the third highest number of age-related references along with the X chromosome. On the combined citation index score chart, we find that chromosome 1 has the highest age-related score, meaning the largest number of references that studied specific age groups (figure 7). This was followed by chromosome 8 and chromosome X.
Both charts demonstrate that patients around 65 years of age had the highest reference count and combined citation index score, most of which were on chromosome 1, followed by chromosome 8. Patients under 65 had the second highest reference count and combined index score. The majority of their associations were again at chromosome 8 and closely followed by chromosome 1 on both charts. Patients under 60 came in at a distant third place, with an equal reference count at chromosomes 13 and Y, closely followed by chromosome 1. Patients between 66 and 70 years of age (subgroup 65–70) had references on chromosomes 8, 20 and X. The differences on the citation index score showed that the X chromosome had the highest score, distantly followed by chromosomes 1 and 20. Patients over 70 years of age had the most references at chromosome 7 and 8, followed closely by chromosome 10. Their highest combined citation index score however was at chromosome 8, followed by chromosomes 20 and 7 respectively. Caucasian patients around 65 years of age are the most studied ethnic group in the dataset. Caucasians under 65 years of age have the second highest reference count, but only the fourth highest citation index score. Caucasians under age 60 have the second highest citation index score, followed by Scandinavians over age 70. The most studied African Americans were under 65. The most studied Japanese patients were under 72. The most studied Ashkenazi Jewish patients were over 65.
Method by reference count and combined citation index score
This chart indicates the total number of references, positive or negative, associated with the nine standardized experimental methods of analysis (figure 8). Comparative Genomic Hybridization (CGH), seems to be the favored method in our dataset. In-Situ Hybridization (ISH/FISH) is the second most popular method. Loss of Heterogeneity (LOH) methods follow closely behind with the third highest number of references. Familial mapping with the fourth highest combined citation index score. Karyotyping is fifth.