GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

Table 1 Genotype mismatches between the GWAS and 1000 genomes datasets

Study GWAS	1000 genomes	Types	Incorrect conversions		Correct conversion
			Fwd-plus	Top-plus	Plus-plus
T/C	C/T	FLIP	0	0	0
T/C	A/G	CSF	5,048	9,875	301
T/C	G/A	FLIP & CSF	8,556	27,648	1,840
T/A	/	AMBIG	432	432	432
/	−/−	NAR	3,344	3,344	3,344
Matches (%)			62,793 (78.3) (81.7)^†	38,875 (48.5)	74,256 (92.6) (96.7)^†

FLIP: switch both alleles with one another (from A1 to A2 and vice versa).
CSF: complimentary strand flip.
AMBIG: ambiguous SNPs in study GWAS.
NAR: not available in the reference.
*/*: any genotype.
−/−: missing genotype.
Fwd: Forward/Reverse.
Top: TOP/BOT.
Plus: Plus (+)/Minus (−).
^†, percentages of matched genotypes after excluding the NAR genotype counts.
Both the “GWAS” (the 3,096 Ashkenazi Jewish samples) and “1000 Genome” columns show the example alleles in the A1/A2 order. The “Type” column indicates the changes required to match the study SNP to the reference. The last three columns refer to numbers of genotype mismatches on chromosome 1 (80,173 SNPs in total). The “Fwd-Plus” and “Top-Plus” columns show the numbers of genotype mismatches between the “Fwd” and “Top” definitions of our GWAS data (we first generated two versions of the same GWAS data: “Fwd” and “Top”) and the “Plus” definition of the 1000 Genome data, respectively, while the “Plus” column refers to the numbers after we converted the GWAS data to “Plus” using GACT. The last row shows the numbers (percentages) of correct genotype matches (e.g., “T/C” and “T/C”) between the GWAS and 1000 Genome data, where the (%) and (%) ^†represent the percentages measured by including and excluding the SNPs (NAR) unique to our GWAS data, respectively. Similar ratios were observed in other chromosomes.

ISSN: 1471-2164