Table 1 Real genomic datasets tested in this study

From: HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers

Category Selfgroup Closegroup No. of genes Date of BLAST Max. no. of hits List of input genomes (organism name and NCBI accn. no.)
Alphaproteobacteria SFG Rickettsia Rickettsiales 8484 Jan. 2013 200 R. akari str. Hartford [GenBank:NC_009881]
R. felis URRWXCal2 [GenBank:NC_007109]
R. massiliae MTU5 [GenBank:NC_009900]
R. slovaca 13-B [GenBank:CP002428]
R. rickettsii str. ‘Sheila Smith’ [GenBank:NC_009882]
R. africae ESF-5 [GenBank:NC_012633]
R. conorii str. Malish 7 [GenBank:NC_003103]1
Firmicutes Streptococcus Bacilli 11906 Nov. 2013 100 S. anginosus C238 [GenBank:NC_022239]
S. gallolyticus UCN34 [GenBank:NC_013798]
S. intermedius B196 [GenBank:NC_022246]
S. mutans LJ23 [GenBank:NC_017768]
S. pneumonia A026 [GenBank:NC_022655]
S. suis JS14 [GenBank:NC_017618]
Epsilonproteobacteria Helicobacter Campylobacterales 10531 Mar. 2013, Nov. 20132 200 H. acinonychis Sheeba [GenBank:NC_008229]
H. bizzozeronii CIII-1 [GenBank:NC_015674]
H. cinaedi PAGU611 [GenBank:NC_017761]
H. felis ATCC 49179 [GenBank:NC_014810]
H. mustelae 12198 [GenBank:NC_013949]
H. hepaticus ATCC 51449 [GenBank:NC_004917]
Gammaproteobacteria Erwinia Enterobacteriales 19013 Mar. 2013 200 E. amylovora ATCC 49946 [GenBank:NC_013971]
E. billingiae Eb661 [GenBank:NC_014306]
E. sp. Ejp617 [GenBank:NC_017445]
E. pyrifoliae DSM 12163 [GenBank:NC_017390]
E. tasmaniensis Et1/99 [GenBank:NC_010694]
Actinobacteria Mycobacterium africanum Mycobacterium 3830 Oct. 2013 100 M. africanum GM041182 [GenBank:NC_015758]
Unicellular red algae Galdieria sulphuraria Eukaryota 7174 Dec. 2013 50 G. sulphuraria [GenBank:ASM34128v1]3
Higher animal Homo sapiens Animalia 225164 Nov. 2013 1000 H. sapiens [GenBank:GCF_000001405.13]
  1. 1The genomes used in this study are identical to those used in [37].
  2. 2Two independent analyses were conducted on different dates, and similar outcomes were obtained. The more recent result was reported.
  3. 3The genome used in this study is identical to that used in [79].
  4. 4For genes with multiple isoforms, the longest CDS was extracted using an in-house Perl script and used for the analysis.