Skip to main content

Advertisement

Table 1 File formats used in genomics

From: Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data

Data type Format Implementation
Feature annotations (e.g. genes, transcripts, exons, origins of replication) BED, extended BED* Plastid
BigBed Plastid + kentUtils [46]
GTF2* Plastid
GFF3* Plastid
PSL* Plastid
Read alignments bowtie Plastid
BAM Plastid + Pysam [27]
Reduced count data bedGraph Plastid
BigWig Plastid + kentUtils [46]
wiggle (fixedStep) Plastid
wiggle (variableStep) Plastid
Sequence FASTA via Biopython [20]
twobit via twobitreader [21]
  1. For each category of genomics data, many file formats exist. Plastid includes readers for each format that standardize the representation of data for each type, so that the meaning of each data type is separated from its format on disk. *tabix compression for these formats is supported via Pysam [27]