Skip to main content

Table 1 File formats used in genomics

From: Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data

Data type

Format

Implementation

Feature annotations (e.g. genes, transcripts, exons, origins of replication)

BED, extended BED*

Plastid

BigBed

Plastid + kentUtils [46]

GTF2*

Plastid

GFF3*

Plastid

PSL*

Plastid

Read alignments

bowtie

Plastid

BAM

Plastid + Pysam [27]

Reduced count data

bedGraph

Plastid

BigWig

Plastid + kentUtils [46]

wiggle (fixedStep)

Plastid

wiggle (variableStep)

Plastid

Sequence

FASTA

via Biopython [20]

twobit

via twobitreader [21]

  1. For each category of genomics data, many file formats exist. Plastid includes readers for each format that standardize the representation of data for each type, so that the meaning of each data type is separated from its format on disk. *tabix compression for these formats is supported via Pysam [27]