Skip to main content
Call us : +2348178812480 E-mail : elearning@newgateuniversityminna.edu.ng
Site-wide search Close
Toggle search input
You are currently using guest access
Log in
Newgate University Minna - Elearning Platform
Home Calendar
Newgate University Minna - Elearning Platform
  • Home
  • Calendar
  • More
Expand all Collapse all
  1. MLS 414
  2. DNA File Formats for DNA Data Storage and Bioinformatics
  3. DNA File Formats for DNA Data Storage and Bioinformatics

DNA File Formats for DNA Data Storage and Bioinformatics

Completion requirements

Introduction.

DNA file formats are specialized file types used to store, encode, and process DNA sequences. These formats are crucial for DNA data storage, genome analysis, and bioinformatics applications.

 

1 Common DNA File Formats

1. FASTA (.fasta, .fa)

Description: Stores nucleotide (DNA/RNA) or protein sequences in a simple text format.
  Structure:

  • First line: Header (starts with > followed by sequence description).
  • Following lines: DNA sequence (A, T, C, G) or protein sequence.

Example:

>Human_Gene1

ATGCGTACGTAGCTAGCTAGCTAGCTAGC

Uses:
Genome sequencing
Storing and sharing DNA sequences
Bioinformatics tools (BLAST, ClustalW)

 

2. FASTQ (.fastq, .fq)

Description: Stores raw sequencing data, including quality scores.
Structure:

  • Line 1: Identifier (@ followed by sequence ID).
  • Line 2: DNA sequence.
  • Line 3: + separator (optional identifier).
  • Line 4: Quality scores (ASCII-encoded Phred scores).

Example:

@SEQ_ID

GATTTGGGGTTTCCCAGTCACGAC

+ !''*((((***+))%%%++)(%%%%).1

Uses:
  Next-Generation Sequencing (NGS) data storage
Read quality analysis

 

3. GenBank (.gb, .gbk)

Description: Stores DNA sequences along with annotations, including gene names, features, and references.
Structure:

  • LOCUS: Sequence name, length, type
  • DEFINITION: Brief description
  • FEATURES: Gene annotations
  • ORIGIN: DNA sequence

Example (simplified):

LOCUS       SCU49845     5028 bp    DNA

DEFINITION  Yeast mitochondrion gene.

FEATURES    Location/Qualifiers

     gene            1..5028

     /gene="COX1"

ORIGIN     

     ATGCGTACGTAGCTAGCTAGCTAGC

Uses:
Storing annotated genetic data
Genome databases (NCBI, EMBL, DDBJ)

 

4. GFF/GTF (.gff, .gtf)

 Description: Gene annotation formats used for mapping genes to sequences.
Structure:

  • Columns: Chromosome, source, feature type, start, end, strand, etc.

Example (GFF3 format):

chr1  Ensembl  gene  1000  5000  .  +  .  ID=Gene1;Name=COX1

Uses:
Gene annotations in genomic research

 

5. SAM/BAM (.sam, .bam)

Description: Stores DNA sequence alignments to a reference genome.
SAM = Text-based, BAM = Binary format (compressed).

Uses:
DNA sequence alignment from high-throughput sequencing
Storing large genomic datasets efficiently

 

6. VCF (.vcf)

Description: Stores genetic variations (SNPs, mutations) in a genome.
Uses:
Storing human genetic variation data
Used in population genetics studies

 

7. DNA Data Storage-Specific Formats

DNA Fountain – Advanced encoding technique for digital DNA storage.
Twist Bioscience Format – Custom format for synthetic DNA storage.

 

Each DNA file format serves a unique purpose, from storing raw sequencing data (FASTQ) to annotated genetic databases (GenBank, GFF) and genomic variations (VCF).


No content has been added to this book yet.
Academi

Empowering learning through technology — Explore to Excel

Info

    Moodle communitysupportMy NuMApplyOur Programmes

Contact Us

Km 8, Off Bida-Minna Road, Niger State, Minna

Phone : +2348178812480

Email : elearning@newgateuniversityminna.edu.ng

Follow Us

Copyright © 2025

Contact site support
You are currently using guest access (Log in)
Data retention summary
Get the mobile app
Powered by Moodle