FASTA and BLAST in Bioinformatics

FASTA and BLAST are two fundamental tools used for DNA and protein sequence comparison in bioinformatics.

1. FASTA Format and FASTA Algorithm

A. What is FASTA?

FASTA is both:

A file format for storing nucleotide or protein sequences.
An alignment algorithm for sequence similarity searching.

B. FASTA File Format

A FASTA file consists of:

A header line (starts with > followed by sequence name/description).
The sequence itself (DNA or protein).

Example of a FASTA File:

>sequence1 Human beta-globin gene

ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAAC

>sequence1 → Header line
DNA sequence below it

Variants of FASTA Format

.fasta or .fa → Standard FASTA format
.ffn → FASTA file with gene sequences
.faa → FASTA file with protein sequences

C. FASTA Algorithm

Purpose: Finds sequence similarity using a heuristic approach.
Process:

Identifies short exact matches between sequences.
Extends those matches using a scoring system (matches, mismatches, and gaps).
Computes an optimal local alignment using dynamic programming.

Use Case: Works well for both nucleotide and protein sequences but is slower than BLAST.

2. BLAST (Basic Local Alignment Search Tool)

A. What is BLAST?

BLAST is a rapid sequence comparison tool that finds regions of local similarity between sequences. It is faster than FASTA and widely used for database searches.

B. Types of BLAST

BLAST Type	Query Sequence	Database Type	Use Case
BLASTn	DNA	DNA	Finding similar nucleotide sequences
BLASTp	Protein	Protein	Identifying similar protein sequences
BLASTx	DNA	Protein	Translates DNA into protein and searches against proteins
tBLASTn	Protein	DNA	Searches a protein against a translated DNA database
tBLASTx	DNA	DNA	Translates both query and database into proteins

C. How BLAST Works

Word Matching → Identifies short k-mers (words) in query and database.
Word Extension → Expands matching words without introducing gaps initially.
Gapped Alignment → If a significant match is found, BLAST adds gaps to improve the alignment.
Scoring and Filtering → Calculates similarity score and removes low-scoring results.

3. Differences Between FASTA and BLAST

Feature	FASTA	BLAST
Speed	Slower	Faster (optimized for large databases)
Sensitivity	More sensitive	Less sensitive but faster
Alignment Type	Local and Global	Local
Algorithm	Identifies regions first, then aligns	Uses word matching and extension
Best Use Case	Small-scale sequence comparison	Large-scale database searches

4. Practical Applications

FASTA → Best for detailed sequence alignment, especially for small datasets.
BLAST → Best for rapid database searches, finding homologous genes or proteins.

No content has been added to this book yet.

FASTA and BLAST in Bioinformatics

Info

Contact Us