DNA Local Sequence Alignment
What is Local Sequence Alignment?
Local sequence alignment identifies regions of similarity between two DNA sequences rather than aligning them from start to end. It is useful for detecting conserved domains, motifs, and regions of functional significance even in distantly related sequences.
It is commonly used when:
Sequences are of different lengths
Searching for similar subsequences within a larger sequence
Finding functional motifs or conserved regions
1. Smith-Waterman Algorithm (Local Alignment)
The Smith-Waterman algorithm is the most commonly used method for local sequence alignment. It follows dynamic programming principles but differs from the Needleman-Wunsch algorithm by allowing gaps without penalties at sequence ends and discarding negative scores.
A. Steps of the Algorithm
Step 1: Initialization
- Create a scoring matrix where rows and columns represent the sequences.
- Initialize the first row and first column with zero (unlike global alignment).
Step 2: Scoring the Matrix
- Compute the score at each position based on:
- Match (+1 or +2, depending on scoring matrix)
- Mismatch (-1 or -2, depending on scoring matrix)
- Gap penalty (-2 or lower)
- No negative scores → If a score drops below 0, it is set to 0 (to allow local alignments).
Formula:
S(i,j)=max{S(i−1,j−1)+match/mismatch scoreS(i−1,j)+gap penaltyS(i,j−1)+gap penalty0S(i,j) = \max \begin{cases} S(i-1,j-1) + \text{match/mismatch score} \\ S(i-1,j) + \text{gap penalty} \\ S(i,j-1) + \text{gap penalty} \\ 0 \end{cases}
Step 3: Traceback
- Begin at the highest-scoring cell in the matrix.
- Traceback stops when a cell with score 0 is reached (this defines the local alignment region).
2. Example of Local Alignment
Example Sequences
Sequence 1: GATTACA
Sequence 2: TTAC
Scoring System
- Match = +2
- Mismatch = -1
- Gap penalty = -2
Alignment Output
GATTACA
||||
TTAC
- The best-matching region "TTAC" is aligned locally.
- The beginning and end mismatches are ignored.
3. Python Implementation using Biopython
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
seq1 = "GATTACA"
seq2 = "TTAC"
alignments = pairwise2.align.localxx(seq1, seq2)
for alignment in alignments:
print(format_alignment(*alignment))
localxx performs local alignment with simple match scoring.
4. Tools for Local Alignment
- BLAST (Basic Local Alignment Search Tool)
- Used for database searches to find similar sequences.
- HMMER
- Detects conserved motifs in sequences.
- EMBOSS Water
- Implements Smith-Waterman for pairwise local alignment.
5. Applications of Local Sequence Alignment
- Finding conserved functional regions (e.g., transcription factor binding sites).
- Detecting mutations or variations in DNA sequences.
- Searching for homologous sequences in large databases.
- Analyzing viral and bacterial genomes to find common regions.