DNA Local Sequence Alignment

What is Local Sequence Alignment?

Local sequence alignment identifies regions of similarity between two DNA sequences rather than aligning them from start to end. It is useful for detecting conserved domains, motifs, and regions of functional significance even in distantly related sequences.

It is commonly used when:
Sequences are of different lengths
Searching for similar subsequences within a larger sequence
Finding functional motifs or conserved regions

1. Smith-Waterman Algorithm (Local Alignment)

The Smith-Waterman algorithm is the most commonly used method for local sequence alignment. It follows dynamic programming principles but differs from the Needleman-Wunsch algorithm by allowing gaps without penalties at sequence ends and discarding negative scores.

A. Steps of the Algorithm

Step 1: Initialization

Create a scoring matrix where rows and columns represent the sequences.
Initialize the first row and first column with zero (unlike global alignment).

Step 2: Scoring the Matrix

Compute the score at each position based on:

Match (+1 or +2, depending on scoring matrix)
Mismatch (-1 or -2, depending on scoring matrix)
Gap penalty (-2 or lower)
No negative scores → If a score drops below 0, it is set to 0 (to allow local alignments).

Formula:

S(i,j)=max⁡{S(i−1,j−1)+match/mismatch scoreS(i−1,j)+gap penaltyS(i,j−1)+gap penalty0S(i,j) = \max \begin{cases} S(i-1,j-1) + \text{match/mismatch score} \\ S(i-1,j) + \text{gap penalty} \\ S(i,j-1) + \text{gap penalty} \\ 0 \end{cases}

Step 3: Traceback

Begin at the highest-scoring cell in the matrix.
Traceback stops when a cell with score 0 is reached (this defines the local alignment region).

2. Example of Local Alignment

Example Sequences

Sequence 1: GATTACA
Sequence 2: TTAC

Scoring System

Match = +2
Mismatch = -1
Gap penalty = -2

Alignment Output

GATTACA

||||

TTAC

The best-matching region "TTAC" is aligned locally.
The beginning and end mismatches are ignored.

3. Python Implementation using Biopython

from Bio import pairwise2

from Bio.pairwise2 import format_alignment

seq1 = "GATTACA"

seq2 = "TTAC"

alignments = pairwise2.align.localxx(seq1, seq2)

for alignment in alignments:

print(format_alignment(*alignment))

localxx performs local alignment with simple match scoring.

4. Tools for Local Alignment

BLAST (Basic Local Alignment Search Tool)

Used for database searches to find similar sequences.

HMMER

Detects conserved motifs in sequences.

EMBOSS Water

Implements Smith-Waterman for pairwise local alignment.

5. Applications of Local Sequence Alignment

Finding conserved functional regions (e.g., transcription factor binding sites).
Detecting mutations or variations in DNA sequences.
Searching for homologous sequences in large databases.
Analyzing viral and bacterial genomes to find common regions.

No content has been added to this book yet.

DNA Local Sequence Alignment

Info

Contact Us