DNA Global Sequence Alignment
What is Global Sequence Alignment?
Global sequence alignment is a method used in bioinformatics to compare two entire DNA sequences from start to end, even if they are of different lengths. It ensures that the sequences are aligned across their full lengths, introducing gaps when necessary.
It is commonly used to compare closely related sequences, such as homologous genes from different species.
1. Needleman-Wunsch Algorithm (Global Alignment)
The Needleman-Wunsch algorithm is the most widely used algorithm for global sequence alignment. It follows dynamic programming principles to find the best match between two sequences.
A. Steps of the Algorithm
Step 1: Initialization
- Create a scoring matrix with rows and columns representing the two sequences.
- Initialize the first row and column with gap penalties.
Step 2: Scoring the Matrix
- Use a substitution matrix (e.g., identity matrix for DNA).
- Compute scores based on:
- Match (+1)
- Mismatch (-1)
- Gap penalty (-2)
Formula:
S(i,j)=max{S(i−1,j−1)+match/mismatch scoreS(i−1,j)+gap penaltyS(i,j−1)+gap penaltyS(i,j) = \max \begin{cases} S(i-1,j-1) + \text{match/mismatch score} \\ S(i-1,j) + \text{gap penalty} \\ S(i,j-1) + \text{gap penalty} \end{cases}
Step 3: Traceback
- Start from the bottom-right of the matrix and move back to the top-left, choosing the highest score path to reconstruct the optimal alignment.
2. Example of Global Alignment
Example Sequences
Sequence 1: AGCTG
Sequence 2: AGTTG
Scoring System
- Match = +1
- Mismatch = -1
- Gap penalty = -2
Alignment Output
AGCTG
|| ||
AG-TG
- The gap (-) is inserted to optimize alignment.
- The second sequence lacks a "C", so a gap is introduced.
3. Python Implementation using Biopython
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
seq1 = "AGCTG"
seq2 = "AGTTG"
alignments = pairwise2.align.globalxx(seq1, seq2)
for alignment in alignments:
print(format_alignment(*alignment))
globalxx performs global alignment with a simple scoring scheme where matches = 1 and gaps/mismatches = 0.
4. Tools for Global Alignment
- Biopython (Python library for sequence alignment)
- EMBOSS Needle (Online Needleman-Wunsch alignment)
- Clustal Omega (Multiple sequence alignment tool)
5. Applications of Global Alignment
- Comparing homologous genes across species
- Detecting evolutionary relationships
- Analyzing full-length sequences
- Finding conserved regions in DNA