DNA Sequence Retrieval from Genomic Databases
Introduction
DNA sequence retrieval refers to the process of accessing and extracting nucleotide sequences from genomic databases for research, medical, and biotechnological applications. Various bioinformatics tools and search strategies help retrieve specific DNA sequences from publicly available databases.
1 Major Genomic Databases for DNA Sequence Retrieval
A. Publicly Available Databases
1 NCBI GenBank – A
comprehensive, freely accessible DNA sequence repository.
2 EMBL-EBI (European Nucleotide Archive - ENA) – Europe's primary
nucleotide sequence archive.
3 DDBJ (DNA Data Bank of Japan) – The Japanese counterpart of GenBank
and ENA.
4 Ensembl – A genome browser providing annotated reference genomes.
5 UCSC Genome Browser – A tool for exploring and visualizing genome
sequences.
B. Specialized Genomic Databases
RefSeq – Provides
reference genome sequences curated by NCBI.
GISAID – A database for viral genomic sequences, including
SARS-CoV-2.
dbSNP – Contains information on genetic variations such as
SNPs.
Human Genome Project Database – Stores the entire human genome
sequence.
2 Methods for DNA Sequence Retrieval
A. Using NCBI Entrez Search
Step 1: Access NCBI
GenBank (https://www.ncbi.nlm.nih.gov/genbank/)
Step 2: Use keywords, accession numbers, or gene names
in the search bar.
Step 3: Filter results based on organism, genome type, and
sequence length.
Step 4: Retrieve sequences in FASTA or GenBank formats
for further analysis.
B. BLAST (Basic Local Alignment Search Tool)
Function: Finds
similar DNA sequences in a database.
Types:
- BLASTn – Nucleotide sequence search.
- BLASTp – Protein sequence search.
- tBLASTn/tBLASTx – Cross-translational searches.
Step 1: Go to NCBI
BLAST (https://blast.ncbi.nlm.nih.gov/)
Step 2: Upload/query a DNA sequence.
Step 3: Select target database (e.g., GenBank, RefSeq).
Step 4: Analyze results based on similarity scores.
C. Using Ensembl Genome Browser
Provides gene annotations,
genome comparisons, and sequence retrieval.
Users can search by gene names, chromosomal coordinates, or
accession numbers.
Download sequences in FASTA, EMBL, or GFF formats.
D. UCSC Genome Browser Retrieval
Visualizes and extracts
DNA sequences from different reference genomes.
Users can zoom in on specific genomic regions to extract target
DNA sequences.
3 File Formats for Retrieved DNA Sequences
FASTA (.fasta/.fa)
– Stores nucleotide sequences in plain text.
GenBank (.gb/.gbk) – Includes sequence data + annotations.
GFF (.gff/.gff3) – Contains genomic feature annotations.
VCF (.vcf) – Stores genetic variation data like SNPs.
4 Applications of DNA Sequence Retrieval
Genetic Research –
Studying genes, mutations, and evolution.
Medical Diagnostics – Identifying disease-related genetic
variations.
Forensics – DNA fingerprinting and criminal investigations.
Biotechnology – Genetic engineering, CRISPR applications.
Drug Development – Identifying genetic drug targets.