Introduction to DNA and Protein Databases
1. DNA Databases
DNA databases store and manage nucleotide sequences, genomic data, and related annotations. They are essential for bioinformatics, genomics, and evolutionary studies.
Types of DNA Databases
Primary Databases (Raw Sequence Data)
- GenBank (NCBI - USA): Publicly available DNA sequence repository.
- EMBL-EBI (Europe): European Molecular Biology Laboratory’s nucleotide archive.
- DDBJ (Japan):
DNA Data Bank of Japan.
(These three collaborate under the International Nucleotide Sequence Database Collaboration - INSDC.)
Secondary Databases (Processed Data)
- RefSeq: Curated, non-redundant reference genome database.
- Ensembl: Genome annotation and comparative genomics resource.
- UCSC Genome Browser: Visualizes genome sequences and annotations.
Specialized DNA Databases
- dbSNP: Repository for Single Nucleotide Polymorphisms (SNPs).
- miRBase: Stores microRNA sequences.
- TCGA (The Cancer Genome Atlas): Cancer-related genomic alterations.
2. Protein Databases
Protein databases contain information about amino acid sequences, protein structures, functions, and interactions.
Types of Protein Databases
Primary Protein Sequence Databases
- UniProtKB (Universal Protein Knowledgebase): Comprehensive protein sequence and annotation database.
- UniProtKB/Swiss-Prot: Curated, high-quality data.
- UniProtKB/TrEMBL: Automatically annotated sequences.
- PIR (Protein Information Resource): Historical protein database.
Structural Protein Databases
- PDB (Protein Data Bank): Stores 3D structures of proteins, nucleic acids, and complexes.
- SCOP (Structural Classification of Proteins): Classifies proteins based on structure.
Functional & Interaction Databases
- Pfam: Collection of protein families and domains.
- InterPro: Protein function classification tool.
- STRING: Protein-protein interaction networks.
Specialized Protein Databases
- KEGG (Kyoto Encyclopedia of Genes and Genomes): Pathway database.
- BRENDA: Enzyme-specific database.
- MEROPS: Protease database.
3. Applications of DNA and Protein Databases
Genomic Research –
Understanding genetic variations and evolution.
Drug Discovery – Identifying drug targets and protein functions.
Molecular Diagnostics – Detecting disease-related mutations.
Personalized Medicine – Tailoring treatments based on genomic data.
Proteomics & Systems Biology – Analyzing protein interactions and pathways.