Databases in Bioinformatics
1. What is a Database?
A database is an organized collection of data that allows users to store, retrieve, and manage information efficiently. In bioinformatics, databases store biological data such as DNA sequences, protein structures, gene expressions, and metabolic pathways.
2. Types of Bioinformatics Databases
A. Primary Databases (Archival Databases)
- Contain raw experimental data without significant modification.
- Examples:
- GenBank (NCBI) – Stores nucleotide sequences
- European Nucleotide Archive (ENA) – Stores DNA/RNA sequences
- Protein Data Bank (PDB) – Stores 3D protein structures
B. Secondary Databases (Curated Databases)
- Contain analyzed or processed data derived from primary databases.
- Examples:
- UniProt – Annotated protein sequences
- RefSeq – Curated collection of DNA, RNA, and protein sequences
- Swiss-Prot – Manually reviewed protein sequences
C. Specialized Databases
- Focus on specific types of biological data.
- Examples:
- KEGG (Kyoto Encyclopedia of Genes and Genomes) – Metabolic pathways
- Pfam – Protein families and domains
- OMIM (Online Mendelian Inheritance in Man) – Genetic disorders
3. Key Features of Bioinformatics Databases
- Large-scale data storage – Handles huge volumes of genomic and proteomic data.
- Efficient data retrieval – Uses search algorithms for quick access.
- Data annotation – Adds biological meaning to raw data.
- Cross-referencing – Links different databases for comprehensive research.
4. Uses of Bioinformatics Databases
- Genome sequencing and annotation
- Protein structure prediction
- Drug discovery and development
- Evolutionary and phylogenetic analysis
- Medical and clinical applications (e.g., disease gene identification)
5. Challenges in Bioinformatics Databases
- Data overload – Rapid increase in biological data.
- Data integration – Need for linking different databases.
- Data accuracy and curation – Ensuring reliable and updated information.
- Storage and retrieval speed – Managing large datasets efficiently.
6. Popular Bioinformatics Database Repositories
|
Database |
Type |
Use |
|
GenBank |
Primary |
Nucleotide sequences |
|
UniProt |
Secondary |
Protein sequences |
|
PDB |
Primary |
3D protein structures |
|
KEGG |
Specialized |
Metabolic pathways |
|
OMIM |
Specialized |
Genetic disorders |
|
Ensembl |
Secondary |
Genome annotation |