Databases in Bioinformatics

1. What is a Database?

A database is an organized collection of data that allows users to store, retrieve, and manage information efficiently. In bioinformatics, databases store biological data such as DNA sequences, protein structures, gene expressions, and metabolic pathways.

2. Types of Bioinformatics Databases

A. Primary Databases (Archival Databases)

Contain raw experimental data without significant modification.
Examples:

GenBank (NCBI) – Stores nucleotide sequences
European Nucleotide Archive (ENA) – Stores DNA/RNA sequences
Protein Data Bank (PDB) – Stores 3D protein structures

B. Secondary Databases (Curated Databases)

Contain analyzed or processed data derived from primary databases.
Examples:

UniProt – Annotated protein sequences
RefSeq – Curated collection of DNA, RNA, and protein sequences
Swiss-Prot – Manually reviewed protein sequences

C. Specialized Databases

Focus on specific types of biological data.
Examples:

KEGG (Kyoto Encyclopedia of Genes and Genomes) – Metabolic pathways
Pfam – Protein families and domains
OMIM (Online Mendelian Inheritance in Man) – Genetic disorders

3. Key Features of Bioinformatics Databases

Large-scale data storage – Handles huge volumes of genomic and proteomic data.
Efficient data retrieval – Uses search algorithms for quick access.
Data annotation – Adds biological meaning to raw data.
Cross-referencing – Links different databases for comprehensive research.

4. Uses of Bioinformatics Databases

Genome sequencing and annotation
Protein structure prediction
Drug discovery and development
Evolutionary and phylogenetic analysis
Medical and clinical applications (e.g., disease gene identification)

5. Challenges in Bioinformatics Databases

Data overload – Rapid increase in biological data.
Data integration – Need for linking different databases.
Data accuracy and curation – Ensuring reliable and updated information.
Storage and retrieval speed – Managing large datasets efficiently.

6. Popular Bioinformatics Database Repositories

Database	Type	Use
GenBank	Primary	Nucleotide sequences
UniProt	Secondary	Protein sequences
PDB	Primary	3D protein structures
KEGG	Specialized	Metabolic pathways
OMIM	Specialized	Genetic disorders
Ensembl	Secondary	Genome annotation

No content has been added to this book yet.

Databases in Bioinformatics

Info

Contact Us