week 10

10.6 Database Recovery

Database recovery restores the database to a correct, consistent state after a failure. Types of failures:

Failure Type

Description and Recovery Approach

Transaction Failure

A single transaction aborts (logic error, integrity violation, deadlock). Recovery: ROLLBACK the failed transaction using the log.

System Failure

Power outage, OS crash — main memory lost but disk data intact. Recovery: Use transaction logs to REDO committed transactions and UNDO uncommitted ones (ARIES algorithm).

Media Failure

Disk crash — data permanently lost. Recovery: Restore from last full backup + apply transaction log to roll forward to point of failure.

Communication Failure

Network failure in distributed databases. Recovery: Distributed transaction protocols (2-Phase Commit).

 

10.6.1 Write-Ahead Logging (WAL)

The Write-Ahead Log (WAL) or transaction log records every change made to the database BEFORE the change is written to disk. This is the foundation of most recovery systems.

 

1.    BEFORE IMAGE (BFIM) — the value before the change

2.    AFTER IMAGE (AFIM) — the value after the change

3.    Transaction ID, Operation type, Timestamp

 

PRINCIPLE: A transaction's log records must be written to stable storage BEFORE the actual data pages are written to disk. This ensures that even if the system crashes mid-write, the log allows recovery.

 

10.6.2 Checkpoints

A checkpoint is a periodic snapshot of the database state written to disk, along with a list of active transactions at that point. Checkpoints reduce the amount of log that must be replayed during recovery.

 

Recovery Analogy: Think of checkpoints like a save point in a video game. If your game crashes, you don't restart from the beginning — you restart from the last save point. The transaction log is like the game's action history that lets you replay up to where you were when the crash happened.

 

10.6.3 ARIES Algorithm (Overview)

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the recovery algorithm used by IBM DB2 and many other commercial DBMS. It operates in three phases:

1.    ANALYSIS PASS — Scan the log forward from the last checkpoint to determine which transactions were active at the time of failure and which pages were dirty (modified but not yet written to disk).

2.    REDO PASS — Redo all operations from the oldest dirty page to the end of the log, restoring the database to the exact state it was in at the moment of crash.

3.    UNDO PASS — Undo all operations of transactions that were active (uncommitted) at the time of the crash, ensuring atomicity.