week 9

Site: Newgate University Minna - Elearning Platform
Course: Data Management
Book: week 9
Printed by: Guest user
Date: Sunday, 14 June 2026, 4:01 AM

Description

Teaching and Learning Methods to be Employed

1.    Illustrated lecture on query processing pipeline

2.    Lab: Using EXPLAIN in MySQL to analyse query execution plans

3.    Index creation and performance comparison exercise

4.    Collection of Assignment 3

 

Learning Outcomes / Objectives

By the end of this week, students should be able to:

1.    Describe the steps in query processing from SQL statement to query result.

2.    Explain the role of the query optimiser in a DBMS.

3.    Understand relational algebra as the basis for query evaluation.

4.    Describe the structure and benefits of database indexes.

5.    Use MySQL's EXPLAIN statement to interpret a query execution plan.

6.    Apply basic query optimisation techniques.

9.1 Overview of Query Processing

When you submit an SQL query, the DBMS does not simply 'read' and execute it directly. It goes through a sophisticated multi-stage process to determine the most efficient way to retrieve the requested data. This is query processing.

 

THE QUERY PROCESSING PIPELINE:

Stage

Description

1. Parsing and Translation

SQL query is parsed for syntax errors and translated into an internal algebraic representation (relational algebra expression tree).

2. Query Optimisation

The query optimiser generates multiple equivalent execution plans and selects the one with the lowest estimated cost (I/O, CPU, memory).

3. Code Generation

The chosen plan is compiled into executable code.

4. Query Execution

The execution engine runs the plan, accessing data from disk/memory as needed.

5. Result Return

The final result set is returned to the user or application.

9.2 Relational Algebra

Relational algebra is the formal mathematical language underlying SQL. The query optimiser works with relational algebra expressions, not SQL directly. Key operators:

 

Relational Algebra Operator

SQL Equivalent

Description

Selection (σ)

WHERE clause

Filter rows based on a condition. E.g., σ(Level=300)(STUDENT)

Projection (π)

SELECT columns

Keep only specified columns. E.g., π(Name, MatricNo)(STUDENT)

Join ()

JOIN

Combine related tuples from two relations based on a condition

Union ()

UNION

Combine results of two compatible queries

Intersection (∩)

INTERSECT

Rows that appear in BOTH query results

Difference (−)

EXCEPT / MINUS

Rows in first result but NOT in second

Cartesian Product (×)

CROSS JOIN

All combinations of tuples from two relations

Rename (ρ)

AS (alias)

Rename a relation or attribute

 

Relational Algebra Example: Find the names of all 300-level Computer Science students:  Relational Algebra: π(FirstName, LastName)(σ(Level=300 AND DeptCode='CSC')(STUDENT))  SQL: SELECT FirstName, LastName FROM STUDENT WHERE Level=300 AND DeptCode='CSC';  The query optimiser evaluates whether it is more efficient to first filter (SELECT), then project (columns), or vice versa. Since selection reduces the number of rows, it is usually applied first.

9.3 Query Optimisation

The query optimiser is one of the most sophisticated components of a DBMS. It generates a query execution plan — the step-by-step instructions for retrieving data — and chooses the plan with the lowest estimated cost.

 

COST FACTORS the optimiser considers:

1.    Number of disk I/O operations (the most expensive operation)

2.    Amount of data transferred over the network

3.    CPU processing cost

4.    Size of intermediate results

5.    Available indexes

6.    Statistics about data distribution (maintained in the catalogue)

 

OPTIMISATION STRATEGIES:

1.    Perform selections as early as possible (reduces rows early)

2.    Perform projections as early as possible (reduces columns)

3.    Reorder joins to start with the smallest result sets

4.    Use indexes to avoid full table scans

5.    Use join methods appropriate to data size (Nested Loop, Hash Join, Sort-Merge Join)

 

9.4 Indexes

An index is an auxiliary data structure that speeds up data retrieval at the cost of additional storage and slower write operations. Without an index, the DBMS must perform a FULL TABLE SCAN — reading every row to find matches.

 

ANALOGY: An index in a database is like the index at the back of a textbook. Instead of reading the entire book to find where 'normalisation' is discussed, you look it up in the index and go directly to page 145. This is the difference between O(n) and O(log n) performance.

 

Index Type

Description and Use Case

B-Tree Index (default)

Balanced tree structure. Good for range queries (BETWEEN, >, <) and equality searches. Most common type in MySQL.

Hash Index

Uses a hash table. Very fast for equality searches (=) only. Not useful for range queries. Used in MEMORY tables.

Full-Text Index

For text searching (MATCH...AGAINST). Useful for product descriptions, article content.

Composite Index

Index on multiple columns. Order matters: INDEX(LastName, FirstName) helps queries on LastName alone or (LastName, FirstName) together, but NOT FirstName alone.

Unique Index

Ensures no duplicate values. MySQL creates one automatically for PRIMARY KEY and UNIQUE constraints.

Clustered Index

Rows are physically stored in index order. Each table has at most one clustered index (the PRIMARY KEY in MySQL InnoDB).

 

SQL COMMANDS FOR INDEXES:

-- Create an index on LastName for faster name searches

CREATE INDEX idx_lastname ON STUDENT(LastName);

 

-- Create a composite index for queries filtering by State and AccountType

CREATE INDEX idx_state_accttype ON CUSTOMER(State, AccountType);

 

-- Drop an index

DROP INDEX idx_lastname ON STUDENT;

 

-- Show all indexes on a table

SHOW INDEX FROM STUDENT;

9.5 Using EXPLAIN to Analyse Queries

MySQL's EXPLAIN statement shows how MySQL plans to execute a query — what indexes it will use, how many rows it estimates to scan, and the join method.

 

EXPLAIN SELECT s.FirstName, s.LastName, d.DeptName

FROM STUDENT s INNER JOIN DEPARTMENT d ON s.DeptCode = d.DeptCode

WHERE s.Level = 300;

 

KEY EXPLAIN OUTPUT COLUMNS:

Column

What It Tells You

type

Access method: ALL (full scan — bad!), index, range, ref, eq_ref, const (fast!)

key

Which index MySQL chose to use (NULL means no index used)

rows

Estimated number of rows MySQL will examine

Extra

Additional info: 'Using filesort' (bad), 'Using index' (good — covering index)

filtered

Percentage of rows that pass the WHERE filter after the access method

 

Performance Example: A Nigerian bank's TRANSACTION table has 50 million rows. Without an index, SELECT * FROM TRANSACTION WHERE AccountNumber='1234567890' scans all 50M rows (type=ALL). With an index on AccountNumber, MySQL jumps directly to the matching rows (type=ref, rows=15). The index makes this query thousands of times faster.

 

Reading List / References

1.    Ramakrishnan, R. & Gehrke, J. (2003). Database Management Systems, Chapters 12-15: Query Optimisation. McGraw-Hill.

2.    Silberschatz et al. (2020). Database System Concepts, Chapters 15 & 16: Query Processing and Optimisation. McGraw-Hill.

3.    MySQL 8.0 Reference Manual. Chapter 8: Optimisation. dev.mysql.com/doc/refman/8.0/en/optimization.html

 

Activities

Self-Assessment Quiz: 1. Name the five stages in the query processing pipeline. 2. What is the role of the query optimiser? What factors does it consider? 3. What is a B-Tree index? When should you create one on a column? 4. You run EXPLAIN on a query and see 'type=ALL' and 'rows=5000000'. What does this mean and how would you fix it?