week 9
| Site: | Newgate University Minna - Elearning Platform |
| Course: | Data Management |
| Book: | week 9 |
| Printed by: | Guest user |
| Date: | Sunday, 14 June 2026, 4:01 AM |
Description
Teaching and Learning Methods to be Employed
1. Illustrated lecture on query processing pipeline
2. Lab: Using EXPLAIN in MySQL to analyse query execution plans
3. Index creation and performance comparison exercise
4. Collection of Assignment 3
Learning Outcomes / Objectives
By the end of this week, students should be able to:
1. Describe the steps in query processing from SQL statement to query result.
2. Explain the role of the query optimiser in a DBMS.
3. Understand relational algebra as the basis for query evaluation.
4. Describe the structure and benefits of database indexes.
5. Use MySQL's EXPLAIN statement to interpret a query execution plan.
6. Apply basic query optimisation techniques.
9.1 Overview of Query Processing
When you submit an SQL query, the DBMS does not simply 'read' and execute it directly. It goes through a sophisticated multi-stage process to determine the most efficient way to retrieve the requested data. This is query processing.
THE QUERY PROCESSING PIPELINE:
|
Stage |
Description |
|
1. Parsing and Translation |
SQL query is parsed for syntax errors and translated into an internal algebraic representation (relational algebra expression tree). |
|
2. Query Optimisation |
The query optimiser generates multiple equivalent execution plans and selects the one with the lowest estimated cost (I/O, CPU, memory). |
|
3. Code Generation |
The chosen plan is compiled into executable code. |
|
4. Query Execution |
The execution engine runs the plan, accessing data from disk/memory as needed. |
|
5. Result Return |
The final result set is returned to the user or application. |
9.2 Relational Algebra
Relational algebra is the formal mathematical language underlying SQL. The query optimiser works with relational algebra expressions, not SQL directly. Key operators:
|
Relational Algebra Operator |
SQL Equivalent |
Description |
|
Selection (σ) |
WHERE clause |
Filter rows based on a condition. E.g., σ(Level=300)(STUDENT) |
|
Projection (π) |
SELECT columns |
Keep only specified columns. E.g., π(Name, MatricNo)(STUDENT) |
|
Join (⋈) |
JOIN |
Combine related tuples from two relations based on a condition |
|
Union (∪) |
UNION |
Combine results of two compatible queries |
|
Intersection (∩) |
INTERSECT |
Rows that appear in BOTH query results |
|
Difference (−) |
EXCEPT / MINUS |
Rows in first result but NOT in second |
|
Cartesian Product (×) |
CROSS JOIN |
All combinations of tuples from two relations |
|
Rename (ρ) |
AS (alias) |
Rename a relation or attribute |
|
Relational Algebra Example: Find the names of all 300-level Computer Science students: Relational Algebra: π(FirstName, LastName)(σ(Level=300 AND DeptCode='CSC')(STUDENT)) SQL: SELECT FirstName, LastName FROM STUDENT WHERE Level=300 AND DeptCode='CSC'; The query optimiser evaluates whether it is more efficient to first filter (SELECT), then project (columns), or vice versa. Since selection reduces the number of rows, it is usually applied first. |
9.3 Query Optimisation
The query optimiser is one of the most sophisticated components of a DBMS. It generates a query execution plan — the step-by-step instructions for retrieving data — and chooses the plan with the lowest estimated cost.
COST FACTORS the optimiser considers:
1. Number of disk I/O operations (the most expensive operation)
2. Amount of data transferred over the network
3. CPU processing cost
4. Size of intermediate results
5. Available indexes
6. Statistics about data distribution (maintained in the catalogue)
OPTIMISATION STRATEGIES:
1. Perform selections as early as possible (reduces rows early)
2. Perform projections as early as possible (reduces columns)
3. Reorder joins to start with the smallest result sets
4. Use indexes to avoid full table scans
5. Use join methods appropriate to data size (Nested Loop, Hash Join, Sort-Merge Join)
9.4 Indexes
An index is an auxiliary data structure that speeds up data retrieval at the cost of additional storage and slower write operations. Without an index, the DBMS must perform a FULL TABLE SCAN — reading every row to find matches.
ANALOGY: An index in a database is like the index at the back of a textbook. Instead of reading the entire book to find where 'normalisation' is discussed, you look it up in the index and go directly to page 145. This is the difference between O(n) and O(log n) performance.
|
Index Type |
Description and Use Case |
|
B-Tree Index (default) |
Balanced tree structure. Good for range queries (BETWEEN, >, <) and equality searches. Most common type in MySQL. |
|
Hash Index |
Uses a hash table. Very fast for equality searches (=) only. Not useful for range queries. Used in MEMORY tables. |
|
Full-Text Index |
For text searching (MATCH...AGAINST). Useful for product descriptions, article content. |
|
Composite Index |
Index on multiple columns. Order matters: INDEX(LastName, FirstName) helps queries on LastName alone or (LastName, FirstName) together, but NOT FirstName alone. |
|
Unique Index |
Ensures no duplicate values. MySQL creates one automatically for PRIMARY KEY and UNIQUE constraints. |
|
Clustered Index |
Rows are physically stored in index order. Each table has at most one clustered index (the PRIMARY KEY in MySQL InnoDB). |
SQL COMMANDS FOR INDEXES:
-- Create an index on LastName for faster name searches
CREATE INDEX idx_lastname ON STUDENT(LastName);
-- Create a composite index for queries filtering by State and AccountType
CREATE INDEX idx_state_accttype ON CUSTOMER(State, AccountType);
-- Drop an index
DROP INDEX idx_lastname ON STUDENT;
-- Show all indexes on a table
SHOW INDEX FROM STUDENT;
9.5 Using EXPLAIN to Analyse Queries
MySQL's EXPLAIN statement shows how MySQL plans to execute a query — what indexes it will use, how many rows it estimates to scan, and the join method.
EXPLAIN SELECT s.FirstName, s.LastName, d.DeptName
FROM STUDENT s INNER JOIN DEPARTMENT d ON s.DeptCode = d.DeptCode
WHERE s.Level = 300;
KEY EXPLAIN OUTPUT COLUMNS:
|
Column |
What It Tells You |
|
type |
Access method: ALL (full scan — bad!), index, range, ref, eq_ref, const (fast!) |
|
key |
Which index MySQL chose to use (NULL means no index used) |
|
rows |
Estimated number of rows MySQL will examine |
|
Extra |
Additional info: 'Using filesort' (bad), 'Using index' (good — covering index) |
|
filtered |
Percentage of rows that pass the WHERE filter after the access method |
|
Performance Example: A Nigerian bank's TRANSACTION table has 50 million rows. Without an index, SELECT * FROM TRANSACTION WHERE AccountNumber='1234567890' scans all 50M rows (type=ALL). With an index on AccountNumber, MySQL jumps directly to the matching rows (type=ref, rows=15). The index makes this query thousands of times faster. |
Reading List / References
1. Ramakrishnan, R. & Gehrke, J. (2003). Database Management Systems, Chapters 12-15: Query Optimisation. McGraw-Hill.
2. Silberschatz et al. (2020). Database System Concepts, Chapters 15 & 16: Query Processing and Optimisation. McGraw-Hill.
3. MySQL 8.0 Reference Manual. Chapter 8: Optimisation. dev.mysql.com/doc/refman/8.0/en/optimization.html
Activities
|
Self-Assessment Quiz: 1. Name the five stages in the query processing pipeline. 2. What is the role of the query optimiser? What factors does it consider? 3. What is a B-Tree index? When should you create one on a column? 4. You run EXPLAIN on a query and see 'type=ALL' and 'rows=5000000'. What does this mean and how would you fix it? |