Skip to main content

Databases

Database Fundamentals

Flat File vs Relational Database

A flat file stores data in a single table (e.g., a spreadsheet or CSV file). Each row is a record and each column is a field. Flat files are simple and suitable for small, single-user applications, but they suffer from data redundancy, inconsistency, and limited querying capability.

A relational database organizes data into multiple related tables. Each table has a fixed structure (defined number of columns with specific data types), and relationships between tables are maintained through primary keys and foreign keys. Relational databases enforce data integrity, minimize redundancy, and support complex queries through SQL.

AspectFlat FileRelational Database
StructureSingle tableMultiple related tables
RedundancyHigh (data repeated across rows)Low (normalization eliminates redundancy)
Data integrityNot enforcedEnforced by constraints
Concurrent accessPoor (file locking issues)Built-in concurrency control
Query capabilityLimited (full table scan)SQL with indexing, joins, subqueries
ScalabilityPoorGood (with proper design and indexing)
SecurityFile-level onlyGranular (table, column, row-level)

Why Use a Database?

Data independence: the structure of data can be changed without modifying applications that use it. Data integrity: constraints prevent invalid data from being stored. Concurrent access: multiple users can read and write simultaneously without corrupting data. Security: access can be controlled at a granular level. Backup and recovery: databases provide transaction logging and point-in-time recovery.

Worked Example: Choosing Between a Flat File and a Database

A small bakery tracks daily sales in a spreadsheet. They have one sheet with columns: date, product, quantity, price, total. They now want to: (1) track customer loyalty points, (2) generate monthly revenue reports by product category, (3) allow two cashiers to enter data simultaneously. Should they switch to a relational database?

Solution

Yes, a relational database is now appropriate. Here is why each requirement justifies the switch:

  1. Customer loyalty points: This requires a Customer table linked to sales via a foreign key. A flat file would duplicate customer data in every row (redundancy). A relational database stores each customer once and references them by customerID.

  2. Monthly revenue reports by category: This requires joining sales with product categories and using aggregate functions (SUM, GROUP BY). SQL handles this efficiently; a flat file would need manual filtering and formula work for each report.

  3. Concurrent access: Two cashiers entering data simultaneously in a spreadsheet causes file locking and potential data loss. Relational databases have built-in concurrency control (locking and isolation) that handles simultaneous writes safely.

Entity-Relationship Diagrams (ERDs)

An ERD is a visual representation of the data model. It shows entities (things of interest), their attributes, and the relationships between them.

Entities and Attributes

An entity represents a real-world object or concept (e.g., Student, Course, Order). An entity instance is a single occurrence (e.g., a specific student named "Alice").

An attribute is a property of an entity (e.g., Student has studentID, name, dateOfBirth, email).

Key attributes:

  • Underlined in ERD notation indicates a primary key.
  • Attributes that are unique for each instance are candidate keys.
  • Multi-valued attributes (e.g., a student's phone numbers) are indicated with double lines in some notations, but in relational databases these are typically moved to a separate table.

Relationships

Relationships describe how entities are associated with each other. They have a degree (the number of entities involved) and a cardinality (the number of instances on each side).

One-to-One (1:1): Each instance of entity A is associated with exactly one instance of entity B, and vice versa. Example: Person and Passport. A person has at most one passport; a passport belongs to exactly one person. In a relational database, the foreign key can be placed in either table.

One-to-Many (1:M): Each instance of entity A can be associated with many instances of entity B, but each instance of B is associated with exactly one instance of A. Example: Teacher and Class. A teacher can teach many classes, but each class has one teacher. The foreign key goes in the "many" table (Class table contains teacherID).

Many-to-Many (M:N): Each instance of entity A can be associated with many instances of entity B, and vice versa. Example: Student and Course. A student can enroll in many courses, and a course can have many students. This cannot be directly represented in a relational database and requires a junction (link/associative) table.

Worked Example: School Database ERD

Entities: Student, Teacher, Course, Enrollment

RelationshipTypeForeign Key Location
Teacher-Course1:MCourse table has teacherID
Student-Enrollment1:MEnrollment table has studentID
Course-Enrollment1:MEnrollment table has courseID
Student-CourseM:NVia Enrollment (junction table)

Worked Example: Creating an ERD for a Hospital

Design an ERD for a hospital system with the following requirements:

  • Each Patient has a unique patientID, name, dateOfBirth, and address
  • Each Doctor has a unique doctorID, name, and speciality
  • A Doctor can treat many Patients; a Patient can be treated by many Doctors
  • Each Appointment has a date, time, and diagnosis
  • A Patient can have many Appointments; each Appointment belongs to one Patient
  • Each Appointment is with one Doctor; a Doctor can have many Appointments
Solution

Entities: Patient, Doctor, Appointment

Attributes:

  • Patient: patientID (PK), name, dateOfBirth, address
  • Doctor: doctorID (PK), name, speciality
  • Appointment: appointmentID (PK), date, time, diagnosis, patientID (FK), doctorID (FK)

Relationships:

RelationshipTypeForeign Key Location
Doctor-Appointment1:MAppointment has doctorID
Patient-Appointment1:MAppointment has patientID
Doctor-PatientM:NVia Appointment (junction table)

The Appointment table serves as both the junction table for the M:N Doctor-Patient relationship and as an entity in its own right with attributes (date, time, diagnosis). This is a common pattern where the junction table carries additional data.

Worked Example: Identifying Relationships in a Library System

A library has Books and Members. A member can borrow many books over time; a book can be borrowed by many members over time (but only by one member at a time). Each borrow record has a borrow date and a return date. Design the ERD.

Solution

Entities: Book, Member, Loan

Relationships:

RelationshipTypeNotes
Member-Loan1:MA member can have many loans; each loan has one member
Book-Loan1:MA book can be loaned many times; each loan has one book
Member-BookM:NVia the Loan junction table

The Loan table is the junction table for the M:N Member-Book relationship, and it also stores attributes (borrowDate, returnDate) that belong to the relationship itself, not to either entity.

Common Pitfalls -- ERDs

  • Placing the foreign key in the wrong table: The FK always goes in the "many" side of a 1:M relationship. If Teacher teaches many Classes, the FK (teacherID) goes in Class, not in Teacher.
  • Forgetting the junction table for M:N: Many-to-many relationships cannot be directly stored in a relational database. You must create a junction table with foreign keys pointing to both entities.
  • Overlooking relationship attributes: If a relationship has its own attributes (e.g., grade in Student-Course, borrowDate in Member-Book), these belong in the junction table, not in either entity.

Keys

Key TypeDefinitionExample
SuperkeyA set of attributes that uniquely identifies each tuple (may include extras){studentID, name}
Candidate keyA minimal superkey (no proper subset is also a superkey)\{studentID}, \{email}
Primary keyThe candidate key chosen by the database designer to be the main identifier\{studentID}
Foreign keyAn attribute that references the primary key of another tablecourse.teacherID references Teacher.teacherID
Composite keyA primary key consisting of two or more attributes{studentID, courseID} in Enrollment
Alternate keyA candidate key not chosen as the primary key\{email} (if studentID is the primary key)

Primary key requirements: Uniqueness (no two rows have the same primary key value), non-null (every row must have a primary key), immutable (the value should not change over time).

Foreign key constraints: A foreign key value must either be null or match a primary key value in the referenced table. This enforces referential integrity.

Worked Example: Identifying Keys

For the following Book table, identify all superkeys, candidate keys, the primary key, and any alternate keys.

bookIDISBNtitleauthor
1978-013468599Clean CodeRobert Martin
2978-020161622PragmaticProgAndrew Hunt
3978-013235088Clean CoderRobert Martin

Assume: bookID is unique, ISBN is unique, no two books share the same title and author combination.

Solution

Superkeys (all attribute sets that uniquely identify each row):

  • \{bookID}
  • \{ISBN}
  • {bookID, ISBN}
  • {bookID, title}
  • {bookID, author}
  • {ISBN, title}
  • {bookID, ISBN, title}
  • {bookID, ISBN, author}
  • {bookID, ISBN, title, author}
  • {title, author}
  • {bookID, title, author}
  • {ISBN, title, author}
  • ...and more (any set containing a candidate key is a superkey)

Candidate keys (minimal superkeys -- no proper subset is also a superkey):

  • \{bookID} -- minimal, uniquely identifies each row
  • \{ISBN} -- minimal, uniquely identifies each row
  • {title, author} -- minimal (title alone is not unique; author alone is not unique)

Primary key: \{bookID} -- chosen by the designer (surrogate key, simplest)

Alternate keys: \{ISBN} and {title, author} -- candidate keys not chosen as primary key

Note: {bookID, title} is a superkey but NOT a candidate key because its proper subset \{bookID} is also a superkey.

Common Pitfalls -- Keys

  • Confusing superkeys with candidate keys: Every candidate key is a superkey, but not every superkey is a candidate key. A candidate key must be minimal -- no proper subset can also be a superkey.
  • Using natural keys as primary keys: An email address might seem like a good primary key, but if a user changes their email, every foreign key reference must be updated. Surrogate keys (auto- incremented integers) avoid this problem.
  • Forgetting that composite keys require all attributes: A composite primary key {studentID, courseID} means that BOTH attributes together are needed to uniquely identify a row. Neither alone is sufficient.

Normalization

Normalization is the systematic process of organizing data to minimize redundancy and avoid insertion, update, and deletion anomalies. The process involves decomposing tables into smaller, well-structured tables linked by relationships.

Functional Dependencies

A functional dependency XYX \rightarrow Y means that the value of attribute YY is uniquely determined by the value of attribute XX. If two rows have the same value for XX, they must have the same value for YY.

Example: In a table with {studentID, name, dateOfBirth}, the dependency studentID \rightarrow name holds because each student has exactly one name.

Partial dependency: XYX \rightarrow Y where XX is a proper subset of a candidate key. This violates 2NF.

Transitive dependency: XYX \rightarrow Y and YZY \rightarrow Z (where YY is not a candidate key), therefore XZX \rightarrow Z transitively. This violates 3NF.

First Normal Form (1NF)

A table is in 1NF if and only if:

  1. All attributes contain atomic (indivisible) values. No repeating groups, no arrays, no comma-separated lists within a single cell.
  2. Each row is unique (there is a primary key).

Violation example:

OrderIDCustomerItems
1001AliceLaptop, Mouse, Cable
1002BobMonitor

The Items column contains multiple values, violating the atomicity rule.

1NF conversion: Create a separate row for each item:

OrderIDCustomerItem
1001AliceLaptop
1001AliceMouse
1001AliceCable
1002BobMonitor

Primary key: {OrderID, Item}

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and has no partial dependencies. Every non-key attribute must depend on the entire primary key, not just part of it. This is only relevant when the primary key is composite.

Violation example (unnormalized table):

studentIDcourseIDstudentNamecourseTitleteacherName
1101AliceMathMr. Smith
1102AlicePhysicsMs. Jones
2101BobMathMr. Smith

Primary key: {studentID, courseID}

Partial dependencies:

  • studentName depends only on studentID (not on courseID)
  • courseTitle depends only on courseID (not on studentID)
  • teacherName depends only on courseID (not on studentID)

These are partial dependencies because studentName is determined by a proper subset of the primary key (studentID alone).

2NF conversion: Decompose into three tables:

Student (PK: studentID):

studentIDstudentName
1Alice
2Bob

Course (PK: courseID):

courseIDcourseTitleteacherName
101MathMr. Smith
102PhysicsMs. Jones

Enrollment (PK: {studentID, courseID}):

studentIDcourseID
1101
1102
2101

Third Normal Form (3NF)

A table is in 3NF if it is in 2NF and has no transitive dependencies. Every non-key attribute must depend directly on the primary key, not through another non-key attribute.

Violation example:

In the Course table above:

courseIDcourseTitleteacherNameteacherEmail
101MathMr. Smithsmith@school.edu
102PhysicsMs. Jonesjones@school.edu

Primary key: courseID

Transitive dependency: courseID \rightarrow teacherName \rightarrow teacherEmail. The teacherEmail depends on teacherName, not directly on courseID. If Mr. Smith teaches a new course, we must update his email in every row where he appears (update anomaly). If Mr. Smith has no courses, we cannot store his email (insertion anomaly). If we delete all of Mr. Smith's courses, we lose his email (deletion anomaly).

3NF conversion: Decompose Course into two tables:

Course (PK: courseID):

courseIDcourseTitleteacherID
101MathT1
102PhysicsT2

Teacher (PK: teacherID):

teacherIDteacherNameteacherEmail
T1Mr. Smithsmith@school.edu
T2Ms. Jonesjones@school.edu

Summary of Normal Forms

Normal FormRequirementAnomaly Eliminated
1NFAtomic values, unique rowsMulti-valued cells
2NF1NF + no partial dependenciesPartial redundancy
3NF2NF + no transitive dependenciesTransitive redundancy

Worked Example: Full Normalization

Starting table: HospitalRecord

patientIDpatientNamedoctorIDdoctorNamedepartmentappointmentDate
P001JohnD001Dr. LeeCardiology2025-03-01
P001JohnD001Dr. LeeCardiology2025-03-15
P002MaryD002Dr. ChenNeurology2025-03-02
P003TomD001Dr. LeeCardiology2025-03-03

Step 1: 1NF. Values are already atomic. But there are repeating groups (same patient appears multiple times). The primary key must be {patientID, appointmentDate}.

Actually, {patientID, appointmentDate} uniquely identifies each row. But patientName depends only on patientID, and doctorName depends only on doctorID.

Step 2: 2NF. Remove partial dependencies.

Patient (PK: patientID):

patientIDpatientName
P001John
P002Mary
P003Tom

Appointment (PK: {patientID, appointmentDate}, FK: patientID, doctorID):

patientIDappointmentDatedoctorID
P0012025-03-01D001
P0012025-03-15D001
P0022025-03-02D002
P0032025-03-03D001

Doctor (PK: doctorID):

doctorIDdoctorNamedepartment
D001Dr. LeeCardiology
D002Dr. ChenNeurology

Step 3: 3NF. Check for transitive dependencies in Doctor. doctorID \rightarrow doctorName, doctorID \rightarrow department. There is no non-key attribute depending on another non-key attribute here, so Doctor is already in 3NF. Check Appointment: no non-key attributes besides the foreign keys, which depend on the full primary key. Check Patient: patientName depends on patientID (the only key). All tables are in 3NF.

Worked Example: Normalize a Video Rental Table

Normalize the following unnormalized table to 3NF:

rentalIDcustomerNamecustomerEmailmovieTitlegenredailyRaterentalDatereturnDate
R001Alicealice@email.comInceptionSci-Fi3.502025-03-012025-03-05
R002Alicealice@email.comTitanicDrama2.502025-03-102025-03-12
R003Bobbob@email.comInceptionSci-Fi3.502025-03-02NULL
Solution

Step 1: Identify 1NF violations. All values are atomic. The primary key candidate is rentalID. The table is already in 1NF.

Step 2: Identify 2NF violations (partial dependencies). Since the primary key is rentalID (a single attribute, not composite), there cannot be partial dependencies. The table is already in 2NF.

Step 3: Identify 3NF violations (transitive dependencies).

  • rentalID \rightarrow customerName \rightarrow customerEmail: customerEmail depends on customerName (transitive)
  • rentalID \rightarrow movieTitle \rightarrow genre, dailyRate: genre and dailyRate depend on movieTitle (transitive)

Decompose into 4 tables:

Customer (PK: customerID):

customerIDcustomerNamecustomerEmail
C001Alicealice@email.com
C002Bobbob@email.com

Movie (PK: movieID):

movieIDmovieTitlegenredailyRate
M001InceptionSci-Fi3.50
M002TitanicDrama2.50

Rental (PK: rentalID, FK: customerID, movieID):

rentalIDcustomerIDmovieIDrentalDatereturnDate
R001C001M0012025-03-012025-03-05
R002C001M0022025-03-102025-03-12
R003C002M0012025-03-02NULL

Benefits of normalization:

  • Update anomaly fixed: changing Alice's email now requires updating only one row in Customer.
  • Insertion anomaly fixed: we can add a new movie without it being rented yet.
  • Deletion anomaly fixed: deleting a rental does not lose customer or movie information.

Common Pitfalls -- Normalization

  • 2NF is irrelevant with a single-column primary key: Partial dependencies can only exist when the primary key is composite. If the PK is a single attribute, every non-key attribute depends on the entire PK by definition.
  • Confusing partial and transitive dependencies: A partial dependency is when a non-key attribute depends on only a part of a composite primary key. A transitive dependency is when a non-key attribute depends on another non-key attribute.
  • Losing data during decomposition: When splitting tables, every attribute must appear in exactly one resulting table. Forgetting to include an attribute means data is lost.

SQL Fundamentals

Data Definition Language (DDL)

DDL defines and modifies the structure of database objects (tables, views, indexes).

CREATE TABLE:

CREATE TABLE Student (
studentID INTEGER PRIMARY KEY,
firstName VARCHAR(50) NOT NULL,
lastName VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
dateOfBirth DATE,
enrollmentDate DATE DEFAULT CURRENT_DATE,
gpa FLOAT CHECK (gpa >= 0.0 AND gpa <= 4.0)
);

Constraints:

ConstraintPurpose
PRIMARY KEYUniquely identifies each row; implicitly NOT NULL
FOREIGN KEYReferences a primary key in another table
NOT NULLColumn cannot contain NULL values
UNIQUEAll values in the column must be distinct
CHECKValues must satisfy a Boolean condition
DEFAULTDefault value when no value is specified

ALTER TABLE:

ALTER TABLE Student ADD COLUMN phone VARCHAR(20);
ALTER TABLE Student DROP COLUMN phone;
ALTER TABLE Student MODIFY COLUMN email VARCHAR(200);

DROP TABLE:

DROP TABLE Student;

Data Manipulation Language (DML)

INSERT:

INSERT INTO Student (studentID, firstName, lastName, email, dateOfBirth)
VALUES (1, 'Alice', 'Smith', 'alice@school.edu', '2005-06-15');

INSERT INTO Student (studentID, firstName, lastName)
VALUES (2, 'Bob', 'Jones');

UPDATE:

UPDATE Student SET gpa = 3.8 WHERE studentID = 1;

```sql
UPDATE Student SET email = 'newemail@school.edu'
WHERE lastName = 'Jones' AND firstName = 'Bob';

**DELETE:**

```sql
DELETE FROM Student WHERE studentID = 2;
warning

DELETE FROM Student without WHERE deletes all rows but keeps the table structure. DROP TABLE Student deletes both the data and the table structure. Be certain which one you intend.

Queries: SELECT

The SELECT statement retrieves data from one or more tables.

Basic SELECT:

SELECT firstName, lastName, gpa FROM Student;
SELECT * FROM Student;

WHERE clause:

SELECT firstName, lastName, gpa
FROM Student
WHERE gpa >= 3.5 AND enrollmentYear = 2023;

Comparison operators in WHERE: =, <>, <, >, $\le$, $\ge$, BETWEEN, LIKE, IN, IS NULL, IS NOT NULL.

ORDER BY:

SELECT firstName, lastName, gpa
FROM Student
ORDER BY gpa DESC, lastName ASC;

DISTINCT:

SELECT DISTINCT department FROM Teacher;

LIMIT:

SELECT firstName, lastName, gpa
FROM Student
ORDER BY gpa DESC
LIMIT 10;

Worked Example: Writing SQL Queries

Given these tables:

Student (studentID, firstName, lastName, gradeLevel, gpa):

studentIDfirstNamelastNamegradeLevelgpa
1AliceSmith113.8
2BobJones103.2
3CarolWilliams113.9
4DaveBrown102.8
5EveDavis123.5

Write SQL queries for each task:

  1. Find all students in grade 11 with GPA greater than 3.5, ordered by GPA descending.
  2. Count the number of students in each grade level.
  3. Find the student(s) with the highest GPA.
Solution

Query 1: Grade 11 students with GPA > 3.5, ordered by GPA descending

SELECT firstName, lastName, gpa
FROM Student
WHERE gradeLevel = 11 AND gpa > 3.5
ORDER BY gpa DESC;

Result: Carol Williams (3.9), Alice Smith (3.8)

Query 2: Count of students per grade level

SELECT gradeLevel, COUNT(*) AS numStudents
FROM Student
GROUP BY gradeLevel
ORDER BY gradeLevel;

Result:

gradeLevelnumStudents
102
112
121

Query 3: Student(s) with the highest GPA

SELECT firstName, lastName, gpa
FROM Student
WHERE gpa = (SELECT MAX(gpa) FROM Student);

Result: Carol Williams (3.9)

Alternative using subquery comparison:

SELECT firstName, lastName, gpa
FROM Student
WHERE gpa >= ALL (SELECT gpa FROM Student);

Aggregate Functions and GROUP BY

Aggregate functions: operate on a set of values and return a single value.

FunctionDescriptionExample
COUNTNumber of rowsCOUNT(*), COUNT(email)
SUMSum of numeric valuesSUM(gpa)
AVGArithmetic meanAVG(gpa)
MINMinimum valueMIN(gpa)
MAXMaximum valueMAX(gpa)

GROUP BY: Groups rows that have the same values in specified columns. Aggregate functions are applied to each group.

SELECT department, COUNT(*) AS numTeachers, AVG(salary) AS avgSalary
FROM Teacher
GROUP BY department;

HAVING: Filters groups after GROUP BY. HAVING operates on aggregated values; WHERE operates on individual rows before grouping.

SELECT department, COUNT(*) AS numTeachers
FROM Teacher
GROUP BY department
HAVING COUNT(*) >= 3;

Joins

Joins combine rows from two or more tables based on a related column.

INNER JOIN: Returns only rows where there is a match in both tables.

SELECT Student.firstName, Student.lastName, Course.courseName, Enrollment.grade
FROM Student
INNER JOIN Enrollment ON Student.studentID = Enrollment.studentID
INNER JOIN Course ON Enrollment.courseID = Course.courseID;

LEFT JOIN (LEFT OUTER JOIN): Returns all rows from the left table, and matched rows from the right table. Unmatched rows from the right table are filled with NULL.

SELECT Student.firstName, Student.lastName, Course.courseName
FROM Student
LEFT JOIN Enrollment ON Student.studentID = Enrollment.studentID
LEFT JOIN Course ON Enrollment.courseID = Course.courseID;

This returns all students, including those who are not enrolled in any course (their courseName will be NULL).

RIGHT JOIN (RIGHT OUTER JOIN): Returns all rows from the right table, and matched rows from the left table.

FULL OUTER JOIN: Returns all rows from both tables. Unmatched rows are filled with NULL.

SELECT Student.firstName, Course.courseName
FROM Student
FULL OUTER JOIN Enrollment ON Student.studentID = Enrollment.studentID
FULL OUTER JOIN Course ON Enrollment.courseID = Course.courseID;

Self-join: A table joined with itself. Useful for hierarchical data or finding relationships within the same entity.

SELECT e1.firstName AS employee, e2.firstName AS manager
FROM Employee e1
INNER JOIN Employee e2 ON e1.managerID = e2.employeeID;

Comparison of Join Types

Join TypeLeft UnmatchedRight UnmatchedUse Case
INNER JOINExcludedExcludedOnly matching records needed
LEFT JOINIncluded (NULL)ExcludedAll records from left table
RIGHT JOINExcludedIncluded (NULL)All records from right table
FULL OUTER JOINIncluded (NULL)Included (NULL)All records from both tables

Worked Example: Tracing a JOIN Query

Given these tables:

Student (studentID, firstName):

studentIDfirstName
1Alice
2Bob
3Carol

Enrollment (studentID, courseID, grade):

studentIDcourseIDgrade
1101A
1102B
2101C

What is the output of the following query?

SELECT Student.firstName, Enrollment.courseID, Enrollment.grade
FROM Student
LEFT JOIN Enrollment ON Student.studentID = Enrollment.studentID
ORDER BY Student.firstName, Enrollment.courseID;
Solution

A LEFT JOIN returns all rows from Student, matched with Enrollment where possible. Carol (studentID 3) has no enrollments, so her columns from Enrollment are NULL.

Step-by-step matching:

Student.firstNameEnrollment.courseIDEnrollment.gradeMatch?
Alice101AstudentID = 1
Alice102BstudentID = 1
Bob101CstudentID = 2
CarolNULLNULLNo match

Result:

firstNamecourseIDgrade
Alice101A
Alice102B
Bob101C
CarolNULLNULL

Carol appears in the result even though she has no enrollments -- this is the key difference from an INNER JOIN, which would exclude her.

Worked Example: Aggregate Query with HAVING

Given the Enrollment table:

studentIDcourseIDgrade
1101A
1102B
2101C
3103A
3104B
3105A

Write a query to find students enrolled in more than 1 course, showing the studentID and the count of courses.

Solution
SELECT studentID, COUNT(*) AS numCourses
FROM Enrollment
GROUP BY studentID
HAVING COUNT(*) > 1;

Result:

studentIDnumCourses
12
33

Student 2 is excluded because they are enrolled in only 1 course. HAVING filters after grouping; WHERE cannot be used here because COUNT(*) is an aggregate function.

Subqueries

A subquery is a query nested inside another query. It can appear in SELECT, FROM, WHERE, and HAVING clauses.

Subquery in WHERE (comparison):

SELECT firstName, lastName, gpa
FROM Student
WHERE gpa > (SELECT AVG(gpa) FROM Student);

Subquery with IN:

SELECT firstName, lastName
FROM Student
WHERE studentID IN (
SELECT studentID FROM Enrollment WHERE courseID = 101
);

Subquery with EXISTS:

SELECT firstName, lastName
FROM Student s
WHERE EXISTS (
SELECT 1 FROM Enrollment e
WHERE e.studentID = s.studentID AND e.grade = 'F'
);

EXISTS returns TRUE if the subquery returns at least one row. It is efficient because it stops searching as soon as a match is found.

Correlated subquery: The inner query references a column from the outer query. It is evaluated once for each row of the outer query.

SELECT firstName, lastName, gpa
FROM Student s
WHERE gpa > (
SELECT AVG(gpa) FROM Student
WHERE department = s.department
);

This finds students whose GPA exceeds the average GPA of their own department.

Worked Example: Subquery vs JOIN

Find the names of all students who have received a grade of 'A' in at least one course. Write this using (a) a subquery with IN, and (b) a JOIN.

Given:

Student: studentID 1 (Alice), 2 (Bob), 3 (Carol)

Enrollment: (1, 101, A), (2, 101, B), (3, 102, A)

Solution

(a) Subquery with IN:

SELECT firstName, lastName
FROM Student
WHERE studentID IN (
SELECT studentID FROM Enrollment WHERE grade = 'A'
);

The subquery returns {1, 3}. The outer query finds Alice and Carol.

(b) INNER JOIN with DISTINCT:

SELECT DISTINCT Student.firstName, Student.lastName
FROM Student
INNER JOIN Enrollment ON Student.studentID = Enrollment.studentID
WHERE Enrollment.grade = 'A';

DISTINCT is needed because a student could have multiple A grades, which would produce duplicate rows without it.

Both approaches return: Alice, Carol. The JOIN approach is typically more efficient for large datasets because it avoids executing the subquery separately.

Views

A view is a virtual table based on the result of a SELECT query. It does not store data itself; it presents data from underlying tables.

CREATE VIEW HonorRoll AS
SELECT firstName, lastName, gpa
FROM Student
WHERE gpa >= 3.5
ORDER BY gpa DESC;

Updatable views: A view is updatable if it meets certain conditions: it is based on a single table, it does not contain DISTINCT, GROUP BY, HAVING, or aggregate functions, and it includes the primary key of the underlying table.

Non-updatable views: Views that involve joins, aggregations, DISTINCT, or GROUP BY are typically read-only. To modify the data, you must update the underlying tables directly.

Use cases for views: Simplifying complex queries (users query the view instead of writing a complex join each time), security (restrict access to specific columns or rows), data independence (the underlying table structure can change without affecting the view definition).

Transaction Management

A transaction is a logical unit of work that consists of one or more SQL operations. Transactions ensure that the database remains in a consistent state even when failures occur.

ACID Properties

PropertyDescription
AtomicityA transaction is all-or-nothing. Either all operations complete successfully, or none do.
ConsistencyA transaction transforms the database from one valid state to another valid state.
IsolationConcurrent transactions do not interfere with each other. Intermediate states are invisible.
DurabilityOnce a transaction is committed, its effects are permanent even in the event of a system crash.

Atomicity example: A bank transfer deducts USD 100 from account A and adds USD 100 to account B. If the deduction succeeds but the addition fails (e.g., due to a system crash), atomicity ensures that the deduction is rolled back. Both operations must succeed, or neither does.

Isolation levels (from weakest to strongest): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation levels provide stronger consistency guarantees but reduce concurrency.

Transaction Control

BEGIN TRANSACTION;

UPDATE Account SET balance = balance - 100 WHERE accountID = 'A';
UPDATE Account SET balance = balance + 100 WHERE accountID = 'B';

COMMIT;

If an error occurs before COMMIT:

ROLLBACK;

Worked Example: Transaction with ROLLBACK

A bank transfer deducts 100fromaccountAandadds100 from account A and adds 100 to account B. Trace what happens if the second UPDATE fails.

BEGIN TRANSACTION;

UPDATE Account SET balance = balance - 100 WHERE accountID = 'A';
-- balance of A goes from 500 to 400

UPDATE Account SET balance = balance + 100 WHERE accountID = 'B';
-- ERROR: account B does not exist!

-- What happens now?
Solution

Because the second UPDATE fails, the transaction must be rolled back:

ROLLBACK;

After ROLLBACK:

accountIDbalanceState
A500Restored to 500
B--(never existed)

The ACID property Atomicity guarantees that the database returns to its original state. Without ROLLBACK, account A would have lost $100 with no corresponding gain anywhere -- the money would disappear.

If both statements had succeeded:

COMMIT;

The changes would be permanent (Durability).

Common Pitfalls -- Transactions

  • Forgetting BEGIN TRANSACTION: Without explicitly starting a transaction, each SQL statement auto-commits individually. A multi-step operation like a bank transfer would not be atomic.
  • Not handling errors: If an error occurs mid-transaction and ROLLBACK is not called, the transaction remains open. Some databases keep locks held, blocking other users indefinitely.
  • Long-running transactions: Transactions that stay open for a long time hold locks and reduce concurrency. Keep transactions as short as possible.

Referential Integrity

Referential integrity ensures that relationships between tables remain consistent. Specifically, every foreign key value must either be null or match a primary key value in the referenced table.

Cascading Operations

When a referenced row is deleted or updated, the database can automatically propagate the change to related rows:

ActionON DELETE CASCADEON DELETE SET NULLON DELETE RESTRICT
EffectDelete child rowsSet FK to NULLBlock the delete
Use caseDelete order, remove order itemsEmployee leaves, clear departmentCannot delete a teacher who has classes
CREATE TABLE Enrollment (
studentID INTEGER NOT NULL,
courseID INTEGER NOT NULL,
grade VARCHAR(2),
PRIMARY KEY (studentID, courseID),
FOREIGN KEY (studentID) REFERENCES Student(studentID)
ON DELETE CASCADE,
FOREIGN KEY (courseID) REFERENCES Course(courseID)
ON DELETE RESTRICT
);

Deleting a student cascades to delete their enrollment records. Deleting a course that has enrollments is restricted (blocked).

Worked Example: Cascading Deletes

Given the following tables with the Enrollment table defined as above:

Student: studentID 1 (Alice), 2 (Bob) Course: courseID 101 (Math), 102 (Physics) Enrollment: (1, 101), (1, 102), (2, 101)

What happens when you execute DELETE FROM Student WHERE studentID = 1? What happens if you then try DELETE FROM Course WHERE courseID = 101?

Solution

DELETE FROM Student WHERE studentID = 1: Due to ON DELETE CASCADE on the studentID foreign key, deleting Alice also deletes all her enrollment records. After this operation:

Student: studentID 2 (Bob) Course: courseID 101 (Math), 102 (Physics) Enrollment: (2, 101) -- Alice's two enrollment rows are deleted

DELETE FROM Course WHERE courseID = 101: This is blocked due to ON DELETE RESTRICT on the courseID foreign key. Bob still has an enrollment in course 101, so the database refuses to delete the course. The DELETE statement fails with a referential integrity error.

To delete the course, you must first delete (or reassign) all enrollments that reference it:

DELETE FROM Enrollment WHERE courseID = 101;
DELETE FROM Course WHERE courseID = 101;

Data Dictionaries

A data dictionary (also called a system catalog) is a database about the database. It stores metadata: definitions of tables, columns, data types, constraints, indexes, relationships, and users. Every relational database management system (RDBMS) maintains a data dictionary automatically.

Contents of a data dictionary:

  • Table names and descriptions
  • Column names, data types, sizes, and constraints
  • Primary key and foreign key definitions
  • Index definitions
  • View definitions
  • Stored procedure and trigger definitions
  • User permissions and access rights
  • Integrity constraints

The data dictionary is queried using system-specific SQL commands. For example, in SQLite: PRAGMA table_info(Student); in PostgreSQL: SELECT * FROM information_schema.columns WHERE table_name = 'Student'.

Worked Example: Creating a Data Dictionary Entry

For the following CREATE TABLE statement, produce a data dictionary entry for each column:

CREATE TABLE Employee (
empID INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(200) UNIQUE,
salary DECIMAL(10,2) CHECK (salary > 0),
deptID INTEGER,
FOREIGN KEY (deptID) REFERENCES Department(deptID)
);
Solution
Column NameData TypeSizeConstraintsFK Reference
empIDINTEGER--PRIMARY KEY--
nameVARCHAR100NOT NULL--
emailVARCHAR200UNIQUE--
salaryDECIMAL10,2CHECK (salary > 0)--
deptIDINTEGER----Department(deptID)

Big Data

Big data refers to datasets that are too large, too complex, or generated too rapidly to be processed by traditional database systems. The IB syllabus identifies four characteristics known as the four Vs.

The Four Vs

CharacteristicDescriptionExample
VolumeThe sheer scale of data, typically terabytes to petabytesSocial media posts, sensor data, transaction logs
VelocityThe speed at which data is generated and must be processedStock market feeds, IoT sensors, clickstream data
VarietyThe diversity of data types: structured, semi-structured, and unstructuredText, images, video, JSON, XML, audio
VeracityThe quality, accuracy, and trustworthiness of dataNoisy sensor readings, contradictory social media

Structured vs Semi-Structured vs Unstructured Data

TypeDescriptionExamples
StructuredOrganized into rows and columns with a fixed schemaRelational database tables, CSV
Semi-structuredHas some organizational properties but not a rigid schemaJSON, XML, HTML, log files
UnstructuredNo predefined structureImages, video, audio, free text

Traditional relational databases handle structured data well. Big data systems (like Hadoop, NoSQL databases, data lakes) are designed to handle all three types at scale.

Distributed Processing

Big data is typically processed using distributed systems where data is partitioned across multiple machines and processed in parallel. Key frameworks include:

  • MapReduce: A programming model that divides work into map (process individual data items) and reduce (combine results) phases.
  • Hadoop: An open-source framework for distributed storage (HDFS) and processing (MapReduce).
  • NoSQL databases: Non-relational databases (e.g., MongoDB, Cassandra, Redis) that handle unstructured data and scale horizontally.

Common Pitfalls

Database Design

Not normalizing sufficiently: Storing all data in a single table leads to redundancy and anomalies. A student's name appearing in every enrollment row means that changing the student's name requires updating every row (update anomaly).

Over-normalizing: Excessive normalization (beyond 3NF) can lead to too many tables, requiring complex joins that degrade query performance. Sometimes deliberate denormalization is used for performance-critical read-heavy applications.

Missing foreign key constraints: Without foreign key constraints, the database cannot enforce referential integrity. Orphaned records (enrollment records pointing to non-existent students) can accumulate.

Poor primary key design: Using natural keys (e.g., email addresses) as primary keys can cause problems if the key value changes. Surrogate keys (auto-incremented integers) are generally preferred for stability.

SQL

SELECT * in production:** Retrieving all columns when only a few are needed wastes bandwidth and memory. Always specify the columns you need.

Missing WHERE clause in UPDATE/DELETE: UPDATE Student SET gpa = 4.0 without a WHERE clause sets every student's GPA to 4.0. Always verify the WHERE clause before executing UPDATE or DELETE.

Confusing WHERE and HAVING: WHERE filters rows before grouping; HAVING filters groups after grouping. Using WHERE with an aggregate function causes a syntax error.

NULL comparisons: WHERE column = NULL does not work. NULL is not a value; it represents the absence of a value. Use WHERE column IS NULL or WHERE column IS NOT NULL.

Cartesian product: A join without an ON clause (or with incorrect join conditions) produces a Cartesian product: every row from one table combined with every row from the other table. For tables with 1000 rows each, this produces 1,000,000 rows.

String comparison issues: SQL string comparisons are typically case-sensitive in many RDBMS. WHERE name = 'alice' will not match 'Alice'. Use functions like LOWER() for case-insensitive comparisons, or set the column collation appropriately.

Problem Set

Problem 1: ERD Design for a Music Library

A music library has Artists and Albums. An artist can produce many albums; each album has exactly one artist. An album has many Tracks; each track belongs to one album. Identify all entities, relationships with cardinalities, and where foreign keys should be placed.

Solution

Entities: Artist, Album, Track

Relationships:

RelationshipCardinalityForeign Key Location
Artist-Album1:MAlbum table has artistID (FK)
Album-Track1:MTrack table has albumID (FK)

Attributes:

  • Artist: artistID (PK), name, genre
  • Album: albumID (PK), title, releaseYear, artistID (FK)
  • Track: trackID (PK), title, duration, albumID (FK)

If you get this wrong, revise: Entity-Relationship Diagrams


Problem 2: Partial vs Transitive Dependencies

Given a table with columns {orderID, productID, productName, customerID, customerName, quantity, unitPrice}, and the primary key is {orderID, productID}:

  • Is productName a partial dependency? If so, on what?
  • Is customerName a partial dependency? If so, on what?
Solution
  • productName depends only on productID (a proper subset of the PK {orderID, productID}). Yes, productName is a partial dependency.
  • customerName depends only on customerID. But customerID is NOT part of the primary key, so this is not a partial dependency -- it is a transitive dependency: orderID \rightarrow customerID \rightarrow customerName.

Partial dependencies are specifically about non-key attributes depending on a proper subset of the primary key.

If you get this wrong, revise: Functional Dependencies and Second Normal Form


Problem 3: Full Normalization to 3NF

Normalize the following table to 3NF. Show each step.

empIDempNamedeptIDdeptNameprojIDprojNamehoursWorked
E01AliceD01MarketingP01Campaign20
E01AliceD01MarketingP02Research15
E02BobD02SalesP01Campaign30
Solution

1NF: Already atomic values. PK: {empID, projID}.

2NF: Partial dependencies exist:

  • empName depends only on empID
  • deptID and deptName depend only on empID
  • projName depends only on projID

Decompose:

Employee (PK: empID):

empIDempNamedeptID
E01AliceD01
E02BobD02

Project (PK: projID):

projIDprojName
P01Campaign
P02Research

Assignment (PK: {empID, projID}, FK: empID, projID):

empIDprojIDhoursWorked
E01P0120
E01P0215
E02P0130

3NF: Check Employee: empID \rightarrow deptID \rightarrow deptName is a transitive dependency. Decompose:

Employee (PK: empID, FK: deptID):

empIDempNamedeptID
E01AliceD01
E02BobD02

Department (PK: deptID):

deptIDdeptName
D01Marketing
D02Sales

Final tables: Employee, Department, Project, Assignment -- all in 3NF.

If you get this wrong, revise: Normalization


Problem 4: CREATE TABLE with Constraints

Write a CREATE TABLE statement for the Enrollment table with appropriate constraints.

Solution
CREATE TABLE Enrollment (
studentID INTEGER NOT NULL,
courseID INTEGER NOT NULL,
grade VARCHAR(2) CHECK (grade IN ('A', 'B', 'C', 'D', 'F')),
enrollmentDate DATE DEFAULT CURRENT_DATE,
PRIMARY KEY (studentID, courseID),
FOREIGN KEY (studentID) REFERENCES Student(studentID)
ON DELETE CASCADE,
FOREIGN KEY (courseID) REFERENCES Course(courseID)
ON DELETE RESTRICT
);

Key decisions:

  • Composite primary key {studentID, courseID} -- a student can enroll in a course only once
  • CHECK constraint on grade to limit values to valid letter grades
  • ON DELETE CASCADE for student -- deleting a student removes their enrollments
  • ON DELETE RESTRICT for course -- cannot delete a course that has enrollments

If you get this wrong, revise: DDL and Cascading Operations


Problem 5: Top N Students Query

Given the Student table from the worked example above, write a query that returns the names and GPAs of the top 3 students overall.

Solution
SELECT firstName, lastName, gpa
FROM Student
ORDER BY gpa DESC
LIMIT 3;

Result:

firstNamelastNamegpa
CarolWilliams3.9
AliceSmith3.8
EveDavis3.5

If you get this wrong, revise: SELECT Queries


Problem 6: WHERE vs HAVING

What is the difference between WHERE and HAVING? Give an example where using WHERE with an aggregate function would cause an error.

Solution

WHERE filters individual rows before GROUP BY. It cannot reference aggregate functions.

HAVING filters groups after GROUP BY. It can reference aggregate functions.

Incorrect (causes syntax error):

SELECT gradeLevel, COUNT(*) AS numStudents
FROM Student
WHERE COUNT(*) > 1
GROUP BY gradeLevel;

Correct:

SELECT gradeLevel, COUNT(*) AS numStudents
FROM Student
GROUP BY gradeLevel
HAVING COUNT(*) > 1;

The WHERE clause is evaluated before rows are grouped, so aggregate functions like COUNT(*) are not yet defined. HAVING is evaluated after grouping, when aggregates are available.

If you get this wrong, revise: GROUP BY and HAVING


Problem 7: Finding Unenrolled Students

Write a query to find all students who are NOT enrolled in any course.

Solution
SELECT firstName, lastName
FROM Student
WHERE studentID NOT IN (
SELECT studentID FROM Enrollment
);

Alternative using LEFT JOIN:

SELECT Student.firstName, Student.lastName
FROM Student
LEFT JOIN Enrollment ON Student.studentID = Enrollment.studentID
WHERE Enrollment.studentID IS NULL;

Alternative using NOT EXISTS:

SELECT firstName, lastName
FROM Student s
WHERE NOT EXISTS (
SELECT 1 FROM Enrollment e
WHERE e.studentID = s.studentID
);

All three approaches return the same result. NOT EXISTS is typically the most efficient for large datasets because it stops searching as soon as a match is found.

If you get this wrong, revise: Subqueries and Joins


Problem 8: Dangerous UPDATE Statement

What does the following SQL statement do? Is it dangerous?

UPDATE Student SET gpa = 4.0;
Solution

This sets the GPA of every student in the table to 4.0. It is extremely dangerous because there is no WHERE clause to restrict which rows are updated.

The safe version:

UPDATE Student SET gpa = 4.0 WHERE studentID = 3;

Always double-check UPDATE and DELETE statements for a WHERE clause before executing them. In production systems, you can wrap destructive operations in a transaction and use SELECT first to verify which rows will be affected.

If you get this wrong, revise: DML and Common Pitfalls - SQL


Problem 9: NULL Comparison

Explain why WHERE column = NULL does not work and provide the correct syntax.

Solution

NULL represents the absence of a value, not a value itself. The expression column = NULL evaluates to NULL (not TRUE or FALSE), so no rows are ever matched. NULL is not equal to anything, not even itself: NULL = NULL is NULL.

Correct syntax:

-- Find rows where column IS NULL
SELECT * FROM Student WHERE email IS NULL;

-- Find rows where column has a value
SELECT * FROM Student WHERE email IS NOT NULL;

If you get this wrong, revise: Common Pitfalls - SQL


Problem 10: ACID Properties

Explain each of the four ACID properties with a bank transfer example.

Solution

Atomicity: Transferring $100 from A to B involves two operations (debit A, credit B). Either both succeed or neither does. If the system crashes after debiting A, the transaction is rolled back and A's balance is restored.

Consistency: After the transfer, the total balance across all accounts must remain the same. The transfer moves money but does not create or destroy it. The database transitions from one valid state to another.

Isolation: If Alice transfers $100 to Bob at the same time that Carol checks Bob's balance, Carol should see either the balance before the transfer or after -- never a partially completed state where A has been debited but B has not yet been credited.

Durability: Once the COMMIT succeeds, the transfer is permanent. Even if the power fails immediately after, the updated balances will be recovered from the transaction log when the system restarts.

If you get this wrong, revise: ACID Properties


Problem 11: Cartesian Product from Missing ON Clause

What is the result of the following join without an ON clause?

SELECT Student.firstName, Course.courseName
FROM Student
CROSS JOIN Course;

If Student has 5 rows and Course has 3 rows, how many rows are in the result?

Solution

A CROSS JOIN (cartesian product) combines every row from Student with every row from Course. With 5 students and 3 courses, the result has 5×3=155 \times 3 = 15 rows.

Each student appears paired with every course, regardless of whether they are enrolled. This is rarely useful in practice and usually indicates a missing ON clause. A missing ON clause in an INNER JOIN or LEFT JOIN is an error that produces an unintended cartesian product.

If you get this wrong, revise: Joins


Problem 12: Classifying Data Types

Classify each type of data as structured, semi-structured, or unstructured:

  1. A CSV file of employee salaries
  2. A collection of JPEG photographs
  3. A JSON file from a weather API
  4. A patient's free-text medical notes
  5. An XML configuration file
  6. A relational database table
Solution
  1. Structured -- CSV has a fixed schema (columns) and consistent data types.
  2. Unstructured -- JPEG images have no tabular or hierarchical structure.
  3. Semi-structured -- JSON has organizational properties (key-value pairs) but not a rigid schema.
  4. Unstructured -- Free text has no predefined structure or schema.
  5. Semi-structured -- XML has tags providing structure but the schema can vary.
  6. Structured -- Relational tables have fixed columns, data types, and constraints.

If you get this wrong, revise: Big Data


Problem 13: Correlated Subquery

A company has an Employee table with columns empID, name, salary, and departmentID. Write a query that finds employees who earn more than the average salary in their own department. Explain why this requires a correlated subquery.

Solution
SELECT e1.name, e1.salary, e1.departmentID
FROM Employee e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM Employee e2
WHERE e2.departmentID = e1.departmentID
);

This requires a correlated subquery because the average must be computed per department, and the department changes for each row in the outer query. The inner query references e1.departmentID from the outer query, so it is re-evaluated for each employee.

This cannot be written as a simple subquery because a non-correlated subquery returns a single value (the overall average), not the department-specific average.

If you get this wrong, revise: Subqueries


Problem 14: View Creation and Use Cases

A school wants to create a view that shows only the names and grades of students in course 101. Write the CREATE VIEW statement and explain one advantage and one limitation of this view.

Solution
CREATE VIEW Course101Students AS
SELECT Student.firstName, Student.lastName, Enrollment.grade
FROM Student
INNER JOIN Enrollment ON Student.studentID = Enrollment.studentID
WHERE Enrollment.courseID = 101;

Advantage: Security -- teachers of other courses cannot see the full Student table. They only see the subset of data relevant to course 101 through this view.

Limitation: This view is not updatable because it involves a JOIN. You cannot use INSERT, UPDATE, or DELETE through this view. To modify student grades, you must update the Enrollment table directly.

If you get this wrong, revise: Views


Problem 15: IB Exam-Style Comprehensive Question

A school uses the following database schema:

CREATE TABLE Teacher (
teacherID INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(50)
);

CREATE TABLE Club (
clubID INTEGER PRIMARY KEY,
clubName VARCHAR(100) NOT NULL,
teacherID INTEGER,
FOREIGN KEY (teacherID) REFERENCES Teacher(teacherID) ON DELETE SET NULL
);

CREATE TABLE Student (
studentID INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
gradeLevel INTEGER
);

CREATE TABLE Membership (
studentID INTEGER NOT NULL,
clubID INTEGER NOT NULL,
joinDate DATE,
PRIMARY KEY (studentID, clubID),
FOREIGN KEY (studentID) REFERENCES Student(studentID) ON DELETE CASCADE,
FOREIGN KEY (clubID) REFERENCES Club(clubID) ON DELETE CASCADE
);

Answer all parts:

(a) Explain the effect of ON DELETE SET NULL on the Club table's teacherID foreign key. (b) Write a query to find the names of all students who are in at least 2 clubs. (c) Write a query to find the name of each club and the number of members it has, showing only clubs with more than 5 members. (d) If teacherID 3 is deleted from the Teacher table, what happens to any clubs where teacherID = 3?

Solution

(a) If a teacher is deleted, the teacherID column in any Club that referenced that teacher is set to NULL. The club itself is NOT deleted; it simply no longer has a supervising teacher. This allows a club to continue existing after its teacher leaves.

(b)

SELECT Student.name
FROM Student
INNER JOIN Membership ON Student.studentID = Membership.studentID
GROUP BY Student.studentID, Student.name
HAVING COUNT(*) >= 2;

(c)

SELECT Club.clubName, COUNT(*) AS numMembers
FROM Club
INNER JOIN Membership ON Club.clubID = Membership.clubID
GROUP BY Club.clubID, Club.clubName
HAVING COUNT(*) > 5;

(d) The teacherID column in any Club row where teacherID = 3 is set to NULL. The club record is preserved; only the reference to the teacher is cleared. If ON DELETE CASCADE had been used instead, the clubs themselves would have been deleted.

If you get this wrong, revise: Cascading Operations, Joins, and Aggregate Functions