Membership
Main Menu
Forum Boards
Stats
- 20 tutorials
- 74,660 members
- 732,635 forum posts
- 13 blog posts
Tutorials
Data Joins Unions
Views: 25290
The Data
After weeks of intensive data analysis and data modelling we present our redesigned database to the headmaster. (OK, it took a few seconds but if you were doing this on a corporate scale it is going to take that timescale, identifying all the processes and data elements and how they relate to one another. Get this bit right and life is a lot easier down the line.)
We have immediately identified three main items of data (entities), namely pupil, house and subject and these will be our main tables. We give each entity a unique identifier (ID). From the spreadsheet you might have counted four entities but a pupil's house-master depends on the house to which the pupil belongs, and is therefore an item in the house table.
house pupil subject ============ =========== ========== houseID pupilID subjectID house_name pupil_name subject house_master
The Data Relationships
We look at each pair of entities and consider how they are related.
House - subject
We will assume there is no relationship between house and subject. Our examination of the processes showed no requirement. If Beetsem-Daley had wanted, say, a league table of house results for each subject then there would be. (As a professional, it's my duty to make him aware of the potential though. No harm in generating extra work).
House - pupil
A house has many pupils. A pupil belongs to one house. Here we have a simple one-to-many relationship between the two. We handle this by adding the houseID to the pupil record so we know to which house each pupil belongs. (Computers handle numbers much more efficiently than text so we will will use these numeric ids to link our data.)
Pupil - subject
A pupil can choose many subjects. A subject can be chosen by many pupils. Here we have a many-to-many relationship. We handle these by creating another entity (in this case we'll call it choice) to link pupil and subject. Each choice will contain the pupilID and the subjectID of a chosen subject. Actually, a subject can be chosen by none, one or many pupils. This doesn't affect our table design but it will affect how we query the data (more of which later).
The Database Schema
Our final design, showing the entity relationships, looks like this
house pupil choice subject ============ =========== ========== ========== houseID --+ pupilID ---+ id +---- subjectID house_name | pupil_name +---< pupilID | subject house_master +--< houseID subjectID >--+
Notice that our many-to-many relationship between pupil-subject has now become two simple one-to-many relationships between pupil-choice and subject-choice. In our original spreadsheet we could see which subjects were chosen by a pupil. Now, by looking at the same relationship in the opposite direction, we can easily find which pupils chose a particular subject.
Had we been required to store who teaches each subject then there would also be a teacher entity so that the teacher's name (and other teacher details) would be stored in only one place. In this case the house-master item would have been a teacherID linked to the teacher table and each subject would have a "taught_by" item, also linked to the teacher table.
Now we can start querying our database using joins.
