14:08 Objectives of Normalization
“Codd specifically complained about SQL.”
The reason for first order normalization is to allow data to be queried and manipulated via a “universal data sub-language” (Example: SQL). Codd listed four objectives for normalization beyond first order. These were listed in his book “Further Normalization of the Data Base Relational Model.”
To free the collection of relations from undesirable insertion, update and deletion dependencies.
To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.
To make the relational model more informative to users.
To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
16:27 Reasons To Normalize
The first reason you’d want to normalize is to minimize redesign when adding to the schema. Normalized schemas allow for new types of data. Existing tables and structures do not need to change.
“I totally just called Bill Gates ‘Gatesy’!”
Update errors or anomalies occur when you have the same information in multiple rows. Updates to one relational area require changes in multiple rows. For example if you have an order form with customer information (Name, Address, Company, etc) on each order from that customer if they change address you have to change it in all the rows.
Flattened tables may not allow for partial information to be inserted. This may result in being unable to store some information.
When deleting data you may have to remove more than intended. Removing some information requires removing unrelated information.
21:00 First Normal Form (1NF)
“They’re descriptions of different levels of normalization.”
The first normal form starts the normalization of data by requiring no repeating elements or groups of elements. Remove duplicate columns from the same table. Create separate tables for each group of related data. Identify each set of related data with a primary key.
The values in each field of the database must be atomic. They cannot be broken down into individual parts. This can be complicated because the atomicity of a value may be business dependent. A person’s name can be broken into first, last, middle. A string can have sub-strings, a fixed-point can be an integer and a fraction, etc.
Chris Date updated Codd’s definition to say that a table is first normal if it is isomorphic to another relationship. There’s no top-to-bottom ordering to the rows. Each row has a unique primary key so it is not like a spreadsheet where you can have subsequent rows associated with each other. There’s no left-to-right ordering to the columns. There are no duplicate rows. Every row-and-column intersection contains exactly one value from the applicable domain. All columns are regular. Rows have no hidden components such as row IDs, object IDs, or hidden time-stamps. Breaking it down to functional units of information.
28:05 Second Normal Form (2NF)
The second normal form continues to remove duplicate information by not allowing partial dependencies. Separate data that applies to multiple rows into different tables. Use foreign keys to create relationships between tables.
All columns that are not the primary key are dependent on the primary key. Primary key is a unique identifier for each row in a table. In order to find a particular bit of information in a row you must know it’s primary key.
Each column should describe what the primary key identifies. This groups data into logical dependencies or table with associated information. You have customer information separate from product information and join them based on their primary keys.
30:40 Third Normal Form (3NF)
The third normal form adds to the second by not allowing non-transitive dependencies on anything but the primary key. If it can exist with out the primary key then it doesn’t belong in that table. Remove dependencies between columns.
Dictionary definition for Transitive: “if it applies between successive members of a sequence, it must also apply between any two members taken in order”
“Otherwise you don’t really have a primary key.”
Basically the relationship between elements has to be the same across the table and to the primary key. What this means is that all the columns are dependent only the primary key and no other columns in the table.
33:04 Boyce-Codd Normal Form (BCNF or 3.5NF)
Boyce-Codd normal form adds to third normal form by requiring that all determinants are candidate keys. Every dependency must either be trivial or contain a super key. Trivial functional dependency would mean one is a subset of the other. A super key is a unique identifier or set of identifiers to the entire schema. A candidate key is the minimal information needed to create a super key. When you do this all redundancy based on functional dependency has been removed.
Informally expressed as: “Each attribute must represent a fact about the key, the whole key, and nothing but the key.”
This form was created by Raymond Boyce and Edgar Codd in 1974 to address certain anomalies not covered by 3NF. Most third normal form tables will meet criteria for boyce-codd. A 3NF with no overlapping candidate keys will meet criteria. Multiple overlapping candidate keys may cause it to not depending on the functional dependencies.
35:55 Beyond Third Normal Form
“I remember thinking in class, ‘When am I ever gonna use this’ and it turns out…never!”
The previous normal forms have been concerned with functional dependencies fourth normal (4NF) focuses on multi-valued dependency. A dependency is when information in a table determines other information in the same table. A multi-valued dependency occurs when one or more rows in a table imply the presence of more rows in the same table. For any non-trivial multi-valued dependency there must be a super key involved.
Fifth Normal Form (5NF) isolates semantically related relationships. A table is in fifth normal form if and only if all non-trivial join dependencies are implied by the candidate keys. This can only be the case if each join dependency is a super key. It’s rare for a 4NF to not meet criteria for 5NF.
A table is in sixth normal form (6NF) if and only if it all of it’s join dependencies are trivial. The idea is to break the data down into irreducible components. This can become important when adding in historical data or changes over time. It is used in some data warehouses where the benefits of less storage space outweigh the cost to retrieval by having so many tables.