1. Normalized and denormalized databases
Welcome back! Now that we have a grasp on normalization, let's talk about why we would want to normalize a database.
2. Back to our book store example
You should be familiar with these two schemas by now. They're both storing fictional company data on the sales of books in bulk to stores across the US and Canada.
On the left, you have the star schema with denormalized dimension tables.
On the right, you have the snowflake schema with normalized dimension tables.
The normalized database looks way more complicated. And it is in some ways.
For example, let's say you wanted to get the quantity of all books by Octavia E. Butler sold in Vancouver in Q4 of 2018.
3. Denormalized query
Based on the denormalized schema, you can run the following query to accomplish this.
It's composed of 3 joins, which makes sense based on the three dimension tables in the star schema.
4. Normalized query
What would the query look like on the normalized schema?
A lot longer. It doesn't even fit one slide!
5. Normalized query (continued)
There's a total of 8 inner joins.
This makes sense based on the snowflake schema diagram. The normalized snowflake schema has considerably more tables. This means more joins, which means slower queries.
So why would we want to normalize a database?
6. Normalization saves space
Normalization saves space.
This isn't intuitive seeing how normalized databases have more tables.
Let's take a look at the store table in our denormalized database. Here we see a lot of repeated information in bold - such as USA, California, New York, and Brooklyn.
This type of denormalized structure enables a lot of data redundancy.
7. Normalization saves space
If we normalize that previous schema, we get this:
We see that although we are using more tables, there is no data redundancy. The string, Brooklyn, is only stored once. And the state records are stored separately because many cities share the same state, and country. We don't need to repeat that information, instead, we can have one record holding the string California.
Here we see how normalization eliminates data redundancy.
8. Normalization ensures better data integrity
Normalization ensures better data integrity through its design.
First, it enforces data consistency. Data entry can get messy, and at times people will fill out fields differently. For example, when referring to California, someone might enter the initials "CA". Since the states are already entered in a table, we can ensure naming conventions through referential integrity.
Secondly, because duplicates are reduced, modification of any data becomes safer and simpler. Say in the previous example, you wanted to update the spelling of a state - you wouldn't have to find each record referring to the state, instead, you could make that change in the states table by altering one record. From there, you can be confident that the new spelling will be enacted for all stores in that state.
Lastly, since tables are smaller and organized more by object, its easier to alter the database schema. You can extend a smaller table without having to alter a larger table holding all the vital data.
9. Database normalization
To recap, here are the pros and cons of normalization. Now normalization seems appealing, especially for database maintenance.
However, normalization requires a lot more joins making queries more complicated, which can make indexing and reading of data slower.
Deciding between normalization and denormalization comes down to how read- or write- intensive your database is going to be.
10. Remember OLTP and OLAP?
Remember OLTP and OLAP? Can you guess which prefers normalization?
Take a pause and think about it.
Did you get it right?
OLTP is write-intensive meaning we're updating and writing often. Normalization makes sense because we want to add data quickly and consistently.
OLAP is read-intensive because we're running analytics on the data. This means we want to prioritize quicker read queries. Remember how much more joins the normalized query had over the denormalized query? OLAP should avoid that.
11. Let's practice!
Let's see how much you've learned!