Get startedGet started for free

The importance of data normalization

1. The importance of data normalization

This chapter will serve as an introduction to an important topic in database design. Data normalization is performed on the tables within a database to protect from data anomalies and ensure the integrity of data.

2. Example 1: redundant data

To introduce this topic, let's begin with an example. One important point to keep in mind is that duplication of data in different tables can be problematic. A loan table for the SBA was described in Chapter 2. Imagine that we defined the loan table with a FOREIGN KEY referencing the borrower table. This table definition also includes a variable length character column for the name of the bank as well as other columns. A bank table exists in this version of the SBA database which includes a bank name.

3. Example 1: redundant data

It is expected that banks represented in the loan table will have a corresponding record in the bank table. One problem with such a table structure arises when two different bank entities have the same name. This would make it difficult to determine which bank is referenced by the name in the loan table. Another problem could arise if one bank is acquired by a larger bank. We would like all records referencing the acquired bank to have the name of the acquiring bank. It would be quite easy to fail to update some loan records to include the acquiring bank's name leading to inaccuracies in the loan data.

4. Example 1: redundant data

This structure can be fixed to avoid such problems. One fix is to only reference a bank entity through a FOREIGN KEY. Such a change results in a new definition of the loan table. Using a FOREIGN KEY that references the id column of the bank table provides the following benefits. Banks can have the same name but will have distinct records in the bank table. A change to the bank name due to one bank being acquired by another can be reflected through an update to one record in the bank table.

5. Example 2: consolidating records

Here is another example. In this course, we have defined both an applicant and borrower table.

6. Example 2: consolidating records

However, given the definition of these objects, an applicant can become a borrower once a loan is approved. When the loan of Pam Jones is approved a record is added to the borrower table. Such a table structure results in data duplication.

7. Example 2: consolidating records

One way to handle this scenario would be to delete the applicant record once the applicant is approved and becomes a borrower.

8. Example 2: consolidating records

This reduces data duplication. However, we cannot be certain that the applicant record will be deleted. A server failure when this transaction takes place could leave the duplicated data in place.

9. Example 2: consolidating records

This potential scenario can be avoided. Because the data on an applicant and borrower can represent the same entity, a single table can be defined named borrower.

10. Example 2: consolidating records

A boolean column approved can be included in this table to distinguish an applicant from a borrower. A NULL value represents an applicant. A true value represents a borrower with a valid loan. And a false value represents an applicant denied a loan.

11. Why normalize data?

To clarify why we are exploring this topic, data normalization has the following benefits. Data duplication is reduced. Data consistency is increased so that entities do not have conflicting data in different tables. Data is organized so that data objects better map to table records. For instance, all data about a particular borrower is contained in the borrower table.

12. Let's practice!

Now that you have been introduced to data normalization, let's take this opportunity to practice using this knowledge.