Shaping tables
1. Shaping tables
In this second lesson, you'll learn about database normalization and table operations.2. Database normalization
Database normalization is a set of rules and processes for data modeling. You could say it's similar to organizing or designing a database. Normalization has a few key goals. First, we want to remove redundant data. That is, when the same information is stored in multiple places. Reducing redundancy can shrink down file sizes and avoids data inconsistency. The final goal is to build out a design which reflects the real world as it is, including how business entities inter-relate. As a result, the data is separated over several tables, and are connected through relationships: a link from one table to another. Relationships are defined in Power BI.3. Data shaping in Power Query
Power Query includes several data shaping operations to get closer to a normalized data model. We'll cover four key techniques for shaping data: splitting columns, extracting values from columns, merging together queries, and appending queries to one another. Note that all of these are always performed in Power Query. These are the most common practices, but not the only ones possible.4. 1. Column splitting
Column splitting breaks one column out into multiple columns based on split criteria. It could be after a delimiter, such as a comma or tilde. For serial numbers, we might split after a number of characters, for instance, the first 5 characters representing the brand and the next 3, the product. In the example on the slide you notice "3A" got split into "3" and "A".5. 2. Column extraction
Column extraction takes columns from one table and breaks them out into multiple tables. When extracting a column we can define keys for the new tables. A key is a column or set of columns which make a particular row unique in a table. On the original table, we retain that key column so we can create a relationship between the tables. In Power BI, you can define these relationships in the Model view. With column extraction, we retain just the distinct values of these breakout tables, which shrinks the total data model size and reduces redundancy.6. 3. Query appending
You can append a query to combine the contents of two or more tables into a single table. It matches rows based on the column names, so watch out for differing headers. It is equivalent to a UNION ALL statement in SQL.7. 4. Query merging
Query merging is the last option of how you can shape data in Power BI. There are different ways to join tables together using keys, and it is very similar to joining tables in SQL. Don't worry if you're not that familiar with this. It is covered in detail in other DataCamp courses, but it's good to know this option exists.8. Let's practice!
I'll see you in the instruction demo!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.