Data integration
1. Data integration
In this video, let's understand the concept of data integration.2. Data integration
You have seen how conditional statements and expression-based transformations are used to create custom columns. The next step is to collect and blend data, which requires understanding data integration.3. What is data integration?
Data integration can be understood in three stages. First, you collect the required columns from various tables, and then you merge some of these columns to simplify your dataset. The next stage is to add context to your data to enhance its quality. At this stage, you can also filter out irrelevant information. In the last stage, you simply reorder the columns and sort the data to derive insights from it.4. Use case - Air travel
Let's understand data integration using an example, considering the air travel use case. You are provided with two tables, one showing the flight information - departure and price. The second table shows the meal preferences of the customers on these flights.5. Append columns
The first step in data integration is to append these columns into one table.6. Meal preference
Next, we see that the meal table lists customers' first and second meal preferences. We need to provide one meal choice per customer, not all preferences. Let's merge these columns into one.7. Meal preference
The column Meal1 shows the customer's first preference, so this becomes our primary column, and the other one becomes the secondary column.8. Merge columns
Using the merge operation, you can create a meal column that defaults to values from the primary column. If the primary value is missing, a value from the secondary column will be chosen.9. Context
Now, this data is collected daily to study trends. It's essential to add the flight date to the table. For context, all flights are from May 7th.10. Constant value column
This can be done by adding a constant value column "Date" with the value "7th May".11. Constant value column
Now that you have enhanced the data, it is a good practice to show the date and ID columns at the beginning of the table.12. Reorder column sequence
This means you are required to reorder the columns for easier data interpretation.13. Reorder column sequence
The date column is moved to the first position.14. Reorder column sequence
The price column can be moved to last position for better inference.15. Reorder column sequence
Since the columns are reordered, sorting the rows is important to attain meaningful insights from the data.16. Sorting
You can sort the data by providing various criteria and order of sorting, ascending or descending. Here, the first criterion is to sort the flights according to their departure city in ascending order.17. Sorting
The sorted table shows detailed information on flights from Berlin in the first half of the table.18. Sorting - add criteria
Further, you can add the second criterion to show the expensive flights at the top. This can be done by sorting by price column in descending order.19. Sorting - add criteria
Two sets of criteria now sort the table. Looking at the table, you can understand which flight was the most expensive, from Berlin.20. Let's practice!
Now that you have understood the data integration concepts let's do some exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.