Get startedGet started for free

Data integration

1. Data integration

In this video, let's understand the concept of data integration.

2. Data integration

You have seen how conditional statements and expression-based transformations are used to create custom columns. The next step is to collect and blend data, which requires understanding data integration.

3. What is data integration?

Data integration can be understood in three stages. First, you collect the required columns from various tables, and then you merge some of these columns to simplify your dataset. The next stage is to add context to your data to enhance its quality. At this stage, you can also filter out irrelevant information. In the last stage, you simply reorder the columns and sort the data to derive insights from it.

4. Use case - Air travel

Let's understand data integration using an example, considering the air travel use case. You are provided with two tables, one showing the flight information - departure and price. The second table shows the meal preferences of the customers on these flights.

5. Append columns

The first step in data integration is to append these columns into one table.

6. Meal preference

Next, we see that the meal table lists customers' first and second meal preferences. We need to provide one meal choice per customer, not all preferences. Let's merge these columns into one.

7. Meal preference

The column Meal1 shows the customer's first preference, so this becomes our primary column, and the other one becomes the secondary column.

8. Merge columns

Using the merge operation, you can create a meal column that defaults to values from the primary column. If the primary value is missing, a value from the secondary column will be chosen.

9. Context

Now, this data is collected daily to study trends. It's essential to add the flight date to the table. For context, all flights are from May 7th.

10. Constant value column

This can be done by adding a constant value column "Date" with the value "7th May".

11. Constant value column

Now that you have enhanced the data, it is a good practice to show the date and ID columns at the beginning of the table.

12. Reorder column sequence

This means you are required to reorder the columns for easier data interpretation.

13. Reorder column sequence

The date column is moved to the first position.

14. Reorder column sequence

The price column can be moved to last position for better inference.

15. Reorder column sequence

Since the columns are reordered, sorting the rows is important to attain meaningful insights from the data.

16. Sorting

You can sort the data by providing various criteria and order of sorting, ascending or descending. Here, the first criterion is to sort the flights according to their departure city in ascending order.

17. Sorting

The sorted table shows detailed information on flights from Berlin in the first half of the table.

18. Sorting - add criteria

Further, you can add the second criterion to show the expensive flights at the top. This can be done by sorting by price column in descending order.

19. Sorting - add criteria

Two sets of criteria now sort the table. Looking at the table, you can understand which flight was the most expensive, from Berlin.

20. Let's practice!

Now that you have understood the data integration concepts let's do some exercises.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.