Integrate and analyze students data
1. Integrate and analyze students data
Let’s see data integration in action using the student's data. There are two tables: the first contains student details, such as name, hours spent in studies, extracurriculars, etc. The second table contains scores and grades. The first task is to append columns from both tables into one table. Search for the column appender node. Connect these tables to the node. In the node dialog, you can choose if the rowID has to be retained or a new one is to be generated. Here, we use the existing rowIDs. Execute the node; the output table now contains columns from both tables. The first column shows preferences for printed report cards, and the second column shows preferences for digital report cards. The logical step is to merge these columns to show student’s preferences for report cards. Drag the column merger node from the node panel. In the node dialog, specify the primary column as Report Card 1 and the secondary as Report Card 2. After this, you can decide if the merged column should replace the primary or secondary column. Alternatively, you can append the merged column as a new one and provide its name. Execute the node and open the output table. The merged column is appended at the end. You know the students' data is collected from various schools to create a central repository. So, it is important to add the school name of each student to their respective rows. You have the context that this data is from St Xavier school, so let's add that information. As this value is constant for all the rows, use the constant value column node to add the school name to each row. In the node configuration, select the append option as we want to add a new column and provide the school's name. Execute the node and notice the output has a new column appended at the end. Next, we can eliminate the irrelevant columns. Use the column filter node to select the required columns. It is a good practice to keep columns that denote similar levels of information together. For example, the school name column can be moved to the beginning. Use the column resorter node to reorder the column sequence. In the node dialog, select the column that has to be moved. In the bottom right, use the arrow keys to move the column up or down in the sequence. Further, let’s sort the table to get some insights. Use the sorter node to sort the table using different criteria. In the node dialog, select the column grade on the first criteria. Execute the node and notice that grades sort the table. Next, we want to know which students earned A grades and scored higher marks in subject 1. So click on the Add sorting criterion button to add a second criterion. Specify the column as subject 1 and order as descending. Execute the node and observe the output table. The first row shows the student who got an A grade and earned the highest score in subject 1.2. Let's practice!
Now that you have gained an understanding of the data integration concepts. Let's do some exercise.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.