Advanced Data Manipulation Techniques
1. Advanced data manipulation techniques
Welcome to the final video of this chapter. Now, we'll explore advanced data manipulation techniques that will allow us to perform complex transformations and analyses on our datasets.2. Column removal and addition
Changing table structures is a common step in data preparation. We might want to drop unnecessary columns using the .drop() method, or we may want to add pre-computed or categorical columns. You have seen how to add columns previously in Chapter 1 using the .addColumns() method, but remember that new columns must have the same number of rows as the table they're being added to. In the example below, we first remove the TempID and Notes columns that are no longer needed using the .drop() method. This creates a new table, cleaned, and the original dataTable is unchanged. We then use addColumns to add two new columns to the table, dataTable. Again, this creates a new table called enhanced.3. Row filtering with .dropWhere()
While .where() keeps rows matching a condition, .dropWhere() does the opposite - it removes matching rows. This is particularly useful for data cleaning tasks like removing outliers or invalid entries. The Selection mechanism we learned earlier is used to specify which rows to remove. In this example, we first create a selection condition with both lower and upper bounds for the Value column, and anything outside these bounds will be our outliers. Then, we pass that selection condition into dropWhere() to perform the operation where we drop any of our outliers outside the bounds.4. Row counting
Another important method which we have already covered briefly is the rowCount() method, which returns the number of rows in a table. This is very useful to look at after performing some row filtering or data cleaning. In this example, we can compare the row counts between the original table and the cleaned table, by calling rowCount on both tables and printing the result, and we can see that five rows have been dropped.5. Boolean filtering
We have seen Boolean filtering operations before, but to recap quickly, they let us create complex conditions. We can combine multiple Selection objects using the and(), or(), and not() methods. In the first example, we use the and condition to ensure that both the Department is equal to Technology, AND the Salary is greater than 100,000. In our short second example, we use the not method to get the inverse of this selection condition.6. Transformation with .map()
The map() method updates values in a column by applying a function to each one. It's a cleaner and shorter alternative to using a loop. We can use a lambda expression to define the function and keep our code easy to read. In the first example, we are transforming the Name column in the table, setting all values in that column to be upper case. In this second example, we first transform the Price column by applying a modifier across all the values in that column, multiplying them all by 0.9, and then we name and add the column to the table.7. Summary
This slide shows a short summary of some of the methods that we have seen in this lesson. The drop, dropWhere, addColumns, and map methods are extremely useful when working with data.8. Let's practice!
Well done on finishing this chapter and covering these advanced methods of data manipulation. Let's finish off with some exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.