1. Data types and data merging
In this lesson, we will talk about various techniques to manipulate data using Pandas.
2. Common data types
Each column in a pandas DataFrame has a specific data type. Some of the common data types are strings (which are represented as objects), numbers, boolean values (which are True/False) and dates.
3. Data type of a column
You can use the dtype attribute if you are interested in knowing the data type of a single column.
4. Changing the data type of a column
To change the data type of a column, you can use the astype() method. For example, you saw on the earlier slide that the converted column is stored as an object. It contains True and False values, so it's more appropriate to store it as a boolean. You can use the astype() method along with the argument 'bool' as shown here to change its data type.
If you check the data type of the 'converted' column again, you will see that it's now 'bool'.
5. Creating new boolean columns
The marketing_channel column captures the channel a user saw a marketing asset on. Say you want to have a column that identifies if a particular marketing asset was a house ad or not.
You can use numpy's where() function to create a new boolean column to establish this. The first argument is an expression that checks whether the value in the marketing_channel column is 'House Ads', the second argument is the value you want to assign if the expression is true, and the third argument is the value you want to assign if the expression is false.
6. Mapping values to existing columns
Due to the way pandas stores data, in a large dataset, it can be computationally inefficient to store columns of strings. In such cases, it can speed things up to instead store these values as numbers.
To create a column with channel codes, build a dictionary that maps the channels to numerical codes. Then, use the map() method on the channel column along with this dictionary, as shown here.
7. Date columns
Often, you will have date columns that are improperly read as objects by pandas. However, as you will see in the following lessons, having date columns properly imported as the datetime datatype has several advantages.
You have two options to ensure that certain columns are treated as dates. First, when importing the dataset using the read_csv() function, you can pass a list of column names to the parse_dates argument to ensure that these columns are correctly interpreted as date columns.
8. Date columns
Another option is to use the pandas' datetime() function to convert a specific column.
9. Date columns
Once the dates in the column are properly imported, you can use various date attributes to extract relevant information.
For example, to obtain the day of the week, you can use the dayofweek attribute along with the dt accessor on the date column. This will result in a numerical value where 0 maps to Monday, 1 to Tuesday, and so on.
10. Let's Practice!
It's time for you to practice these concepts.