Get startedGet started for free

Working with data types

1. Working With Data Types

Now that we've reviewed pandas techniques for exploring data and dropping missing values, we need to start thinking about other steps we have to take in order to prepare data for modeling.

2. Why are types important?

One of these steps is to think about the types that are present in your dataset, because we'll likely have to transform some of these columns to other types later on. Recall that you can check the types of a DataFrame by using the info method. pandas data types are similar to native Python types, but there are a couple of subtle differences. The most common types are object, int64, and float64 types. The object type is what pandas uses to refer to a column that consists of string values or contains a mixture of types. int64 and float64 are equivalent to the Python integer and float types, where the 64 refers to the allocation of memory alloted for storing the values, in this case, the number of bits. datetime64 is another common data type that stores date and time data. This special data type unlocks a bunch of extra functionality for working with time series data, such as datetime indexing, adding timezone information, and selecting a datetime sampling frequency. For this course, though, we'll stick to objects, integers, and floats. Before any preprocessing can begin, we have to understand the data types of our features. Sometimes, when importing datasets, pandas accidentally assigns an incorrect or inappropriate data type to a column, which will need to be converted.

3. Converting column types

Let's take a look at how to convert the type of a column if the type that pandas has inferred its type incorrectly. Here we have a simple dataset with a couple of columns. If we call the dot-info method, we can see that the type for column C is object. However, if we look at this DataFrame, we can see that C contains float values: numbers with decimal points. If we want to preprocess and model this data, we're going to have to convert the column type.

4. Converting column types

The pandas astype method can be used to convert a column's data type to a specified data type. We need to reassign the column to overwrite the original data type when converting it, as shown here. Before converting a column, be extra careful that all of the values it contains can be appropriately converted into this new data type.

5. Let's practice!

Alright, now it's your turn to do some type conversion.