DataFrames and their methods
1. DataFrames and their methods
In just 3 lines of code, you're already looking at your data,2. Where we left off
represented in Python as a pandas DataFrame.3. Anatomy of a pandas DataFrame
For you, this should look similar to spreadsheet data.4. Anatomy of a pandas DataFrame
At the top of the DataFrame we have our column names.5. Anatomy of a pandas DataFrame
Note, each column is one unique data type. The price column here is numeric. We can expect to perform mathematical operations on it later.6. Anatomy of a pandas DataFrame
The color column is populated with text entries.7. Anatomy of a pandas DataFrame
Each row in this DataFrame is a specific observation of a fruit's name, color,8. Anatomy of a pandas DataFrame
and price in US dollars.9. Anatomy of a pandas DataFrame
Finally on the left is the DataFrame's index. The index is a powerful component of the pandas DataFrame but beyond the scope of this course. Moving forward, I'll explicitly share when we're performing an action to avoid working with the index.10. DataFrame methods
Just like we used the dot to access functions in the pandas package, like pd-dot-read-excel, we use the dot to access methods associated with DataFrames. DataFrame methods are like functions, but accessed from within our DataFrame object, with the dot. Let's take a closer look at each of these common DataFrame methods.11. The .head() method
The dot-head method allows us to look at the first few rows of our DataFrame. It's very useful for when you have hundreds or thousands of rows of data, but only want to look at the first few. By default, this method will display the first 5 rows of our DataFrame. You can see on the left, in our last line of code, we accessed this method by writing fruit-dot-head, followed by a set of parentheses. We place fruit-dot-head inside the print function so our results will display in the console.12. The .head() method
We can pass an optional argument to the dot-head method if we wish to display an alternative number of rows. Here, we pass a 2 in order to display just the first two rows of data.13. The .info() method
The dot-info method provides us with details on the number of entries, or rows, in our DataFrame, the total number of columns, the name of each column, and the data type of each column. Here on the right, we can see that our DataFrame has 8 rows, 3 columns, 2 columns with an object, or text data type, and one column with a float64, or numerical, data type. int64 is another common numerical data type. In short, int64 represents whole numbers, and float64 signifies numbers with decimal places.14. The .describe() method
The dot-describe method provides us with summary statistics for any numerical column in our DataFrame. Here, we see the mean, or average, price for fruit in our data is around 2-point-28, and the max price is 5-point-27.15. The .sort_values() method
Finally, the sort-underscore-values method allows us to rearrange the rows in our DataFrame based on a column. Here, we've used sort-underscore-values to alphabetize our DataFrame according to the name column. In the code, you will also notice we've used the reset-underscore-index method. This is done so that our index remains ordered. In the exercises, this will be done for you.16. The .sort_values() method
We can also sort values in descending order by passing ascending equals False to the sort-underscore-values method. This code chunk outputs a DataFrame of the most expensive fruits in our dataset. Also, note how we keep redefining fruit. First, fruit equals the data we load in from our file, then, fruit equals the data sorted by price, and so on.17. Your turn!
Now it's your turn to put some methods to work.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.