1. Getting data from multiple worksheets
You can now create dataframes from a single spreadsheet, or from parts of one. It's fairly common for Excel files to contain multiple sheets of data though. In this lesson, you'll learn how to specify sheets to get data from, as well as a workflow to combine multiple sheets into a single dataframe.
2. Selecting Sheets to Load
Although the terms "spreadsheet" and "Excel file" are often used interchangeably, an Excel file can be a workbook with multiple sheets, containing data as well as charts and metadata. We'll focus on sheets with tabular data.
By default, read Excel pulls data from the first sheet in a workbook only.
This behavior can be changed with the sheet name keyword argument.
Sheet name accepts the name of the desired sheet as a string, an integer indicating the desired sheet's position in the workbook,
or a list containing any combination of sheet names and position numbers to read.
Two things to note are that sheets are zero-indexed, and that if you load multiple sheets at once, any other arguments passed to read Excel apply to all sheets. For example, if you set nrows to 50, the first 50 rows of each sheet listed in sheet name will be loaded. If sheets need different parameters, load them with separate read Excel calls.
3. Selecting Sheets to Load
The New Developer Survey file actually contains two sheets: one with responses from 2016, another with responses from 2017. We already got 2016's data, since it's the first sheet. Let's get 2017's.
4. Loading Select Sheets
To get the second sheet containing 2017's responses, we can either pass its position index, 1, to sheet name,
or the actual sheet name, 2017.
Either method gets the same results, as shown when we check if the two dataframes are equal.
5. Loading All Sheets
What if we wanted to get all sheets in a workbook at once? We could list them all, but the easier way is to pass None to sheet name.
Let's do this for the survey data and check the data type of the result.
Instead of getting a dataframe, we got an ordered dictionary!
Iterating through the items reveals that the keys are sheet names, and the values are dataframes for each sheet.
6. Putting It All Together
We have one dataframe per sheet, which is great if sheets have different columns or describe different subjects. For the survey data, each sheet has the same columns, just for different years. Let's combine them into one dataframe.
First, we create an empty dataframe, all responses.
Then, we loop through the items in the survey responses dictionary. Remember, each value is a dataframe corresponding to a worksheet, and each key is a sheet name. For each dataframe, we add a column, Year, containing the sheet name, so we know which dataset a record came from.
Finally, we add the dataframe to all responses by passing a list of dataframes to combine to the concat function.
We can check unique values in the year column to confirm that all responses has both years.
In this case, we only combined two sheets, so we could have concatenated them outside of the loop, but looping has the advantage of working for any number of sheets.
7. Let's practice!
Now it's your turn to practice working with multiple sheets. Good luck!