Load a portion of a spreadsheet
Spreadsheets meant to be read by people often have multiple tables, e.g., a small business might keep an inventory workbook with tables for different product types on a single sheet. Even tabular data may have header rows of metadata, like the New Developer Survey data here. While the metadata is useful, we don't want it in a dataframe. You'll use read_excel()
's skiprows
keyword to get just the data. You'll also create a string to pass to usecols
to get only columns AD and AW through BA, about future job goals.
pandas
has been imported as pd
.
This exercise is part of the course
Streamlined Data Ingestion with pandas
Exercise instructions
- Create a single string,
col_string
, specifying thatpandas
should load columnAD
and the rangeAW
throughBA
. - Load
fcc_survey_headers.xlsx'
, settingskiprows
andusecols
to skip the first two rows of metadata and get only the columns incol_string
. - View the selected column names in the resulting dataframe.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create string of lettered columns to load
col_string = ____
# Load data with skiprows and usecols set
survey_responses = ____("fcc_survey_headers.xlsx",
____,
____)
# View the names of the columns selected
print(survey_responses.columns)