Writing an iterator to load data in chunks (2)
In the previous exercise, you used read_csv()
to read in DataFrame chunks from a large dataset. In this exercise, you will read in a file using a bigger DataFrame chunk size and then process the data from the first chunk.
To process the data, you will create another DataFrame composed of only the rows from a specific country. You will then zip together two of the columns from the new DataFrame, 'Total Population'
and 'Urban population (% of total)'
. Finally, you will create a list of tuples from the zip object, where each tuple is composed of a value from each of the two columns mentioned.
You're going to use the data from 'ind_pop_data.csv'
, available in your current directory. pandas
has been imported as pd
.
This exercise is part of the course
Python Toolbox
Exercise instructions
- Use
pd.read_csv()
to read in the file in'ind_pop_data.csv'
in chunks of size1000
. Assign the result tourb_pop_reader
. - Get the first DataFrame chunk from the iterable
urb_pop_reader
and assign this todf_urb_pop
. - Select only the rows of
df_urb_pop
that have a'CountryCode'
of'CEB'
. To do this, compare whetherdf_urb_pop['CountryCode']
is equal to'CEB'
within the square brackets indf_urb_pop[____]
. - Using
zip()
, zip together the'Total Population'
and'Urban population (% of total)'
columns ofdf_pop_ceb
. Assign the resulting zip object topops
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv(____, ____)
# Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(____)
# Check out the head of the DataFrame
print(df_urb_pop.head())
# Check out specific country: df_pop_ceb
df_pop_ceb = df_urb_pop[____]
# Zip DataFrame columns of interest: pops
pops = zip(____, ____)
# Turn zip object into list: pops_list
pops_list = list(pops)
# Print pops_list
print(pops_list)