Writing an iterator to load data in chunks (2)
In the previous exercise, you used read_csv() to read in DataFrame chunks from a large dataset. In this exercise, you will read in a file using a bigger DataFrame chunk size and then process the data from the first chunk.
To process the data, you will create another DataFrame composed of only the rows from a specific country. You will then zip together two of the columns from the new DataFrame, 'Total Population' and 'Urban population (% of total)'. Finally, you will create a list of tuples from the zip object, where each tuple is composed of a value from each of the two columns mentioned.
You're going to use the data from 'ind_pop_data.csv', available in your current directory. pandas has been imported as pd.
This exercise is part of the course
Python Toolbox
Exercise instructions
- Use
pd.read_csv()to read in the file in'ind_pop_data.csv'in chunks of size1000. Assign the result tourb_pop_reader. - Get the first DataFrame chunk from the iterable
urb_pop_readerand assign this todf_urb_pop. - Select only the rows of
df_urb_popthat have a'CountryCode'of'CEB'. To do this, compare whetherdf_urb_pop['CountryCode']is equal to'CEB'within the square brackets indf_urb_pop[____]. - Using
zip(), zip together the'Total Population'and'Urban population (% of total)'columns ofdf_pop_ceb. Assign the resulting zip object topops.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv(____, ____)
# Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(____)
# Check out the head of the DataFrame
print(df_urb_pop.head())
# Check out specific country: df_pop_ceb
df_pop_ceb = df_urb_pop[____]
# Zip DataFrame columns of interest: pops
pops = zip(____, ____)
# Turn zip object into list: pops_list
pops_list = list(pops)
# Print pops_list
print(pops_list)