Writing an iterator to load data in chunks (1)
Another way to read data too large to store in memory in chunks is to read the file in as DataFrames of a certain length, say, 100. For example, with the pandas package (imported as pd
), you can do pd.read_csv(filename, chunksize=100)
. This creates an iterable reader object, which means that you can use next()
on it.
In this exercise, you will read a file in small DataFrame chunks with read_csv()
. You're going to use the World Bank Indicators data 'ind_pop.csv'
, available in your current directory, to look at the urban population indicator for numerous countries and years.
This exercise is part of the course
Python Toolbox
Exercise instructions
- Use
pd.read_csv()
to read in'ind_pop.csv'
in chunks of size 10. Assign the result todf_reader
. - Print the first two chunks from
df_reader
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the pandas package
import pandas as pd
# Initialize reader object: df_reader
df_reader = ____(____, ____)
# Print two chunks
print(____)
print(____)