Get Started

Writing a generator to load data in chunks (2)

In the previous exercise, you processed a file line by line for a given number of lines. What if, however, you want to do this for the entire file?

In this case, it would be useful to use generators. Generators allow users to lazily evaluate data. This concept of lazy evaluation is useful when you have to deal with very large datasets because it lets you generate values in an efficient manner by yielding only chunks of data at a time instead of the whole thing at once.

In this exercise, you will define a generator function read_large_file() that produces a generator object which yields a single line from a file each time next() is called on it. The csv file 'world_dev_ind.csv' is in your current directory for your use.

Note that when you open a connection to a file, the resulting file object is already a generator! So out in the wild, you won't have to explicitly create generator objects in cases such as this. However, for pedagogical reasons, we are having you practice how to do this here with the read_large_file() function. Go for it!

This is a part of the course

“Python Toolbox”

View Course

Exercise instructions

  • In the function read_large_file(), read a line from file_object by using the method readline(). Assign the result to data.
  • In the function read_large_file(), yield the line read from the file data.
  • In the context manager, create a generator object gen_file by calling your generator function read_large_file() and passing file to it.
  • Print the first three lines produced by the generator object gen_file using next().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define read_large_file()
def read_large_file(file_object):
    """A generator function to read a large file lazily."""

    # Loop indefinitely until the end of the file
    while True:

        # Read a line from the file: data
        data = ____

        # Break if this is the end of the file
        if not data:
            break

        # Yield the line of data
        
        
# Open a connection to the file
with open('world_dev_ind.csv') as file:

    # Create a generator object for the file: gen_file
    gen_file = ____

    # Print the first three lines of the file
    print(____)
    print(____)
    print(____)

This exercise is part of the course

Python Toolbox

IntermediateSkill Level
4.5+
104 reviews

Continue to build your modern Data Science skills by learning about iterators and list comprehensions.

This chapter will allow you to apply your newly acquired skills toward wrangling and extracting meaningful information from a real-world dataset—the World Bank's World Development Indicators. You'll have the chance to write your own functions and list comprehensions as you work with iterators and generators to solidify your Python chops.

Exercise 1: Welcome to the case study!Exercise 2: Zipping dictionariesExercise 3: Writing a function to help youExercise 4: Using a list comprehensionExercise 5: Turning this all into a DataFrameExercise 6: Using Python generators for streaming dataExercise 7: Processing data in chunks (1)Exercise 8: Writing a generator to load data in chunks (2)
Exercise 9: Writing a generator to load data in chunks (3)Exercise 10: Using pandas' read_csv iterator for streaming dataExercise 11: Writing an iterator to load data in chunks (1)Exercise 12: Writing an iterator to load data in chunks (2)Exercise 13: Writing an iterator to load data in chunks (3)Exercise 14: Writing an iterator to load data in chunks (4)Exercise 15: Writing an iterator to load data in chunks (5)Exercise 16: Final thoughts

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free