Get startedGet started for free

Using pandas to import flat files as DataFrames (2)

In the last exercise, you were able to import flat files into a pandas DataFrame. As a bonus, it is then straightforward to retrieve the corresponding numpy array using the method .to_numpy(). You'll now have a chance to do this using the MNIST dataset, which is available as digits.csv.

There are a number of arguments that pd.read_csv() takes that you'll find useful for this exercise:

  • nrows allows you to specify how many rows to read from the file. For example, nrows=10 will only import the first 10 rows.
  • header accepts row numbers to use as the column labels and marks the start of the data. If the file does not contain a header row, you can set header=None, and pandas will automatically assign integer column labels starting from 0 (e.g., 0, 1, 2, …).

This exercise is part of the course

Introduction to Importing Data in Python

View Course

Exercise instructions

  • Import the first 5 rows of the file into a DataFrame using the function pd.read_csv() and assign the result to data. You'll need to use the arguments nrows and header. Note that there is no header row in this file.
  • Build a numpy array from the resulting DataFrame in data and assign to data_array.
  • Execute print(type(data_array)) to print the datatype of data_array.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Assign the filename: file
file = 'digits.csv'

# Read the first 5 rows of the file into a DataFrame: data
data = ____(____, ____, ____)

# Build a numpy array from the DataFrame: data_array
data_array = ____

# Print the datatype of data_array to the shell
print(type(data_array))
Edit and Run Code