1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Importing Data in Python

Connected

Exercise

Using pandas to import flat files as DataFrames (2)

In the last exercise, you were able to import flat files into a pandas DataFrame. As a bonus, it is then straightforward to retrieve the corresponding numpy array using the method .to_numpy(). You'll now have a chance to do this using the MNIST dataset, which is available as digits.csv.

There are a number of arguments that pd.read_csv() takes that you'll find useful for this exercise:

  • nrows allows you to specify how many rows to read from the file. For example, nrows=10 will only import the first 10 rows.
  • header accepts row numbers to use as the column labels and marks the start of the data. If the file does not contain a header row, you can set header=None, and pandas will automatically assign integer column labels starting from 0 (e.g., 0, 1, 2, …).

Instructions

100 XP
  • Import the first 5 rows of the file into a DataFrame using the function pd.read_csv() and assign the result to data. You'll need to use the arguments nrows and header. Note that there is no header row in this file.
  • Build a numpy array from the resulting DataFrame in data and assign to data_array.
  • Execute print(type(data_array)) to print the datatype of data_array.