Using pandas to import flat files as DataFrames (2)

In the last exercise, you were able to import flat files into a pandas DataFrame. As a bonus, it is then straightforward to retrieve the corresponding numpy array using the method .to_numpy(). You'll now have a chance to do this using the MNIST dataset, which is available as digits.csv.

There are a number of arguments that pd.read_csv() takes that you'll find useful for this exercise:

nrows allows you to specify how many rows to read from the file. For example, nrows=10 will only import the first 10 rows.
header accepts row numbers to use as the column labels and marks the start of the data. If the file does not contain a header row, you can set header=None, and pandas will automatically assign integer column labels starting from 0 (e.g., 0, 1, 2, …).

Import the first 5 rows of the file into a DataFrame using the function pd.read_csv() and assign the result to data. You'll need to use the arguments nrows and header. Note that there is no header row in this file.
Build a numpy array from the resulting DataFrame in data and assign to data_array.
Execute print(type(data_array)) to print the datatype of data_array.

script.py

IPython Shell

Introduction and flat files

Importing data from other file types

Working with relational databases in Python

Exercise

Exercise

Using pandas to import flat files as DataFrames (2)

Instructions