Using pandas to import flat files as DataFrames (2)
In the last exercise, you were able to import flat files
into a pandas DataFrame. As a bonus, it is then straightforward
to retrieve the corresponding
numpy array using the method .to_numpy(). You'll now have a chance
to do this using the MNIST dataset, which is available as digits.csv.
There are a number of arguments that pd.read_csv() takes that you'll find useful for this exercise:
nrowsallows you to specify how many rows to read from the file. For example,nrows=10will only import the first 10 rows.headeraccepts row numbers to use as the column labels and marks the start of the data. If the file does not contain a header row, you can setheader=None, andpandaswill automatically assign integer column labels starting from 0 (e.g., 0, 1, 2, …).
This exercise is part of the course
Introduction to Importing Data in Python
Exercise instructions
- Import the first 5 rows of the file into a DataFrame using the function
pd.read_csv()and assign the result todata. You'll need to use the argumentsnrowsandheader. Note that there is no header row in this file. - Build a
numpyarray from the resulting DataFrame indataand assign todata_array. - Execute
print(type(data_array))to print the datatype ofdata_array.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Assign the filename: file
file = 'digits.csv'
# Read the first 5 rows of the file into a DataFrame: data
data = ____(____, ____, ____)
# Build a numpy array from the DataFrame: data_array
data_array = ____
# Print the datatype of data_array to the shell
print(type(data_array))