Using pandas to import flat files as DataFrames (2)
In the last exercise, you were able to import flat files
into a pandas
DataFrame. As a bonus, it is then straightforward
to retrieve the corresponding
numpy
array using the method .to_numpy()
. You'll now have a chance
to do this using the MNIST dataset, which is available as digits.csv
.
There are a number of arguments that pd.read_csv()
takes that you'll find useful for this exercise:
nrows
allows you to specify how many rows to read from the file. For example,nrows=10
will only import the first 10 rows.header
accepts row numbers to use as the column labels and marks the start of the data. If the file does not contain a header row, you can setheader=None
, andpandas
will automatically assign integer column labels starting from 0 (e.g., 0, 1, 2, …).
This exercise is part of the course
Introduction to Importing Data in Python
Exercise instructions
- Import the first 5 rows of the file into a DataFrame using the function
pd.read_csv()
and assign the result todata
. You'll need to use the argumentsnrows
andheader
. Note that there is no header row in this file. - Build a
numpy
array from the resulting DataFrame indata
and assign todata_array
. - Execute
print(type(data_array))
to print the datatype ofdata_array
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Assign the filename: file
file = 'digits.csv'
# Read the first 5 rows of the file into a DataFrame: data
data = ____(____, ____, ____)
# Build a numpy array from the DataFrame: data_array
data_array = ____
# Print the datatype of data_array to the shell
print(type(data_array))