Get startedGet started for free

Importing flat files using NumPy

1. Importing flat files using NumPy

Okay so you now know how to use Python’s built-in open function to open text files. What if you now want to import a flat file and assign it to a variable? If all the data are numerical, you can use the package numpy to import the data as a numpy array. Why would we want to do this?

2. Why NumPy?

First off, numpy arrays are the Python standard for storing numerical data. They are efficient, fast and clean.

3. Why NumPy?

Secondly, numpy arrays are often essential for other packages, such as scikit-learn, a popular Machine Learning package for Python. Numpy itself has a number of built-in functions that make it far easier and more efficient for us to import data as arrays. Enter the NumPy functions loadtxt and genfromtxt.

4. Importing flat files using NumPy

To use either of these we first need to import NumPy. We then call loadtxt and pass it the filename as the first argument, along with the delimiter as the 2nd argument. Note that the default delimiter is any white space so we’ll usually need to specify it explicitly.

5. Customizing your NumPy import

There are a number of additional arguments you may wish to specify. If, for example, your data consists of numerics and your header has strings in it, such as in the MNIST digits data, you will want to skip the first row by calling loadtxt with the argument skiprows equals 1; if you want only the 1st and 3rd columns of the data,

6. Customizing your NumPy import

you’ll want to set usecols equals the list containing ints 0 and 2.

7. Customizing your NumPy import

You can also import different datatypes into NumPy arrays: for example, setting the argument dtype equals 'str' will ensure that all entries are imported as strings. loadtxt is great for basic cases, but tends to break down when we have

8. Mixed datatypes

mixed datatypes, for example,

9. Mixed datatypes

columns consisting of floats AND columns consisting of strings, such as we saw in the Titanic dataset.

10. Let's practice!

Now it's your turn to have fun with loadtxt. You'll also gain hands-on experience with other functions that can handle mixed datatypes. In the next video we’ll see that, although NumPy arrays can handle data of mixed types, the natural place for such data really is the dataframe.