Get startedGet started for free

Input data

1. Input data

In the previous chapter, we learned how to perform core TensorFlow operations. In this chapter, we will work towards training a linear model with TensorFlow.

2. Using data in TensorFlow

So far, we've only generated data using functions like ones and random uniform; however, when we train a machine learning model, we will want to import data from an external source. This may include numeric, image, or text data. Beyond simply importing the data, numeric data will need to be assigned a type, and text and image data will need to be converted to a usable format.

3. Importing data for use in TensorFlow

External datasets can be imported using TensorFlow. While this is useful for complex data pipelines, it will be unnecessarily complicated for what we do in this chapter. For that reason, we will use simpler options to import data. We will then convert the data into an NumPy array, which we can use without further modification in TensorFlow.

4. How to import and convert data

Let's start by importing numpy under the alias np and pandas under the alias pd. We will then read housing transaction data from kc_housing.csv using the pandas method read csv and assign it to a dataframe called housing. When you are ready to train a model, you will want to convert the data into a numpy array by passing the pandas dataframe, housing, to np array. We will focus on loading data from csv files in this chapter, but you can also use pandas to load data from other formats, such as json, html, and excel.

5. Parameters of read_csv()

Let's take a closer look at the read csv method of pandas, since you will use it frequently to import data. In the code block, we filled in the only required parameter, which was the filepath or buffer. Note that you could have instead supplied a URL, rather than a filepath to load your data. Another important parameter is sep, which is the delimiter that separates columns in your dataset. By default, this will be a comma; however, other common choices are semi-colons and tabs. Note that if you do use whitespace as a delimiter, you will need to set the delim whitespace parameter to true. Finally, if you are working with datasets that contain non-ASCII characters, you can specify the appropriate choice of encoding, so that your characters are correctly parsed.

6. Using mixed type datasets

Finally, we will end this lesson by talking about how to transform imported data for use in TensorFlow. We will use housing data from King County, Washington as an example. Notice how the dataset contains columns with different types. One column contains data on house prices in a floating point format. Another column is a boolean variable, which can either be true, 1, or false, 0. In this case, a 1 indicates that a property is located on the waterfront.

7. Setting the data type

Let's say we want to perform TensorFlow operations that require price to be a 32-bit floating point number and waterfront to be a boolean. We can do this in two ways. The first approach uses the array method from numpy. We select the relevant column in the DataFrame, provide it as the first argument to array, and then provide the datatype as the second argument.

8. Setting the data type

The second approach uses the cast operation from TensorFlow. Again, we supply the data first and the data type second. While either tf cast or np array will work, waterfront will be a tf dot Tensor type under the former option and a numpy array under the latter.

9. Let's practice!

You now know how to load data and set its data type. Let's put that to use in some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.