Get startedGet started for free

Importing Data with rxImport Function

Let's go ahead and import some data to the native xdf format. Declare the paths to the files you want to read and write, and use the rxImport command to import the airline data.

This exercise is part of the course

Big Data Analysis with Revolution R Enterprise

View Course

Exercise instructions

The first step in this problem is to declare the file paths that point to where the data is being stored on the server.

The file.path function constructs a path to a file, and in this exercise, we will use this function to define the location of the csv file that we would like to import. As arguments to file.path, we pass it the directory where the data lives, and then the basename of the file we would like to import. We can use rxGetOption("sampleDataDir") to get the appropriate directory. In the exercise, fill in the appropriate filename for the csv file that contains the appropriate airline dataset. In the video, we used the full dataset associated with 2007, but in this example, we will just use a small subsample of approximately 2.5% of those observations. These are available in the file "2007_subset.csv".

Next, let's import the data. The rxImport function has the following syntax:

  • rxImport(inData, outFile, overwrite = TRUE)

The inData argument is the file we want to convert, so it should be assigned the csv file path. The outFile argument corresponds to the imported file we want to create, so it should be defined as the xdf file path.

If we specify overwrite as TRUE, any existing data in the output file will be overwritten by the results from this process. You should take extra care when setting this argument to TRUE!

Once you have run the rxImport() command, you can run list.files() to make sure your xdf file has been created!

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Declare the file paths for the csv and xdf files
myAirlineCsv <- file.path(rxGetOption("___"), "2007_subset.csv")
myAirlineXdf <- "2007_subset.xdf"

# Use rxImport to import the data into xdf format
rxImport(___ = myAirlineCsv, outFile = ___, overwrite = TRUE)
list.files()
Edit and Run Code