Get startedGet started for free

Creating New Variables using rxDataStep

Use the function rxDataStep() function to compute a new variable, airspeed. Air speed should be the distance traveled, divided by the time that has elapsed.

  • rxDataStep() provides a function for transforming variables.

This exercise is part of the course

Big Data Analysis with Revolution R Enterprise

View Course

Exercise instructions

The rxDataStep function is structured as follows:

  • rxDataStep(inData, outFile, varsToKeep, transforms, append, overwrite)

Let's go over each of these arguments:

  • inData - specifies the dataset from which we extract variables.
  • outFile - specifies the output dataset that we want to create (This can be the same as inData, see append and overwrite below).
  • varsToKeep - the variables to be read from the inData file.
  • transforms - the (simple) transformation to compute.
  • append - either "none" to create a new files, "rows" to append rows to an existing file, or "cols" to append columns to an existing file.
  • overwrite - Logical value to specify whether the output should be overwritten if it already exists.

Use rxDataStep() to create a variable airspeed.

Once you have done this, then use rxGetInfo(), rxSummary(), and rxHistogram() to get some information about the new variable. When using rxGetInfo(), use the varsToKeep argument to only extract information about the new variable. For rxSummary() and rxHistogram(), remember that you specify a formula as the first argument.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

## Calculate an additional variable: airspeed (distance traveled / time in the air). 
rxDataStep(inData = ___, 
         outFile = ___, 
         varsToKeep = c("___", "___"),
	       transforms = list(airSpeed = ___ / ___),
         append = "___",
         overwrite = ___)

# Get Variable Information for airspeed
rxGetInfo(data = myAirlineXdf, 
          getVarInfo = TRUE,
          varsToKeep = "___")

# Summary for the airspeed variable
rxSummary(___, 
          data = myAirlineXdf)

# Construct a histogtam for airspeed
# We can use the xAxisMinMax argument to limit the X-axis.
rxHistogram(___, 
            ___ = myAirlineXdf
            )

rxHistogram(___, 
            ___ = myAirlineXdf,
            xNumTicks = 10,
            numBreaks = 1500,
            xAxisMinMax = c(0,12)
            )
Edit and Run Code