Creating New Variables using rxDataStep
Use the function rxDataStep() function to compute a new variable, airspeed. Air speed should be the distance traveled, divided by the time that has elapsed.
- rxDataStep() provides a function for transforming variables.
This exercise is part of the course
Big Data Analysis with Revolution R Enterprise
Exercise instructions
The rxDataStep function is structured as follows:
- rxDataStep(inData, outFile, varsToKeep, transforms, append, overwrite)
Let's go over each of these arguments:
- inData - specifies the dataset from which we extract variables.
- outFile - specifies the output dataset that we want to create (This can be the same as inData, see append and overwrite below).
- varsToKeep - the variables to be read from the inData file.
- transforms - the (simple) transformation to compute.
- append - either "none" to create a new files, "rows" to append rows to an existing file, or "cols" to append columns to an existing file.
- overwrite - Logical value to specify whether the output should be overwritten if it already exists.
Use rxDataStep() to create a variable airspeed.
Once you have done this, then use rxGetInfo(), rxSummary(), and rxHistogram() to get some information about the new variable. When using rxGetInfo(), use the varsToKeep argument to only extract information about the new variable. For rxSummary() and rxHistogram(), remember that you specify a formula as the first argument.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## Calculate an additional variable: airspeed (distance traveled / time in the air).
rxDataStep(inData = ___,
outFile = ___,
varsToKeep = c("___", "___"),
transforms = list(airSpeed = ___ / ___),
append = "___",
overwrite = ___)
# Get Variable Information for airspeed
rxGetInfo(data = myAirlineXdf,
getVarInfo = TRUE,
varsToKeep = "___")
# Summary for the airspeed variable
rxSummary(___,
data = myAirlineXdf)
# Construct a histogtam for airspeed
# We can use the xAxisMinMax argument to limit the X-axis.
rxHistogram(___,
___ = myAirlineXdf
)
rxHistogram(___,
___ = myAirlineXdf,
xNumTicks = 10,
numBreaks = 1500,
xAxisMinMax = c(0,12)
)