Reading raw data and turning it into a data structure
As mentioned before, part of what makes iotools
fast is that it separates reading data from the hard drive from converting the binary data it into a data.frame
or matrix
. Data in their binary format are copied from the hard drive into memory as raw
objects. These raw
objects are then passed to optimized functions that turn them into data.frame
or matrix
objects.
In this exercise, you'll learn how to separate reading data from the disk (using the readAsRaw()
function), and then convert the raw
binary data into a matrix
or data.frame
(using the mstrsplit()
and dstrsplit()
functions).
This is a part of the course
“Scalable Data Processing in R”
Exercise instructions
- Read
"mortgage-sample.csv"
as a raw vector. - Convert the raw vector contents to a
matrix
of integers. - Convert the raw vector contents to a
data.frame
with 16 integer columns.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Read mortgage-sample.csv as a raw vector
raw_file_content <- ___("mortgage-sample.csv")
# Convert the raw vector contents to a matrix
mort_mat <- ___(___, sep = ",", type = ___, skip = 1)
# Look at the first 6 rows
head(mort_mat)
# Convert the raw file contents to a data.frame
mort_df <- ___(___, sep = ",", col_types = rep("integer", 16), skip = 1)
# Look at the first 6 rows
head(mort_df)
This exercise is part of the course
Scalable Data Processing in R
Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.
We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.
Exercise 1: Introduction to chunk-wise processingExercise 2: Can you split-compute-combine it?Exercise 3: Foldable operations (I)Exercise 4: Foldable operations (II)Exercise 5: A first look at iotools: Importing dataExercise 6: Compare read.delim() and read.delim.raw()Exercise 7: Reading raw data and turning it into a data structureExercise 8: chunk.applyExercise 9: Reading chunks in as a matrixExercise 10: Reading chunks in as a data.frameExercise 11: Parallelizing calls to chunk.applyWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.