Comparing read times of CSV and RDS files
One of the most common tasks we perform is reading in data from CSV files. However, for large CSV files this can be slow.
One neat trick is to read in the data and save as an R binary file (rds
) using saveRDS()
.
To read in the rds
file, we use readRDS()
.
Note: Since rds
is R's native format for storing single objects, you have not introduced any third-party dependencies that may change in the future.
To benchmark the two approaches, you can use system.time()
.
This function returns the time taken to evaluate any R expression. For example, to time how long it takes to calculate the square root of the numbers from one to ten million, you would write the following:
system.time(sqrt(1:1e7))
This exercise is part of the course
Writing Efficient R Code
Exercise instructions
The files "movies.csv"
and "movies.rds"
both contain identical data frames with information on 45,000 movies.
- Using the
system.time()
function, how long does it take to read in the CSV file usingread.csv("movies.csv")
. - Repeat for the rds file,
"movies.rds"
usingreadRDS()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# How long does it take to read movies from CSV?
system.time(read.csv(___))
# How long does it take to read movies from RDS?
___