MulaiMulai sekarang secara gratis

Reading chunks in as a data.frame

In the previous example, we read each chunk into the processing function as a matrix using mstrsplit(). This is fine when we are reading rectangular data where the type of element in each column is the same. When it's not, we might like to read the data in as a data.frame. This can be done by either reading a chunk in as a matrix and then convert it to a data.frame, or you can use the dstrsplit() function.

Latihan ini adalah bagian dari kursus

Scalable Data Processing in R

Lihat Kursus

Petunjuk latihan

  • In the function make_msa_table(), read each chunk as a data frame.
  • Call chunk.apply() to read in the data as chunks.
  • Get the total counts for each column by adding all the rows.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Define the function to apply to each chunk
make_msa_table <- function(chunk) {
    # Read each chunk as a data frame
    x <- ___(chunk, col_types = rep("integer", length(col_names)), sep = ",")
    # Set the column names of the data frame that's been read
    colnames(x) <- col_names
    # Create new column, msa_pretty, with a string description of where the borrower lives
    x$msa_pretty <- msa_map[x$msa + 1]
    # Create a table from the msa_pretty column
    table(x$msa_pretty)
}

# Create a file connection to mortgage-sample.csv
fc <- file("mortgage-sample.csv", "rb")

# Read the first line to get rid of the header
readLines(fc, n = 1)

# Read the data in chunks
counts <- ___(fc, ___, CH.MAX.SIZE = 1e5)

# Close the file connection
close(fc)

# Aggregate the counts as before
___
Edit dan Jalankan Kode