Chunks inlezen als een data.frame

In het vorige voorbeeld lazen we elke chunk in de verwerkingsfunctie in als een matrix met mstrsplit(). Dit is prima wanneer we rechthoekige data lezen waarbij het elementtype in elke kolom hetzelfde is. Als dat niet zo is, wil je de data misschien als een data.frame inlezen. Dat kan door een chunk eerst als matrix in te lezen en die daarna om te zetten naar een data.frame, of je gebruikt de functie dstrsplit().

Deze oefening maakt deel uit van de cursus

Schaalbare gegevensverwerking in R

Cursus bekijken

Oefeninstructies

Lees in de functie make_msa_table() elke chunk in als een data frame.
Roep chunk.apply() aan om de data in chunks in te lezen.
Krijg de totale tellingen per kolom door alle rijen op te tellen.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Define the function to apply to each chunk
make_msa_table <- function(chunk) {
    # Read each chunk as a data frame
    x <- ___(chunk, col_types = rep("integer", length(col_names)), sep = ",")
    # Set the column names of the data frame that's been read
    colnames(x) <- col_names
    # Create new column, msa_pretty, with a string description of where the borrower lives
    x$msa_pretty <- msa_map[x$msa + 1]
    # Create a table from the msa_pretty column
    table(x$msa_pretty)
}

# Create a file connection to mortgage-sample.csv
fc <- file("mortgage-sample.csv", "rb")

# Read the first line to get rid of the header
readLines(fc, n = 1)

# Read the data in chunks
counts <- ___(fc, ___, CH.MAX.SIZE = 1e5)

# Close the file connection
close(fc)

# Aggregate the counts as before
___

Code bewerken en uitvoeren