Lendo blocos como um data.frame

No exemplo anterior, lemos cada bloco na função de processamento como uma matriz usando mstrsplit(). Isso funciona bem ao ler dados retangulares em que o tipo de elemento em cada coluna é o mesmo. Quando não é o caso, pode ser melhor ler os dados como um data.frame. Isso pode ser feito lendo um bloco como matriz e depois convertendo para data.frame, ou você pode usar a função dstrsplit().

Este exercício faz parte do curso

Processamento de Dados em Escala no R

Ver curso

Instruções do exercício

Na função make_msa_table(), leia cada bloco como um data.frame.
Chame chunk.apply() para ler os dados em blocos.
Obtenha as contagens totais de cada coluna somando todas as linhas.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Define the function to apply to each chunk
make_msa_table <- function(chunk) {
    # Read each chunk as a data frame
    x <- ___(chunk, col_types = rep("integer", length(col_names)), sep = ",")
    # Set the column names of the data frame that's been read
    colnames(x) <- col_names
    # Create new column, msa_pretty, with a string description of where the borrower lives
    x$msa_pretty <- msa_map[x$msa + 1]
    # Create a table from the msa_pretty column
    table(x$msa_pretty)
}

# Create a file connection to mortgage-sample.csv
fc <- file("mortgage-sample.csv", "rb")

# Read the first line to get rid of the header
readLines(fc, n = 1)

# Read the data in chunks
counts <- ___(fc, ___, CH.MAX.SIZE = 1e5)

# Close the file connection
close(fc)

# Aggregate the counts as before
___

Editar e executar o código