Reading a .tsv file
In the previous exercise, you may have noticed this message when you called read_csv()
:
Parsed with column specification:
cols(
weight = col_integer(),
feed = col_character()
)
This occurs every time a readr
function is called to import data. In fact, all readr
functions that import data have the argument col_type
, which allows for custom column specifications. The message shows how the col_type
argument was specified by default. Notice the column specification is created by cols()
and in it are the names of the columns and some col_*()
functions that tell R to import a certain column as *
. For example, the weight
column was imported as integer
and the feed
column as character
, by default. One special col_*()
function is col_skip()
, which tells R to skip that column when importing the data.
All readr
functions that import data also have the argument col_names
, which is used when you want to name the columns differently from what's in the data file. This argument takes in either TRUE
, FALSE
, or a character vector of column names. If set equal to TRUE
, the first row of the input will be used as the column names. If FALSE
, column names will be generated automatically: X1
, X2
, X3
, etc. If col_names
is a character vector, the values will be used as the names of the columns and the first row of the input will be read into the first row of the output data frame.
These arguments are useful when you know you want to import certain columns of the data as certain types with certain names. The readr
package does a great job guessing what each column type and name should be, but it's important to know that you can also customize this further with the col_names
and col_type
arguments.
In this exercise, you’ll import a set of data on professors’ salaries called Salaries.tsv
with read_tsv()
, another readr
function that imports files with tab-separated values. This time, you’ll also provide custom column specifications when you're reading in the data.
This exercise is part of the course
Reading Data into R with readr
Exercise instructions
In this exercise and all following, the readr
package will be preloaded in your workspace so you don't need to load it yourself with library(readr)
.
- Use the
read_tsv()
function to read in theSalaries.tsv
file with a customcols()
specification that tells R to autogenerate column names and skip columnsX2
,X3
, andX4
. Store the result in an object calledsalaries
. - View the
head()
ofsalaries
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## readr is loaded
# Import data: salaries
salaries <- read_tsv(___, col_names = ___,
col_types = cols(
X2 = ___,
X3 = ___,
X4 = ___,
))
# View first six rows of salaries
head(salaries)