readxl (2)

1. readxl (2)

Next to the path and sheet arguments that we discussed in the previous video,

2. read_excel()

there are also other arguments you can specify in the read_excel function. Have a look at this documentation; can you tell the meaning of these arguments? path and sheets have no more secrets for you.

3. read_excel() - col_names

Then, there's col_names, which can be three things: TRUE, the default, which means what the first row of the Excel sheet contains the column names. It can also be FALSE, in which case R comes up with its own names. Finally, you can also set it to a character vector specifying the column names. Excel can contain different data types, such as text, numerics and dates. How this converts to R classes is controlled in the col_types argument:

4. read_excel() - col_types

By default it is NULL, which has R guess the data types of the different columns. But you can also manually specify the col_types. Suppose you want to import all the columns of the first sheet as text. This command will do the trick: The column classes we can see in the printout reveal that indeed, the pop_data variable contains two columns that are both character now, perfect. Other keywords to enforce a type are "numeric", "date" and "blank". "numeric" and "date" are straightforward, but what about "blank"?

5. read_excel() - col_types

Well, if you use blank, you're simply ignoring that column. If we adapt the previous call, like this, you'll see that the population column is missing. This "blank" keyword can come in handy when you have an Excel file with a lot of columns and you only need some of them in R.

6. read_excel() - skip

Next, there's the skip argument. It specifies the number of rows in your excel sheet R has to skip before actually importing the data. Let's say the first two rows in the first sheet of cities-dot-xlsx are not necessary for our analysis. To ignore them, we can simply set the skip argument to 2 and read_excel will not import these lines. we'll also have to set the col_names argument, because the first row with the column names is skipped as well. If we try out this code, indeed 2 rows were skipped; the row containing the column names and the first observations. That leaves us with the last three observations.

7. Wrap-up

Actually, the excel_sheets and read_excel are the only functions that are available in the readxl package. These two functions and the customization possibilities are all you need to get started with your excel data in R. On top of all that, readxl is extremely fast. You might have recognized many of the arguments of the readr package of the previous chapter. Hadley Wickham made reading data from different data sources pretty consistent. This can only make your job of importing data easier, right?

8. Let's practice!

Get some practice with it in the exercises!

This exercise is part of the course

Introduction to Importing Data in R

BeginnerSkill Level

4.8+

Start Course for Free

A lot of data comes in the form of flat files: simple tabular text files. Learn how to import the common formats of flat file data with base R functions.

Exercise 1: Introduction & read.csv Exercise 2: read.csv Exercise 3: Reading from a path Exercise 4: read.delim & read.table Exercise 5: read.delim Exercise 6: Assigning column names Exercise 7: Column classes Exercise 8: read.table Exercise 9: Final Thoughts

In addition to base R, there are dedicated packages to easily and efficiently import flat file data. We'll talk about two such packages: readr and data.table.

Exercise 1: readr: read_csv & read_tsv Exercise 2: read_csv Exercise 3: read_tsv Exercise 4: readr: read_delim Exercise 5: read_delim Exercise 6: skip and n_max Exercise 7: col_types Exercise 8: col_types with collectors Exercise 9: data.table: fread Exercise 10: fread Exercise 11: fread: more advanced use Exercise 12: Dedicated classes

Excel is a widely used data analysis tool. If you prefer to do your analyses in R, though, you'll need an understanding of how to import .csv data into R. This chapter will show you how to use readxl to do so.

Exercise 1: readxl (1)Exercise 2: List the sheets of an Excel file Exercise 3: Import an Excel sheet Exercise 4: Reading a workbook Exercise 5: readxl (2)

Current Exercise

Exercise 6: The col_names argument Exercise 7: The skip argument

Beyond importing data from Excel, you can take things one step further with XLConnect. Learn all about it and bridge the gap between R and Excel.

Exercise 1: Reading sheets Exercise 2: Connect to a workbook Exercise 3: List and read Excel sheets Exercise 4: Customize readWorksheet Exercise 5: Adapting sheets Exercise 6: Add worksheet Exercise 7: Populate worksheet Exercise 8: Renaming sheets Exercise 9: Removing sheets