1. Reading sheets
Now imagine this situation.
2. Scenario
You're working in a big company that uses Excel for all there analysis work. You are forced to work on and deliver these Excel files, but you want to work on the data through R, so that you can work in a reproducible way? I'm happy to tell you that there is a package that does just that:
3. XLConnect
XLConnect, written and maintained by Martin Studer. He created one of the most comprehensive packages for working with Excel files through R.
You can think of XLConnect as a bridge between Excel and R. That means you can do practically any action you could do within Excel but you do it from inside R. Editing Excel sheets, formatting data and adapting entire calculation sheets, you name it. XLConnect has a function for it. XLConnect works with xls and xlsx files, and has easy-to-use functions.
To get this all working so smoothly, XLConnect depends on Java. This is totally abstracted away for us end-users, but installing the package can have its difficulties.
4. Installation
If you're starting from a reasonably clean computing environment, this traditional install-dot-packages command will work fine. From the messaging, you'll see that it also installs the XLConnectJars package containing Java files and class definitions that XLConnect depends on. If it wasn't installed already, the rJava package will also be installed, providing a low-level R to Java interface that XLConnect uses.
If something goes wrong during installation, it's possible that you first have to install the Java Development Kit, or JDK, from Oracle's web site. If things still don't work out, I suggest you google the errors you're getting: there's quite some people using this package so help is never far away.
With our package installed and not to forget, loaded,
5. loadWorkbook()
let's take the first step: loading a workbook into R. You do this with the loadWorkbook function, by simply passing the name of the excel file you want to interface to. Assuming that our cities-dot-xlsx file is still in the current working directory, this call works.
If you have a look at the structure of book, we see that it is a so-called workbook object. This object is the actual "bridge" between R and Excel I talked about earlier.
After building a workbook object in R, you can use it to get information on the Excel file it links to. To get the names of the different sheets,
6. getSheets()
for example, you can use getSheets.
The result is exactly the same to the excel_sheets function from readxl: a character vector containing the two sheet names.
Apart from sheet information,
7. readWorksheet()
you can also read the actual data from the sheets, like readxl's read_excel function and gdata's read dot xls function. Suppose we want to import the data from the year_2000 sheet as a data frame. As the first argument to readWorksheet, we pass the workbook object, book in our case. The second argument, sheet, is the name or index of the sheet you want to import from.
Works just like before. The cool thing here is, that you can easily specify from which row and which column to start reading information.
8. readWorksheet()
Say you only want the population information for Berlin and Madrid. You can simply set startRow to 3, endRow to 4 and startCol to 2. Because you skipped the first row, the column names are also skipped, so you should set header to FALSE.
9. Let's practice!
Before we take the next step, let's train a bit with some exercises.