Get startedGet started for free

SQL Queries from inside R

1. SQL Queries from inside R

So, connecting to and importing from a database is something you're familiar with by now.

2. dbReadTable()

However, with dbReadTable,

3. dbReadTable()

you're importing an entire table. For the company example that had little data, that's not a problem, but what if you're dealing with a database that contains tables with millions of records? You'd have to import the entire table into R before you can do an analysis that might only need a fraction of this data.

4. dbReadTable()

If there was a way to perform a lot of the data selection work

5. dbReadTable()

on the database side,

6. dbReadTable()

you only have to import those elements that you actually need inside R. This makes much more sense, doesn't it?

7. Selective importing

I am glad to tell you that this is possible, all from inside R! Remember that relational databases typically use SQL as the language for querying? Well, you'll be writing so called SQL Queries to retrieve data based on specific criteria. You can send these queries through R functions specified by the DBI package and implemented by an R package that depends on the database you're using. As we're working with MySQL databases, that's RMySQL here. Writing SQL queries is a entire topic of its own, so I will only treat some basic examples so that you got the idea.

8. company

Let's have a look at the company database again, that contained informations on sales of telecom products made by different employees.

9. company

Suppose that we want to have the names of the employees that started after the first of september in 2012. How to go about this?

10. Load package and connect

In any case, we have to start with loading the DBI package, and creating a connection to the company database. This is an actual database, so you can try the code in this video yourself!

11. Example 1

To solve the task at hand, you can read the entire employees table, and then subset it using the subset function. But there is another way. Have a look at this call of the dbGetQuery function. The result is exactly the same. The strange syntax inside the string here, is actually a very common SQL query, that uses three SQL keywords: SELECT, FROM, and WHERE. The SELECT keyword specifies which column to select, and corresponds to the select argument in the subset function. The FROM keyword specifies which table you want to get data from, and corresponds to the first argument in the subset function. Finally, the WHERE keyword specifies a condition that a record in the table has to meet. If you read this sentence out loud with some additions here and there, it sounds pretty natural: "Select the name column from the employees table, where the started_at field is greater than the first of september in 2012." The first approach and the second approach might seem similar, but the conceptual difference is huge.

12. Example 1

In the first case,

13. Example 1

you're importing the entire employees table,

14. Example 1

and then do some subsetting in R. In the second case,

15. Example 1

you're sending a SQL query to the database,

16. Example 1

this query is run on the database side, and

17. Example 1

only the results are imported into R. If you're dealing with huge tables, the second approach is way more efficient. Let's try another example.

18. company

Suppose you want to select the products that imply a contract,

19. company

so where contract is one. For these products, we're interested in all variables. The old approach of reading the entire table and then subsetting

20. Example 2

would look like this. The new approach of sending an SQL query to the database and fetching the result, looks like this. The results are again exactly the same. Notice how the star after the SELECT keyword specifies to keep all columns from the products table. Also notice that in the SQL query, you have to use a single equals sign to specify a condition for the WHERE keyword, instead of a double equals sign, like you're used to in R.

21. Let's practice!

After this crash course on SQL syntax, let's see how you perform in some exercises. Good luck!