Get startedGet started for free

foreign

1. foreign

You already know how to import your data using Wickham's haven package, great!

2. foreign

However, I also told you that there is another alternative, the foreign package, written by the R core team. Although it's somewhat less consistent in naming and use, it's a very comprehensive tool that can work with all kinds of foreign data formats. Apart from importing SAS, STATA and SPSS files, it can also handle even more exotic formats, from Systat and Weka for example. It's also able to export data again to various formats. I'll only discuss importing SAS, STATA and SPSS data though. Before I get to it, let me install and load the foreign package.

3. SAS

Let's start with SAS first. Well, here the first drawback of foreign in comparison to haven emerges. Foreign cannot import single SAS data files, such as dot sas7bdat files. With foreign only so-called SAS libraries can be read. These libraries usually are of the format dot-xport. If you are really looking for an alternative to haven here, you can check out a package called sas7bdat.

4. STATA

When it comes to STATA, foreign can be used to read dot dta files of Stata versions 5 to 12 today. You can do this with the read dot dta function. As you saw before, the R core team packages such as utils and this foreign package, use dots in their function names, while Wickham's packages use underscores. Have a look at this simplified usage of the read dot dta function. As you probably expected you first have to define a file path. This could be a local file or a URL.

5. read.dta()

Basically, this is already sufficient to import a data set, as this call to import the US airlines punctuality data set shows.

6. read.dta()

Have a look at the structure of this data frame. The Airline variable is already a factor. This is because the convert dot factors argument of the read dot dta function is TRUE by default. This then automatically creates factors from labelled STATA values. This is something you had to do in the haven package manually with as_factor, remember?

7. read.dta() - convert.factors

Let's see what happens if we set convert dot factors to FALSE. The Airline column is now integer. Is this information on Airlines lost then? Not at all. Notice all the information that is stored in the data frame's attributes. From the version attribute, for example, you can tell that we're dealing with a STATA 7 file. The label dot table attribute, contains a mapping between the integer airline codes and their actual names. To work with the dataset easily, you'll want to stick to the default argument of convert dot factors though, which is TRUE.

8. read.dta() - more arguments

Similar to convert dot factors, there's also convert dot dates to specify whether you want STATA time and date information to be converted to R Date and POSIXct objects. As this is something you'll typically want to do, the defaults here TRUE. Finally, I also wanted to mention the missing dot type argument. If you're familiar with STATA 8 and later, you'll know that there is support for different types of missing values, 27 of them to be precise. In R, there's only one type of missing values, NA. If you set the missing type argument to FALSE, all these different missing values are converted to NA. If it's set to TRUE, a list with information on how different values for different variables are missing are included in the attributes of the returned data frame.

9. SPSS

Importing SPSS files with foreign works quite the same. This time though, you'll need the read dot spss function. Not very surprising is it? Have another look at a trimmed down version of its usage. As usual, the file path comes first. For the rest, all argument names are different when you compare to the read dta function from before. This is what I meant with not really consistent before 'use dot value dot labels', which is TRUE by default, specifies whether variables that are labelled vectors in SPSS should be converted to R factors. This argument thus is similar to the the convert dot factor argument from read data. The 'to dot data dot frame' argument tells R whether or not to return the SPSS data as a data frame. Strangely, it's FALSE by default, which has foreign build a list containing all different columns. But you already know that a data frame is simply a special kind of list, so the difference is not that big. Next to these two arguments, there are many more, such as trim factor names, trim values and use missings. Their purpose is often similar to what you've seen for the read dot dta function, but not always. Foreign aims at a specific treatment of different types of data files. This does not benefit the consistency, but provides full control over how actually data files are imported. To learn more about importing data with foreign, you can always consult the documentation. But save that for later,

10. Let's practice!

first head over to the interactive exercises to get some practice. There you'll see how different arguments influence the way the data is imported.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.