Get startedGet started for free

Advanced file reading

1. Advanced file reading

In this lesson, you will look at some of the more advanced features of data table's fread().

2. Reading big integers using integer64 type

R can only represent numbers less than or equal to 2^31 - 1 = 2147483647 as type "integer". read dot csv automatically coerces numbers larger than this to numeric type. This might not be appropriate in some cases. data table therefore sets the type of such columns with large integer values to "integer64" type by default using the bit64 package. It is however possible to override the default with numeric or character types if required using the colClasses argument.

3. Specifying column class types with colClasses

If you don't want to rely on fread()'s default column guessing, colClasses argument can be used to override the column types. colClasses can be a named or unnamed vector of column classes similar to read dot csv.

4. Specifying column class types with colClasses

If named, column classes are assigned to column names provided before parsing. If unnamed, first column is parsed using the first class, second with second class etc.

5. Specifying column class types with colClasses

In addition, you can also provide a named list of vectors where names correspond to the column class and values correspond to the column names or numbers. This is particularly useful when there are too many columns with a limited number of column types. In this example, instead of specifying "numeric" four times corresponding to the first four columns, you can specify numeric = 1:4 to parse the first four columns as numeric type.

6. The fill argument

When reading files with incomplete columns in a file, it is not always possible to parse them unambiguously. The "fill" argument can be used in these cases to explicitly direct fread() to fill the missing entries. In the first example, you can see that fread() has some trouble reading the data correctly. This is because fill is set to FALSE by default.

7. The fill argument

When you set fill to TRUE, fread() can parse the data properly and it fills empty values with empty strings. Empty values for integer, logical and numeric types are filled with NA.

8. The na.strings argument

Not all files encode missing values in the same way. You can use the "na-dot-strings" argument to parse all such values as NAs. na-dot-strings accepts a character vector of values that are replaced with NAs. Since this is done while parsing, it is very memory efficient as well as very fast.

9. Let's practice!

Now it's your turn to import more exotic files using fread().

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.