Limitations of bigmemory
1. Limitations of bigmemory
2. Where can you use bigmemory?
The bigmemory package is useful when your data are represented as a dense, numeric matrix and you can store the entire matrix on your hard drive. It is also compatible with optimized, low-level linear algebra libraries written in C, like Intel's Math Kernel Library. So, you can use bigmemory directly in your C and C++ programs for better performance. If your data isn't numeric - if you have string variables - or if you need a greater range of numeric types - like 1-bit integers- then you might consider trying the ff package. It is similar to bigmemory but includes structures similar to a data frame.3. Understanding disk access
After creating a big matrix object you can instantly access any of its elements. If the elements are stored in RAM - or cached - then they are returned immediately. If they are not in RAM then they are moved from the disk to RAM and returned. Since a big matrix can access any of it's elements equally quickly we call it a random access data structure.4. Disadvantages of random access
However, this ability to quickly access any element comes with other associated challenges. For example, if we want to add columns or rows we have to create an entirely new big matrix object with the appropriate size, copy the old big matrix elements to their appropriate positions, and add the new values. This also means we need enough disk space to hold the entire matrix in one big block. In practice, many of the larger data sets you encounter won't require random access and storing all data in a single file is not feasible. It may be sufficient to retrieve contiguous "chunks" of rows from different locations, process them, and move on to the next chunk.5. Let's practice!
In the next chapter we'll take a look at tools for processing data in contiguous chunks. But first, we are going to finish up this chapter with one more exercise.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.