R Performance Tuning – Tips to Improve R Speed & Memory

1. R Performance Tuning

In this R tutorial, we are going to discuss the most important thing i.e. R Performance Tuning techniques. We are going to discuss the various factors which degrade the performance of R programming and we will also cover the tips to enhance the Performance in R language. 

R Performance Tuning - Tips to Improve R Speed & Memory

R Performance Tuning – Tips to Improve R Speed & Memory

R Quiz

2. R Performance Tuning Techniques

2.1. Writing fast R code

In this section of R Performance Tuning, we will discuss various factors that slow down the R Code, and then we will cover how can we write the code in R faster?
a) Is R slow?

  • R programs can be slow, but well-written R programs are usually fast enough.
  • Speed was not the primary design criteria.
  • Designed to make programming easier
  • Slow programs often a result of bad programming practices or not
  • understanding how R works.
  • There are various options for calling C or C++ functions from R.

b) Write R code that runs faster
R is a popular statistical software which is famous for the enormous amount of packages. R’s syntax is very flexible with making it convenient at the cost of performance. R is indeed slow compared to many other scripting languages, but there are a few tricks which can make our R code run faster.
i) Use matrix instead of data frame whenever possible. Actually data frame cause problem in many cases. Only use data frame when necessary.
ii) Use double(n) to create a vector of length n instead of using code rep(0,n), and similar for others.
iii) Split big data object (e.g., big data frame or matrix) to smaller ones, and operate on these smaller objects.
iv) Use for each(i=1:n) %dopar% {} to do parallel computing if applicable. Even if a for loop is not parallelizable, for each(i=1:n) %do% {} is a better alternative.
v) Use vector and matrix operation if possible. Theses *apply functions are very helpful for this purpose.
vi) Avoid changing the type and size of an object in R. Though we use R object as if they are typeless, they have type actually. In R, changing the type and size of an R object forces it to reallocate a memory space which is of course insufficient.
vii) Avoid creating too many objects in the each working environment. Not having enough memory can not only make your code run slower but also make it fail to run if have to allocate big vectors. One way to do this is to write small functions and run your functions instead of running everything directly in a working environment.

2.2. The Dreaded for loop

In R, many questions arise how to accomplish various tasks without for loops. There seems to be a feeling that programmers should avoid these loops at all costs.Those who pose the queries usually have the goal of speeding up their code.
a) Vectorization for Speedup
Sometimes, we can use vectorization instead of looping. For example, if x and y are vectors of equal lengths, you can write this:
z <- x + y
This is not only more compact, but even more important, it is faster than using this loop and if we understand the nuts and bolts of vectorization in R, it may help us to write shorter, simpler, safer, and yes, a faster code in the first place.

2.3. Functional programming and memory issues

a) What is functional programming?
R is a functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions. In particular, R has what’s known as first-class functions.
b) Memory issues 
Most R objects are immutable, or unchangeable. Thus, R operations are implemented as functions that re-assign to the given object, a trait that can have performance implications.

2.4. Copy-on-change-issues

We will discuss an important feature of R that makes it safer to work with data. Suppose we create a simple numeric vector x1:
x1 <- c(1, 2, 3)
Then, we assign the value of x1 to x2:
x2 <- x1
Now, x1 and x2 have exactly the same value. What if we modify an element in one of the two vectors? Will both vectors change?

x1[1] <- 0
## [1] 0 2 3
## [1] 1 2 3

The output shows that when x1 is changed, x2 will remain unchanged. You may guess that the assignment automatically copies the value and makes the new variable point to the copy of the data instead of the original data.

2.5. Using Rprof() to Find Slow Spots in Your Code

If our R code is running unnecessarily slowly, a handy tool for finding the culprit is Rprof():
a) Monitoring with Rprof()
We will call Rprof() to get the monitor started, run our code, and then call Rprof() with a NULL argument to stop the monitoring.
b) Profiling R code
Profiling R code gives you the chance to identify bottlenecks and pieces of code that needs to be more efficiently implemented just by changing one line of the code from
x = data.frame(a = variable1, b = variable2)
x = c(variable1, variable2)
This big reduction happened because this line of code was called several times during the execution of the function.to
c) Using R code profiling functions

  • Rprof is a function included in the base package utils, which is loaded by default.
  • To use R profiling in our code, we can call this function and specify its parameters, including the name of the location of the log file that will be written.
  • You can also turn on and off Profiling in your code.

2.6. Byte Code Compilation

a) The Compiler Interface
We can use compiler either explicitly by calling certain functions to carry out compilations, or implicitly by enabling compilation to occur automatically at certain points.
b) What is JIT?
It is a method to improve the runtime performance of computer programs.
c) JIT in R
There are two R packages available that offers Just-in-time compilation to R users: the {jit} package and the {compiler} package.
d) The {JIT} package
The {jit} package was created by Stephen Milborrow which provides a just-in-time compilation of R loops and arithmetic expressions in loops, enabling such code to run much faster.
This was all on R Performance tuning.

3. Conclusion

Hence, we have studied about the performance of the R language in different aspects. Basically, R is a slow language but there are too many ways available to speed up the language by following a particular manner. We can also use vectorization instead of loops to increase the speed and memory of R. Different functions like rprof, bytecode compilation is also available for R performance tuning which we have discussed in above tutorial.
If you find any other techniques for R performance tuning, so, do share with us in the comment section.

2 Responses

  1. Mau says:

    Working with matrix instead of dataframe. Great. However, I have a dataframe with 9,000,000 obs and 275 var. If I try to transform it via as.matrix, rstudio tells me that the matrix will be about 15gb. Thus matrices faster but heavier?

    • DataFlair Team says:

      Hi Mau,
      In order to perform mathematical operations on a dataframe, it must be converted into a matrix. Furthermore, more the size of the dataframe, more size will be that of the converted matrix. Also, matrices are much faster as the data is homogeneous in nature as opposed to the dataframe.
      Hope, it helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.