R Performance Tuning | Learn Tips to Improve Speed & Memory of R Programs
Job-ready Online Courses: Click, Learn, Succeed, Start Now!
In this R tutorial, we are going to discuss about the performance tuning in R. We will explore the various factors which degrade the performance of R programming and also cover the tips to enhance its performance.
R Performance Tuning Techniques
1. Writing R Code Quickly
In this section of R performance tuning, we will discuss various factors that slow down the R code and how can we write the code in R fastly.
1.1 Is R slow?
- While some of the R programs can be slow, therefore, in order to speed up the execution, programs must be optimised well enough.Â
- In R, speed was never the focus during its development.
- It was designed primarily to make programming much easier.
- In R, there is an exclusive facility for calling several C as well as C++ functions.
1.2 Write R code that runs faster
R is popular statistical software which is trending due to the enormous amount of packages. R’s syntax is very flexible, making it convenient at the cost of performance. R is indeed slow when compared to many other scripting languages, but there are a few tricks which can make our R code run faster:
- Use a matrix instead of a data frame whenever possible as data frame cause problem in many cases. Therefore, only use data frame whenever necessary.
- Use double(n) to create a vector of length n instead of using code rep(0,n), and similar for others.
- Split big data object (e.g., big data frame or matrix) into smaller ones, and operate it on these smaller objects.
- Use vector and matrix operation, if possible.
- Avoid changing the type and size of an object in R. In R, changing the type and size of an R object forces it to reallocate a memory space which is of course insufficient.
- Avoid creating too many objects in each working environment. Inadequate memory can slow down your code as well as make it impossible to execute programs when allocated with complex vectors. One way to do this is to write small functions and run them instead of running everything directly in a working environment.
Wait! Have you checked – R Graphic Devices Tutorial
2. The Dreaded for Loop
In R, many questions arise for accomplishing various tasks without for loops. Programmers should generally avoid these loops at all costs.
Vectorization for Speedup
Sometimes, we can use vectorization instead of looping. For example, if x and y are vectors of equal lengths, you can write this:
z <- x + y
This is not only more compact, but also more important. If we have a proper understanding of the nuts and bolts of vectorization in R, we can execute the for loop much faster. This is because, with the help of vectorization, we can write much shorter, simpler as well as much faster code in the first place.Â
3. Functional Programming and Memory Issues
What is Functional Programming?
R is a functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions. In particular, R involves what’s known as first-class functions.
Memory issuesÂ
Most R objects are immutable, or unchangeable. Thus, R operations are implemented as functions that re-assign to the given object, a trait that can have performance implications.
Master the concept – Object-Oriented Programming in R
4. Copy-on-change-issues
We will discuss an important feature of R that makes it safer to work with data. Suppose we create a simple numeric vector x1:
x1 <- c(1, 2, 3)
Then, we assign the value of x1 to x2:
x2 <- x1
Now, x1 and x2 have exactly the same value. What if we modify an element in one of the two vectors? Will both vectors change?
x1[1] <- 0 x1 ## [1] 0 2 3 x2 ## [1] 1 2 3
Output:Â
The output shows that when x1 is changed, the x2 will remain unchanged. You may guess that the assignment automatically copies the value and makes the new variable point to the copy of the data instead of the original data.
Get a deep insight into R Vector Functions
5. Using Rprof() to Find Slow Spots in your Code
If our R code is running unnecessarily slowly, a handy tool for finding the culprit is Rprof():
5.1 Monitoring with Rprof()
We will call Rprof() to get the monitor started, run our code, and then call Rprof() with a NULL argument to stop the monitoring.
5.2 Profiling R code
Profiling R code gives you the chance to identify bottlenecks and pieces of code that needs to be more efficiently implemented just by changing one line of the code from
x = data.frame(a = variable1, b = variable2)
to
x = c(variable1, variable2)
This big reduction happened because this line of code was called several times during the execution of the function.to
5.3 Using R code profiling functions
Using Rprof():
- Rprof is a function included in the base package utils, which is loaded by default.
- To use R profiling in our code, we can call this function and specify its parameters, including the name of the location of the log file that will be written.
- You can also turn on and off profiling in your code.
6. Byte Code Compilation
6.1 The Compiler Interface
We can use compiler either explicitly by calling certain functions to carry out compilations, or implicitly by enabling compilation to occur automatically at certain points.
6.2 What is JIT?
It is a method to improve the runtime performance of computer programs.
6.3 JIT in R
There are two R packages available that offer just-in-time compilation to R users: the {jit} package and the {compiler} package.
6.4 The {JIT} package
This package is a creation of Stephen Milborrrow. It facilitates the just-in-time compilation of R loops and several arithmetic expressions. This allows the code to get executed faster.Â
Summary
We have studied about the performance of the R language in different aspects. Basically, R is a slow language but there are too many ways available to speed up the language by following a particular manner. We can also use vectorization instead of loops to increase the speed and memory of R. Different functions like rprof, bytecode compilation is also available for R performance tuning which we have discussed in this tutorial.
Now, let’s learn about the Hypothesis Testing in R
Still, have any doubts related to R performance tuning? Do share with us in the comment section.
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google
Working with matrix instead of dataframe. Great. However, I have a dataframe with 9,000,000 obs and 275 var. If I try to transform it via as.matrix, rstudio tells me that the matrix will be about 15gb. Thus matrices faster but heavier?
Hi Mau,
In order to perform mathematical operations on a dataframe, it must be converted into a matrix. Furthermore, more the size of the dataframe, more size will be that of the converted matrix. Also, matrices are much faster as the data is homogeneous in nature as opposed to the dataframe.
Hope, it helps!