R Data Types – Become an expert in its implementation!

Moving ahead in our R DataFlair Tutorial Series, today we will learn all about R data types in which we will understand about vectors, matrices, lists, data frames and factors in R programming in detail. Also, we will explore the expressions, assignment, and arithmetic operations along with grouped expressions, conditional execution and repeated execution in R.

So, let’s start with the tutorial.

What is R Data Types?

One of the essential features of R is its robust ability to handle and process complicated statistical operations with an optimized strategy. R handles complex computations using:

  • Vector – A basic data structure of R containing the same type of data.
  • Matrices – A matrix is basically an array of numbers or other types of mathematical objects. We can do operations such as addition and multiplication on the matrix in R.
  • Lists – Lists store collections of objects when vectors are of same type and length in a matrix.
  • Data Frames – Generated by combining multiple vectors together such that each vector becomes a separate column.

1. Vectors in R

Vector is a basic data structure in R that contains the element of similar type. These data types in R can be logical, integer, double, character, complex or raw. With the help of the typeof() function, you can find out a vector’s data-type.

One more significant property of the R vector is its length. The function length() determines the number of elements in the vector.

> c(1,2,3)

Output:

c 1 2 3 - R Data Types

> length(c("aa", "bb", "cc", "dd", "ee", "ff"))

Output:

length c aa bb cc - R Data Types

1.1 Indexing Vectors in R

To access individual elements and subsets in R, we use the indexing technique. It can access, extract and replace part of an object. It is possible to print or refer to the subset of a vector by appending an index vector, enclosed in square brackets, to the vector name.

There are four forms of vector indexing used to extract a subset of vectors:

  • Positive Integers Vector – Indicates selected elements.
  • Negative Integers Vector – Indicates rejected elements.
  • Character Strings Vector – Used only for vectors with named elements.
  • Logical Values Vector – They are the result of evaluated conditions.

a. The vector of positive integers:

These are the set of integers which show the elements of the vector to be selected. These elements are then concatenated in the specified order.

temp = c(5,10,2,3,1,11) #Author DataFlair
temp[c(1,5,6,9)]
temp[2:5]

Output:

temp c - R Data Types

b. The vector of negative integers:

These are the set of integers which show the elements of the vector that are to be excluded from concatenation.

For example: Select all but the n<sup>th</sup> element

temp[-2]

Output:

temp -2 - R Data Types

This will give a vector of all elements except the 2nd element.

c. The vector of logical values:

Vector of logical values must be of the same length as the subset vector and it is usually the result of an evaluated condition. Logical values of T(True) means to include and F(False) means to exclude the elements from a vector of logical values from concatenation.

For example – Select elements for which the logical condition is True.

> temp[temp < 5]

Output:

temp<5 - R Data Types

Want to learn more right? Explore our article on R Vector Functions

2. Matrices in R

Matrices are data frames which contain lists of homogeneous data in a tabular format. We can perform arithmetic operations on some elements of the matrix or on the whole matrix itself in R.

Let us now see how to convert a single dimension vector into a two-dimensional array using the matrix() function:

matrix(temp, nrow = 2)

Output:

matrix temp

We will first create our matrix ‘comb’. Then we will use colnames() and  rownames functions to set row and column names as follows:

Matrices can represent the binding of two or more vectors of equal length. If we have the X and Y coordinates for five quadrants within a grid, we can use cbind() (combine by columns) or rbind() (combine by rows) functions to combine them into a single matrix, as follows:

quadX <- c(16.92, 24.03, 7.61, 15.49, 11.77)    #DataFlair
quadY <- c(8.37, 12.93, 16.65, 12.2, 13.12)
comb <- cbind(quadX, quadY)
colnames(comb)
rownames(comb) <- LETTERS[1:5]

Output:

quad x and quad y

By default, a matrix is filled by columns. Optional argument byrow=T causes filling by row.

quad x and quad y 2

2.1 Indexing Matrices in R

Like vectors, we can index matrices from the vectors of positive integers, negative integers, character strings, and logical values. The difference is, matrices have two dimensions (height and width) that require a set of 2 numbers for indexing while vectors have one dimension (length) that enable indexing of each element by a single number. Matrices are in the form of [row.indices, col.indices] for row and column indices.

For example: Let’s consider the ‘comb’ matrix as below:

print(comb)    #Author DataFlair
comb[2,1]

Output:

print comb

Explore the whole concept of R Matrices Operations

3. Lists in R

Lists are R data types that stores a collection of objects of differing lengths and types using list() function.

For example – We can create many isolated vectors, like temperature, shade, and names to represent data from a single experiment and group them to make the components of a list object, as follows:

> EXPERIMENT <- list(SITE = SITE, COORDINATES = paste(X,+ Y, sep = ","), TEMPERATURE = TEMPERATURE,+ SHADE = SHADE)

List created in the above example consists of four components:

  • SITE is a two-character vector.
  • A two-character vector named COORDINATES, which is a vector of XY coordinates for sites A, B, C, D, and E.
  • TEMPERATURE is a numeric vector.
  • A factor named SHADE.

Don’t forget to check the tutorial on R Lists

4. Data Frames and Datasets in R

Rarely there is a single variable collected in isolation. Data is mostly collected in sets of variables which reflect investigation of patterns between different variables. So, datasets are best organized into matrices of variables of the same lengths yet not necessarily of the same type. Here data frames come to the rescue as they store a list of vectors of the same length, yet different types, in a rectangular matrix.

We can create data frames by combining many vectors together in a manner such that each vector becomes a separate column. In this way, the data frame is like a matrix in which each column can represent a different vector type.

The sequence and number of observations in the vectors must be the same for each vector in the data frame to represent a dataset.

The first, second and third entries in each vector, for example, must represent the observations collected from first, second and third sampling units respectively.

For a better understanding, you must explore our tutorial on R Data Frame

5. Factors in R

Factors in R take on variables known as categorical variables. Factors are mostly used in statistical modeling as the categorical variables can easily get entered into the statistical models. This is much different than the continuous variables, as it stores the data in the form of factors insuring the modelling functions to treat the data correctly.

Factors stored in R are in the form of integer values that have a corresponding set of character values that can be used when we display the factor. We create a factor using the factor function. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument in a factor is a set of values that will be returned as factor values. Character, as well as numeric variables, can be converted into factors but changing its values would also change its levels. Using the levels command, you can check the levels.

We will create a factor in R as follows –

data = c(1,2,2,3,3,2,3,3,1,4,5,6,1)   #DataFlair
fac_data = factor(data)
fac_data

Output:

Factor Data Types

Programming in R

There are several built-in functions library and add-on tools available for R and they continue to grow at an incredible rate yet programs need performing a task for which no functions exist.

Since R is itself a programming language, extending its functionality to accommodate more procedures depends on the complexity of the procedure and the level of R proficiency of the user.

Some of the user-created functions are:

  • Expressions – Command entered at R command prompt.
  • Assignment – Assigns a name to an object.
  • Arithmetic Operations – When numeric values are there, we use arithmetic operations to perform operations.

For a clear understanding, I would recommend you to visit our easy to learn article on R Functions

Expressions, Assignment and Arithmetic Operations in R

An expression is a command that is entered in the R command prompt, is evaluated by R, printed to the current output device (usually the screen), and then discarded.

> 2 + 3

To assign a name to a new object that may be the result of an evaluated expression or any other object, we use assignment operation in R. R interprets the assignment operator <- as ‘evaluate the expression on the right-hand side and assign it the name supplied on the left hand side’. Object on the left-hand side is created if it does not already exists, otherwise the object’s contents are replaced.

You can view the contents of the object by entering the name of the object in the command prompt, as shown in the following code:

2 + 3      #Author DataFlair
add <- 2 + 3
add
sub <- add - 1
sub

Output:

add 2+3

We can concatenate objects to create objects with multiple entries using the c() function as:

> c(2,3,4)
> c(add, sub)

Output:

c 2 3 4

It concatenates add and sub displays result as:

There are different special operators along with the typical addition, subtraction, multiplication, and division operators. The simplest of these is the quotient or integer divide operator (%/%) and the remainder or modulus operator (%%).

For example:

7/3       #DataFlair
7%/%3
7%%3

Output:

7 divide by 3

You must definitely check the Numeric and Character Functions in R

Grouped Expressions in R

You can issue many commands on a single line by separating each command with a semicolon (;). When doing so, the evaluation of commands is in order from left to right:

> A <- 1; B <- 2; C <- A + B
> C

Output:

A B C A+B

When we group series of commands together between braces, evaluation of the whole group of commands is there as a single expression and value of last evaluated command within the group is returned.

Grouped expressions are useful for wrapping upsets of commands that work together to produce a single result and they can be further nested within braces as part of large grouped expression.

Conditional Execution in R – if and if else in R

When a sequence of tasks is determined whether a condition is met (TRUE) or not (FALSE), it is conditional execution. This is useful when we are writing code that needs to accommodate more than one set of circumstances. In R, the conditional execution has the following forms:

  • if(condition) true.task
  • if(condition) true.task else false.task
  • ifelse(condition) true.task false.task

If the condition returns a true, evaluation of true.task statement is there else evaluation of false.task statement is there. We cannot coerce If condition into a logical yes/no answer else we get an error.

For example:

x <- 3    #DataFlair
if( any(x <= 0) ) y <- log(1+x) else y <- log(x)     
print(y)

Output:

x<-3

Here if value of x is <=0, value of y will be log(1+x) else it will be log(x).

R Quiz

Repeated Execution – Looping in R

Looping means repeated execution of sets of commands. Following are the most commonly used loops in R:

  • for loop – Perform something for all items in a vector or list.
  • while loop – Perform something while a logical statement is true.

1. for Loop in R

A ‘for loop’ loops through a vector of integers (a counter) iteratively, each time executing the set of commands. It takes on the general form of:

> for (counter in sequence) task

Here counter is a loop variable, whose value increments according to the integer vector defined by the sequence.

  • The task is a single expression or grouped expression that utilizes the incrementing variable to perform a specific operation on a sequence of objects.

For example – Consider the following snippet that counts to six:

for (index in 1:6) print(index)

Output:

for index in 1:6

2. while Loop in R

A ‘while loop’ executes a set of commands repeatedly while a condition is TRUE and exits when the condition evaluates to FALSE. It takes the general form as below:

> while (condition) task

Here, the task is a single expression or grouped expression that performs a specific operation as long as the condition evaluates to TRUE.

For example:

x <- 100    #Author DataFlair
while (x > 0) {
  a <- runif(1, 1, 10)
  #do something
  x <- x - a
}
print(a)
print(x)

Output:

x<-100

Let us consider a situation where a procedure needs to generate a temporary object. To ensure that no existing objects are overwritten, a simple solution is to append the object name with a number.

We can use a while loop to generate a unique name after assessing whether an object name already exists in the current R environment. We can use the first three commands in the following syntax to generate a couple of existing names and confirm their existence:

TEMP <- NULL     #Author DataFlair
TEMP1 <- NULL
ls()
j <- NULL
NAME <- "TEMP"
while (exists(Nm <- paste(NAME, j, sep = ""))) { ifelse(is.null(j), j <- 1, j <- j + 1) }
assign(Nm, c(1, 3, 3))

Output:

The exists() function assesses whether an object of the given name already exists and the assign() function makes the first argument an object name and assigns it the value of the second argument.

Summary

Here we come to the end of our tutorial of R data types. In this article, we discussed the concept of data types in R, learned about matrices, vectors, lists, datasets, data frames and factors in R.

Now, you must go through our next tutorial on OLS Regression in R

Still, if you have any doubts related to the R data types tutorial, do let us know by leaving a comment below. We will be happy to solve them.

1 Response

  1. Jo says:

    Could you show what the output of the list EXPERIMENT would look like? I know it’s explained line by line but I can’t picture it. The SITE element is described as a two character vector, but there are 5 sites. Also, would is the purpose of the plus sign before element SHADE is introduced?
    thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.