R Data Types – Vectors, Matrices, Lists, and Data Frames

1. R Data Types – Objective

This R tutorial is all about R data types. First of all, we will discuss the introduction of Data Types in R. Moreover, we will look at R Vectors, R-Matrices, R lists, R DataFrame & DataSets. Also, this R language data type tutorial covers the indexing in Vectors and Matrices in R, Arithmetic operators in R, R-loops, Conditional i.e. if and if-else expressions in R.

So, let’s start R Data Types Tutorial.

R Data Types - Vectors, Matrices, Lists, and Data Frames

R Data Types – Vectors, Matrices, Lists, and Data Frames

2. What is R Data Types?

One of the key features of R is that it can handle complex statistical operations in an easy and optimized way.
R handles complex computations using:

  • Vector – A basic data structure of R containing the same type of data
  • Matrices – A matrix is a rectangular array of numbers or other mathematical objects. We can do operations such as addition and multiplication on Matrix in R.
  • Lists – Lists store collections of objects when vectors are of same type and length in a matrix.
  • Data Frames – Generated by combining together multiple vectors such that each vector becomes a separate column.

3. Vectors in R

In R, Vector is a basic data structure in R that contains element of similar type. These data types in R can be logical, integer, double, character, complex or raw.
In R using the function, typeof() one can check the data type of vector.
One more significant property of R vector is its length. The function length() determines the number of elements in the vector.

> c(2, 3, 5) [1] 2 3 5

[1] 2 3 5

> length(c("aa", "bb", "cc", "dd", "ee"))

[1] 5

i. Indexing Vectors in R

To access individual elements and subsets in R, we use indexing technique. It can access, extract and replace part of an object.
It is possible to print or refer to the subset of a vector by appending an index vector, enclosed in square brackets, to the vector name.
There are four forms of vector indexing used to extract a subset of vectors:

  • Positive integers Vector– Indicates selected elements
  • Negative integers Vector – Indicates rejected elements
  • Character strings Vector – Used only for vectors with named elements
  • Logical values Vector– They are the result of evaluated conditions.

a. The vector of positive integers:
These are the set of integers which show the elements of the vector to be selected. These elements are then concatenated in the specified order.
For example: To select the nth element

> TEMPERATURE[2]

Q2
30.6
It can also select specific sets of elements or range of elements as below:

> TEMPERATURE [c(1,5,6,9)]
> TEMPERATURE[2:5]

b. The vector of negative integers:
These are the set of integers which show the elements of the vector that are to be excluded from concatenation.
For example: Select all but the n<sup>th</sup> element

> TEMPERATURE[-2]

This will give a vector of all elements except 2nd element.
c. The vector of character strings:
When there are named elements in the vectors, then only we can use vector of character strings. We use a vector of element names to select elements that have to be concatenated.
For example: Select the named elements

> TEMPERATURE[c("Q1", "Q4")]

This will give values of 1st and 4th elements of the vector.
d. The vector of logical values:
Vector of logical values must be the same length as the subset vector and usually are the result of an evaluated condition. Logical values of T(True) means to include and F(False) means to exclude the elements from Vector of logical values from concatenation.
For example: Select elements for which the logical condition is true

> TEMPERATURE[TEMPERATURE < 15]

It will display elements whose value is less than 15.
We can also describe different conditions here like below:

> TEMPERATURE[TEMPERATURE <34 & SHADE = = “no”]

Read: R Vectors – Types and R Vector Operations with Examples

4. Matrices in R

Matrices are Data frames which contain lists of homogeneous data in a tabular format. We can perform arithmetic operations on some elements of the matrix or the whole matrix itself in R.
Let us see how to convert a single dimension vector into a two-dimensional array using the matrix() function:

> matrix(TEMPERATURE, nrow = 5)

We use rownames() and colnames() functions to set row and column names as follows:

> colnames(XY)
> rownames(XY) <- LETTERS[1:5]

Matrices can represent the binding of two or more vectors of equal length. If we have the X and Y coordinates for five quadrats within a grid, we can use cbind() (combine by columns) or rbind() (combine by rows) functions to combine them into a single matrix, as follows:

> X <- c(16.92, 24.03, 7.61, 15.49, 11.77)
> Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
> XY <- cbind(X, Y)

By default, a matrix is filled by columns. Optional argument byrow=T causes filling by row.

i. Indexing Matrices in R

Like vectors, we can index matrices from the vectors of positive integers, negative integers, character strings and logical values.
The difference is, matrices have two dimensions (height and width) that require a set of 2 numbers for indexing while vectors have one dimension (length) that enable indexing of each element by a single number.
Matrices are in the form of [row.indices, col.indices] for row and column indices.
For example: Let’s consider the XY matrix as below:
X Y
A 16.92 8.37
B 24.03 12.93
C 7.61 16.65
D 15.49 12.20
E 11.77 13.12
attr(,”description”)
[1] “coordinates of quadrats”
Below are few examples of matrix indexing:

> XY[3,2]

[1] 16.65
The above command selects the element at row 3 and column 2.
>XY[3, ] – It displays entire third row.
>XY[, 2 ] – It displays entire second column.
>XY[,-2] – It displays all columns except second
>XY[“A”,1:2 ] – It displays columns 1 through 2 for Row A.
>XY[, “X” ] – It displays column named ‘X’.
Read: R Matrices – Usage, Operations & Applications of Matrices in R

5. Lists in R

Lists are R Data Types stores collections of objects of differing lengths and types using list() function.
For example, we can create many isolated vectors, like temperature, shade, and names to represent data from a single experiment and group them to make them components of a list object, as follows:

> EXPERIMENT <- list(SITE = SITE, COORDINATES = paste(X,+ Y, sep = ","), TEMPERATURE = TEMPERATURE,+ SHADE = SHADE)

List created in the above example consists of four components:

  • SITE which is a two-character vector.
  • A two character vector named COORDINATES, which is a vector of XY coordinates for sites A, B, C, D, and E
  • TEMPERATURE which is a numeric vector.
  • A factor named SHADE

Read: R Lists – How to Create, Access, Manipulate, Merge Lists in R?
Any Doubt yet in R Data Types? Please Comment.

6. Data Frames and DataSets in R

Rarely are single variable collected in isolation. Data is mostly collected in sets of variables which reflect investigations of patterns between different variables. So, data sets are best organized into matrices of variables of the same lengths yet not necessarily of the same type. Here Data Frames come to the rescue as they store a list of vectors of the same length, yet different types, in a rectangular matrix.
We can create Data Frames by combining together many vectors in a manner that each vector becomes a separate column. In this way, the data frame is like a matrix in which each column can represent a different vector type.
The sequence and number of observations in the vectors must be the same for each vector in the Data Frame to represent a DataSet.
The first, second and third entries in each vector, for example, must represent the observations collected from first, second and third sampling units respectively.

7. Programming in R

There are several built-in functions library and add-on tools available for R and they continue to grow at an incredible rate. Yet programs need performing a task for which no functions exist.
Since R is itself a programming language, extending its functionality to accommodate more procedures depends on the complexity of the procedure and the level of R proficiency of the user.
User Created Functions:

  • Expressions – Command entered at R command prompt.
  • Assignment – Assigns name to an object.
  • Arithmetic Operations – When numeric values are there, we use arithmetic operations to perform operations.

Read: Functions in R – Learn to use R Functions with examples

8. Expressions, Assignment, and Arithmetic Operations in R

An expression is a command that is entered at the R command prompt, is evaluated by R, printed to the current output device(usually the screen), and then discarded.

> 2 + 3

— This is an expression that gives evaluated output as 5.
> 5
To assign a name to a new object that may be the result of an evaluated expression or any other object, we use assignment operation in R. R interprets the assignment operator <- as ‘evaluate the expression on the right-hand side and assign it the name supplied on the left hand side’. Object on the left-hand side is created if it does not already exists, otherwise the object’s contents are replaced.
You can view the contents of the object by entering the name of the object at the command prompt, as shown in the following code:

> VAR1 <- 2 + 3    -- It assigns expression to the object VAR1
> VAR1         -- It prints the contents of the object VAR1.

[1] 5             — Evaluated Output
We can spread a single command over many lines. If either a command is not complete by end of a line, or a carriage return is entered before R considers that the command syntax is complete, following line will begin with prompt + to show that the line is incomplete.
For example:

> VAR2    --- Incomplete assignment
+2+3        --- Assignment completed

> VAR2    — Prints contents of VAR2, evaluated output
[1] 5
When the contents of a vector are numeric, we can apply standard arithmetic operations as:

> VAR2—1

This prints the contents of VAR2-1 and gives output as 4.
We can concatenate objects to create objects with multiple entries using the c() function as:

> c(1,2 6)

This concatenates 1,2 and 6 and displays result as:
[1] 1 2 6

> c(VAR!, VAR2)

It concatenates VAR1 and VAR2 and displays result as:
[1] 5 5
There are different special operators along with the typical addition, subtraction, multiplication, and division operators. The simplest of these are the quotient or integer divide operator (%/%) and the remainder or modulus operator (%%). For example:

> 7/3

[1] 2.333333

> 7%/%3

[1] 2

> 7%%3

[1] 1
After Studying the R Data types you can also refer to our tutorial on R Studio | The Best RStudio. Tutorial of 2018

9. Grouped Expressions in R

You can issue many commands on a single line by separating each command with a semicolon (;). When doing so, evaluation of commands is in order from left to right:

> A <- 1; B <- 2; C <- A + B
> C

The above-grouped expression gives output as 3.
When we group series of commands together between braces, evaluation of whole group of commands is there as a single expression and value of last evaluated command within the group is returned.
Grouped expressions are useful for wrapping up sets of commands that work together to produce a single result and they can be further nested within braces as part of large grouped expression.

Read: Introduction to Arguments in R Programming Language

10. Conditional Execution in R – if and if else in R

When a sequence of tasks is determined by whether a condition is met (TRUE) or not (FALSE), it is conditional execution.This is useful when we are writing code that needs to accommodate more than one set of circumstances. In R, the conditional execution has the following forms:
if(condition) true.task
if(condition) true.task else false.task
ifelse(condition) true.task false.task
If the condition returns a true, evaluation of true.task statement is there else evaluation of false.task statement is there. We cannot coerce If condition into a logical yes/no answer else we get an error.
For example:

> if( any(x <= 0) ) y <- log(1+x) else y <- log(x)

Here if value of x is <=0 , value of y will be log(1+x) else it will be log(x).

Read: Array in R Programming – Array Function and ways to create array

R Quiz

11. Repeated Execution – Looping in R

Looping means repeated execution of sets of commands.
Following are the most commonly used loops in R

  • For loop – do something for all items in a vector or list.
  • While loop – do something while a logical statement is true.

i. For Loop in R

A for loop, loops through a vector of integers (a counter) iteratively, each time executing the set of commands. It takes on the general form of:
> for (counter in sequence) task

  • Here counter is a loop variable, whose value increments according to the integer vector defined by sequence.
  • Task is a single expression or grouped expression that utilizes the incrementing variable to perform a specific operation on a sequence of objects.

For example, consider the following snippet that counts to six:

> for (I in 1:6) print(i)

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6

ii. While Loop in R

A while loop executes a set of commands repeatedly while a condition is TRUE and exits when the condition evaluates to FALSE. It takes the general form as below:

> while (condition) task

Here, task is a single expression or grouped expression that performs a specific operation as long as condition evaluates to TRUE.
For example:

loop so long as x > 0
x <- 100
while (x > 0) {
a <- runif(1, 1, 10)
#do something
x <- x - a
}

Let us consider a situation where a procedure needs to generate a temporary object. To ensure that no existing objects are overwritten, a simple solution is to append the object name with a number.
We can use a while loop to generate a unique name after assessing whether an object name already exists in the current R environment. We can use the first three commands in the following syntax to generate a couple of existing names and confirm their existence:

> TEMP <- NULL
> TEMP1 <- NULL
> ls()
> j <- NULL
> NAME <- "TEMP"
> while (exists(Nm <- paste(NAME, j, sep = ""))) {+ ifelse(is.null(j), j <- 1, j <- j + 1)+ }
> assign(Nm, c(1, 3, 3))

The exists() function assesses whether an object of the given name already exists and the assign() function makes the first argument an object name and assigns it the value of the second argument.
So, this was all in the R Data Types tutorial. Hope you like our R language Data types Tutorial.

12. Conclusion – R Data Types

Hence, in this R Data Types Tutorial, we discussed the meaning of Data Types in R. Moreover, we learned about matrices, vectors, lists, data sets and data frames in R. Still, if you have any query related to this R data types, do let us know by leaving a comment in. We will be happy to solve them.
See Also-

1 Response

  1. Jo says:

    Could you show what the output of the list EXPERIMENT would look like? I know it’s explained line by line but I can’t picture it. The SITE element is described as a two character vector, but there are 5 sites. Also, would is the purpose of the plus sign before element SHADE is introduced?
    thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.