R Data Frame Introduction & Operations with Examples


1. Objective

In this R tutorial we are going to explain R Data Frame in detail. Here we will learn what is R Data Frame, what are the characteristics of R Data Frame. We will also learn the operations that can be perform on the Data Frame in R such as Creating Data Frame, How to Print the Data Frame, How to get the structure of Data frame in R, How to get coloum and row in R Data Frame with the help of examples.

Introduction to R Data Frame and its Operations

1. What is Data Frame in R?

First of all, we are going to discuss where the concept of data frame came from? The concept comes from the world of the statistical software used in empirical research. It generally refers to tabular data: a data structure representing the cases (rows), each of which consists of numbers of observation or measurement (columns).

A data frame is being used for storing data tables. It is a list of vectors of equal length.

For example:

The following variable df is data frame containing three variables n, s, b.

n = c(2, 3, 5 )
S = c(“a” , “b” , “c”)
b = c(TRUE, FALSE, TRUE )
df = data.frame ( n, s, b )         # df is a data frame

A data frame is an array. Unlike an array, the data we store in the columns of the data frame can be of various types. It means one column might be a numeric variable, another might be a factor, and a third might be a character variable. All columns have to be the same length.

2. Characteristics of a R Data Frame

As we have discussed what is R Data Frame, let’s now discuss the characteristics of Data Frame in R.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain the same number of data items.

3. R Data Frame Operations

In this section of R Data Frame we will perform various operations on Data Frame in R. Let’s discuss these operations one by one-

a) Create Data Frame

emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
  • Print the data frame.
print(emp.data)

After executing the above code, it will produce the following result −

emp_id    emp_name     salary     start_date
1         Ricky        643.30     2012-01-01
2         Danish       515.20     2013-09-23
3         Mini         671.00     2014-11-15
4         Ryan         729.00     2014-05-11
5         Gary         943.25     2015-03-27

b) Get the Structure of the Data Frame

The structure of the data frame can see by using the star () function.

  • Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
  • Get the structure of the data frame.
str(emp.data)

When we execute the above code, it produces the following result −

'data.frame':   5 obs. of  4 variables:
$ emp_id    : int  1 2 3 4 5
$ emp_name  : chr  "Ricky" "Danish" "Mini" "Ryan" ...
$ salary    : num  643 515 671 729 943
$start_date : Dat, efrmoat: "2012-01-01" "2013-09-23" "214-011-15" "214-00511-" ...

c) Extract data from Data Frame

By using name of the column extract specific column from column.

  • Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)

Extract Specific columns.

result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

When we execute the above code, it produces the following result −

emp.data.emp_name. emp.data.salary
1          Ricky         643.30
2          Danish        515.20
3          Mini          671.00
4          Ryan          729.00
5          Gary          943.25

Extract the first two rows and then all columns

  • Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
  • Extract first two rows.
result <- emp.data[1:2,]
print(result)

When we execute the above code, it produces the following result −

emp_id    emp_name   salary    start_date
   1      Ricky      643.3     2012-01-01
   2      Danish     515.2     2013-09-23

Extract 3rd and 5th row with 2nd and 4th column of the below data

  • Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
StringsAsFactors = FALSE
)

Extract 3rd and 5th row with 2nd and 4th column.

result <- emp.data[c(3,5),c(2,4)]
print(result)

When we execute the above code, it produces the following result −

         emp_name start_date
3         Mini    2014-11-15
5         Gary    2015-03-27

d) Expand Data Frame

A data frame can expand by adding columns and rows.

Add Column

Add the column vector using a new column name.

  • Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
  • Add the “dept” coulmn
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

When we execute the above code, it produces the following result −

emp_id   emp_name   salary    start_date    dept
1        Ricky      643.30    2012-01-01    IT
2        Danish     515.20    2013-09-23    Operations
3        Mini       671.00    2014-11-15
4        Ryan       729.00    2014-05-11    HR
5        Gary       943.25    2015-03-27    Finance

Add Row

  • Create the first data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),
salary = c(643.3,515.2,671.0,729.0,943.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
dept = c("IT","Operations","IT","HR","Finance"),
stringsAsFactors = FALSE
)
  • Create the second data frame
emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)
  • Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

When we execute the above code, it produces the following result −

emp_id     emp_name      salary      start_date         dept
1            Ricky       643.30      2012-01-01         IT
2            Danish      515.20      2013-09-23         Operations
3            Mini        671.00      2014-11-15         IT
4            Ryan        729.00      2014-05-11         HR
5            Gary        943.25      2015-03-27         Finance
6            Rasmi       578.00      2013-05-21         IT
7            Pranab      722.50      2013-07-30         Operations
8            Tusar       632.80      2014-06-17         Fianance

4. Conclusion

We have learned about the data frame along with its Characteristics in detail. We have also discussed different operations of a data frame and with the help of above-mentioned information, it is easier to understand how to expand the data frame as we have included examples of it.

See Also-

Leave a comment

Your email address will not be published. Required fields are marked *