R Data Reshaping | 4 Major Functions to Organise your Data
In this tutorial, we will discuss about data reshaping in R. We will also cover data frame concepts in R as R data reshaping is totally dependent on a data frame. Along with this, we will also learn different properties of a data frame which will help you out in understanding data reshaping concepts.
Introduction to R Data Reshaping
R data reshaping is all about changing the way in which data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. Also, extracting data from the rows and columns of a data frame is an easy task but there are situations when we need the data frame in a format that is different from the format in which we received it. In R, it has many functions to split, merge and change the rows to columns in a data frame.
Why Reshape R Package?
For analytic functions, the data obtained as a result of an experiment or study is generally different. Usually, the data from a study has one or more columns that can identify a row followed by a number of columns that represent the values measured. The columns that identify the row can be thought of as a composite key of a database column.
To understand this concept of reshape R, knowledge of the matrix is very necessary. So, here is the R Matrix tutorial for you.
Joining Columns and Rows in a Data Frame
We use vectors to create a data frame using the cbind()function.
We use cbind() function to combine vector, matrix or data frame by columns.
x1,x2: vector, matrix, data frames
Also, we can merge two data frames using rbind() function.
We use rbind() function to combine vector, matrix or data frame by rows.
x1,x2: vector, matrix, data frames
You must have a look at R Vector Operations
We use melt() function in R to convert an object into a molted data frame. It takes input in the form of a wide format and stacks multiple columns into a single column of the data. The melt() function has the following attributes –
melt(data, …, na.rm = FALSE, value.name = “value”)
- data – The input data that is to be melted.
- …. – Arguments that are passed to or from.
- na.rm – Used for converting explicit missings into implicit missings.
- value.name – Used for storing values in variables.
In the following example, make use of the mtcars data and apply melt() function to the id variables – ‘gears’ and ‘carbs’ and the measured variables – ‘mpg’, ‘cyl’, ‘disp’, ‘hp’. We use this melt function to melt the mtcars data frame.
library(reshape) library(datasets) str(mtcars) molted = melt(mtcars,id.vars=c("gear","carb"),measured.vars=c("mpg","cyl","disp","hp")) str(molted) molted[sample(nrow(molted),10),]
Once you have a molten dataset with you, it is ready to be cast or reshaped. We will construct the original dataset using the dcast() function. The dcast() function-
There are three arguments in dcast() –
- data – The data attribute taken in the molten data frame.
- formula – The formula specifies how the data is to be cast. The formula is present in the form of x_variable ~ y_variable, but there can be multiple variables present.
- fun.aggregate – We use this function if there is data aggregation due to implementation of the casting formula. (example – length(), mean() and sum() ).
What if we use only one of the variables gear or carb in dcast()?
We can also perform a transpose operation on this as follows –
> dcast(molted,variable~gear,mean) #Author DataFlair
We can also avail .(dot) which does not signify any variable –
> dcast(molted,variable~.,mean) #Author DataFlair
We can also perform –
> dcast(molted,carb~.,mean) #DataFlair
Margins, that are known as column totals can be created by specifying an attribute ‘margin’ and setting it to TRUE.
Do you know about Bootstrapping in R
Merging Data Frames in R
merge() function is used to merge two data frames in R. The data frames must have the same column names on which the merging happens.
To merge two data frames (datasets) horizontally, we use the merge function. Mostly, we use it to join two data frames by one or more common key variables (i.e., an inner join).
# merge two data frames by ID
total <- merge(data frameA,data frameB,by=”ID”)
# merge two data frames by ID and Country
total <- merge(data frameA,data frameB,by=c(“ID”,”Country”)) .
This was all in Reshape R tutorial. Hope you liked the article.
We have studied how to reshape data in R in a detailed manner. We have also learned about the data frame and its properties. Data frame helps you in every single concept of data reshaping as they are linked to data frame only.
Next up in our R programming DataFlair tutorial series – Object Oriented Programming (OOP ) in R