R Factor | Factors & Factor Function in R Programming

1. R Factor – Objective

Today, in this R tutorial, we are going to cover the complete introduction to R factor. First of all, we will discuss what is Factor in R. Moreover, we will look at the Factor function in R programming. Also, we will cover the different R Factor functions such as tapply(), split() and by() along with their usage, example, and arguments.

So, let’s start the R Factor tutorial.

R Factor | Factors & Factor Function in R Programming

R Factor | Factors & Factor Function in R Programming

2. What is an R Factor?

First, we will discuss what exactly factor is? After that, we will proceed to factor functions. In R, Factors are variables. It takes a limited number of different values. Hence, those variables often called as categorical variables. The R factor is the data objects for categorizing the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values.
The factor is been stored as integers. They have labels associated with these unique integers. We need to be careful while treating factors like strings. Factor contains a predefined set value called as levels. By default, R always sorts levels in alphabetical order.

R Quiz

3. Factor function in R Programming

Now we will discuss R factor function in detail.
Functions which we apply to factors is term as a function of factors. We use function factor to encode a vector as a factor TRUE. The factor level is been assumed to order If argument ordered is TRUE. For compatibility with S, there is a function ordered.
The factor() command is used to create and change factors in R
The most used functions in factor: tapply(), split() and by()

i. tapply()

Apply a Function over a Ragged Array
In a ragged array, apply a function to each its cell which is in a group of values given by a unique combination of the levels of certain factors.
iteration, category
tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)

  • X – An atomic object called a vector.
  • … – Optional arguments to FUN: the Note section.
  • FUN – The function is applied, or NULL. In the case of functions like +, %*%, etc., Function tapply returns a vector which we can use to subscript the multi-way array tapply If FUN is NULL
  • Simplify – Logical; if FALSE, tapply always returns an array of mode “list”; in other words, a list with a dim attribute. FUN always returns a scalar, tapply returns an array with the mode of the scalar only if TRUE(default)
  • Default – Before R 3.4.0, this was hardcoded to an array()’s default NA. If it is NA (the default), the missing value of the answer type, e.g. NA real_, is been chosen (as.raw (0) for “raw”). In a numerical case, it may be set, e.g., to FUN(integer(0)), e.g., in the case of a FUN = sum of 0 or 0L.
  • INDEX – list of one or more factor, each of the same length as X. By using as.factor we use elements to form factors.

ii. split()

Divide into Groups and Reassemble
split divides the data in the vector x into the groups defined by f. The replacement forms replace values corresponding to such a division. unsplit reverses the effect of a split.

split(x, f, drop = FALSE, …)
# S3 method for default
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, …)
split(x, f, drop = FALSE, …) <- value
unsplit(value, f, drop = FALSE)


  • X – vector or data frame which contains value is divided into groups.
  • Drop – logical indicating, if levels that do not occur should be dropped (if f is a factor or a list)
  • sep – In the case where f is a list character string, passed to interaction
  • … – Further potential arguments passed to methods.
  • Value – a list of vectors or data frames compatible with a splitting of x. Recycling applies if the lengths do not match.
  • f – In a ‘factor’ as.factor(f) defines the grouping.
  • Lex.order – logical, passed to interaction when f is a list.

iii. by()

Apply a Function to a Data Frame Split by Factors
The tapply function uses to apply on data frames where tapply is an object-oriented of by function.
Iteration, category
by(data, INDICES, FUN, …, simplify = TRUE)

  • INDICES – A factor or a list of factors, each of length nrow(data).
  • data – Data in a by the function is a data frame. It is also termed as an R object and a matrix.
  • simplify – Logical: see tapply
  • … – further arguments to FUN
  • FUN – A function which we apply to(usually data-frame) subsets of data.

So, this was all on R Factor tutorial. Hope you like the R Factor tutorial.

4. Conclusion – R Factor

Hence, in this R Factor tutorial, we saw the factor is the data objects for categorizing the data and store it as levels while Functions which we apply to factors is term as a function of factors. Moreover, we use the function factor to encode a vector as a factor TRUE. Also, we discussed different factor() functions with their uses. Still. if you have any query regarding R Factor, ask in the comment tab.
See Also-

Reference for R

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.