In this R tutorial, we are going to cover the complete introduction to R factor. First of all, we will discuss what is Factor next with Factor functions in R programming. After this, we will cover the different R Factor functions such as tapply(), split() and by() along with their usage, example, and arguments.
2. What is an R Factor?
First, we will discuss what exactly factor is? After that, we will proceed to factor functions. In R, Factors are variables. It takes a limited number of different values. Hence, those variables are often called as categorical variables. The R factor is the data objects for categorizing the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values.
The factor is been stored as integers. They have labels associated with these unique integers. We need to be careful while treating factors like strings. Factor contains a predefined set value called as levels. By default, R always sorts levels in alphabetical order.
3. Factor function in R Programming
Now we will discuss R factor function in detail.
Functions which we apply to factors is term as a function of factors. We use function factor to encode a vector as a factor TRUE. The factor level is been assumed to order If argument ordered is TRUE. For compatibility with S, there is a function ordered.
The factor() command is used to create and change factors in R
The most used functions in factor: tapply(), split() and by()
Apply a Function over a Ragged Array
In a ragged array, apply a function to each its cell which is in a group of values given by a unique combination of the levels of certain factors.
tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)
- X – An atomic object called a vector.
- … – Optional arguments to FUN: the Note section.
- FUN – The function is applied, or NULL. In the case of functions like +, %*%, etc., Function tapply returns a vector which we can use to subscript the multi-way array tapply If FUN is NULL
- Simplify – Logical; if FALSE, tapply always returns an array of mode “list”; in other words, a list with a dim attribute. FUN always returns a scalar, tapply returns an array with the mode of the scalar only if TRUE(default)
- Default – Before R 3.4.0, this was hardcoded to an array()’s default NA. If it is NA (the default), the missing value of the answer type, e.g. NA real_, is been chosen (as.raw (0) for “raw”). In a numerical case, it may be set, e.g., to FUN(integer(0)), e.g., in the case of a FUN = sum of 0 or 0L.
- INDEX – list of one or more factor, each of the same length as X. By using as.factor we use elements to form factors.
Divide into Groups and Reassemble
split divides the data in the vector x into the groups defined by f. The replacement forms replace values corresponding to such a division. unsplit reverses the effect of a split.
split(x, f, drop = FALSE, …) # S3 method for default split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, …) split(x, f, drop = FALSE, …) <- value unsplit(value, f, drop = FALSE)
- X – vector or data frame which contains value is divided into groups.
- Drop – logical indicating, if levels that do not occur should be dropped (if f is a factor or a list)
- sep – In the case where f is a list character string, passed to interaction
- … – Further potential arguments passed to methods.
- Value – a list of vectors or data frames compatible with a splitting of x. Recycling applies if the lengths do not match.
- f – In a ‘factor’ as.factor(f) defines the grouping.
- Lex.order – logical, passed to interaction when f is a list.
Apply a Function to a Data Frame Split by Factors
The tapply function uses to apply on data frames where tapply is an object-oriented of by function.
by(data, INDICES, FUN, …, simplify = TRUE)
- INDICES – a factor or a list of factors, each of length nrow(data).
- data – data in a by the function is a data frame. It is also termed as an R object and a matrix.
- simplify – logical: see tapply
- … – further arguments to FUN
- FUN – a function which is applied to(usually data-frame) subsets of data.
Hence, the factor is the data objects for categorizing the data and store it as levels while Functions which we apply to factors is term as a function of factors. We use function factor to encode a vector as a factor TRUE. Here we have also discussed different factor() functions with their uses.