ANOVA in R – Common Statistical ANOVA Models

1. Objective – ANOVA in R

Today, in this tutorial, we are going to discuss ANOVA in R. Moreover, we will look at the ANOVA Model in R. Also, we will discuss the one way and two way ANOVA in R. Along with this, we will also cover the syntax of ANOVA Models in R. Also, we will see the uses of Models in R. At lst, we will discuss ANOVA Table and Classical ANOVA in R.

So, let’s start ANOVA in R Tutorial.

ANOVA in R - Common Statistical ANOVA Models

ANOVA in R – Common Statistical ANOVA Models

2. What is ANOVA Model in R?

Basically, it’s model which is seldom sweet and almost always confusing. Moreover, we use analysis of Variance in a statistical technique. As a result, we have found that it’s used for investigating data by comparing the means of subsets of the data.

Anova.glm

Generally, it’s an analysis of Deviance for Generalized Linear Model Fits.
As a result, it’s needed to compute an analysis of deviance table for one or more generalized linear model fits.

Keywords

models, regression

Usage

# S3 method for glm
anova(object, …, dispersion = NULL, test = NULL)

Arguments

a. the object, …

Basically, it’s the result of a call to glm or a list of objects for the “almost” method.

b. dispersion

Basically, dispersion is define as the parameter for the fitting family.

c. test

Generally, it’s a character string. As a result, it should match one of “Chisq”, “LRT”, “Rao”, “F” or “Cp”.

3. What is ANOVA in R?

ANOVA in R can explain by the following ways that are 1 way and 2 Way ANOVA in r-

i. 1-way ANOVA in R

Generally, we have to use Insect Sprays which is a type of data set. Although we are going to test 6 different insect sprays. As a result, it needs to see if there was a difference in the number of insects found in the field after each spraying

> attach(InsectSprays)
> data(InsectSprays)
> str(InsectSprays)

data.frame‘: 72 obs. of 2 variables:
$ count: num 10 7 20 14 14 12 10 23 17 20 …
$ spray: Factor w/ 6 levels “A”,”B”,”C”,”D”,..: 1 1 1 1 1 1 1 1 1 1 …

a. Descriptive statistics

1. Mean, variance, number of elements in each cell

2. Visualise the data – boxplot; look at distribution, look for outliers

We will be going to use tapply() here:

tapply() function is a very helpful shortcut in processing data. Also, we use it as a function. Moreover, it should be applied to each subset of the response variable defined by each level of the factor.

b. Run 1-way ANOVA

1. Oneway.test ( )

Use, for example:

> oneway.test(count~spray)

One-way analysis of means (not assuming equal variances)
data: count and spray

F = 36.0654, num df = 5.000, denom df = 30.043, p-value = 7.999e-12

Default is equal variances not assumed – i.e. Welch’s correction applied (and this explains why the denom df (which is k*(n-1)) is not a whole number in the output) O.
To change this, set ” var.equal=” option to TRUE
Oneway.test( ) corrects for non-homogeneity, but doesn’t give much information.

2. Run an ANOVA using aov( )

Basically, we have to use this function and store output and use extraction functions to extract what you need.

> aov.out = aov(count ~ spray, data=InsectSprays)
> summary(aov.out)

ii. Two-way ANOVA in R

Two-way Analysis of Variance

We use it to compare the means of populations. That is classified in two different ways. Besides, we can use lm() to fit two-way ANOVA models in R.

For example, the command:

> lm(Response ~ FactorA + FactorB)

fits a two- way ANOVA model without interactions. In contrast, the command

> lm(Response ~ FactorA + FactorB + FactorA * FactorB )

Includes an interaction term. Here both FactorA and FactorB are categorical variables, while Response is quantitative.

4. Classical ANOVA in R

Generally, we start with a simple additive fixed effects model. In this model, we use the built-in function aov

aov(Y ~ A + B, data=d)

Now, to cross these factors, or more generally to interact two variables we use either of

aov(Y ~ A * B, data=d)
aov(Y ~ A + B + A:B, data=d)

So far so familiar. Now assume that B is being nested within A

aov(Y ~ A/B, data=d)
aov(Y ~ A + B %in% A, data=d)
aov(Y ~ A + A:B, data=d)

So, nesting amounts to adding one main effect and one interaction.

i. Random Effects in Classical ANOVA

aov also can deal with random effects. That provides everything which is being balanced. Assume A is alone random effect, e.g. a subject indicator

aov(Y ~ Error(A), data=d)

Now assume A is random, but B is being fixed and B is being nested within A.

aov(Y ~ B + Error(A/B), data=d)

or B and X are crossed (interacted) within levels of random A.

aov(Y ~ (B*X) + Error(A/(B*X)), data=d)

Or B and X within random A are categorized by (non-nested) G and H:

aov(Y ~ (B*X*G*H) + Error(A/(B*X)) + (G*H), data=d)

As a result, this Error business can get confusing and the balance requirements tiresome. Thus, for random effects models, it’s usually easier to move to lme4.

R Quiz

5. ANOVA Table in R

Let us say we have collected data, and our X values have been entered in R as an array called data.X and our Y values as data.Y.

Now, we want to find the ANOVA values for the data. Then we can do this through the following steps:

  • First, we should fit our data into a model. > data.lm = lm(data.Y~data.X)
  • Next, we can get R to produce an ANOVA table by typing : > anova(data.lm)
  • As a result, we should have an ANOVA table!

a. Fitted Values

We used to type:

> data.fit = fitted(data.lm)

to get the fitted values of the model.

As a result, it gives us an array called “data.fit” that contains the fitted values of data.lm

b. Residuals

We use this to get the residuals of the model.

> data.res = resid(data.lm)

Now, as a result, we have an array of the residuals.

c. Hypothesis testing

  • Generally, in case if we have already found the ANOVA table for our data. Then we can able to calculate our test statistic from the numbers given.
  • If we want to find the F – quantile given by F(.95;3,24)

We can find this by typing

> qf(.95, 3, 24)
  • If we want to find the t – quantile given by t(.975;1,19)

We would type:

> qt(.975, 1, 19)

d. P – values

In case if we want to get the p-value for the F – quantile of, say, 2.84, with degrees of freedom 3 and 24, we would type in

> pf(2.84, 3, 24)

e. Normal Q-Q plot

Generally, we use “data.lm to get the normal probability for the standard residuals of our data.
Although, we have already fit our data to a model, but now we need the studentized residuals:

> data.stdres = rstandard(data.lm)

Also, we used to type like this to make the plot:

> qqnorm(data.stdres)

Then, to see the line, type:

> qqline(data.stdres)

So, this was all in ANOVA in R. Hope you like our explanation.

6.Conclusion – ANOVA in R

As a result, we have studied ANOVA in R. Also, their different types with properties of ANOVA Model in R. That is so much useful in investigating data by comparing the means of subsets of the data. Still, if you have any confusion regarding ANOVA in R, ask in the comment tab.

Reference for R 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.