T-Tests in R – Welch t-test and its uses


1. Objective

In this tutorial, we will be going to learn what is T-Tests in R and why we use R t-tests. Along with this, we will also learn how to perform t-tests in R and various uses of T-Test in R. We will also learn about various types of t-test in R like one sample t-test, Welch t-test etc.

t-tests in R

2. Introduction

T-tests in R is one of the most common tests in statistics. It is being used to determine whether the means of two groups are equal to each other. The assumption for the test is that both groups are sampled from normal distributions with equal variances. The null hypothesis is that the two means are equal, and the alternative is that they are not. It is being known that under the null hypothesis, we can calculate a t-statistic that will follow a t-distribution with n1 + n2 – 2 degrees of freedom. Welch’s t-test is a user modification of the t-test that adjusts the number of degrees of freedom when the variances are thought not to be equal to each other.

We use t.test() which provides a variety of t-tests.

# independent 2-group t-test

t.test(y~x) # where y is numeric and x is a binary factor

# independent 2-group t-test

t.test(y1,y2) # where y1 and y2 are numeric

# paired t-test

t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric

# one sample t-test

t.test(y,mu=3) # Ho: mu=3   

3. How to perform t-tests in R

We can use the var.equal = TRUE option to specify equal variances and a pooled variance estimate.

You can use them –

alternative=”less” or

alternative=”greater”, option to specify one-tailed test.

a. One-Sample T-Tests in R

In R, we use the syntax t.test(y, mu = 0) to conduct one-sample tests in R, where

X:  is the name of our variable of interest and

Mu: is set equal to the mean specified by the null hypothesis.

For Example:

If we wanted to test whether the volume of a shipment of lumber was less than usual (μ0=39000 cubic feet), we would run:

set.seed(0)

treeVolume <- c(rnorm(75, mean = 36500, sd = 2000))
t.test(treeVolume, mu = 39000) # Ho: mu = 39000

One Sample t-test

data:  treeVolume
t = -12.2883, df = 74, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 39000
95 percent confidence interval:
36033.60 36861.38

sample estimates:
mean of x
36447.49

b. Paired sample T-Tests in R

We need either two vectors of data, y1 and y2, to conduct a paired-samples test. Then we will run this code using this using syntax t.test(y1, y2, paired=TRUE).

For instance, let’s say that we work at a large health clinic and we’re testing a new drug, Procardia, that’s meant to reduce hypertension. We find 1000 individuals with a high systolic blood pressure (x¯=145mmHg, SD=9mmHg), we give them Procardia for a month, and then measure their blood pressure again. We find that the mean systolic blood pressure has decreased to 138mmHg with a standard deviation 8mmHg.

Here, we would conduct a t-test using:

set.seed(2820)

preTreat <- c(rnorm(1000, mean = 145, sd = 9))
postTreat <- c(rnorm(1000, mean = 138, sd = 8))

t.test(preTreat, postTreat, paired = TRUE)

Paired t-test

data:  preTreat and postTreat
t = 19.7514, df = 999, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.703959 8.183011
sample estimates:
mean of the differences
              7.443485

Again, we see that there is a statistically significant difference in means of

t = 19.7514, p-value < 2.2e-16

c. Independent Samples

The independent-samples test can take one of three forms, depending on the structure of your data and the equality of their variances. The general form of the test is t.test(y1, y2, paired=FALSE). By default, R assumes that the variances of y1 and y2 are unequal, thus defaulting to Welch’s test. To toggle this, we use the flag var.equal=TRUE.

In the three examples shown here, we’ll test the hypothesis that Clevelanders and New Yorkers spend different amounts monthly eating out.

Independent-samples t-test where y1 and y2 are numeric:

set.seed(0)

ClevelandSpending <- rnorm(50, mean = 250, sd = 75)
NYSpending <- rnorm(50, mean = 300, sd = 80)

t.test(ClevelandSpending, NYSpending, var.equal = TRUE)

      Two Sample t-test

data:  ClevelandSpending and NYSpending
t = -3.6361, df = 98, p-value = 0.0004433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-77.1608 -22.6745
sample estimates:
mean of x mean of y
251.7948  301.7125

Where y1 is numeric and y2 is binary:

spending <- c(ClevelandSpending, NYSpending)

city <- c(rep(“Cleveland”, 50), rep(“New York”, 50))

t.test(spending ~ city, var.equal = TRUE)

    Two Sample t-test

data:  spending by city
t = -3.6361, df = 98, p-value = 0.0004433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-77.1608 -22.6745
sample estimates:
mean in group Cleveland  mean in group New York
              251.7948                301.7125

With equal variances not assumed:

t.test(ClevelandSpending, NYSpending, var.equal = FALSE)

Welch Two Sample t-test

data:  ClevelandSpending and NYSpending
t = -3.6361, df = 97.999, p-value = 0.0004433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-77.1608 -22.6745
sample estimates:
mean of x mean of y
251.7948  301.7125

In each case, we see that the results really don’t differ substantially: our simulated data show that in any case, New Yorkers spend more each month at restaurants than Clevelanders do. However, should you want to test for equality of variances in your data prior to running an independent-samples t-test, R offers an easy way to do so with the var.test() function:

var.test(ClevelandSpending, NYSpending)

F test to compare two variances

data:  ClevelandSpending and NYSpending
F = 1.0047, num df = 49, denom df = 49, p-value = 0.9869
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5701676 1.7705463
sample estimates:
ratio of variances
         1.004743

4. Uses of T-Tests

a. What is t-test in R used for?

It is an analysis of two populations means the use of statistical examination. It is a type of t-test with two samples is being used with small sample sizes. And testing the difference between the samples when the variances of two normal distributions are not known.

b. What is Welch’s t-test used for?

In statistics, we use welch’s t-test, which is a two-sample location test. we use it to test the hypothesis that two populations have equal means. Welch’s t-test in R is a type of test which we used in an adaptation of Student’s t-test. It is more reliable when the two samples have unequal variances and unequal sample sizes.

c. What is a one sample t-test used for?

We use it only for tests of the sample mean.

d. Why do we use the t-test for research?

We use PowerPoint on t-tests which have been made for our use. The t-test is one type of inferential statistics. we use it to determine whether there is a  difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution.

5. Conclusion

We have learned t-test in deep. Along with it, we have also learned how to perform different t-tests. We have also studied in this tutorial about the uses of the t-test in R.

Leave a comment

Your email address will not be published. Required fields are marked *