Guides: MASH : Maths and Stats Help: One-Way Between-Subjects ANOVA (R)

One-Way Between-Subjects ANOVA

Introduction

A One-Way Between-Subjects ANOVA compares the means between more than two independent groups, such as comparing the difference between groups 1, 2 and 3. If your data only has two groups such as Male/Female or Present/Absent you should consider the Independent-Samples t-Test.

It is considered a parametric test and is only suitable for parametric data. To check if your data is parametric, please check out the dedicated guide: Parametric or Not Guide (PDF)

If your data is non-parametric you should consider using a Kruskal-Wallis Test

Test Procedure

In order to run a one-way ANOVA, you will need to install and load the “afex” and “emmeans” packages. The “rstatix” package is also recommended for this tutorial, as it allows you to run post-hoc tests in one line.

The aov_car() function used to conduct ANOVAs requires a continuous dependent variable and categorical grouping variable. It uses a formula notation to indicate the dependent and independent variables.

The data needs to be structured using a grouping variable. A grouping variable is a categorical variable indicating which scores belong to the different groups. This kind of structure is called long format data, please see our guide on long vs wide formats for more information on the two types of data structure.

Additionally, aov_car requires an ID variable in your data to indicate each subject or observation in your data. This ID variable is then used in the formula’s error term. The use of the ID variable is more important when you have within-subjects factors, but the function requires it across the board, so it’s good to get used to having an ID in your data.

Data:

If your data does not have an ID variable, it is quite easy to create one. The dataset that this guide uses (group_differences.xlsx) has 73 participants and 2 variables with no ID variable. You can use the row.names() function to obtain the number of each row, then store that into an ID variable:

group_differences$ID <- row.names(group_differences)

group_differences

## # A tibble: 73 × 3
##    Group Score ID   
##    <dbl> <dbl> <chr>
##  1     1    58 1    
##  2     1    48 2    
##  3     1    57 3    
##  4     1    43 4    
##  5     1    42 5    
##  6     1    43 6    
##  7     1    59 7    
##  8     1    52 8    
##  9     1    60 9    
## 10     1    58 10   
## # ℹ 63 more rows

Formula Method

The first argument should be a formula that takes the following structure:

dependent variable ~ independent variable + Error(ID)

The aov_car() function needs the grouping variable to be a factor, which is a type of categorical variable (in R) that contains information about the different categories present in the variable along with the values indicating the categories. In the code below, we convert the categorical variable “Group” from numerical to factor using the factor() function.

Additionally, the aov_car() function requires the Error term to be defined in the formula! This is done by adding an Error() function to the formula’s independent variables. For between-subjects ANOVAs, only the ID of the participants needs to be present inside of the Error() function.

The second argument input into the aov_car() function should be the data frame.

aov_car(formula = Score ~ factor(Group) + Error(ID), data = group_differences)

## Converting to factor: Group

## Contrasts set to contr.sum for the following variables: Group

## Anova Table (Type 3 tests)
## 
## Response: Score
##          Effect    df   MSE      F  ges p.value
## 1 factor(Group) 2, 70 47.01 4.39 * .112    .016
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Effect size

In ANOVA, it is quite important to report effects sizes for each main effect and interaction effects. The aov_car() function gives a Generalized Era Squared (ges) by default, but the best effect size to use with any ANOVA model is Partial Eta Squared. In order to obtain this, store your aov_car() output into an object, and run the nice() function on it, specifying the effect es (effect size) argument to be “pes” (partial eta squared).

anova_model <- aov_car(formula = Score ~ factor(Group) + Error(ID), data = group_differences)

## Converting to factor: Group

## Contrasts set to contr.sum for the following variables: Group

nice(anova_model, es = "pes")

## Anova Table (Type 3 tests)
## 
## Response: Score
##          Effect    df   MSE      F  pes p.value
## 1 factor(Group) 2, 70 47.01 4.39 * .112    .016
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Pairwise comparisons (post-hoc tests)

In order to obtain a table with pairwise comparisons (post-hoc tests) for all the levels of the grouping variable, we will use the emmeans_test() function.

“emmeans” stands for estimated marginal means, and is a concept that can be quite important in linear models such as ANOVA. If you want to learn more about it, please check the following guide: Estimated Marginal Means handout.

Similar to the anova_test() function, the first two arguments are the data frame and the formula. Please note that the formula prefers the grouping variable to not be a factor, so we will not use the factor() function.

In addition, the emmeans_test() function has a third argument. The p.adjust.method argument tells that function what type of correction for multiple comparisons to use. We will use the Bonferroni method for this guide, but the function has options for other corrections.

The fourth argument is an optional setting of the function that asks for more detailed output.

IMPORTANT: Don’t forget to report the p-values from the p.adj column, NOT the ones from the p column. If you can’t see any of the columns like in the example below, this is due to the table (tibble) being too large. In R, you will be able to use an arrow button to scroll through the table.

emmeans_test(data = group_differences, formula = Score ~ Group, p.adjust.method = "bonferroni", detailed = TRUE)

## # A tibble: 3 × 14
##   term  .y.   group1 group2 null.value estimate    se    df conf.low conf.high
## * <chr> <chr> <chr>  <chr>       <dbl>    <dbl> <dbl> <dbl>    <dbl>     <dbl>
## 1 Group Score 1      2               0     1.25  1.92    70    -2.59     5.09 
## 2 Group Score 1      3               0    -4.47  1.97    70    -8.40    -0.542
## 3 Group Score 2      3               0    -5.72  2.02    70    -9.76    -1.68 
## # ℹ 4 more variables: statistic <dbl>, p <dbl>, p.adj <dbl>, p.adj.signif <chr>

We use both the ANOVA table (from the nice() function output) and the post-hoc table to report the analysis. For obtaining descriptive statistics for separate groups, please see our guide on Descriptive Statistics.

ANOVA table - nice() function

This table shows the specific test results including the F-statistic (F), the two degrees of freedom (df) the two-tailed significance or p-value (p), and the effect size - partial eta squared (pes).

Post-hoc table

This table shows the results of the post hoc tests between all the possible pairs of groups in the grouping variable. The most important columns are the ones containing the mean difference (estimate), the confidence interval (conf.low and conf.high), and the Bonferroni adjusted p-value (p.adj).

Reporting the Results in APA Formatting

Test scores of three groups (1, 2, and 3) were compared. A One-Way Between-Subjects ANOVA indicated there was a significant effect for test score, F (2,70) = 4.39, p = .016, η_p² = .112.

In addition, if your ANOVA is significant you must also report your post-hoc results:

On average, Group 1 (M = 49.67, SD = 6.70) scored higher than Group B (M = 48.42, SD = 5.22), but lower than Group 3 (M = 54.14, SD = 8.44). Post hoc comparisons were conducted using the Bonferroni correction. The difference between Group 1 and Group 2, d̄ = 1.25 95% CI [-3.47,5.97], was not statistically significant (p = .999), The difference between Group 1 and Group 3, d̄ = -4.47 95% CI [-.36,9.30], was not statistically significant (p = .079). However, the difference between Group 2 and Group 3, d̄ = -5.72 95% CI [.76,10.68], was statistically significant (p = .018).

Downloads

group_differences.xlsx