The palmerpenguins
data contains size measurements for
three penguin species observed on three islands in the Palmer
Archipelago, Antarctica. These data were collected from 2007 - 2009 by
Dr. Kristen Gorman with the Palmer
Station Long Term Ecological Research Program, part of the US Long Term Ecological Research
Network. The data were imported directly from the Environmental Data
Initiative (EDI) Data Portal, and are available for use by CC0
license (“No Rights Reserved”) in accordance with the Palmer Station Data
Policy.
More information: https://allisonhorst.github.io/palmerpenguins/articles/intro.html
Source: https://github.com/allisonhorst/palmerpenguins/blob/HEAD/vignettes/intro.Rmd
Here we will use this dataset and the relevant R package to
demonstrate the use of ANOVA to compare means. First we need to load the
libraries we will need. All packages can be easily installed through
CRAN. Package report
exists in remotes
and can
be installed using:
remotes::install_github(“easystats/report”)
# R
library(palmerpenguins)
library(tidyverse)
library(car)
library(multcomp)
library(report)
library(FSA)
Out of the dataset, we keep the information for species and flipper length.
# R
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
## $ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
## $ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
dat <- penguins %>%
dplyr::select(species, flipper_length_mm)
summary(dat)
## species flipper_length_mm
## Adelie :152 Min. :172.0
## Chinstrap: 68 1st Qu.:190.0
## Gentoo :124 Median :197.0
## Mean :200.9
## 3rd Qu.:213.0
## Max. :231.0
## NA's :2
ggplot(dat) +
aes(x = species, y = flipper_length_mm, color = species) +
geom_jitter() +
theme(legend.position = "none")
We use an ANOVA to help us answer the question: “Is the length of the flippers different between the three species of penguins?
# R
res_aov <- aov(flipper_length_mm ~ species, data = dat)
First we can test normality of flipper length visually.
# R
par(mfrow = c(1, 2)) # combine plots
# histogram
hist(res_aov$residuals)
# QQ-plot
qqPlot(res_aov$residuals, id = FALSE)
We use residuals and not the “raw” values directly. This is done because if you test the normality assumption on the raw data, it must be tested for each group separately as the ANOVA requires normality in each group. The residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals. So, normality can be checked using all residuals, which is easier.
We statistically test normality of the residuals with the
Shapiro-Wilk test thanks to the shapiro.test()
function.
# R
shapiro.test(res_aov$residuals)
##
## Shapiro-Wilk normality test
##
## data: res_aov$residuals
## W = 0.99452, p-value = 0.2609
With a p-value = 0.2609, we reject Η1 and assume that residuals follow a normal distribution.
Since residuals follow a normal distribution, we now check whether the variances are equal across species or not. The result will have an impact on whether we use the ANOVA (equal variances) or the Welch ANOVA (non equal variances). First we apply a visual test via a boxplot
# R
boxplot(flipper_length_mm ~ species, data = dat)
And a statistical test (Levene’s test).
# R
leveneTest(flipper_length_mm ~ species, data = dat)
The p-value (here Pr(>F)) is very large and we assume that variances for flipper length in each species are equal.
A nice ggplot boxplot.
# R
ggplot(dat) +
aes(x = species, y = flipper_length_mm) +
geom_boxplot()
The ANOVA can help us to answer the research question “Is the length of the flippers different between the 3 species of penguins?”. ANOVA in R can be done in several ways, of which two are presented below:
1st method:
# R
oneway.test(flipper_length_mm ~ species, data = dat, var.equal = TRUE)
##
## One-way analysis of means
##
## data: flipper_length_mm and species
## F = 594.8, num df = 2, denom df = 339, p-value < 2.2e-16
2nd method:
# R
res_aov <- aov(flipper_length_mm ~ species, data = dat)
summary(res_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 52473 26237 594.8 <2e-16 ***
## Residuals 339 14953 44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
report(res_aov)
## The ANOVA (formula: flipper_length_mm ~ species) suggests that:
##
## - The main effect of species is statistically significant and large (F(2, 339)
## = 594.80, p < .001; Eta2 = 0.78, 95% CI [0.75, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
In all the results of ANOVA, we see that the p-value is very small, leading us to reject the null hypothesis and conclude that at least one of the penguin species has significant differences in flipper length from the others. Welch ANOVA is used when variances are not equal.
# R
oneway.test(flipper_length_mm ~ species, data = dat, var.equal = FALSE)
##
## One-way analysis of means (not assuming equal variances)
##
## data: flipper_length_mm and species
## F = 614.01, num df = 2.00, denom df = 172.76, p-value < 2.2e-16
In order to see which group(s) is(are) different from the others, we need to compare groups 2 by 2. Post-hoc tests are a family of statistical tests so there are several of them. We will use one of the most common, Tukey HSD.
# R
post_test <- glht(res_aov, linfct = mcp(species = "Tukey"))
summary(post_test)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = flipper_length_mm ~ species, data = dat)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Chinstrap - Adelie == 0 5.8699 0.9699 6.052 6.76e-09 ***
## Gentoo - Adelie == 0 27.2333 0.8067 33.760 < 1e-09 ***
## Gentoo - Chinstrap == 0 21.3635 1.0036 21.286 < 1e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
All comparisons have shown statistically significant differences in mean flipper length.
Out of the dataset, we keep the information for species and bill length.
# R
dat <- penguins %>%
dplyr::select(species, bill_length_mm)
ggplot(dat) +
aes(x = species, y = bill_length_mm, color = species) +
geom_jitter() +
theme(legend.position = "none")
Visually!
# R
res_aov <- aov(bill_length_mm ~ species, data = dat)
par(mfrow = c(1, 2)) # combine plots
# histogram
hist(res_aov$residuals)
# QQ-plot
qqPlot(res_aov$residuals, id = FALSE)
Statistically with the Shapiro-Wilk test.
# R
shapiro.test(res_aov$residuals)
##
## Shapiro-Wilk normality test
##
## data: res_aov$residuals
## W = 0.98903, p-value = 0.01131
With a p-value = 0.01131, we reject Η0 and assume that residuals do not follow a normal distribution. For this reason, we cannot apply ANOVA in this case.
We apply the non-parametric Kruscal-Wallis test. This test allows to compare three or more groups in terms of a quantitative variable. It is the extension to the Mann-Whitney test which allows to compare 2 groups under the non-normality assumption.
# R
kruskal.test(bill_length_mm ~ species, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: bill_length_mm by species
## Kruskal-Wallis chi-squared = 244.14, df = 2, p-value < 2.2e-16
We reject the null hypothesis and we conclude that at least one species is different in terms of bill length (p-value < 0.001). The Post-hoc is done with the Dunn test, to compare groups by two.
# R
dunnTest(bill_length_mm ~ species, data = dat, method = "holm")
## Comparison Z P.unadj P.adj
## 1 Adelie - Chinstrap -12.753511 2.980163e-37 5.960326e-37
## 2 Adelie - Gentoo -13.135630 2.057716e-39 6.173147e-39
## 3 Chinstrap - Gentoo 1.767498 7.714481e-02 7.714481e-02
We see that Adelie is significantly different than both Gentoo and Chinstrap, while the difference between Gentoo and Chinstrap is non significant (P.adj=0.071). We can verify this in the next two graphs.
# R
# Boxplots
ggplot(dat) +
aes(x = species, y = bill_length_mm) +
geom_boxplot()
# Histogram example: bill length by species
ggplot(data = penguins, aes(x = bill_length_mm)) +
geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))
The two-way ANOVA (analysis of variance) is a statistical method that allows to evaluate the simultaneous effect of two categorical variables on a quantitative continuous variable. The two-way ANOVA is an extension of the one-way ANOVA since it allows to evaluate the effects on a numerical response of two categorical variables instead of one. The advantage of a two-way ANOVA over a one-way ANOVA is that we test the relationship between two variables, while taking into account the effect of a third variable. Moreover, it also allows to include the possible interaction of the two categorical variables on the response to evaluate whether or not they act jointly on the response variable.
Here, we would like to know if flipper length depends on species and/or sex.
# R
boxplot(penguins$flipper_length_mm ~ penguins$sex * penguins$species, las=1.2, col = c("darkorange","darkorchid"))
We will answer the following research questions: (a) Controlling for the species, is flipper length significantly different between the two sexes? (b) Controlling for the sex, is flipper length significantly different for at least one species? (c) Is the relationship between species and flipper length different between female and male penguins?
In order to avoid flawed conclusions, it is recommended to first check whether the interaction is significant or not, and depending on the results, include it or not. If the interaction is not significant, it is safe to remove it from the final model. On the contrary, if the interaction is significant, it should be included in the final model which will be used to interpret results. We thus start with a model which includes the two main effects (i.e., sex and species) and the interaction:
# R
factorial <- aov(penguins$flipper_length_mm ~ penguins$sex * penguins$species)
summary(factorial)
## Df Sum Sq Mean Sq F value Pr(>F)
## penguins$sex 1 4246 4246 132.777 < 2e-16 ***
## penguins$species 2 50185 25093 784.583 < 2e-16 ***
## penguins$sex:penguins$species 2 329 165 5.144 0.00631 **
## Residuals 327 10458 32
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 11 observations deleted due to missingness
The p-values are displayed in the last column of the output above (Pr(>F)). From these p-values, we conclude that, at the 5% significance level: (a) controlling for the species, flipper length is significantly different between the two sexes, (b) controlling for the sex, flipper length is significantly different for at least one species, and the interaction between sex and species (displayed at the line sex:species in the output above) is significant.
For this reason we do not need to run the two-way ANOVA for independent variables, but here we will do it as a demonstration. We just need to replace the “*” between the two variables with a “+”.
# R
additive <- aov(penguins$flipper_length_mm ~ penguins$sex + penguins$species)
summary(additive)
## Df Sum Sq Mean Sq F value Pr(>F)
## penguins$sex 1 4246 4246 129.5 <2e-16 ***
## penguins$species 2 50185 25093 765.3 <2e-16 ***
## Residuals 329 10787 33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 11 observations deleted due to missingness