1 The palmerpenguins dataset

The palmerpenguins data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica. These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data were imported directly from the Environmental Data Initiative (EDI) Data Portal, and are available for use by CC0 license (“No Rights Reserved”) in accordance with the Palmer Station Data Policy.

More information: https://allisonhorst.github.io/palmerpenguins/articles/intro.html

Source: https://github.com/allisonhorst/palmerpenguins/blob/HEAD/vignettes/intro.Rmd

Here we will use this dataset and the relevant R package to demonstrate the use of ANOVA to compare means. First we need to load the libraries we will need. All packages can be easily installed through CRAN. Package report exists in remotes and can be installed using: remotes::install_github(“easystats/report”)

# R
library(palmerpenguins)
library(tidyverse)
library(car)
library(multcomp)
library(report)
library(FSA)

2 Comparing means of penguin species in flipper length

2.1 Data preparation

Out of the dataset, we keep the information for species and flipper length.

# R
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
dat <- penguins %>%
  dplyr::select(species, flipper_length_mm)
summary(dat)
##       species    flipper_length_mm
##  Adelie   :152   Min.   :172.0    
##  Chinstrap: 68   1st Qu.:190.0    
##  Gentoo   :124   Median :197.0    
##                  Mean   :200.9    
##                  3rd Qu.:213.0    
##                  Max.   :231.0    
##                  NA's   :2
ggplot(dat) +
  aes(x = species, y = flipper_length_mm, color = species) +
  geom_jitter() +
  theme(legend.position = "none")

2.2 Running ANOVA and creating an object

We use an ANOVA to help us answer the question: “Is the length of the flippers different between the three species of penguins?

# R
res_aov <- aov(flipper_length_mm ~ species, data = dat)

2.3 Testing normality

First we can test normality of flipper length visually.

# R
par(mfrow = c(1, 2)) # combine plots
# histogram
hist(res_aov$residuals)
# QQ-plot
qqPlot(res_aov$residuals, id = FALSE)

We use residuals and not the “raw” values directly. This is done because if you test the normality assumption on the raw data, it must be tested for each group separately as the ANOVA requires normality in each group. The residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals. So, normality can be checked using all residuals, which is easier.

We statistically test normality of the residuals with the Shapiro-Wilk test thanks to the shapiro.test() function.

# R
shapiro.test(res_aov$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  res_aov$residuals
## W = 0.99452, p-value = 0.2609

With a p-value = 0.2609, we reject Η1 and assume that residuals follow a normal distribution.

2.4 Testing Equality of variances - homogeneity

Since residuals follow a normal distribution, we now check whether the variances are equal across species or not. The result will have an impact on whether we use the ANOVA (equal variances) or the Welch ANOVA (non equal variances). First we apply a visual test via a boxplot

# R
boxplot(flipper_length_mm ~ species, data = dat)

And a statistical test (Levene’s test).

# R
leveneTest(flipper_length_mm ~ species,  data = dat)

The p-value (here Pr(>F)) is very large and we assume that variances for flipper length in each species are equal.

A nice ggplot boxplot.

# R
ggplot(dat) +
  aes(x = species, y = flipper_length_mm) +
  geom_boxplot()

2.5 ANOVA

The ANOVA can help us to answer the research question “Is the length of the flippers different between the 3 species of penguins?”. ANOVA in R can be done in several ways, of which two are presented below:

1st method:

# R
oneway.test(flipper_length_mm ~ species, data = dat, var.equal = TRUE)
## 
##  One-way analysis of means
## 
## data:  flipper_length_mm and species
## F = 594.8, num df = 2, denom df = 339, p-value < 2.2e-16

2nd method:

# R
res_aov <- aov(flipper_length_mm ~ species, data = dat)
summary(res_aov)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## species       2  52473   26237   594.8 <2e-16 ***
## Residuals   339  14953      44                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
report(res_aov)
## The ANOVA (formula: flipper_length_mm ~ species) suggests that:
## 
##   - The main effect of species is statistically significant and large (F(2, 339)
## = 594.80, p < .001; Eta2 = 0.78, 95% CI [0.75, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

In all the results of ANOVA, we see that the p-value is very small, leading us to reject the null hypothesis and conclude that at least one of the penguin species has significant differences in flipper length from the others. Welch ANOVA is used when variances are not equal.

# R
oneway.test(flipper_length_mm ~ species, data = dat, var.equal = FALSE)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  flipper_length_mm and species
## F = 614.01, num df = 2.00, denom df = 172.76, p-value < 2.2e-16

2.6 Post-hoc test

In order to see which group(s) is(are) different from the others, we need to compare groups 2 by 2. Post-hoc tests are a family of statistical tests so there are several of them. We will use one of the most common, Tukey HSD.

# R
post_test <- glht(res_aov, linfct = mcp(species = "Tukey"))
summary(post_test)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = flipper_length_mm ~ species, data = dat)
## 
## Linear Hypotheses:
##                         Estimate Std. Error t value Pr(>|t|)    
## Chinstrap - Adelie == 0   5.8699     0.9699   6.052 6.76e-09 ***
## Gentoo - Adelie == 0     27.2333     0.8067  33.760  < 1e-09 ***
## Gentoo - Chinstrap == 0  21.3635     1.0036  21.286  < 1e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

All comparisons have shown statistically significant differences in mean flipper length.

3 Comparing means of penguin species in bill length

3.1 Data preparation

Out of the dataset, we keep the information for species and bill length.

# R
dat <- penguins %>%
  dplyr::select(species, bill_length_mm)
ggplot(dat) +
  aes(x = species, y = bill_length_mm, color = species) +
  geom_jitter() +
  theme(legend.position = "none")

3.2 Testing normality

Visually!

# R
res_aov <- aov(bill_length_mm ~ species, data = dat)
par(mfrow = c(1, 2)) # combine plots
# histogram
hist(res_aov$residuals)
# QQ-plot
qqPlot(res_aov$residuals, id = FALSE)

Statistically with the Shapiro-Wilk test.

# R
shapiro.test(res_aov$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  res_aov$residuals
## W = 0.98903, p-value = 0.01131

With a p-value = 0.01131, we reject Η0 and assume that residuals do not follow a normal distribution. For this reason, we cannot apply ANOVA in this case.

3.3 Comparing means with Kruscal-Wallis test and Post-hoc with Dunn test

We apply the non-parametric Kruscal-Wallis test. This test allows to compare three or more groups in terms of a quantitative variable. It is the extension to the Mann-Whitney test which allows to compare 2 groups under the non-normality assumption.

# R
kruskal.test(bill_length_mm ~ species, data = dat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  bill_length_mm by species
## Kruskal-Wallis chi-squared = 244.14, df = 2, p-value < 2.2e-16

We reject the null hypothesis and we conclude that at least one species is different in terms of bill length (p-value < 0.001). The Post-hoc is done with the Dunn test, to compare groups by two.

# R
dunnTest(bill_length_mm ~ species, data = dat, method = "holm")
##           Comparison          Z      P.unadj        P.adj
## 1 Adelie - Chinstrap -12.753511 2.980163e-37 5.960326e-37
## 2    Adelie - Gentoo -13.135630 2.057716e-39 6.173147e-39
## 3 Chinstrap - Gentoo   1.767498 7.714481e-02 7.714481e-02

We see that Adelie is significantly different than both Gentoo and Chinstrap, while the difference between Gentoo and Chinstrap is non significant (P.adj=0.071). We can verify this in the next two graphs.

# R
# Boxplots
ggplot(dat) +
  aes(x = species, y = bill_length_mm) +
  geom_boxplot()

# Histogram example: bill length by species
ggplot(data = penguins, aes(x = bill_length_mm)) +
  geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
  scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))

4 Two way for flipper length and age

The two-way ANOVA (analysis of variance) is a statistical method that allows to evaluate the simultaneous effect of two categorical variables on a quantitative continuous variable. The two-way ANOVA is an extension of the one-way ANOVA since it allows to evaluate the effects on a numerical response of two categorical variables instead of one. The advantage of a two-way ANOVA over a one-way ANOVA is that we test the relationship between two variables, while taking into account the effect of a third variable. Moreover, it also allows to include the possible interaction of the two categorical variables on the response to evaluate whether or not they act jointly on the response variable.

Here, we would like to know if flipper length depends on species and/or sex.

# R
boxplot(penguins$flipper_length_mm ~ penguins$sex * penguins$species, las=1.2, col = c("darkorange","darkorchid"))

We will answer the following research questions: (a) Controlling for the species, is flipper length significantly different between the two sexes? (b) Controlling for the sex, is flipper length significantly different for at least one species? (c) Is the relationship between species and flipper length different between female and male penguins?

In order to avoid flawed conclusions, it is recommended to first check whether the interaction is significant or not, and depending on the results, include it or not. If the interaction is not significant, it is safe to remove it from the final model. On the contrary, if the interaction is significant, it should be included in the final model which will be used to interpret results. We thus start with a model which includes the two main effects (i.e., sex and species) and the interaction:

# R
factorial <- aov(penguins$flipper_length_mm ~ penguins$sex * penguins$species)
summary(factorial)
##                                Df Sum Sq Mean Sq F value  Pr(>F)    
## penguins$sex                    1   4246    4246 132.777 < 2e-16 ***
## penguins$species                2  50185   25093 784.583 < 2e-16 ***
## penguins$sex:penguins$species   2    329     165   5.144 0.00631 ** 
## Residuals                     327  10458      32                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 11 observations deleted due to missingness

The p-values are displayed in the last column of the output above (Pr(>F)). From these p-values, we conclude that, at the 5% significance level: (a) controlling for the species, flipper length is significantly different between the two sexes, (b) controlling for the sex, flipper length is significantly different for at least one species, and the interaction between sex and species (displayed at the line sex:species in the output above) is significant.

For this reason we do not need to run the two-way ANOVA for independent variables, but here we will do it as a demonstration. We just need to replace the “*” between the two variables with a “+”.

# R
additive <- aov(penguins$flipper_length_mm ~ penguins$sex + penguins$species)
summary(additive)
##                   Df Sum Sq Mean Sq F value Pr(>F)    
## penguins$sex       1   4246    4246   129.5 <2e-16 ***
## penguins$species   2  50185   25093   765.3 <2e-16 ***
## Residuals        329  10787      33                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 11 observations deleted due to missingness