ANOVA

1 Objectives

  • Use R to perform analysis of variance (ANOVA) to compare the means of multiple groups;
  • Perform Tukey-Kramer tests to look at unplanned contrasts between all pairs of groups;
  • Use Kruskal-Wallis tests to test for difference between groups without assumptions of Gaussianity;
  • Transform data to meet assumptions of ANOVA.

2 Start a Script

For this lab or project, begin by:

  • Starting a new R script
  • Create a good header section and table of contents
  • Save the script file with an informative name
  • set your working directory

Aim to make the script a future reference for doing things in R!

3 Introduction

Perhaps more so than any other tool, the Analysis of Variance (ANOVA) played a role in literally revolutionising the idea of objectivity in using data to produce evidence to support claims for certain experimental designs. Invented by the famous statistician and biologist R. A. Fisher while he worked at Rothamsted Research, the intention was for ANOVA to be a useful tool to analyse agriculture experiments. Today, despite many innovations and competing approaches, it remains at the foundation of the basic practice of statistics.

4 Why ANOVA?

Two-sample t-tests evaluate whether or not the mean of a numeric variable changes among two groups or experimental conditions, which can be encoded by a categorical variable. We can conceptualise this test as one that evaluates whether or not the numeric variable is related to the categorical variable, reflected as a difference between means of different groups. What happens if we need to evaluate differences among means of more than two groups? The ‘obvious’ thing to do might seem to be to test each pair of means using a t-test. However, this procedure is statistically flawed. The problem is that the more tests we do, the more likely we are to make a Type I error (i.e., a false positive). This is known as the problem of multiple comparisons. The ANOVA is a statistical test that allows us to evaluate whether or not there is a difference among means of more than two groups, while controlling for the problem of multiple comparisons.

There are several reasons to use a “1-way ANOVA” experimental design. The scenario usually involves:

  • One numeric continuous dependent variable of interest;
  • A factor that contains 2 or more levels, often with a control;
  • When there are just two levels, the 1-ANOVA is conceptually equivalent to the t-test.

An example might be something like a classic field trial, where crop pest damage is measured (the numeric continuous dependent variable) and the factor compares pest treatment with 3 levels: a control level (no pesticide), an organic pesticide, and a chemical pesticide. The basic question is whether there is an overall difference in the numeric dependent variable amongst the factor levels, however several kinds of questions are also possible to answer:

  • Overall difference test of means between the factors;
  • Comparison of difference of each factor level with the control or other reference factor level;
  • Post-hoc tests of difference between specific factor levels, e.g. pairwise tests for a factor with levels A, B, and C might test all possible comparisons A:B, A:C, and B:C;
  • Examination of the “sources of variation” observed in the dependent variable, e.g. what proportion of total variation can be accounted for by the factor.

The test statistic for ANOVA is the F ratio, which is proportion of variance in the dependent variable between the groups, relative to that within the categories. We will (very briefly) look at this calculation with the aim of gaining a practical understanding of what is going. However, the main focus of this lab is on how to use the R function aov() to perform ANOVA and how to interpret the output.

5 ANOVA Assumptions

The ANOVA is a parametric test, which means that it makes assumptions about the data. The key assumptions for you to be aware of are:

  • Independence of observations - the groups being compared should be independent of each other. This means that the data collected in one group should not influence the data collected in another. In experimental designs, this is often achieved through random assignment;
  • Normality - the distribution of the residuals (differences between observed and predicted values) should be approximately normally distributed for each group. This assumption is particularly important when the sample sizes are small. For larger sample sizes, the Central Limit Theorem suggests that this assumption can be somewhat relaxed due to the normalizing effect of large sample sizes;
  • Homogeneity of variances (homoscedasticity) - the variances among the groups should be approximately equal. This assumption, known as homoscedasticity, ensures that each group contributes equally to the overall analysis. The Levene’s test or Bartlett’s test can be used to check this assumption;
  • scale of measurement- the dependent variable should be measured at least at the interval level (i.e., the data should be quantitative and continuous). Ratio-level data also meet this criterion;
  • Random sampling - the data should be collected from a randomly selected portion of the total population. Random sampling helps to generalize the findings from the sample to the population.

5.1 Testing ANOVA assumptions

Let’s see how we can check these assumptions using some example data. The data we will look at is an experiment in animal genetics, looking at the weight of male chickens (8-week old weight in grams), where weight is the continuous dependent variable. The factor is the sire identity, where the measure young male chicks were sired by one of 5 sires, thus sire is a factor with 5 levels A, B, C, D, and E. The data are in wide format, where each column is a different sire and each row is a different chick. We will convert the data to long format, where each row is a different chick and there is a column for the sire identity. This is the format that we need for the ANOVA.

# Load package for data wrangling 
library(reshape2)
Warning: package 'reshape2' was built under R version 4.3.2
# Create a data frame in wide format
data_wide <- data.frame(A = c(687, 691, 793, 675, 700, 753, 704, 717),
                        B = c(618, 680, 592, 683, 631, 691, 694, 732),
                        C = c(618, 687, 763, 747, 687, 737, 731, 603),
                        D = c(600, 657, 669, 606, 718, 693, 669, 648),
                        E = c(717, 658, 674, 611, 678, 788, 650, 690))

# Convert to long format and rename columns
data_long <- melt(data_wide, variable.name = "Sire", value.name = "Weight")
No id variables; using all as measure variables
# View the long format data with new column names
head(data_long)
  Sire Weight
1    A    687
2    A    691
3    A    793
4    A    675
5    A    700
6    A    753

To ensure that your data meet the key assumptions for conducting a one-way ANOVA, you can perform several tests in R. To check for normality, you can use the Shapiro-Wilk test, which is available in the stats package. To check for homogeneity of variances, you can use the Levene’s test, which is available in the car package. Remember - we have to check the residuals but to do this we have to fit the model first. We will use the aov() function to fit the model:

# Load necessary packages
library(reshape2)
library(car)
Warning: package 'car' was built under R version 4.3.2
Loading required package: carData
Warning: package 'carData' was built under R version 4.3.2
library(ggplot2)

# Fit the model
model <- aov(Weight ~ Sire, data = data_long)

# Check the residuals
residuals <- residuals(model)

# Check for normality using the Shapiro-Wilk test
shapiro.test(residuals)

    Shapiro-Wilk normality test

data:  residuals
W = 0.99182, p-value = 0.9913
# Graph to examine normality assumption of residuals
par(mfrow = c(1,2))
hist(residuals,
     main = "Normal")

# Look at residuals with qqPlot()
qqPlot(residuals,
       main = "Normal")

[1] 38 24

At a glance, there are no serious issues with the assumption of normal residual distribution. But we can test for this more formally with a Shapiro-Wilk test:

# Shapiro-Wilk test for normality
shapiro.test(residuals)

    Shapiro-Wilk normality test

data:  residuals
W = 0.99182, p-value = 0.9913

The null hypothesis of the Shapiro-Wilk test is that the data are normally distributed. There is no evidence of difference to Gaussian in our residuals for our ANOVA model (Shapiro-Wilk: W = 0.99, n = 40, P = 0.99). We can also check for homogeneity of variances using the Bartlett test:

# Levene's test for homogeneity of variances
bartlett.test(Weight ~ Sire, data = data_long)

    Bartlett test of homogeneity of variances

data:  Weight by Sire
Bartlett's K-squared = 1.6868, df = 4, p-value = 0.7931

The null hypothesis of the Bartlett test is that the variances are equal across groups. We find no evidence that variance in offspring weight differs between sires (Bartlett test: K-sqared = 1.69, df = 4, P = 0.79).

6 Visualising ANOVA

The classic way to visualise the one-way ANOVA is with boxplot, with some way to show the central tendency of the data separately for each factor level. For continuous variables, boxplots show this perfectly. For count variables, barplots are sometimes used with the height set to the mean, along with some form of error bar. Here we will use a boxplot.

# Basic boxplot
ggplot(data_long, aes(x = Sire, y = Weight)) +
  geom_boxplot() +
  ggtitle("Is this plot good enough?") +
  xlab("Sire") +
  ylab("Weight") +
  theme_minimal()

So, it looks like sire identity could have an effect on mean male offspring weight. But, is this graph good enough? Can we make it better? Let’s critique it:

  • The y axis does not indicate the unit of measurement (important);
  • Neither axis title is capitalised (important);
  • Adding on the points might add interesting detail (optional);
  • Reference line for control (we do not really have a control here) or of the “grand mean” might be useful (optional).

Let’s make a better graph:

# Better boxplot
ggplot(data_long, aes(x = Sire, y = Weight)) +
  # Boxplot
  geom_boxplot(outlier.shape = NA) +  # Exclude outliers for now
  # Add horizontal line for grand mean
  geom_hline(yintercept = mean(data_long$Weight), linetype = "dashed", 
             size = 1.5, colour = "red") +
  # Add jittered points
  geom_jitter(position = position_jitter(0.1), color = "blue", size = 2.5) +
  # Plot labels
  ggtitle("Effect of Sire on 8-week weight") +
  xlab("Sire") +
  ylab("Weight (g)") +
  # Theme
  theme_minimal() +
  theme(legend.position = "none")  # Remove legend if not needed
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

7 ANOVA in R

The basic application of ANOVA in R is the aov() function. There are actually a lot of alternative ways to perform the exact same test in R. We will look at few things in this section:

  • Carry out one-way ANOVA;
  • Look at how to examine contrasts (differences between the control or a reference factor level, and each of the others) and post-hoc testing (e.g., all pairwise comparisons between factor levels);
  • Examine what happens in the ANOVA in a little more detail.

ANOVA is not quite as simple in R as one might hope. Doing ANOVA takes at least two steps. First, we fit the ANOVA model to the data using the function lm(). This step carries out a bunch of intermediate calculations. Second, we use the results of first step to do the ANOVA calculations and place them in an ANOVA table using the function anova(). The function name lm() stands for “linear model”; this is actually a very powerful function that allows a variety of calculations. One-way ANOVA is a type of linear model.

7.1 Basic output

The basic output of the aov() function is a table of results. The table of results is a little hard to read, but it contains all the information we need to interpret the results of the ANOVA. The table of results is called an ANOVA table:

# Fit the model
model <- aov(Weight ~ Sire, data = data_long)

# View the ANOVA table
summary(model)
            Df Sum Sq Mean Sq F value Pr(>F)
Sire         4  17426    4356   1.872  0.137
Residuals   35  81442    2327               

The output is formatted in a classic “ANOVA Table” style. There are 2 rows - one for the “main effect” of the sire factor, one for the residual error. The test statistic is the F value (1.87), the P-value column is named “Pr(>F)” (0.14), and there are 4 degrees of freedom for this test for the factor (the factor degrees of freedom is the number of factor levels minus 1 = 5 factor levels - 1 = 4; the residual degrees of freedom is the total number of observations minus the number of factor levels = 40 - 5 = 35).

Here we can see that the overall effect of sire does not significantly explain variation in weight (one-way ANOVA: F = 1.87, df = 4,35, P = 0.14).

7.2 Contrasts

The ANOVA table tells us that there is no significant effect of sire on offspring weight. But, we might want to know which sires are different from each other. To do this, we need to perform contrasts and post-hoc testing. Contrasts are comparisons between factor levels. Post-hoc tests are multiple comparisons between factor levels. We will look at two types of contrasts. For our experiment, let’s say that sire C is our reference level sire, against which we would like to statistically compare offspring weight for other sires.

# Make Sire C the reference level
data_long$Sire <- relevel(data_long$Sire, ref="C")

# Fit linear model
model_2 <- lm(formula = Weight ~ Sire, data = data_long)

# View the ANOVA table
summary(model_2)

Call:
lm(formula = Weight ~ Sire, data = data_long)

Residuals:
    Min      1Q  Median      3Q     Max 
-93.625 -29.312  -2.875  33.906 104.750 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   696.63      17.05  40.846   <2e-16 ***
SireA          18.38      24.12   0.762    0.451    
SireB         -31.50      24.12  -1.306    0.200    
SireD         -39.13      24.12  -1.622    0.114    
SireE         -13.38      24.12  -0.555    0.583    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 48.24 on 35 degrees of freedom
Multiple R-squared:  0.1763,    Adjusted R-squared:  0.08211 
F-statistic: 1.872 on 4 and 35 DF,  p-value: 0.1373

Notice how the output format has changed. This is because the summary function (and lots of functions in fact) “behave differently” in response to the specific class() of object we pass to it (here an lm object; before an aov object). One big difference we see is the table of contrasts. Now, there is are 5 rows: one for the intercept coefficient (testing whether the “grand mean” of Weight is different to zero - NB this is not at all interesting for us), and for rows comparing each factor level to the reference mean for sire “C”.

Here, the Estimate column is an estimate of the AMOUNT of difference in weight relative to the reference mean weight for that sire. E.g., sire B offspring weight is estimated at -31.5 grams (the negative indicate less than) compared to offspring weight for sire C. However, this observed sample difference is not statistically significant (P = 0.20).

Notice the overall F value and p-value for the one-way are also present at the bottom of the output, which is exactly the same as that produced by the aov() function.

7.3 Post-hoc testing

Post-hoc tests are often interesting in research, and the one-way ANOVA is a good example, where an overall question of “is there a difference?” can be enhanced by asking whether there are particular or specific differences, say between pairs of means. The phrase post-poc implies that the sometimes these questions can be an afterthought. Thus, consideration should be given as to whether these specific questions NEED to be asked.

There are many ways to perform post-hoc tests in R. A one-way ANOVA can tell us that at least one group has a different mean from another group, but it does not inform us which group means are different from which other group means. A Tukey-Kramer test lets us test the null hypothesis of no difference between the population means for all pairs of groups. The Tukey-Kramer test (also known as a Tukey Honestly Significance Test, or Tukey’s HSD), is implemented in R in the function TukeyHSD(). We will use the results of an ANOVA done with lm() as above, that we stored in the variable model_2. To do a Tukey-Kramer test on these data, we need to first apply the function aov() to model_2, and then we need to apply the function TukeyHSD() to the result. We can do this in a single command:

# Perform Tukey HSD test
TukeyHSD(aov(model_2))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = model_2)

$Sire
       diff        lwr      upr     p adj
A-C  18.375  -50.96883 87.71883 0.9397600
B-C -31.500 -100.84383 37.84383 0.6893101
D-C -39.125 -108.46883 30.21883 0.4937665
E-C -13.375  -82.71883 55.96883 0.9806452
B-A -49.875 -119.21883 19.46883 0.2565333
D-A -57.500 -126.84383 11.84383 0.1436866
E-A -31.750 -101.09383 37.59383 0.6830523
D-B  -7.625  -76.96883 61.71883 0.9977281
E-B  18.125  -51.21883 87.46883 0.9425341
E-D  25.750  -43.59383 95.09383 0.8216369

The key part of this output is the table. It shows:

  • The difference between the means of groups;
  • The 95% confidence interval for the difference between the corresponding population means (“lwr” and “upr” correspond to the lower and upper bounds of that confidence interval for the difference in means);
  • The p-value from a test of the null hypothesis of no difference between the means (the column headed with “p adj”). In the case of the chicken weight data, p is greater than 0.05 in all pairs, and we therefore accept every null hypothesis.

The output is a table of results, which is a little hard to read. We can improve the statistical output by using the agricolae package, which is not loaded by default when you start R.

# Load the package
library(agricolae)

# Perform Tukey HSD test
HSD.test(model, "Sire", console = TRUE)

Study: model ~ "Sire"

HSD Test for Weight 

Mean Square Error:  2326.921 

Sire,  means

   Weight      std r       se Min Max    Q25   Q50    Q75
A 715.000 39.34826 8 17.05477 675 793 690.00 702.0 726.00
B 665.125 46.67345 8 17.05477 592 732 627.75 681.5 691.75
C 696.625 59.62726 8 17.05477 603 763 669.75 709.0 739.50
D 657.500 40.06067 8 17.05477 600 718 637.50 663.0 675.00
E 683.250 52.41796 8 17.05477 611 788 656.00 676.0 696.75

Alpha: 0.05 ; DF Error: 35 
Critical Value of Studentized Range: 4.065949 

Minimun Significant Difference: 69.34383 

Treatments with the same letter are not significantly different.

   Weight groups
A 715.000      a
C 696.625      a
E 683.250      a
B 665.125      a
D 657.500      a

This gives us a much neater output with convenient grouping letters to indicate where differences occur - different letters indicate that the group means are different. We can also plot the post-hoc analysis:

# Plot the results
plot(TukeyHSD(model))

The plot shows the difference between each pair of means, with the 95% confidence intervals for each difference. The vertical line is the “null” difference of zero. If the confidence interval crosses the line, then the difference is not statistically significant. If the confidence interval does not cross the line, then the difference is statistically significant.

8 Two-way ANOVA

The two-way ANOVA is a statistical method that allows to evaluate the simultaneous effect of two categorical variables on a quantitative continuous variable. The two-way ANOVA is an extension of the one-way ANOVA since it allows to evaluate the effects on a numerical response of two categorical variables instead of one. The advantage of a two-way ANOVA over a one-way ANOVA is that we test the relationship between two variables, while taking into account the effect of a third variable. Moreover, it also allows to include the possible interaction of the two categorical variables on the response to evaluate whether or not they act jointly on the response variable.

8.1 Data

To illustrate how to perform a two-way ANOVA in R, we use the penguins data set, available from the palmerpenguins package. This data set contains information about 344 penguins from three species (Adelie, Chinstrap and Gentoo) collected from three islands in the Palmer Archipelago, Antarctica. We do not need to import the data set but we have to load the package then call it:

# Load the package
library(palmerpenguins)
Warning: package 'palmerpenguins' was built under R version 4.3.2
# Call the data
dat <- penguins # rename data set
str(dat) # structure of data set
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

The data set contains 344 rows and 8 columns. Each row corresponds to a penguin and each column corresponds to a variable. The variables we will focus on are:

  • species: the species of the penguin (Adelie, Chinstrap or Gentoo)
  • sex: sex of the penguin (female and male)
  • body_mass_g: body mass of the penguin (in grams)

body_mass_g is the quantitative continuous variable and will be the dependent variable, whereas species and sex are both qualitative variables. Those two last variables will be our independent variables, also referred as factors. Make sure that they are read as factors by R:

# Convert to factor
dat$species <- as.factor(dat$species)
class(dat$species)
[1] "factor"
dat$sex <- as.factor(dat$sex)
class(dat$sex)
[1] "factor"

8.2 Aim of the analysis

As mentioned above, a two-way ANOVA is used to evaluate simultaneously the effect of two categorical variables on one quantitative continuous variable. It is referred as two-way ANOVA because we are comparing groups which are formed by two independent categorical variables. Here, we would like to know if body mass depends on species and/or sex. In particular, we are interested in:

  • Measuring and testing the relationship between species and body mass,
  • Measuring and testing the relationship between sex and body mass, and
  • Potentially check whether the relationship between species and body mass is different for females and males (which is equivalent than checking whether the relationship between sex and body mass depends on the species)

The first two relationships are referred as main effects, while the third point is known as the interaction effect. The main effects test whether at least one group is different from another one (while controlling for the other independent variable). On the other hand, the interaction effect aims at testing whether the relationship between two variables differs depending on the level of a third variable. In other words, if the evolution between the response and the first categorical variable does not depend on the modalities of the second categorical variable, then there is no interaction between the two variables. If, on the contrary, there is a modification of this evolution, either by an increase in the effect of the first variable, or by a decrease, then there is an interaction.

When performing a two-way ANOVA, testing the interaction effect is not mandatory. However, omitting an interaction effect may lead to erroneous conclusions if the interaction effect is present.

8.3 Assumptions

Most statistical tests require some assumptions for the results to be valid, and a two-way ANOVA is not an exception.

Assumptions of a two-way ANOVA are similar than for a one-way ANOVA. To summarise:

  • Variable type: the dependent variable must be quantitative continuous, while the two independent variables must be categorical (with at least two levels).
  • Independence: the observations should be independent between groups and within each group.
  • Normality:
    • For small samples, data should follow approximately a normal distribution
    • For large samples (usually \(n \ge 30\) in each group/sample), normality is not required (thanks to the central limit theorem)
  • Equality of variances: variances should be equal across groups.
  • Outliers: There should be no significant outliers in any group.

Now that we have seen the underlying assumptions of the two-way ANOVA, we review them specifically for our data set before applying the test and interpreting the results.

8.3.1 Variable type

The dependent variable body mass is quantitative continuous, while both independent variables sex and species are qualitative variables (with at least 2 levels). Therefore, this assumption is met. If your dependent variable is quantitative discrete, this is count data, which, strictly speaking, should be analysed using a generalized linear model, not an ANOVA.

8.3.2 Independence

Independence is usually checked based on the design of the experiment and how data have been collected.1

To keep it simple, observations are usually:

  • independent if each experimental unit (here a penguin) has been measured only once and the observations are collected from a representative and randomly selected portion of the population, or
  • dependent if each experimental unit has been measured at least twice (as it is often the case in the medical field for example, with two measurements on the same subjects; one before and one after the treatment).

In our case, body mass has been measured only once on each penguin, and on a representative and random sample of the population, so the independence assumption is met.

Note that if your data correspond to observations made several times on the same experimental units (for example, if one of the factors is a treatment (A or B) and the second factor is time (day 1, day 2 and day 3), and measurements are made at each time point on the same subjects), a two-way mixed ANOVA should be used.

8.3.3 Normality

We have a large sample in all subgroups (each combination of the levels of the two factors, called cell):

table(dat$species, dat$sex)
           
            female male
  Adelie        73   73
  Chinstrap     34   34
  Gentoo        58   61

so normality does not need to be checked.

For completeness, we still show how to verify normality, as if we had a small samples.

There are several methods to test the normality assumption. The most common methods being:

  • a QQ-plot by group or on the residuals;
  • a histogram by group or on the residuals;
  • a normality test by group or on the residuals.

The easiest/shortest way is to verify the normality with a QQ-plot on the residuals. To draw this plot, we first need to save the model:

# save model
mod <- aov(body_mass_g ~ sex * species,
  data = dat
)

This piece of code will be explained further. Now we can draw the QQ-plot on the residuals. We show two ways to do so, first with the plot() function and second with the qqPlot() function from the {car} package:

# method 1
plot(mod, which = 2)

# method 2
library(car)

qqPlot(mod$residuals,
  id = FALSE # remove point identification
)

Code for method 1 is slightly shorter, but it misses the confidence interval around the reference line.

If points follow the straight line and fall within the confidence band, we can assume normality. This is the case here. If you prefer to verify the normality based on a histogram of the residuals, here is the code:

# histogram
hist(mod$residuals)

The histogram of the residuals show a Gaussian distribution, which is in line with the conclusion from the QQ-plot. Although the QQ-plot and histogram is largely enough to verify the normality, if you want to test it more formally with a statistical test, the Shapiro-Wilk test can be applied on the residuals as well:

# normality test
shapiro.test(mod$residuals)

    Shapiro-Wilk normality test

data:  mod$residuals
W = 0.99776, p-value = 0.9367

We do not reject the null hypothesis that the residuals follow a normal distribution (\(p\)-value = 0.937).

From the QQ-plot, histogram and Shapiro-Wilk test, we conclude that we do not reject the null hypothesis of normality of the residuals. The normality assumption is thus verified, we can now check the equality of the variances.

Note that if the normality assumption is not met, many transformations can be applied on the dependent variable to improve it, the most common ones being the logarithmic (log() function in R) and the Box-Cox transformations. If the normality assumption is still not met on the transformed data, the non-parametric version of the two-way ANOVA, the Scheirer–Ray–Hare test, can be used. Alternatively, a permutation test can also be used.

8.3.4 Homogeneity of variances

Equality of variances, also referred as homogeneity of variances or homoscedasticity, can be verified visually with the plot() function:

plot(mod, which = 3)

Since the spread of the residuals is constant, the red smooth line is horizontal and flat, so it looks like the constant variance assumption is satisfied here.

The diagnostic plot above is sufficient, but if you prefer it can also be tested more formally with the Levene’s test (also from the {car} package):2

leveneTest(mod)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   5  1.3908 0.2272
      327               

We do not reject the null hypothesis that the variances are equal (\(p\)-value = 0.227).

Both the visual and formal approaches give the same conclusion; we do not reject the hypothesis of homogeneity of the variances.

Note that, as for the normality, the logarithmic and Box-Cox transformations may improve homogeneity of the residuals.

8.3.5 Outliers

The easiest and most common way to detect outliersis visually thanks to boxplots by groups. For females and males:

library(ggplot2)

# boxplots by sex
ggplot(dat) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

For the three species:

# boxplots by species
ggplot(dat) +
  aes(x = species, y = body_mass_g) +
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

There are two outliers for the species Chinstrap. These points are, nonetheless, not extreme enough to bias results. Therefore, we consider that the assumption of no significant outliers is met.

8.4 Two-way ANOVA in R

We have shown that all assumptions are met, so we can now proceed to the implementation of the two-way ANOVA in R. This will allow us to answer the following research questions:

  • Controlling for the species, is body mass significantly different between the two sexes?
  • Controlling for the sex, is body mass significantly different for at least one species?
  • Is the relationship between species and body mass different between female and male penguins?

Including an interaction effect in a two-way ANOVA is not compulsory. However, in order to avoid flawed conclusions, it is recommended to first check whether the interaction is significant or not, and depending on the results, include it or not. If the interaction is not significant, it is safe to remove it from the final model. On the contrary, if the interaction is significant, it should be included in the final model which will be used to interpret results. We thus start with a model which includes the two main effects (i.e., sex and species) and the interaction:

# Two-way ANOVA with interaction
# save model
mod <- aov(body_mass_g ~ sex * species,
  data = dat
)

# print results
summary(mod)
             Df    Sum Sq  Mean Sq F value   Pr(>F)    
sex           1  38878897 38878897 406.145  < 2e-16 ***
species       2 143401584 71700792 749.016  < 2e-16 ***
sex:species   2   1676557   838278   8.757 0.000197 ***
Residuals   327  31302628    95727                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
11 observations deleted due to missingness

Similar to a one-way ANOVA, the principle of a two-way ANOVA is based on the total dispersion of the data, and its decomposition into four components:

  1. the share attributable to the first factor
  2. the share attributable to the second factor
  3. the share attributable to the interaction of the 2 factors
  4. the unexplained, or residual portion.

The sum of squares (column Sum Sq) shows these four components. The two-way ANOVA consists of using a statistical test to determine whether each of the dispersion component (attributable to the 2 factors studied and to their interaction) is significantly greater than the residual component. If this is the case, we conclude that the effect considered (factor A, factor B or the interaction) is significant.

We see that the species explain a large part of the variability of body mass. It is the most important factor in explaining this variability.

The \(p\)-values are displayed in the last column of the output above (Pr(>F)). From these \(p\)-values, we conclude that, at the 5% significance level:

  • controlling for the species, body mass is significantly different between the two sexes;
  • controlling for the sex, body mass is significantly different for at least one species;
  • the interaction between sex and species (displayed at the line sex:species in the output above) is significant.

So from the significant interaction effect, we have just seen that the relationship between body mass and species is different between males and females. Since it is significant, we have to keep it in the model and we should interpret results from that model.

If, on the contrary, the interaction was not significant (that is, if the \(p\)-value \(\ge\) 0.05) we would have removed this interaction effect from the model. For illustrative purposes, below the code for a two-way ANOVA without interaction, referred as an additive model:

# Two-way ANOVA without interaction
aov(body_mass_g ~ sex + species,
  data = dat
)

8.5 Pairwise comparisons

Through the two main effects being significant, we concluded that:

  • controlling for the species, body mass is different between females and males;
  • controlling for the sex, body mass is different for at least one species.

If body mass is different between the two sexes, given that there are exactly two sexes, it must be because body mass is significantly different between females and males. If one wants to know which sex has the highest body mass, it can be deduced from the means and/or boxplots by subgroup. Here, it is clear that males have a significantly higher body mass than females.

However, it is not so straightforward for the species. Let me explain why it is not as easy as for the sexes.

There are three species (Adelie, Chinstrap and Gentoo), so there are 3 pairs of species:

  1. Adelie and Chinstrap
  2. Adelie and Gentoo
  3. Chinstrap and Gentoo

If body mass is significantly different for at least one species, it could be that:

  • body mass is significantly different between Adelie and Chinstrap but not significantly different between Adelie and Gentoo, and not significantly different between Chinstrap and Gentoo, or
  • body mass is significantly different between Adelie and Gentoo but not significantly different between Adelie and Chinstrap, and not significantly different between Chinstrap and Gentoo, or
  • body mass is significantly different between Chinstrap and Gentoo but not significantly different between Adelie and Chinstrap, and not significantly different between Adelie and Gentoo.

Or, it could also be that:

  • body mass is significantly different between Adelie and Chinstrap, and between Adelie and Gentoo, but not significantly different between Chinstrap and Gentoo, or
  • body mass is significantly different between Adelie and Chinstrap, and between Chinstrap and Gentoo, but not significantly different between Adelie and Gentoo, or
  • body mass is significantly different between Chinstrap and Gentoo, and between Adelie and Gentoo, but not significantly different between Adelie and Chinstrap.

Last, it could also be that body mass is significantly different between all species. We cannot, at this stage, know precisely which species is different from which one in terms of body mass. To know this, we need to compare each species two by two thanks to post-hoc tests (also known as pairwise comparisons). There are several post-hoc tests, the most common ones being the Tukey HSD which tests all possible pairs of groups, and the Dunett’s test which compares all groups to a reference group. As mentioned earlier, these tests should not be done on the sex variable because there are only two levels. In this lab we will only look at Tukey HSD.

As for the one-way ANOVA, the Tukey HSD test can be done in R as follows:

# method 1
TukeyHSD(mod,
  which = "species"
)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = body_mass_g ~ sex * species, data = dat)

$species
                       diff       lwr       upr     p adj
Chinstrap-Adelie   26.92385  -80.0258  133.8735 0.8241288
Gentoo-Adelie    1377.65816 1287.6926 1467.6237 0.0000000
Gentoo-Chinstrap 1350.73431 1239.9964 1461.4722 0.0000000

or using the {multcomp} package:

# method 2
library(multcomp)
Warning: package 'multcomp' was built under R version 4.3.2
Loading required package: mvtnorm
Warning: package 'mvtnorm' was built under R version 4.3.2
Loading required package: survival
Loading required package: TH.data
Warning: package 'TH.data' was built under R version 4.3.2
Loading required package: MASS

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
res_tukey <- glht(
  aov(body_mass_g ~ sex + species,
    data = dat
  ),
  linfct = mcp(species = "Tukey")
)

summary(res_tukey)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = body_mass_g ~ sex + species, data = dat)

Linear Hypotheses:
                        Estimate Std. Error t value Pr(>|t|)    
Chinstrap - Adelie == 0    26.92      46.48   0.579     0.83    
Gentoo - Adelie == 0     1377.86      39.10  35.236   <1e-05 ***
Gentoo - Chinstrap == 0  1350.93      48.13  28.067   <1e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Note that when using the second method, it is the model without the interaction that needs to be specified into the glht() function, even if the interaction is significant. Moreover, do not forget to replace mod and species in my code with the name of your model and the name of your independent variable.

Both methods give the same results, that is:

  • body mass is not significantly different between Chinstrap and Adelie (adjusted \(p\)-value = 0.83),
  • body mass is significantly different between Gentoo and Adelie (adjusted \(p\)-value < 0.001), and
  • body mass is significantly different between Gentoo and Chinstrap (adjusted \(p\)-value < 0.001).

Remember that it is the adjusted \(p\)-values that are reported, to prevent the issue of multiple testing which occurs when comparing several pairs of groups.

If you would like to compare all combinations of groups, it can be done with the TukeyHSD() function and specifying the interaction in the which argument:

# all combinations of sex and species
TukeyHSD(mod,
  which = "sex:species"
)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = body_mass_g ~ sex * species, data = dat)

$`sex:species`
                                     diff       lwr       upr     p adj
male:Adelie-female:Adelie        674.6575  527.8486  821.4664 0.0000000
female:Chinstrap-female:Adelie   158.3703  -25.7874  342.5279 0.1376213
male:Chinstrap-female:Adelie     570.1350  385.9773  754.2926 0.0000000
female:Gentoo-female:Adelie     1310.9058 1154.8934 1466.9181 0.0000000
male:Gentoo-female:Adelie       2116.0004 1962.1408 2269.8601 0.0000000
female:Chinstrap-male:Adelie    -516.2873 -700.4449 -332.1296 0.0000000
male:Chinstrap-male:Adelie      -104.5226 -288.6802   79.6351 0.5812048
female:Gentoo-male:Adelie        636.2482  480.2359  792.2606 0.0000000
male:Gentoo-male:Adelie         1441.3429 1287.4832 1595.2026 0.0000000
male:Chinstrap-female:Chinstrap  411.7647  196.6479  626.8815 0.0000012
female:Gentoo-female:Chinstrap  1152.5355  960.9603 1344.1107 0.0000000
male:Gentoo-female:Chinstrap    1957.6302 1767.8040 2147.4564 0.0000000
female:Gentoo-male:Chinstrap     740.7708  549.1956  932.3460 0.0000000
male:Gentoo-male:Chinstrap      1545.8655 1356.0392 1735.6917 0.0000000
male:Gentoo-female:Gentoo        805.0947  642.4300  967.7594 0.0000000

Or with the HSD.test() function from the {agricolae} package, which denotes subgroups that are not significantly different from each other with the same letter:

library(agricolae)

HSD.test(mod,
  trt = c("sex", "species"),
  console = TRUE # print results
)

Study: mod ~ c("sex", "species")

HSD Test for body_mass_g 

Mean Square Error:  95726.69 

sex:species,  means

                 body_mass_g      std  r       se  Min  Max     Q25  Q50
female:Adelie       3368.836 269.3801 73 36.21222 2850 3900 3175.00 3400
female:Chinstrap    3527.206 285.3339 34 53.06120 2700 4150 3362.50 3550
female:Gentoo       4679.741 281.5783 58 40.62586 3950 5200 4462.50 4700
male:Adelie         4043.493 346.8116 73 36.21222 3325 4775 3800.00 4000
male:Chinstrap      3938.971 362.1376 34 53.06120 3250 4800 3731.25 3950
male:Gentoo         5484.836 313.1586 61 39.61427 4750 6300 5300.00 5500
                     Q75
female:Adelie    3550.00
female:Chinstrap 3693.75
female:Gentoo    4875.00
male:Adelie      4300.00
male:Chinstrap   4100.00
male:Gentoo      5700.00

Alpha: 0.05 ; DF Error: 327 
Critical Value of Studentized Range: 4.054126 

Groups according to probability of means differences and alpha level( 0.05 )

Treatments with the same letter are not significantly different.

                 body_mass_g groups
male:Gentoo         5484.836      a
female:Gentoo       4679.741      b
male:Adelie         4043.493      c
male:Chinstrap      3938.971      c
female:Chinstrap    3527.206      d
female:Adelie       3368.836      d

If you have many groups to compare, plotting them might be easier to interpret:

# set axis margins so labels do not get cut off
par(mar=c(4.1, 13.5, 4.1, 2.1))

# create confidence interval for each comparison
plot(TukeyHSD(mod, which = "sex:species"),
     las = 2 # rotate x-axis ticks
     )

From the outputs and plot above, we conclude that all combinations of sex and species are significantly different, except between female Chinstrap and female Adelie (\(p\)-value = 0.138) and male Chinstrap and male Adelie (\(p\)-value = 0.581).

9 Non-parametric alternative

A Kruskal-Wallis test is a non-parametric analog of a one-way ANOVA. It does not assume that the numeric variable has a Gaussian distribution for each group. It is implemented in R in the function kruskal.test(). The Kruskal-Wallis test is a rank-based test, and it is therefore not affected by outliers in the same way as a one-way ANOVA. The Kruskal-Wallis test is also more robust to unequal variances than a one-way ANOVA. However, the Kruskal-Wallis test is less powerful than a one-way ANOVA when the data do in fact come from Gaussian distributions with equal variances. The input for this function is the same as we used for lm() above. It includes a model formula statement and the name of the data frame to be used:

# Perform Kruskal-Wallis test
kruskal.test(Weight ~ Sire, data = data_long)

    Kruskal-Wallis rank sum test

data:  Weight by Sire
Kruskal-Wallis chi-squared = 7.648, df = 4, p-value = 0.1054

You can see for the output that a Kruskal-Wallis test also confirms that we can accept the null hypothesis of no difference between the means of the groups (p = 0.105).

10 Transforming Data

Up until now, the data we’ve examined have conformed, at least roughly, to the assumptions of the statistical models we’ve been studying. This is all very handy, but perhaps a little unrealistic. The real world being the messy place it is, biological data often don’t conform to the distributional assumptions of t-tests and ANOVA:

  • The residuals may not be normally distribute;
  • Variances may be unequal among groups.

The same kinds of problems with these distributional assumptions can also arise when working in a regression (i.e. non-normal residuals, non-constant variance). Furthermore, we might run into additional problems if there is some kind of non-linearity in the relationship between the response and predictor (numeric) variables.

Most biological data are unlikely to conform perfectly to all the assumptions, but experience has shown (fortunately) that t-tests, ANOVAs and regressions are generally quite robust—they perform reasonably well with data that deviate to some extent from the assumptions of the tests. However, in some cases residuals are clearly very far from normal, or variances change a lot across groups.

In these cases steps may need to be taken to deal with the problem. This chapter deals with one way of tackling the analysis of data that don’t fit the assumptions: data transformation. We will mostly focus on ANOVA / t-test setting, but keep in mind that the ideas are equally applicable to regression analysis.

10.1 Log transformations

Log transformation affects the data in two ways:

  • A log-transformation stretches out the left hand side (smaller values) of the distribution and squashes in the right hand side (larger values). This is obviously useful where the data set has a long tail to the right as in the example above;
  • The ‘squashing’ effect of a log-transformation is more pronounced at higher values. This means a log-transformation may also deal with another common problem in biological data (also seen in the ant data)—samples with larger means having larger variances.

If we are carrying out an ANOVA and the scale-location plot exhibits a positive relationship—i.e., groups with larger means have larger variances—then a log transformation could be appropriate. You can log transform the data using the log() function in R. For example, to log transform the Weight variable in the chicken data, you would use:

# Log transform the data
data_long$logWeight <- log(data_long$Weight)

10.2 Square root transformations

Taking the square root of the data is often appropriate where the data are whole number counts (though the log transformation may also work here). This typically occurs where your data are counts of organisms (e.g., algal cells in fields of view under a microscope). In R the square root of a set of data can be taken using the sqrt() function. However, note that there is no square function in the list. Taking squares is done using the ^ operator with the number 2 on the right (e.g., if the variable is called x, use x^2). To this with our chicken data, we would use:

# Square root transform the data
data_long$sqrtWeight <- sqrt(data_long$Weight)

10.3 Not all data can be transformed

There are some situations where no amount of transformation of the data will get round the fact that the data are problematic. Three in particular are worth noting…

  • Multimodal distributions - these may in fact have only one actual mode, but nonetheless have two or more clear ‘peaks’ (N.B. not to be confused with distributions that are ‘spiky’ just because there are few data);
  • Dissimilar distribution shapes - if the two (or more) samples have very different problems, e.g. one is strongly right-skewed and the other strongly left-skewed then no single transformation will be able to help—whatever corrects one sample will distort the other;
  • Samples with many exactly equal values - with results that are small integer numbers (e.g. counts of numbers of eggs in birds’ nests) then there will be many identical values. If the non-normality results from lots of identical values forming a particular peak, for example, this cannot be corrected by transformation since equal values will still be equal even when transformed.

11 Activities

11.1 Cuckoo eggs

The European cuckoo does not look after its own eggs, but instead lays them in the nests of birds of other species. Previous studies showed that cuckoos sometimes have evolved to lay eggs that are colored similarly to the host bird species. Is the same true of egg size? Do cuckoos lay eggs similar in size to the size of the eggs of their hosts? The data file cuckooeggs.csv contains data on the lengths of cuckoo eggs laid in the nests of a variety of host species. Here we compare the mean size of cuckoo eggs found in the nests of different host species. These are your tasks:

  • Read the data into R and check the data for errors;
  • Plot a multiple histogram showing cuckoo egg lengths by host species;
  • Calculate a table that shows the mean and standard deviation of length of cuckoo eggs for each host species;
  • Look at the graph and the table. For these data, would ANOVA be a valid method to test for differences between host species in the lengths of cuckoo eggs in their nests?
  • Use ANOVA to test for a difference between host species in the mean size of the cuckoo eggs in their nests. What is your conclusion?
  • Assuming that ANOVA rejected the null hypotheses of no mean differences, use a Tukey-Kramer test to decide which pairs of host species are significantly different from each other in cuckoo egg mean length. What is your conclusion?
💡 Click here to view a solution
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ dplyr::select() masks MASS::select()
✖ purrr::some()   masks car::some()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(agricolae)

# Load the data
cuckoo_data <- read.csv("data/cuckooeggs.csv")

# Check the data for errors
head(cuckoo_data)
   host_species egg_length
1 Hedge Sparrow      20.85
2 Hedge Sparrow      21.65
3 Hedge Sparrow      22.05
4 Hedge Sparrow      22.85
5 Hedge Sparrow      23.05
6 Hedge Sparrow      23.05
str(cuckoo_data)
'data.frame':   120 obs. of  2 variables:
 $ host_species: chr  "Hedge Sparrow" "Hedge Sparrow" "Hedge Sparrow" "Hedge Sparrow" ...
 $ egg_length  : num  20.9 21.6 22.1 22.9 23.1 ...
# Plotting a multiple histogram
ggplot(cuckoo_data, aes(x = egg_length, fill = host_species)) +
  geom_histogram(position = "dodge", binwidth = 0.5) +
  theme_minimal() +
  labs(title = "Cuckoo Egg Lengths by Host Species", x = "Egg Length", y = "Count")

# Calculate mean and standard deviation by host species
summary_table <- cuckoo_data %>%
  group_by(host_species) %>%
  summarise(mean_length = mean(egg_length),
            sd_length = sd(egg_length))

# Print the summary table
print(summary_table)
# A tibble: 6 × 3
  host_species  mean_length sd_length
  <chr>               <dbl>     <dbl>
1 Hedge Sparrow        23.1     1.07 
2 Meadow Pipit         22.3     0.921
3 Pied Wagtail         22.9     1.07 
4 Robin                22.6     0.685
5 Tree Pipit           23.1     0.901
6 Wren                 21.1     0.744
# ANOVA test
anova_result <- lm(egg_length ~ host_species, data = cuckoo_data)
summary(anova_result)

Call:
lm(formula = egg_length ~ host_species, data = cuckoo_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.64889 -0.44889 -0.04889  0.55111  2.15111 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)              23.12143    0.24301  95.147  < 2e-16 ***
host_speciesMeadow Pipit -0.82254    0.27825  -2.956  0.00379 ** 
host_speciesPied Wagtail -0.21810    0.33789  -0.645  0.51992    
host_speciesRobin        -0.54643    0.33275  -1.642  0.10332    
host_speciesTree Pipit   -0.03143    0.33789  -0.093  0.92606    
host_speciesWren         -1.99143    0.33789  -5.894 3.91e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9093 on 114 degrees of freedom
Multiple R-squared:  0.313, Adjusted R-squared:  0.2829 
F-statistic: 10.39 on 5 and 114 DF,  p-value: 3.152e-08
# Perform Tukey HSD test
TukeyHSD(aov(anova_result))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = anova_result)

$host_species
                                  diff          lwr         upr     p adj
Meadow Pipit-Hedge Sparrow -0.82253968 -1.629133605 -0.01594576 0.0428621
Pied Wagtail-Hedge Sparrow -0.21809524 -1.197559436  0.76136896 0.9872190
Robin-Hedge Sparrow        -0.54642857 -1.511003196  0.41814605 0.5726153
Tree Pipit-Hedge Sparrow   -0.03142857 -1.010892769  0.94803563 0.9999990
Wren-Hedge Sparrow         -1.99142857 -2.970892769 -1.01196437 0.0000006
Pied Wagtail-Meadow Pipit   0.60444444 -0.181375330  1.39026422 0.2324603
Robin-Meadow Pipit          0.27611111 -0.491069969  1.04329219 0.9021876
Tree Pipit-Meadow Pipit     0.79111111  0.005291337  1.57693089 0.0474619
Wren-Meadow Pipit          -1.16888889 -1.954708663 -0.38306911 0.0004861
Robin-Pied Wagtail         -0.32833333 -1.275604766  0.61893810 0.9155004
Tree Pipit-Pied Wagtail     0.18666667 -0.775762072  1.14909541 0.9932186
Wren-Pied Wagtail          -1.77333333 -2.735762072 -0.81090459 0.0000070
Tree Pipit-Robin            0.51500000 -0.432271433  1.46227143 0.6159630
Wren-Robin                 -1.44500000 -2.392271433 -0.49772857 0.0003183
Wren-Tree Pipit            -1.96000000 -2.922428738 -0.99757126 0.0000006
# Tukey HSD test with agricolae
HSD.test(anova_result, "host_species", console = TRUE)

Study: anova_result ~ "host_species"

HSD Test for egg_length 

Mean Square Error:  0.8267399 

host_species,  means

              egg_length       std  r        se   Min   Max   Q25   Q50   Q75
Hedge Sparrow   23.12143 1.0687365 14 0.2430079 20.85 25.05 22.90 23.05 23.85
Meadow Pipit    22.29889 0.9206278 45 0.1355433 19.65 24.45 22.05 22.25 22.85
Pied Wagtail    22.90333 1.0676186 15 0.2347680 21.05 24.85 21.95 23.05 23.75
Robin           22.57500 0.6845923 16 0.2273131 21.05 23.85 22.05 22.55 23.05
Tree Pipit      23.09000 0.9014274 15 0.2347680 21.05 24.05 22.55 23.25 23.75
Wren            21.13000 0.7437357 15 0.2347680 19.85 22.25 20.85 21.05 21.75

Alpha: 0.05 ; DF Error: 114 
Critical Value of Studentized Range: 4.099489 

Groups according to probability of means differences and alpha level( 0.05 )

Treatments with the same letter are not significantly different.

              egg_length groups
Hedge Sparrow   23.12143      a
Tree Pipit      23.09000      a
Pied Wagtail    22.90333     ab
Robin           22.57500     ab
Meadow Pipit    22.29889      b
Wren            21.13000      c

11.2 Malaria

The pollen of the maize (corn) plant is a source of food to larval mosquitoes of the species Anopheles arabiensis, the main vector of malaria in Ethiopia. The production of maize has increased substantially in certain areas of Ethiopia recently, and over the same time period, malaria has entered in to new areas where it was previously rare. This raises the question, is the increase of maize cultivation partly responsible for the increase in malaria?

One line of evidence is to look for an association between maize production and malaria incidence at different geographically dispersed sites. The data set malaria vs maize.csv contains information on several high-altitude sites in Ethiopia, with information about the level of cultivation of maize (low, medium or high in the variable maize_yield) and the rate of malaria per 10,000 people (incidence_rate_per_ten_thousand). These are your tasks:

  • Read the data into R and check the data for errors;
  • Plot a multiple histogram to show the relationship between level of maize production and the incidence of malaria;
  • ANOVA is a logical choice of method to test differences in the mean rate of malaria between sites differing in level of maize production. Calculate the standard deviation of the incidence rate for each level of maize yield. Do these data seem to conform to the assumptions of ANOVA? Describe any violations of assumptions you identify;
  • Compute the log of the incidence rate and redraw the multiple histograms for different levels of maize yield. Calculate the standard deviation of the log incidence rate for each level of maize yield. Does the log-transformed data better meet the assumptions of ANOVA than did the untransformed data?
  • Test for an association between maize yield and malaria incidence.
💡 Click here to view a solution
library(tidyverse)
library(agricolae)

# Read the data
data <- read.csv("data/malaria_maize.csv")

# Plot multiple histograms
ggplot(data, aes(x = incidence_rate_per_ten_thousand, fill = maize_yield)) +
  geom_histogram(position = "dodge", binwidth = 10) +
  labs(title = "Histogram of Malaria Incidence Rate by Maize Yield Level",
       x = "Incidence Rate per Ten Thousand",
       y = "Frequency",
       fill = "Maize Yield Level")

# Standard Deviation of Incidence Rate by Maize Yield Level
std_dev_by_yield <- data %>%
  group_by(maize_yield) %>%
  summarize(StdDev_Incidence_Rate = sd(incidence_rate_per_ten_thousand))

print(std_dev_by_yield)

# Checking for ANOVA assumptions: Homogeneity of variances
# (Bartlett's test is commonly used for this)
bartlett_test <- bartlett.test(incidence_rate_per_ten_thousand ~ maize_yield, data = data)
print(bartlett_test)

# Log Transformation and Redrawing Histograms
data$log_incidence_rate <- log(data$incidence_rate_per_ten_thousand)

ggplot(data, aes(x = log_incidence_rate, fill = maize_yield)) +
  geom_histogram(position = "dodge", binwidth = 0.5) +
  labs(title = "Histogram of Log-Transformed Malaria Incidence Rate by Maize Yield Level",
       x = "Log of Incidence Rate",
       y = "Frequency",
       fill = "Maize Yield Level")

# Standard Deviation of Log Incidence Rate by Maize Yield Level
std_dev_log_by_yield <- data %>%
  group_by(maize_yield) %>%
  summarize(StdDev_Log_Incidence_Rate = sd(log_incidence_rate))

print(std_dev_log_by_yield)

# 4. Test for Association
# Using ANOVA to test for differences in means between groups
anova_result <- aov(incidence_rate_per_ten_thousand ~ maize_yield, data = data)
summary(anova_result)

# Using Tukey HSD from agricolae
HSD.test(anova_result, "maize_yield", console = TRUE)

Footnotes

  1. If you really want to test the independence, you can do so visually with a plot of the residuals vs. fitted values. This plot can be done in R with plot(mod, which = 1), where mod corresponds to the name of your model. Or you can do so with the Durbin-Watson test. In R, it can be done with the durbinWatsonTest() function from the car package.↩︎

  2. Note that the Bartlett’s and Fligner-Killeen tests are also appropriate to test the assumption of equal variances.↩︎