Overview

this file walks you through the process of how to visualize the results of an acceptability judgment experiment. Some of the code is the same as the one used in Part 1 of the tutorial, so please refer to the file if you get stuck.

Install and import required packages

library(tidyverse)

Import and clean the data

data = read.csv('./fakedata_2.csv')

Let’s take a look at the dataset

glimpse(data)
## Rows: 512
## Columns: 8
## $ Movement    <chr> "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH"…
## $ Island_Type <chr> "whe", "whe", "whe", "whe", "whe", "whe", "whe", "whe", "w…
## $ Island      <chr> "non", "non", "non", "non", "non", "non", "non", "non", "i…
## $ Distance    <chr> "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh"…
## $ Item        <int> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4…
## $ Sentence    <chr> "Who thinks that Paul stole the necklace?", "Who thinks th…
## $ Subj_id     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Score       <int> 6, 2, 3, 7, 2, 4, 7, 2, 5, 2, 3, 5, 5, 2, 4, 4, 2, 3, 2, 7…

There are 4 conditions in this experiment:

Review Q: Conduct z-score conversion on raw acceptability scores

#write your answers here

Answers

data = data %>% group_by(Subj_id) %>% mutate(Z_score = (Score - mean(Score)) / sd(Score))

Make a summary table that contains the average acceptability (in z-score) for each of the 4 conditions

To make a summary, group_by and summarize functions come in handy. Inside the summarize function, calculate the mean acceptability for each group using mean() and name the column “Mean”.

data_summary = data %>% group_by(Island, Distance) %>% summarize(Mean = mean(Z_score))
data_summary
## # A tibble: 4 × 3
## # Groups:   Island [2]
##   Island Distance    Mean
##   <chr>  <chr>      <dbl>
## 1 isl    lg       -0.608 
## 2 isl    sh       -0.0708
## 3 non    lg        0.117 
## 4 non    sh        0.561

Exercise: Calculate the standard deviation for each condition

Create a similar summary table with the mean AND standard deviation for each condition.

#write your answers here

Answers

data_summary = data %>% group_by(Island, Distance) %>% summarize(Mean = mean(Z_score), SD = sd(Z_score))

Exercise: Add standard error of each condition to the summary table

Standard error is different from standard deviation (See https://towardsdatascience.com/standard-deviation-vs-standard-error-5210e3bc9c04) and we’ll need it for each condition when we plot the acceptability scores with error bars.

The formula is pretty simple: Standard deviation / number of subjects.

Write the code below to get a summary dataset consisting of the mean z-score, standard deviation, and standard error of each condition.

#write your answers here

Answers

data_summary = data %>% group_by(Island, Distance) %>% 
  summarize(Mean = mean(Z_score),
            SD = sd(Z_score),
            SE = SD/sqrt(length(levels(as.factor(data$Subj_id)))))
data_summary
## # A tibble: 4 × 5
## # Groups:   Island [2]
##   Island Distance    Mean    SD    SE
##   <chr>  <chr>      <dbl> <dbl> <dbl>
## 1 isl    lg       -0.608  0.684 0.171
## 2 isl    sh       -0.0708 0.658 0.164
## 3 non    lg        0.117  1.12  0.281
## 4 non    sh        0.561  1.02  0.254

Plot the data

Let’s make a simple plot to see what happens.

data_summary %>% ggplot(aes(x=Distance, y=Mean))+
  geom_point()+
  geom_path(aes(group = Island))

The plot looks okay, but it’s not easy to tell what’s going on there. We will make a number of edits to the plot to make it look nicer.

Exercise: Make the first round of edits

Do the following to improve the plot. You should be able to find the hints in the previous tutorial.

#write your answers here

Answers

data_summary %>% ggplot(aes(x=Distance, y=Mean))+
  geom_point()+
  geom_path(aes(group = Island, linetype = rev(Island)))+
  expand_limits(y = c(-1, 1))+
  theme_classic()+
  scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))

Improve axis labels and the legend, and add error bars

Right now, it’s not clear what the axis labels and the legend stand for (what’s “sh”? what’s “isl”?). The labels are kind of small so we want to fix that as well. In addition, it’s not clear how variable the data can be with those single points, which is why we might want to add error bars to the plot.

First, let’s relabel the x-axis and legend. To do so, we will make a separate dataset and combine it with a part of the summary dataset.

data_summary
## # A tibble: 4 × 5
## # Groups:   Island [2]
##   Island Distance    Mean    SD    SE
##   <chr>  <chr>      <dbl> <dbl> <dbl>
## 1 isl    lg       -0.608  0.684 0.171
## 2 isl    sh       -0.0708 0.658 0.164
## 3 non    lg        0.117  1.12  0.281
## 4 non    sh        0.561  1.02  0.254
columns = data.frame(Island=c('whether_island', 'whether_island', 'non-island', 'non-island'),
                     Distance=c('long','short','long','short'))
data_summary = cbind(columns, data_summary[,c(3:5)])
data_summary
##           Island Distance        Mean        SD        SE
## 1 whether_island     long -0.60802871 0.6842833 0.1710708
## 2 whether_island    short -0.07079371 0.6576115 0.1644029
## 3     non-island     long  0.11745032 1.1247666 0.2811916
## 4     non-island    short  0.56137210 1.0151507 0.2537877
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
  geom_point()+
  geom_path(aes(group = Island, linetype = rev(Island)))+
  expand_limits(y = c(-1, 1))+
  theme_classic()+
  scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))

Second, let’s add an error bar to each data point. It’s pretty simple; we will use the geom_errorbar function.

data_summary %>% ggplot(aes(x=Distance, y=Mean))+
  geom_point()+
  geom_path(aes(group = Island, linetype = rev(Island)))+
  expand_limits(y = c(-1, 1))+
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1)+
  theme_classic()+
  scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))

Exercise: Make the second round of edits

We’re almost there! Let’s make a few more edits to the plot to make it camera-ready.

#write your answers here

Answers

data_summary %>% ggplot(aes(x=Distance, y=Mean))+
  geom_point()+
  geom_path(aes(group = Island, linetype = rev(Island)))+
  expand_limits(y = c(-1, 1))+
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1)+
  theme_classic()+
  scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))+
  ylab('mean z-score')+
  theme(legend.title = element_blank(),
        legend.position = c(0.85, 0.85),
        legend.text = element_text(size = 15),
        axis.text.x = element_text(size = 15),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 15))