this file walks you through the process of how to visualize the results of an acceptability judgment experiment. Some of the code is the same as the one used in Part 1 of the tutorial, so please refer to the file if you get stuck.
library(tidyverse)
data = read.csv('./fakedata_2.csv')
Let’s take a look at the dataset
glimpse(data)
## Rows: 512
## Columns: 8
## $ Movement <chr> "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH", "WH"…
## $ Island_Type <chr> "whe", "whe", "whe", "whe", "whe", "whe", "whe", "whe", "w…
## $ Island <chr> "non", "non", "non", "non", "non", "non", "non", "non", "i…
## $ Distance <chr> "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh", "sh"…
## $ Item <int> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4…
## $ Sentence <chr> "Who thinks that Paul stole the necklace?", "Who thinks th…
## $ Subj_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Score <int> 6, 2, 3, 7, 2, 4, 7, 2, 5, 2, 3, 5, 5, 2, 4, 4, 2, 3, 2, 7…
There are 4 conditions in this experiment:
Island
= “isl”, Distance
= “lg”: This is
the condition where there is a movement of a wh-word (“what”) across a
chunk of words starting with “whether”, which works as an island (a
structure from which no movement is possible) in English. This condition
is predicted to have the lowest acceptability.Island
= “isl”, Distance
= “sh”: This is
the condition where there is a movement of a wh-word, but NOT across the
island containing “whether”.Island
= “non”, Distance
= “lg”: This is
the condition where there is a movement of a wh-word across a chunk of
words starting with “that”, which DOES NOT work as an island in
English.Island
= “non”, Distance
= “sh”: This is
the condition where there is a movement of a wh-word, but NOT across the
chunk containing “that”. This condition is predicted to have the highest
acceptability.#write your answers here
data = data %>% group_by(Subj_id) %>% mutate(Z_score = (Score - mean(Score)) / sd(Score))
To make a summary, group_by
and summarize
functions come in handy. Inside the summarize
function,
calculate the mean acceptability for each group using
mean()
and name the column “Mean”.
data_summary = data %>% group_by(Island, Distance) %>% summarize(Mean = mean(Z_score))
data_summary
## # A tibble: 4 × 3
## # Groups: Island [2]
## Island Distance Mean
## <chr> <chr> <dbl>
## 1 isl lg -0.608
## 2 isl sh -0.0708
## 3 non lg 0.117
## 4 non sh 0.561
Create a similar summary table with the mean AND standard deviation for each condition.
#write your answers here
data_summary = data %>% group_by(Island, Distance) %>% summarize(Mean = mean(Z_score), SD = sd(Z_score))
Standard error is different from standard deviation (See https://towardsdatascience.com/standard-deviation-vs-standard-error-5210e3bc9c04) and we’ll need it for each condition when we plot the acceptability scores with error bars.
The formula is pretty simple:
Standard deviation / number of subjects
.
Write the code below to get a summary dataset consisting of the mean z-score, standard deviation, and standard error of each condition.
#write your answers here
data_summary = data %>% group_by(Island, Distance) %>%
summarize(Mean = mean(Z_score),
SD = sd(Z_score),
SE = SD/sqrt(length(levels(as.factor(data$Subj_id)))))
data_summary
## # A tibble: 4 × 5
## # Groups: Island [2]
## Island Distance Mean SD SE
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 isl lg -0.608 0.684 0.171
## 2 isl sh -0.0708 0.658 0.164
## 3 non lg 0.117 1.12 0.281
## 4 non sh 0.561 1.02 0.254
Let’s make a simple plot to see what happens.
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
geom_point()+
geom_path(aes(group = Island))
The plot looks okay, but it’s not easy to tell what’s going on there. We will make a number of edits to the plot to make it look nicer.
Do the following to improve the plot. You should be able to find the hints in the previous tutorial.
#write your answers here
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
geom_point()+
geom_path(aes(group = Island, linetype = rev(Island)))+
expand_limits(y = c(-1, 1))+
theme_classic()+
scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))
Right now, it’s not clear what the axis labels and the legend stand for (what’s “sh”? what’s “isl”?). The labels are kind of small so we want to fix that as well. In addition, it’s not clear how variable the data can be with those single points, which is why we might want to add error bars to the plot.
First, let’s relabel the x-axis and legend. To do so, we will make a separate dataset and combine it with a part of the summary dataset.
data_summary
## # A tibble: 4 × 5
## # Groups: Island [2]
## Island Distance Mean SD SE
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 isl lg -0.608 0.684 0.171
## 2 isl sh -0.0708 0.658 0.164
## 3 non lg 0.117 1.12 0.281
## 4 non sh 0.561 1.02 0.254
columns = data.frame(Island=c('whether_island', 'whether_island', 'non-island', 'non-island'),
Distance=c('long','short','long','short'))
data_summary = cbind(columns, data_summary[,c(3:5)])
data_summary
## Island Distance Mean SD SE
## 1 whether_island long -0.60802871 0.6842833 0.1710708
## 2 whether_island short -0.07079371 0.6576115 0.1644029
## 3 non-island long 0.11745032 1.1247666 0.2811916
## 4 non-island short 0.56137210 1.0151507 0.2537877
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
geom_point()+
geom_path(aes(group = Island, linetype = rev(Island)))+
expand_limits(y = c(-1, 1))+
theme_classic()+
scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))
Second, let’s add an error bar to each data point. It’s pretty
simple; we will use the geom_errorbar
function.
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
geom_point()+
geom_path(aes(group = Island, linetype = rev(Island)))+
expand_limits(y = c(-1, 1))+
geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1)+
theme_classic()+
scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))
We’re almost there! Let’s make a few more edits to the plot to make it camera-ready.
#write your answers here
data_summary %>% ggplot(aes(x=Distance, y=Mean))+
geom_point()+
geom_path(aes(group = Island, linetype = rev(Island)))+
expand_limits(y = c(-1, 1))+
geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1)+
theme_classic()+
scale_x_discrete(limits = rev(levels(as.factor(data_summary$Distance))))+
ylab('mean z-score')+
theme(legend.title = element_blank(),
legend.position = c(0.85, 0.85),
legend.text = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 15))