Linguistics PhD
import pandas as pd
sample = pd.read_csv('./sampledata.csv')
sample.head(3)
Movement | Island_Type | Island | Distance | Item | Sentence | Subj_id | List | Score | |
---|---|---|---|---|---|---|---|---|---|
0 | WH | whe | non | sh | 1 | Who thinks that Paul stole the necklace? | 1 | 1 | 6 |
1 | WH | whe | non | sh | 2 | Who thinks that Matt chased the bus? | 1 | 1 | 2 |
2 | WH | whe | non | sh | 3 | Who thinks that Tom sold the television? | 1 | 1 | 3 |
I want to create new columns Score_mean
and Score_sd
, which show the mean and standard deviation of a particular item’s scores given by different subjects. Think group_by
followed by mutate
in R. Here’s one way to accomplish it.
sample = sample.assign(
Score_mean = sample.groupby(['Sentence'])['Score'].transform('mean'),
Score_sd = sample.groupby(['Sentence'])['Score'].transform('std'))
sample.head(3)
Movement | Island_Type | Island | Distance | Item | Sentence | Subj_id | List | Score | Score_mean | Score_sd | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | WH | whe | non | sh | 1 | Who thinks that Paul stole the necklace? | 1 | 1 | 6 | 4.875 | 1.543805 |
1 | WH | whe | non | sh | 2 | Who thinks that Matt chased the bus? | 1 | 1 | 2 | 4.250 | 1.914854 |
2 | WH | whe | non | sh | 3 | Who thinks that Tom sold the television? | 1 | 1 | 3 | 4.250 | 1.693123 |