Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to mutate the dataframe to add item mean and sd to each row

import pandas as pd
sample = pd.read_csv('./sampledata.csv')
sample.head(3)
Movement Island_Type Island Distance Item Sentence Subj_id List Score
0 WH whe non sh 1 Who thinks that Paul stole the necklace? 1 1 6
1 WH whe non sh 2 Who thinks that Matt chased the bus? 1 1 2
2 WH whe non sh 3 Who thinks that Tom sold the television? 1 1 3

I want to create new columns Score_mean and Score_sd, which show the mean and standard deviation of a particular item’s scores given by different subjects. Think group_by followed by mutate in R. Here’s one way to accomplish it.

sample = sample.assign(
    Score_mean = sample.groupby(['Sentence'])['Score'].transform('mean'),
    Score_sd = sample.groupby(['Sentence'])['Score'].transform('std'))
sample.head(3)
Movement Island_Type Island Distance Item Sentence Subj_id List Score Score_mean Score_sd
0 WH whe non sh 1 Who thinks that Paul stole the necklace? 1 1 6 4.875 1.543805
1 WH whe non sh 2 Who thinks that Matt chased the bus? 1 1 2 4.250 1.914854
2 WH whe non sh 3 Who thinks that Tom sold the television? 1 1 3 4.250 1.693123