Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to group data, assign values to them, and ungroup them

import pandas as pd
sample = pd.read_csv('./sampledata.csv')
sample.head(3)
Movement Island_Type Island Distance Item Sentence Subj_id List Score
0 WH whe non sh 1 Who thinks that Paul stole the necklace? 1 1 6
1 WH whe non sh 2 Who thinks that Matt chased the bus? 1 1 2
2 WH whe non sh 3 Who thinks that Tom sold the television? 1 1 3

The following code groups the data by two conditions (Island, Distance), assigns the mean and the standard deviation of acceptability scores per group, and ungroups them:

sample['mean_response'] = sample.groupby(['Island','Distance'])['Score'].transform('mean')
sample['sd_answer_z'] = sample.groupby(['Island','Distance'])['Score'].transform('std')
sample.iloc[:5, [2, 3, -3, -2, -1]] #showing only relevant columns
Island Distance Score mean_response sd_answer_z
0 non sh 6 4.585938 1.709295
1 non sh 2 4.585938 1.709295
2 non sh 3 4.585938 1.709295
3 non sh 7 4.585938 1.709295
4 non sh 2 4.585938 1.709295

How to aggregate data

Now I will aggregate the data based on the two conditions and make a summary dataset that shows the mean, the standard deviation, and the standard error of the mean of each group’s acceptability.

data_summary = sample.groupby(['Island','Distance'])['Score'].agg(['mean','std','sem']).reset_index()
data_summary
Island Distance mean std sem
0 isl lg 2.593750 1.180134 0.104310
1 isl sh 3.531250 1.128985 0.099789
2 non lg 3.890625 1.920607 0.169759
3 non sh 4.585938 1.709295 0.151082