Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to assign subject IDs

import pandas as pd
sample = pd.read_csv('./sampledata1.csv')
sample.head(3)
Unnamed: 0 Movement Island_Type Island Distance Item Sentence Subj_id List Score
0 0 WH whe non sh 1 Who thinks that Paul stole the necklace? 31WPPC 1 6
1 1 WH whe non sh 2 Who thinks that Matt chased the bus? 31WPPC 1 2
2 2 WH whe non sh 3 Who thinks that Tom sold the television? 31WPPC 1 3
sample['Subj_id'].unique()
array(['31WPPC', 'MLOT0C', 'QUCYBY', '3HM9R4', 'TNZ93A', 'RE7119',
       'IKH3NF', '0R04SW', 'S7VOS9', 'JO1B7Q', '0HY4IC', 'MNSV2I',
       'IOEK50', 'LXP23M', '7NXUBG', '4EQFWR'], dtype=object)

Let’s simplify the subject ids in this dataset by converting them to numbers only. Here is one way to accomplish this.

sample['Subj_id'] = sample.groupby('Subj_id', sort=False).ngroup()
sample['Subj_id'] = sample['Subj_id'] + 1 #add 1 to each id if you do not want the first id to be 0
sample['Subj_id'].unique()
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])