Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to spread data

import pandas as pd

Imagine we have the following data about participants’ demographic and linguistic background.

sample = pd.read_csv('./sample_ppt_data.csv')
sample
SubjID Question Answer
0 1 input_age 30
1 1 input_gender male
2 1 second_lang NaN
3 2 input_age 21
4 2 input_gender NaN
5 2 second_lang Spanish
6 3 input_age 42
7 3 input_gender male
8 3 second_lang Mandarin

The following code mutates the data (akin to the spread function in R) such that the values under the Question column has their own columns.

Note the use of aggfunc parameter - we need it in order to deal with NaNs in some cells.

sample = sample.pivot_table(index=['SubjID'], columns='Question', values='Answer', aggfunc='first').reset_index()
sample
Question SubjID input_age input_gender second_lang
0 1 30 male NaN
1 2 21 NaN Spanish
2 3 42 male Mandarin