How to spread data

import pandas as pd

Imagine we have the following data about participants’ demographic and linguistic background.

sample = pd.read_csv('./sample_ppt_data.csv')
sample

The following code mutates the data (akin to the spread function in R) such that the values under the Question column has their own columns.

Note the use of aggfunc parameter - we need it in order to deal with NaNs in some cells.

sample = sample.pivot_table(index=['SubjID'], columns='Question', values='Answer', aggfunc='first').reset_index()
sample

Question	SubjID	input_age	input_gender	second_lang
0	1	30	male	NaN
1	2	21	NaN	Spanish
2	3	42	male	Mandarin