Linguistics PhD
import pandas as pd
Imagine we have the following data about participants’ demographic and linguistic background.
sample = pd.read_csv('./sample_ppt_data.csv')
sample
SubjID | Question | Answer | |
---|---|---|---|
0 | 1 | input_age | 30 |
1 | 1 | input_gender | male |
2 | 1 | second_lang | NaN |
3 | 2 | input_age | 21 |
4 | 2 | input_gender | NaN |
5 | 2 | second_lang | Spanish |
6 | 3 | input_age | 42 |
7 | 3 | input_gender | male |
8 | 3 | second_lang | Mandarin |
The following code mutates the data (akin to the spread
function in R) such that the values under the Question
column has their own columns.
Note the use of aggfunc
parameter - we need it in order to deal with NaNs in some cells.
sample = sample.pivot_table(index=['SubjID'], columns='Question', values='Answer', aggfunc='first').reset_index()
sample
Question | SubjID | input_age | input_gender | second_lang |
---|---|---|---|---|
0 | 1 | 30 | male | NaN |
1 | 2 | 21 | NaN | Spanish |
2 | 3 | 42 | male | Mandarin |