Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

Create a new column based on existing ones

import pandas as pd
import numpy as np
df = pd.DataFrame(["STD, City    State",
                   "33, Kolkata    West Bengal",
                   "44, Chennai    Tamil Nadu",
                   "40, Hyderabad    Telengana",
                   "80, Bangalore    Karnataka"], columns=['row'])
df
row
0 STD, City State
1 33, Kolkata West Bengal
2 44, Chennai Tamil Nadu
3 40, Hyderabad Telengana
4 80, Bangalore Karnataka

split a column into two

df = df['row'].str.split(',', expand=True)
df
0 1
0 STD City State
1 33 Kolkata West Bengal
2 44 Chennai Tamil Nadu
3 40 Hyderabad Telengana
4 80 Bangalore Karnataka

make a column based on an exisitng one where its values are shifted by one row

df[2] = df[0].shift(-1) #shifted up
df[3] = df[0].shift(1) #shifted down
df
0 1 2 3
0 STD City State 33 NaN
1 33 Kolkata West Bengal 44 STD
2 44 Chennai Tamil Nadu 40 33
3 40 Hyderabad Telengana 80 44
4 80 Bangalore Karnataka NaN 40

make a column based on a condition on another column

df[4] = np.where(df[2].fillna(0).astype(int) < 40, 'low', 'high')
df
0 1 2 3 4
0 STD City State 33 NaN low
1 33 Kolkata West Bengal 44 STD high
2 44 Chennai Tamil Nadu 40 33 high
3 40 Hyderabad Telengana 80 44 high
4 80 Bangalore Karnataka NaN 40 low