Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to reformat and rename values in a pandas dataframe

import pandas as pd
df = pd.DataFrame({'Discount':[10, 8, 20, 15, 10],
                   'Product':[' UMbreLla', '  maTress', 'BeDmintoN ', 'Shuttle', 'jaCket  '],
                   'Updated_Price':[880, 1250, 1450, 1550, 400],
                   'Date':['10/2/2011', '10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011']})
df
Discount Product Updated_Price Date
0 10 UMbreLla 880 10/2/2011
1 8 maTress 1250 10/2/2011
2 20 BeDmintoN 1450 11/2/2011
3 15 Shuttle 1550 12/2/2011
4 10 jaCket 400 13/2/2011

here are some ways to (i) strip product names of whitespace and (ii) capitalize their first letter.

#Example 1
ls = df['Product'].tolist()
new_ls = [item.strip().capitalize() for item in ls]
df = df.drop(['Product'], axis=1)
df.insert(1,'Product_new', new_ls)
df
Discount Product_new Updated_Price Date
0 10 Umbrella 880 10/2/2011
1 8 Matress 1250 10/2/2011
2 20 Bedminton 1450 11/2/2011
3 15 Shuttle 1550 12/2/2011
4 10 Jacket 400 13/2/2011
#Example 2
df['Product'] = df['Product'].apply(lambda x: x.strip().capitalize())
df
Discount Product Updated_Price Date
0 10 Umbrella 880 10/2/2011
1 8 Matress 1250 10/2/2011
2 20 Bedminton 1450 11/2/2011
3 15 Shuttle 1550 12/2/2011
4 10 Jacket 400 13/2/2011

and the code below renames certain values using regex.

df.replace(to_replace=r'^Be', value='Ba', regex=True, inplace=True)
df
Discount Product_new Updated_Price Date
0 10 Umbrella 880 10/2/2011
1 8 Matress 1250 10/2/2011
2 20 Badminton 1450 11/2/2011
3 15 Shuttle 1550 12/2/2011
4 10 Jacket 400 13/2/2011