Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

how to iterate over rows without df.iterrows()

Using iterrows() in Pandas is often considered a bad idea for performance reasons (for instance, iterrows() converts each rows to a index and series pair, which slows down an execution).

I recently ran into a situation where I need to (i) check if each row in a certain column (“subscription_type”) is NaN, (ii) if so, grab the their parent subscriber id under another column of the row (“is_plus_1_of”), and (iii) replace NaN with the subscription type of the parent subscriber. Here’s how I managed to do these without relying on iterrows() - namely, I wrote a function and call the function for each row with apply().

import pandas as pd
import numpy as np
df = pd.DataFrame({'subscriber_id':['21daf8cd', '3393eee6', 'e0c9b302', '1f0c4dbc', '7c49a7e8'],
                   'subscription_type':[np.nan, 'standard', np.nan, 'pro', 'standard'],
                   'is_plus_1_of':['1f0c4dbc', np.nan, '3393eee6', np.nan, np.nan],
                   'account_create_date':['12/2/2023', '12/4/2023', '12/8/2023', '12/10/2023', '12/15/2023']})
def mark_plus1_type(row):
  if row['subscription_type'] != row['subscription_type']: # check for NaN
    parent_type = df[df['subscriber_id'] == row['is_plus_1_of']]['subscription_type'].iloc[0]
    return parent_type
  else:
    return row['subscription_type']

df['subscription_type'] = df.apply(mark_plus1_type, axis=1)
df
subscriber_id subscription_type is_plus_1_of account_create_date
0 21daf8cd pro 1f0c4dbc 12/2/2023
1 3393eee6 standard NaN 12/4/2023
2 e0c9b302 standard 3393eee6 12/8/2023
3 1f0c4dbc pro NaN 12/10/2023
4 7c49a7e8 standard NaN 12/15/2023

You can see now that the subscription types of the first and third rows reflect those of their parent subscribers.