how to iterate over rows without df.iterrows()

Using iterrows() in Pandas is often considered a bad idea for performance reasons (for instance, iterrows() converts each rows to a index and series pair, which slows down an execution).

I recently ran into a situation where I need to (i) check if each row in a certain column (“subscription_type”) is NaN, (ii) if so, grab the their parent subscriber id under another column of the row (“is_plus_1_of”), and (iii) replace NaN with the subscription type of the parent subscriber. Here’s how I managed to do these without relying on iterrows() - namely, I wrote a function and call the function for each row with apply().

import pandas as pd
import numpy as np

df = pd.DataFrame({'subscriber_id':['21daf8cd', '3393eee6', 'e0c9b302', '1f0c4dbc', '7c49a7e8'],
                   'subscription_type':[np.nan, 'standard', np.nan, 'pro', 'standard'],
                   'is_plus_1_of':['1f0c4dbc', np.nan, '3393eee6', np.nan, np.nan],
                   'account_create_date':['12/2/2023', '12/4/2023', '12/8/2023', '12/10/2023', '12/15/2023']})

def mark_plus1_type(row):
  if row['subscription_type'] != row['subscription_type']: # check for NaN
    parent_type = df[df['subscriber_id'] == row['is_plus_1_of']]['subscription_type'].iloc[0]
    return parent_type
  else:
    return row['subscription_type']

df['subscription_type'] = df.apply(mark_plus1_type, axis=1)
df

	subscriber_id	subscription_type	is_plus_1_of	account_create_date
0	21daf8cd	pro	1f0c4dbc	12/2/2023
1	3393eee6	standard	NaN	12/4/2023
2	e0c9b302	standard	3393eee6	12/8/2023
3	1f0c4dbc	pro	NaN	12/10/2023
4	7c49a7e8	standard	NaN	12/15/2023

You can see now that the subscription types of the first and third rows reflect those of their parent subscribers.