Home > Enterprise >  Best way to loop through a filtered pandas Dataframe
Best way to loop through a filtered pandas Dataframe

Time:02-12

I need to loop through a pandas DataFrame, but first I have to filter it. I need to look at how many "old_id"s are attached to each new ID.

I wrote this code and is working fine, but it doesn't scale really well.

d = dict()

for new_id in (new_id_list):
    
    d[new_id] = df[df['new_id_col'] == new_id]['old_id'].nunique()

How can I make this more efficient?

CodePudding user response:

Looks like you're looking for groupby nunique. This fetches the number of unique "old_id"s per "new_id_col":

out = df.groupby('new_id_col')['old_id'].nunique().to_dict()
  • Related