I need to loop through a pandas DataFrame, but first I have to filter it. I need to look at how many "old_id"s are attached to each new ID.
I wrote this code and is working fine, but it doesn't scale really well.
d = dict()
for new_id in (new_id_list):
d[new_id] = df[df['new_id_col'] == new_id]['old_id'].nunique()
How can I make this more efficient?
CodePudding user response:
Looks like you're looking for groupby
nunique
. This fetches the number of unique "old_id"s per "new_id_col":
out = df.groupby('new_id_col')['old_id'].nunique().to_dict()