Home > Enterprise >  Return True if multiple rows in a dataframe share the same value in particular column(s)
Return True if multiple rows in a dataframe share the same value in particular column(s)

Time:07-27

I have a dataframe containing four columns: Employee ID, Employee Location, Manager ID and Manager Location. I would like to write a python script that searches for all Employee IDs that share the same Manager ID and checks to see if those employees are all co-located (in other words, does Employee Location match for all Employee IDs with the same Manager ID). Based on this, I simply want to append a new column that returns True or False.

I've seen some older posts suggesting the use of groupby to group similar rows based on a column, but I want to retain all rows in the dataframe. Is there a way of doing this without doing any sort of aggregation to the dataset?

Edit: example in excel

CodePudding user response:

Groupby is the natural choice - you can undo the aggregation later with a join if necessary.

More specifically, I would start by making a Series, indexed by manager_id, specifying whether all the employees of that manager are in the same location:

c = df.groupby("manager_id")["emp_location"].nunique().eq(1)

This counts the amount of unique locations per manager and then checks if they're equal to 1, i.e., there's only one location and so all employees are at the same place. You can join this back onto the original dataframe to obtain the result you want:

c.rename("same_location", inplace=True)
df.join(c, on="manager_id")
  • Related