I have a dataframe containing the employee name, employee email, manager name and manager email. I need to filter this dataframe using all the unique values of the manager email and confirm they also appear in the column employee email, this way making sure they are also in the database as an employee.
For example I have this dataframe:
Employee Name Employee E-mail Manager Name Manager E-mail
Pedro [email protected] Paul [email protected]
Paul N/A Carlos [email protected]
Richard [email protected] Josh [email protected]
Carlos [email protected] Peter #
Maria # Bob N/A
Josh [email protected] Carlos [email protected]
This would return the following dataframe:
Employee Name Employee E-mail Manager Name Manager E-mail
Richard [email protected] Josh [email protected]
Josh [email protected] Carlos [email protected]
What would be the best way to do it?
CodePudding user response:
IIUC, you can use masks and boolean indexing:
# is the employee email valid? you can use a different pattern e.g. '@company\.com'
m1 = df['Employee E-mail'].str.contains('@').fillna(False)
# is the manager email valid?
m2 = df['Manager E-mail'].str.contains('@').fillna(False)
# is the manager also an employee?
m3 = df['Manager E-mail'].isin(df['Employee E-mail'])
# all conditions True
df2 = df.loc[m1&m2&m3]
output:
Employee Name Employee E-mail Manager Name Manager E-mail
2 Richard [email protected] Josh [email protected]
5 Josh [email protected] Carlos [email protected]