I am pretty novel with programming and I've been stuck for a few days trying to figure out how to solve a problem. Let's say I have a Data Frame with the following info:
- Employee ID
- Manager ID
- Boolean (This column would be the result of a pd.isin(manager_df) from another DataFrame.
- Top Manager ID (This is the head of the business)
Top manager_df = This is a data frame with the Top manager's ID and the business they belong to and lead. This is a csv catalog that I want to keep updated so I do not need to change/fix lines of code with the managers.
What I am trying here is to get to figure out the top manager of every employee. There would be cases in which the Manager would be the Top and other cases in which there would be several managers in between both, the worker and the Top Manager.
For a few cases the Top Manager is the same as the managers I used the following code:
df_managers.loc[df_managers["Boolean"] == True, "Top Manager"] = df_managers["Worker's Manager WD ID"]
But it doesn´t work, I know if I pass it a string with the Employee ID of the Top manager that reports to it would work. But in this case, I´d like to know if there is a way of doing this way so if this catalog of manager_df is changed we do not need to fix the script. What I try to avoid is this:
# Having to change it manually.
df_managers.loc[df_managers["Manager ID"] == "123", "Top Manager"] = "123"
df_managers.loc[df_managers["Manager ID"] == "1234567", "Top Manager"] = "1234567"
So this would be one of the questions I have, but as I said before there are many cases in which the direct manager is not the Top Manager, so I was wondering if there is a way of reaching this top manager through the managers of the managers iterating somehow and checking if the name is in the manager_df with a pd.isin(). But I don't really know if that is possible.
The way we were doing it before was using the xlookup method in Excel, so I looked for a way of emulating it in python. However, it doesn't work out that well and it only reaches the manager of the manager and not the top managers in the cases where there are several. And I need to create a copy of the main data frame to compare both.
# xlook up in python
def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
match_value = return_array.loc[lookup_array == lookup_value]
if match_value.empty:
return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found
else:
return match_value.tolist()[0]
# This looks for the Manager ID in the Employee ID column of the other Dataframe to bring in the Manager ID to whom that employee reports.
copy_df["Top Manager"] = copy_df["Manager ID"].apply(xlookup, args = (main_df["Employee ID"],main_df["Manager ID"] ))
As I said, this doesn't look very practical and doesn't really work out since I will probably need a loop that I don't know how to build with this. I was wondering if there is a way with pandas loc or iloc, or even NumPy to solve this issue.
I leave here a toy df in case you want to give a try:
import pandas as pd
# In the case of the managers of the top managers, they report to the holding so their
managers would not be considered as our employees, since they are the head of the businesses.
df_top_managers = pd.DataFrame.from_dict(
{
'Employee ID': ['1007', '1008'],
'Manager ID': ['1009', '1009']
}
)
# The employee '1009' doesn't belong to our company so we put it there instead of having a NaN.
# The NaN in the top managers is what I want to fill up with the right ID of the top managers.
# The heads of the business are '1007' and'1008'.
df = pd.DataFrame.from_dict(
{
'Employee ID': ['1001', '1002', '1003', '1004','1005','1006','1007','1008'],
'Manager ID': ['1005', '1003', '1008', '1006','1007','1008','1009','1009'],
'Boolean': ['False', 'False', 'True', 'False','True','True','True','True'],
'Top Manager': ['NaN', 'NaN', 'NaN', 'NaN','NaN','NaN','NaN','NaN']
}
)
print(df)
My expected output would be something like this
# There might be several managers in between the Employee ID and the Top Manager, this is just a small example.
Employee ID Manager ID Boolean Top Manager (expected output)
1001 1005 False 1007
1002 1003 False 1008
1003 1008 True 1008
1004 1006 False 1007
1005 1007 True 1007
1006 1008 True 1007
1007 1009 True 1009 (1007)
1008 1009 True 1009 (1008)
I am aware it is a bit confusing and I have tried to explain it in the best possible way, but if there is any doubt about it, please feel free to ask!
Thank you very much again,
CodePudding user response:
Using df_top_managers
and df
shared in the question, if one wants to populate the column Top Manager
of df
with the Manager ID
in the dataframe df_top_managers
for a given Employee ID
, one can use .map
as follows
df['Top Manager'] = df['Employee ID'].map(df_top_managers.set_index('Employee ID')['Manager ID'])
[Out]:
Employee ID Manager ID Boolean Top Manager
0 1001 1005 False NaN
1 1002 1003 False NaN
2 1003 1008 False NaN
3 1004 1007 False NaN
4 1005 1007 False NaN
5 1006 1008 False NaN
6 1007 1009 True 1009
7 1008 1009 True 1009