Given a dictionary
coaching_hours_per_level = {1:30, 2: 55, 3:80, 4:115}
coaching_hours_per_level
and a dataframe:
df1 = {'skill_0': {'jay': 1, 'roy': 4, 'axel': 5, 'billy': 1, 'charlie': 2},
'skill_1': {'jay': 5, 'roy': 3, 'axel': 2, 'billy': 5, 'charlie': 1},
'skill_2': {'jay': 4, 'roy': 1, 'axel': 2, 'billy': 1, 'charlie': 4},
'skill_3': {'jay': 1, 'roy': 3, 'axel': 5, 'billy': 4, 'charlie': 3},
'skill_4': {'jay': 3, 'roy': 4, 'axel': 2, 'billy': 3, 'charlie': 4},
'skill_5': {'jay': 5, 'roy': 2, 'axel': 4, 'billy': 2, 'charlie': 4},
'skill_6': {'jay': 5, 'roy': 5, 'axel': 2, 'billy': 5, 'charlie': 1},
'skill_7': {'jay': 3, 'roy': 3, 'axel': 4, 'billy': 2, 'charlie': 1},
'skill_8': {'jay': 1, 'roy': 4, 'axel': 2, 'billy': 1, 'charlie': 2},
'skill_9': {'jay': 4, 'roy': 3, 'axel': 4, 'billy': 2, 'charlie': 1}}
My target is:
target = {'skill_0': {'jim': 3},
'skill_1': {'jim': 5},
'skill_2': {'jim': 1},
'skill_3': {'jim': 2},
'skill_4': {'jim': 1},
'skill_5': {'jim': 2},
'skill_6': {'jim': 3},
'skill_7': {'jim': 5},
'skill_8': {'jim': 3},
'skill_9': {'jim': 3}}
What i want to do is to understand how many hours of coaching a person might need to catch up on a level of a certain skill. E.g., for Jay in skill_0, Jay has to upskill 2 levels (which is 30 55, total of 85h). If the skills is already at the same level or above, it should be 0.
I've tried with np.where
like below, and it works to just obtain the difference
np.where(df1>=target.values, 0, target.values-df1)
But when i try to access the dictionary to get a sum of the hours needed of coaching, it is like np.where doesn't vectorize anymore, even if i try to simply access the value in the dict
np.where(df1>=target.values, 0, coaching_hours_per_level[target.values 1])
CodePudding user response:
You can build an hour matrix to indicate how much time it takes to go from level x to level y.
First some sample data:
current = {
"skill_0": {"jay": 1, "roy": 4, "axel": 5, "billy": 1, "charlie": 2},
"skill_1": {"jay": 5, "roy": 3, "axel": 2, "billy": 5, "charlie": 1},
"skill_2": {"jay": 4, "roy": 1, "axel": 2, "billy": 1, "charlie": 4},
"skill_3": {"jay": 1, "roy": 3, "axel": 5, "billy": 4, "charlie": 3},
"skill_4": {"jay": 3, "roy": 4, "axel": 2, "billy": 3, "charlie": 4},
"skill_5": {"jay": 5, "roy": 2, "axel": 4, "billy": 2, "charlie": 4},
"skill_6": {"jay": 5, "roy": 5, "axel": 2, "billy": 5, "charlie": 1},
"skill_7": {"jay": 3, "roy": 3, "axel": 4, "billy": 2, "charlie": 1},
"skill_8": {"jay": 1, "roy": 4, "axel": 2, "billy": 1, "charlie": 2},
"skill_9": {"jay": 4, "roy": 3, "axel": 4, "billy": 2, "charlie": 1},
}
# We will up the challenge a bit by saying not everyone
# wants to level up every skill
target = {
"skill_0": {"jay": 3, "charlie": 5},
"skill_1": {"jay": 5, "charlie": 5},
"skill_2": {"jay": 1, "charlie": 1},
"skill_3": {"jay": 2, "charlie": 1},
"skill_4": {"jay": 1, "charlie": 1},
"skill_5": {"jay": 2},
"skill_6": {"jay": 3},
"skill_7": {"jay": 5},
"skill_8": {"jay": 3},
"skill_9": {"jay": 3},
}
The algorithm:
coaching_hours_per_level = {1:30, 2: 55, 3:80, 4:115}
hours = [0] list(coaching_hours_per_level.values())
# The value in hours_matrix[i, j] is the total time it takes
# to go from level (i 1) to level (j 1). Notice that
# hours_matrix[i, j] = 0 if i < j -- no time is needed to
# down-level.
hours_matrix = np.triu(
np.tile(hours, (len(hours), 1)),
k=1,
).cumsum(axis=1)
# Now line up the data
result = (
pd.concat(
[pd.DataFrame(current).unstack(), pd.DataFrame(target).unstack()],
axis=1,
keys=["current", "target"],
)
.dropna()
.astype("int")
)
# And the final step is just taking data from hours_matrix
result["hours"] = hours_matrix[result["current"] - 1, result["target"] - 1]
Result:
current target hours
skill_0 jay 1 3 85
charlie 2 5 250
skill_1 jay 5 5 0
charlie 1 5 280
skill_2 jay 4 1 0
charlie 4 1 0
skill_3 jay 1 2 30
charlie 3 1 0
skill_4 jay 3 1 0
charlie 4 1 0
skill_5 jay 5 2 0
skill_6 jay 5 3 0
skill_7 jay 3 5 195
skill_8 jay 1 3 85
skill_9 jay 4 3 0