Imagine I have an order for specialists in some coding languages with multiple criterion in JSON format:
request = {'languages_required': {'Python': 4,
'Java': 2},
'other_requests': []
}
languages_required means that the candidate must have a skill in the language and the number is the minimum level of this language.
The format of candidates dataframe is long:
df = pd.DataFrame({'candidate': ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'd'],
'language': ['Python', 'Java', 'Scala', 'Python', 'R', 'Python', 'Java', 'Python', 'Scala', 'Java'],
'skill': [5, 4, 4, 6, 8, 1, 3, 5, 2, 2]})
That gives:
candidate language skill
0 a Python 5
1 a Java 4
2 a Scala 4
3 b Python 6
4 b R 8
5 c Python 1
6 c Java 3
7 d Python 5
8 d Scala 2
9 d Java 2
What I need to do is to keep the candidates and their skills in required languages that meet the requirements from the request, that is:
- Have skills in both mentioned languages
- Skills in these languages are equal or higher than values in the dictionary
So the desired output would be:
candidate language skill
0 a Python 5
1 a Java 4
7 d Python 5
9 d Java 2
I am able to filter the candidates with the languages based on keys() of the dictionary:
lang_mask = df[df['language'].isin(request['languages_required'].keys())]\
.groupby('candidate')['language']\
.apply(lambda x: set(request['languages_required']).issubset(x))
...but struggle with adding the 'is higher than' per language condition. I would really appreciate some help.
CodePudding user response:
You need call first condition in one step and then second in another step:
df = df[df['language'].map(request['languages_required']).le(df['skill'])]
df = df[df.groupby('candidate')['language'].transform(lambda x: set(request['languages_required']).issubset(x))]
print (df)
candidate language skill
0 a Python 5
1 a Java 4
7 d Python 5
9 d Java 2
Or one row solution:
df = (df[df['language'].map(request['languages_required']).le(df['skill'])]
.pipe(lambda x: x[x.groupby('candidate')['language'].transform(lambda x: set(request['languages_required']).issubset(x))]))
print (df)
candidate language skill
0 a Python 5
1 a Java 4
7 d Python 5
9 d Java 2