I have two data frames, one contains information about employees and the other contains information about projects. They look like this:
df_employees.head(5)
id name salary
0 0469 Aurore 18100
1 4c13 Malinda 16100
2 67f2 Cephus 20500
3 7b36 Doctor 16000
4 5c65 Paloma 27100
df_projects.head(5)
employee_id project_id start_date end_date
0 8411 03245e5933eb 2017-03-16 2017-03-30
1 b67d 033af173480e 2017-02-27 2017-04-06
2 5ca8 033af173480e 2017-02-27 2017-04-06
3 bb0c 03fa66f3a3e8 2020-06-12 None
4 6d22 03fa66f3a3e8 2020-06-12 None
Now I am trying to find the 3 lowest paid employees who have done at least 10 projects. For this, I started by merging both datasets, as follows:
import pandas as pd
df_mer = pd.merge(df_employees, df_projects, left_on= 'id', right_on= 'employee_id')
print(df_mer.head(10))
id name salary employee_id project_id start_date end_date
0 0469 Aurore 18100 0469 0bd005702cc1 2020-06-17 2020-08-02
1 0469 Aurore 18100 0469 8b2ce1e01e29 2018-02-05 2018-02-21
2 0469 Aurore 18100 0469 d580afd6f249 2020-07-04 2020-07-28
3 4c13 Malinda 16100 4c13 05b299e8ecd9 2018-02-06 2018-03-13
4 4c13 Malinda 16100 4c13 42c7826cb8e1 2019-02-08 2019-05-21
5 4c13 Malinda 16100 4c13 64e01cf58730 2018-04-28 2018-07-10
6 4c13 Malinda 16100 4c13 7854dcf39808 2019-12-05 None
7 4c13 Malinda 16100 4c13 8505058db062 2018-04-11 2018-05-29
8 4c13 Malinda 16100 4c13 908c863bccdd 2019-02-04 2019-05-14
9 4c13 Malinda 16100 4c13 c7bb3ababa5c 2018-02-18 2018-10-12
Then I wrote this command to obtain the number of projects done by each employee:
vn = df_mer['name'].value_counts()
vn.head(5)
Arielle 15
Cornell 10
Devonta 10
Phylicia 10
Abigail 9
Now I would like to find the 3 lowest paid employees who have done at least 10 projects, but I have no idea as to how to do it. If someone could give me a hint I would be very grateful. Thank you.
CodePudding user response:
Could you try using nsmallest for the first df. Something like
df_employees.nsmallest(3,'salary')
and group by employee and counting the projects on the second
df_projects.groupby('employee_id')['project_id'].count()
Filter the results having above 10 and merging these back together ?
CodePudding user response:
vn = df_mer['name'].value_counts()
a=vn[vn > 10].index.tolist()
b=df_mer[df_mer['name'].isin(a)].drop_duplicates(subset='name',keep='first')
c=b.nsmallest(3, 'salary')
3_lowest_paid_employees=c.name