Home > Blockchain >  Pandas: Sort a subset under certain conditions
Pandas: Sort a subset under certain conditions

Time:04-24

I have two data frames, one contains information about employees and the other contains information about projects. They look like this:

df_employees.head(5)

      id    name    salary
 0  0469    Aurore  18100
 1  4c13    Malinda 16100
 2  67f2    Cephus  20500
 3  7b36    Doctor  16000
 4  5c65    Paloma  27100

df_projects.head(5)

    employee_id project_id  start_date  end_date
 0  8411    03245e5933eb    2017-03-16  2017-03-30
 1  b67d    033af173480e    2017-02-27  2017-04-06
 2  5ca8    033af173480e    2017-02-27  2017-04-06
 3  bb0c    03fa66f3a3e8    2020-06-12  None
 4  6d22    03fa66f3a3e8    2020-06-12  None

Now I am trying to find the 3 lowest paid employees who have done at least 10 projects. For this, I started by merging both datasets, as follows:

import pandas as pd

df_mer = pd.merge(df_employees, df_projects, left_on= 'id', right_on= 'employee_id')

print(df_mer.head(10))

     id     name  salary employee_id    project_id  start_date    end_date
 0  0469   Aurore   18100        0469  0bd005702cc1  2020-06-17  2020-08-02
 1  0469   Aurore   18100        0469  8b2ce1e01e29  2018-02-05  2018-02-21
 2  0469   Aurore   18100        0469  d580afd6f249  2020-07-04  2020-07-28
 3  4c13  Malinda   16100        4c13  05b299e8ecd9  2018-02-06  2018-03-13
 4  4c13  Malinda   16100        4c13  42c7826cb8e1  2019-02-08  2019-05-21
 5  4c13  Malinda   16100        4c13  64e01cf58730  2018-04-28  2018-07-10
 6  4c13  Malinda   16100        4c13  7854dcf39808  2019-12-05        None
 7  4c13  Malinda   16100        4c13  8505058db062  2018-04-11  2018-05-29
 8  4c13  Malinda   16100        4c13  908c863bccdd  2019-02-04  2019-05-14
 9  4c13  Malinda   16100        4c13  c7bb3ababa5c  2018-02-18  2018-10-12

Then I wrote this command to obtain the number of projects done by each employee:

vn = df_mer['name'].value_counts()
vn.head(5)
Arielle     15
Cornell     10
Devonta     10
Phylicia    10
Abigail      9

Now I would like to find the 3 lowest paid employees who have done at least 10 projects, but I have no idea as to how to do it. If someone could give me a hint I would be very grateful. Thank you.

CodePudding user response:

Could you try using nsmallest for the first df. Something like

df_employees.nsmallest(3,'salary')

and group by employee and counting the projects on the second

df_projects.groupby('employee_id')['project_id'].count()

Filter the results having above 10 and merging these back together ?

CodePudding user response:

vn = df_mer['name'].value_counts()
a=vn[vn > 10].index.tolist()
b=df_mer[df_mer['name'].isin(a)].drop_duplicates(subset='name',keep='first')
c=b.nsmallest(3, 'salary')

3_lowest_paid_employees=c.name
  • Related