Imagine I have a dataframe with employee IDs, their Hire Dates, the contract type (can be Employee or Contractor) and the company they where hired. Each employee may have as many rows for the same, or different companies and the same or different contract types.
ID Hire Date Contract Type Company
10000 2000.01.01 Employee Abc
10000 2001.01.01 Contractor Zxc
10000 2000.01.01 Employee Abc
10000 2000.01.01 Contractor Abc
10000 2002.01.01 Employee Cde
10000 2002.01.01 Employee Abc
10001 1999.03.11 Employee Zxc
10002 1989.01.01 Employee Abc
10002 1989.01.01 Contractor Cde
10002 1988.01.01 Contractor Zxc
10002 1999.01.01 Employee Abc
Per each ID, and each contract type they have, I need to identify the earliest hire date and assign that as their Primary Assignment (if the ID has 2 rows with the same contract type and same hire date in the same company just take the first value it appears and set to "Yes") for each unique Company they are hired, this way resulting on this dataframe:
ID Hire Date Contract Type Company Primary Assignment
10000 2000.01.01 Employee Abc Yes
10000 2001.01.01 Contractor Zxc Yes
10000 2000.01.01 Employee Abc No
10000 2000.01.01 Contractor Abc Yes
10000 2002.01.01 Employee Cde Yes
10000 2002.01.01 Employee Abc No
10001 1999.03.11 Employee Zxc Yes
10002 1989.01.01 Employee Abc Yes
10002 1989.01.01 Contractor Cde Yes
10002 1988.01.01 Contractor Zxc Yes
10002 1999.01.01 Employee Abc No
What would be the best way to achieve it?
CodePudding user response:
You can use groupby.idxmin
:
date = pd.to_datetime(df['Hire Date'])
idx = date.groupby([df['ID'], df['Contract Type'], df['Company']]).idxmin()
df['Primary Assignment'] = np.where(df.index.isin(idx), 'Yes', 'No')
Alternatively, with groupby.transform
and a mask if you want the flexibility to chose the value to keep:
date = pd.to_datetime(df['Hire Date'])
m1 = date.groupby([df['ID'], df['Contract Type'], df['Company']]).transform('min').eq(date)
m2 = ~df[['ID', 'Contract Type', 'Company']].duplicated()
df['Primary Assignment'] = np.where(m1&m2, 'Yes', 'No')
Output:
ID Hire Date Contract Type Company Primary Assignment
0 10000 2000.01.01 Employee Abc Yes
1 10000 2001.01.01 Contractor Zxc Yes
2 10000 2000.01.01 Employee Abc No
3 10000 2000.01.01 Contractor Abc Yes
4 10000 2002.01.01 Employee Cde Yes
5 10000 2002.01.01 Employee Abc No
6 10001 1999.03.11 Employee Zxc Yes
7 10002 1989.01.01 Employee Abc Yes
8 10002 1989.01.01 Contractor Cde Yes
9 10002 1988.01.01 Contractor Zxc Yes
10 10002 1999.01.01 Employee Abc No