Given a dataframe like this:
Client | Priority |
---|---|
Client 1 | Low |
Client 1 | Medium |
Client 2 | Low |
Client 2 | High |
Client 3 | Low |
I want to group by client and select the highest priority (Low, Medium, High) for each client. It should look something like this:
Client | Priority |
---|---|
Client 1 | Medium |
Client 2 | High |
Client 3 | Low |
CodePudding user response:
convert priority
into a categorical column before grouping:
dtype = pd.CategoricalDtype(['High', 'Medium', 'Low'], ordered = True)
df['Priority'] = df.Priority.astype(dtype)
df.sort_values(['Client', 'Priority']).groupby('Client', sort = False).head(1)
Client Priority
1 Client 1 Medium
3 Client 2 High
4 Client 3 Low
if you do not want to convert to category, you can map the positions instead:
(df.assign(temp = df.Priority.map({'Low':3,'Medium':2, 'High':1}))
.sort_values(['Client', 'temp'])
.drop(columns='temp')
.groupby('Client')
.head(1)
)
Client Priority
1 Client 1 Medium
3 Client 2 High
4 Client 3 Low