I have the following dataframe:
client_id location_id region_name location_name
1 123 Florida location_ABC
6 123 Florida(P) location_ABC
6 845 Miami(P) location_THE
1 386 Boston location_WOP
6 386 Boston(P) location_WOP
What I'm trying to do is:
- If some location_id has more than one client_id, I'll pick the client_id == 1.
- If some location_id has only one client_id, I'll pick whatever row it is.
If we were implementing only one logic, it should be as simple as df[df['client_id'] == 1]
. But I can not figure out how to perform this type of filtering that requires verifying more rows at the same time (figure out how to check if some location_id has more then one client_id, for example).
So, in this scenario, the resulting data frame would be:
client_id location_id region_name location_name
1 123 Florida location_ABC
6 845 Miami(P) location_THE
1 386 Boston location_WOP
Any ideas?
CodePudding user response:
You can use idxmax
with a custom groupby on the boolean Series equal to your preferred id, then slice:
out = df.loc[df['client_id'].eq(1).groupby(df['location_id'], wort=False).idxmax()]
output:
client_id location_id region_name location_name
0 1 123 Florida location_ABC
2 6 845 Miami(P) location_THE
3 1 386 Boston location_WOP