Filter and apply condition between multiple rows-CodePudding

I have the following dataframe:

client_id   location_id      region_name    location_name
1                123          Florida        location_ABC
6                123          Florida(P)     location_ABC
6                845          Miami(P)       location_THE
1                386          Boston         location_WOP
6                386          Boston(P)      location_WOP

What I'm trying to do is:

If some location_id has more than one client_id, I'll pick the client_id == 1.
If some location_id has only one client_id, I'll pick whatever row it is.

If we were implementing only one logic, it should be as simple as df[df['client_id'] == 1]. But I can not figure out how to perform this type of filtering that requires verifying more rows at the same time (figure out how to check if some location_id has more then one client_id, for example).

So, in this scenario, the resulting data frame would be:

client_id   location_id      region_name    location_name
1                123          Florida        location_ABC
6                845          Miami(P)       location_THE
1                386          Boston         location_WOP

Any ideas?

CodePudding user response：

You can use idxmax with a custom groupby on the boolean Series equal to your preferred id, then slice:

out = df.loc[df['client_id'].eq(1).groupby(df['location_id'], wort=False).idxmax()]

output:

   client_id  location_id region_name location_name
0          1          123     Florida  location_ABC
2          6          845    Miami(P)  location_THE
3          1          386      Boston  location_WOP