Filter and apply multiple conditions between multiple rows-CodePudding

I have the following dataframe:

id  location     method
1      456        Phone
1      456        OS
6      456        OS
6      943        Specialist

What I'm trying to do, is to implement the following logic:

If there's only one record (consider the combination of location method), I'll just do nothing. That's the scenario for the first and last row.
If there's more than one record (location method), I want to keep only those where the ID == 1.

So, the resulting dataframe would be:

id  location     method
1      456        Phone
1      456        OS
6      943        Specialist

If I'm trying to only filter by the id column, I have this solution: df.loc[df['id'].eq(1).groupby(df['location'], sort=False).idxmax()] (Reference: Filter and apply condition between multiple rows)

But I can not figure out how to perform this filter combined with the "method" column. Any ideas?

CodePudding user response：

A possible solution:

(df.sort_values(by='id')
 .groupby(['location', 'method']).first()
 .reset_index().sort_index(axis=1))

Output:

   id  location      method
0   1       456          OS
1   1       456       Phone
2   6       943  Specialist

CodePudding user response：

Firstly, you can use groupby on multiple columns (location and method). Then on the "grouped" dataframes, you can select one of the rows as you need (here I sort by "id", and select the first one).

# generate the dataframe
df = pd.DataFrame(
    [
        [1,      456,     "Phone"],
        [1,      456,     "OS"],
        [6,      456,     "OS"],
        [6 ,     943,     "Specialist"]
    ],
    columns=["id",  "location", "method"]
)

# groupby, sort, and select the first row
n_df = df.groupby(["location", "method"]).apply(lambda x:x.sort_values("id").iloc[0])

# remove the index that groupby generates
n_df.reset_index(drop=True, inplace=True)

n_df
    id  location    method
0   1   456 OS
1   1   456 Phone
2   6   943 Specialist