Home > OS >  Pandas - Grouping by column value and detecting values
Pandas - Grouping by column value and detecting values

Time:06-16

I have a data frame as such:

data = [['123', 'Yes', 'No', 'No'], ['123', 'No', 'Yes', 'No'],['1234', 'No', 'Yes', 'No']]
df = pd.DataFrame(data, columns=['ID', 'Object_1', 'Object_2', 'Object_3'])
ID Object_1 Object_2 Object_3
123 Yes No No
123 No Yes No
1234 No Yes No

I want to group by the ID column though the values for Object_1, Object_2 and Object_3 may be different. If the value 'Yes' exists, I would like that remain in the final grouped dataframe.

Desired output would be a dataframe with the following values:

ID Object_1 Object_2 Object_3
123 Yes Yes No
1234 No Yes No

CodePudding user response:

You can take advantage of the fact that Yes is lexicographically sorted after No:

df.groupby('ID', as_index=False).max()

Output:

     ID Object_1 Object_2 Object_3
0   123      Yes      Yes       No
1  1234       No      Yes       No

more robust/generic way

You can use an ordered Categorical type to handle any values, even more than two (e.g, No/Maybe/Yes):

# convert to Categorical
df.update(df.filter(like='Object')
            .apply(pd.Categorical,
                   categories=['No', 'Yes'],
                   ordered=True))

# get max per group
df.groupby('ID', as_index=False).max()
  • Related