select rows that are equal in a column based on another Boolean column-CodePudding

I have the following data frame:

dataset = {
      'ID':[1,1,1,1,2,2,2,2],
     'Animal': ["cat","bat","cat","bat","monkey","monkey","bird","bird"],
    'Bool':[True,False,False,False,True,False,False,False],}

df = pd.DataFrame(dataset)
df.groupby(["ID"])

I would like to select for each grouped subset by ID, the row that has the column df["Animal"] equal to df["Animal"] from another row where df["Bool"] is True. The results would be stored in df["IS_Same"] as shown in the expected output.

Expected output:

  ID  Animal   Bool  Is_Same
0   1     cat   True    False
1   1     bat  False    False
2   1     cat  False     True
3   1     bat  False    False
4   2  monkey   True    False
5   2  monkey  False     True
6   2    bird  False    False
7   2    bird  False    False

Is there a way to do this easily without using a loop ?

CodePudding user response：

Here is one option - looking forward and backward within the group

import pandas as pd
dataset = {
      'ID':[1,1,1,1,2,2,2,2],
     'Animal': ["cat","bat","cat","bat","monkey","monkey","bird","bird"],
    'Bool':[True,False,False,False,True,False,False,False],}

df = pd.DataFrame(dataset)
df['Is_Same'] = df.groupby(['ID','Animal'])['Bool'].shift() | df.groupby(['ID','Animal'])['Bool'].shift(-1)

CodePudding user response：

Let us transform Bool with any per ID and Animal

df['Same'] = ~df['Bool'] & df.groupby(['ID', 'Animal'])['Bool'].transform('any')

   ID  Animal   Bool   Same
0   1     cat   True  False
1   1     bat  False  False
2   1     cat  False   True
3   1     bat  False  False
4   2  monkey   True  False
5   2  monkey  False   True
6   2    bird  False  False
7   2    bird  False  False

CodePudding user response：

You can try groupby and compare Animal column to the coresponding first True Animal

out = (df.groupby(["ID"])
       .apply(lambda g: g.assign(Is_Same=g['Animal'].eq(g.loc[g['Bool'], 'Animal'].item())))
       .pipe(lambda df: df.assign(Is_Same=df['Is_Same'].mask(df['Bool'], False))))

print(out)

   ID  Animal   Bool  Is_Same
0   1     cat   True    False
1   1     bat  False    False
2   1     cat  False     True
3   1     bat  False    False
4   2  monkey   True    False
5   2  monkey  False     True
6   2    bird  False    False
7   2    bird  False    False

CodePudding user response：

Here is a simple and comprehensive solution using group_by which generalize to any number of data in the group.

df = pd.DataFrame(dataset)
df['Is_same'] = False

df_grouped =  = df.groupby(['ID', 'Animal'])
for group_name, df_group in df_grouped:
    for row_index, row in df_group.iterrows():
        if(any(df_group[~df_group.index.isin([row_index])].Bool)):
            df.loc[row_index, 'Is_same'] = True

This gives us the expected output

   ID  Animal   Bool  Is_same
0   1     cat   True    False
1   1     bat  False    False
2   1     cat  False     True
3   1     bat  False    False
4   2  monkey   True    False
5   2  monkey  False     True
6   2    bird  False    False
7   2    bird  False    False