I have the following data frame:
dataset = {
'ID':[1,1,1,1,2,2,2,2],
'Animal': ["cat","bat","cat","bat","monkey","monkey","bird","bird"],
'Bool':[True,False,False,False,True,False,False,False],}
df = pd.DataFrame(dataset)
df.groupby(["ID"])
I would like to select for each grouped subset by ID, the row that has the column df["Animal"]
equal to df["Animal"]
from another row where df["Bool"]
is True
.
The results would be stored in df["IS_Same"]
as shown in the expected output.
Expected output:
ID Animal Bool Is_Same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False
Is there a way to do this easily without using a loop ?
CodePudding user response:
Here is one option - looking forward and backward within the group
import pandas as pd
dataset = {
'ID':[1,1,1,1,2,2,2,2],
'Animal': ["cat","bat","cat","bat","monkey","monkey","bird","bird"],
'Bool':[True,False,False,False,True,False,False,False],}
df = pd.DataFrame(dataset)
df['Is_Same'] = df.groupby(['ID','Animal'])['Bool'].shift() | df.groupby(['ID','Animal'])['Bool'].shift(-1)
CodePudding user response:
Let us transform
Bool
with any
per ID
and Animal
df['Same'] = ~df['Bool'] & df.groupby(['ID', 'Animal'])['Bool'].transform('any')
ID Animal Bool Same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False
CodePudding user response:
You can try groupby
and compare Animal
column to the coresponding first True
Animal
out = (df.groupby(["ID"])
.apply(lambda g: g.assign(Is_Same=g['Animal'].eq(g.loc[g['Bool'], 'Animal'].item())))
.pipe(lambda df: df.assign(Is_Same=df['Is_Same'].mask(df['Bool'], False))))
print(out)
ID Animal Bool Is_Same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False
CodePudding user response:
Here is a simple and comprehensive solution using group_by
which generalize to any number of data in the group.
df = pd.DataFrame(dataset)
df['Is_same'] = False
df_grouped = = df.groupby(['ID', 'Animal'])
for group_name, df_group in df_grouped:
for row_index, row in df_group.iterrows():
if(any(df_group[~df_group.index.isin([row_index])].Bool)):
df.loc[row_index, 'Is_same'] = True
This gives us the expected output
ID Animal Bool Is_same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False