I have searched multiple threads and cannot seem to figure this out. Any help would be appreciated. Suppose I have the following data set (I have simplified it for the sake of this question)
I want to group together all rows that contain the same value in COL1 then search in COL2 for the string "red" for those specific rows. If at least one of the rows in that group contains "red", then I want to keep all of those rows. Thus, for this dataset, the output should look like this:
Any help would be greatly appreciated. I am working in python. Thank you!
CodePudding user response:
df[df['col1'].isin(df[df['col2'] == 'red']['col1'])]
col1 col2
0 1 red
1 1 yellow
2 1 green
7 3 red
8 3 pink
9 3 green
CodePudding user response:
Using Groupby and checking if any rows in COL2 in group have color red
df[df.groupby("COL1").COL2.transform(lambda x: x.eq("red").any())]
Output
COL1 COL2
0 1 red
1 1 yellow
2 1 green
7 3 red
8 3 pink
9 3 green
Explanation
mask = df.groupby("COL1").COL2.transform(lambda x: x.eq("red").any())
mask is True if any items in group in COL2 have color red
0 True
1 True
2 True
3 False
4 False
5 False
6 False
7 True
8 True
9 True
CodePudding user response:
Do you mean 'red' has value 1 and 3 in Col1, therefore you would like to keep all rows with value 1 and 3 in Col1? You can try this:
df[df['Col1'].isin(df['Col1'][df['Col2']=='red'])]
To explain, I'm using a filter to extract the relevant rows:
filter = df['Col1'][df['Col2']=='red']
df1 = df[df['Col1'].isin(filter)]
print(df1)
Output
Col1 Col2
0 1 red
1 1 yellow
2 1 green
6 3 red
7 3 pink
8 3 green