Home > database >  How to keep multiple rows if they have the same value within a column AND at least one of them conta
How to keep multiple rows if they have the same value within a column AND at least one of them conta

Time:08-03

I have searched multiple threads and cannot seem to figure this out. Any help would be appreciated. Suppose I have the following data set (I have simplified it for the sake of this question)

enter image description here

I want to group together all rows that contain the same value in COL1 then search in COL2 for the string "red" for those specific rows. If at least one of the rows in that group contains "red", then I want to keep all of those rows. Thus, for this dataset, the output should look like this:

enter image description here

Any help would be greatly appreciated. I am working in python. Thank you!

CodePudding user response:

df[df['col1'].isin(df[df['col2'] == 'red']['col1'])]


    col1    col2
0   1   red
1   1   yellow
2   1   green
7   3   red
8   3   pink
9   3   green

CodePudding user response:

Using Groupby and checking if any rows in COL2 in group have color red

df[df.groupby("COL1").COL2.transform(lambda x: x.eq("red").any())]

Output

    COL1    COL2
0   1   red
1   1   yellow
2   1   green
7   3   red
8   3   pink
9   3   green

Explanation

mask = df.groupby("COL1").COL2.transform(lambda x: x.eq("red").any())

mask is True if any items in group in COL2 have color red

0     True
1     True
2     True
3    False
4    False
5    False
6    False
7     True
8     True
9     True

CodePudding user response:

Do you mean 'red' has value 1 and 3 in Col1, therefore you would like to keep all rows with value 1 and 3 in Col1? You can try this:

df[df['Col1'].isin(df['Col1'][df['Col2']=='red'])]

To explain, I'm using a filter to extract the relevant rows:

filter = df['Col1'][df['Col2']=='red']
df1 = df[df['Col1'].isin(filter)]
print(df1)

Output

  Col1    Col2
0    1     red
1    1  yellow
2    1   green
6    3     red
7    3   pink
8    3   green
  • Related