Home > Mobile >  Filter out rows that are the same in one column but have multiple values in another columns respecti
Filter out rows that are the same in one column but have multiple values in another columns respecti

Time:04-18

To say I have a dataframe with three columns like:

index A B C
1 foo One 1
2 foo Two 2
3 foo Three 3
4 bar One 2
5 bar One 1
6 num Two 3
7 num Three 3

In this case, how may I filter out the rows that have the same value in column B but more than one respective value in column C by using Python Pandas?

The rows that I need is 1, 2, 4, 5, 6 because "One" in column B has two corresponding values (1 and 2) in column C and "Two" in column B has two corresponding values as well. Eventually I want to group them by column A if possible.

CodePudding user response:

Not an optimised solution but will get your work done:

import pandas as pd


# create dataframe
df = pd.DataFrame([['foo','One',1],['foo','Two',2],['foo','Three',3],['bar','One',2], ['bar','One',1],['num','Two',3],['num','Three',3]], index = range(1,8), columns = ['A','B','C'])

# get the unique values present in column B
values = list(df['B'].unique())

result = pd.DataFrame()
# iterate through the unique values and for each unique value check the corresponding values in C
for val in values:
    unique_values = list(df[df['B'] == val]['C'].unique())
    # if the unique values in column C is more than 1, it satisfies your condition and hence can be added into your result dataFrame.
    if len(unique_values) > 1:
        result = result.append(df[df['B'] == val])

print(result)

The result is the rows 1, 2, 4, 5, 6.

Always show your work in the question.

CodePudding user response:

You can try groupby B column then filter by the value_counts of C column.

out = df.groupby('B').filter(lambda group: len(group['C'].value_counts()) > 1)
print(out)

   index    A    B  C
0      1  foo  One  1
1      2  foo  Two  2
3      4  bar  One  2
4      5  bar  One  1
5      6  num  Two  3
  • Related