Home > front end >  Compare dataframe row-wise in Python
Compare dataframe row-wise in Python

Time:10-17

I have following dataframe in python and would like to select rows where col2 has 'MSH' and 'MDR' for the identical col1 values and store it in a dataframe.

col1  col2  col3
C1234 MSH   fever
C1234 MDR   fieber
C4566 MSF   malaria

The expected output is:

col1  col2  col3
C1234 MSH   fever
C1234 MDR   fieber

I have created a dictionary using below line of code. However, is there a way to do it on a dataframe level.

terms = {k: [g['col2'].tolist(), g['col3'].tolist()] for k,g in df.groupby('col1')}

Any help is highly appreciated.

CodePudding user response:

Create sets and compare if greater or equal values per groups - then are selected groups also if contains another value(s):

print (df)
    col1 col2     col3
0  C1234  MSH    fever
1  C1234  MDR   fieber
2  C1234  MSF  malaria
3  C4566  MSH    fever
4  C4566  MDR   fieber
5  C4567  MDR  malaria

s = set(['MSH','MDR'])
df1 = df[df.groupby('col1')['col2'].transform(lambda x: set(x) >= s)]
print (df1)
    col1 col2     col3
0  C1234  MSH    fever
1  C1234  MDR   fieber
2  C1234  MSF  malaria
3  C4566  MSH    fever
4  C4566  MDR   fieber

Or compare if unique values per groups has only values from set by equal:

s = set(['MSH','MDR'])
df2 = df[df.groupby('col1')['col2'].transform(lambda x: set(x) == s)]
print (df2)
    col1 col2    col3
3  C4566  MSH   fever
4  C4566  MDR  fieber
  • Related