I have following dataframe in python and would like to select rows where col2 has 'MSH' and 'MDR' for the identical col1 values and store it in a dataframe.
col1 col2 col3
C1234 MSH fever
C1234 MDR fieber
C4566 MSF malaria
The expected output is:
col1 col2 col3
C1234 MSH fever
C1234 MDR fieber
I have created a dictionary using below line of code. However, is there a way to do it on a dataframe level.
terms = {k: [g['col2'].tolist(), g['col3'].tolist()] for k,g in df.groupby('col1')}
Any help is highly appreciated.
CodePudding user response:
Create sets and compare if greater or equal values per groups - then are selected groups also if contains another value(s):
print (df)
col1 col2 col3
0 C1234 MSH fever
1 C1234 MDR fieber
2 C1234 MSF malaria
3 C4566 MSH fever
4 C4566 MDR fieber
5 C4567 MDR malaria
s = set(['MSH','MDR'])
df1 = df[df.groupby('col1')['col2'].transform(lambda x: set(x) >= s)]
print (df1)
col1 col2 col3
0 C1234 MSH fever
1 C1234 MDR fieber
2 C1234 MSF malaria
3 C4566 MSH fever
4 C4566 MDR fieber
Or compare if unique values per groups has only values from set by equal:
s = set(['MSH','MDR'])
df2 = df[df.groupby('col1')['col2'].transform(lambda x: set(x) == s)]
print (df2)
col1 col2 col3
3 C4566 MSH fever
4 C4566 MDR fieber