How do I filter pandas dataframe based on whether the groups contain a certain column value?-CodePudding

I have the following data:

df = pd.DataFrame({
    'encounter' : [1, 1, 1, 2, 3, 3],
    'project_id' : ['A','A','A','B','C','C'],
    'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
    'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})

Each encounter is unique (and has a corresponding unique project_id and datetime) and denotes a clinician diagnosing a patient with 1 or more diagnoses. I'm trying to find all the groups that contain a particular diagnosis e.g. F12.

I don't want to just filter for F12; I want to groupby encounter ( /- project_id and datetime?) and filter for groups containing F12, so I can also see what other diagnoses are common with F12.

I'm unsure how to go about this - I've tried setting multi-indexes/different groupby approaches etc. but I'm not getting anywhere. For the above data, my desired output would be the same df excluding Row 3 as below:

Index	encounter	project_id	datetime	diagnosis
0	1	A	2017-01-18	F12
1	1	A	2017-01-18	A11
2	1	A	2017-01-18	B11
4	3	C	2020-01-18	F12
5	3	C	2020-01-18	B22

CodePudding user response：

Filter diagnosis for F12 and get matched encounter values and then again filter original column encounter by Series.isin in boolean indexing:

df = df[df['encounter'].isin(df.loc[df['diagnosis'].eq('F12'), 'encounter'])]
print (df)
   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

CodePudding user response：

You can use GroupBy.transform to check if any value is F12 in the group and perform boolean indexing:

df[df['diagnosis'].eq('F12').groupby(df['encounter']).transform('any')]

Or, alternatively with filter:

df.groupby('encounter').filter(lambda d: d['diagnosis'].eq('F12').any())

output:

   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

CodePudding user response：

I interpreted as you need all encounters which is associated in any way to diagnosis 'F12' in the problem. See if this works.

df[df['project_id'].isin(df.loc[df['diagnosis'].eq('F12'), 'project_id']) | df['datetime'].isin(df.loc[df['diagnosis'].eq('F12'), 'datetime'])]