Home > other >  How do I filter pandas dataframe based on whether the groups contain a certain column value?
How do I filter pandas dataframe based on whether the groups contain a certain column value?

Time:04-13

I have the following data:

df = pd.DataFrame({
    'encounter' : [1, 1, 1, 2, 3, 3],
    'project_id' : ['A','A','A','B','C','C'],
    'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
    'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})

Each encounter is unique (and has a corresponding unique project_id and datetime) and denotes a clinician diagnosing a patient with 1 or more diagnoses. I'm trying to find all the groups that contain a particular diagnosis e.g. F12.

I don't want to just filter for F12; I want to groupby encounter ( /- project_id and datetime?) and filter for groups containing F12, so I can also see what other diagnoses are common with F12.

I'm unsure how to go about this - I've tried setting multi-indexes/different groupby approaches etc. but I'm not getting anywhere. For the above data, my desired output would be the same df excluding Row 3 as below:

Index encounter project_id datetime diagnosis
0 1 A 2017-01-18 F12
1 1 A 2017-01-18 A11
2 1 A 2017-01-18 B11
4 3 C 2020-01-18 F12
5 3 C 2020-01-18 B22

CodePudding user response:

Filter diagnosis for F12 and get matched encounter values and then again filter original column encounter by Series.isin in boolean indexing:

df = df[df['encounter'].isin(df.loc[df['diagnosis'].eq('F12'), 'encounter'])]
print (df)
   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

CodePudding user response:

You can use GroupBy.transform to check if any value is F12 in the group and perform boolean indexing:

df[df['diagnosis'].eq('F12').groupby(df['encounter']).transform('any')]

Or, alternatively with filter:

df.groupby('encounter').filter(lambda d: d['diagnosis'].eq('F12').any())

output:

   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

CodePudding user response:

I interpreted as you need all encounters which is associated in any way to diagnosis 'F12' in the problem. See if this works.

df[df['project_id'].isin(df.loc[df['diagnosis'].eq('F12'), 'project_id']) | df['datetime'].isin(df.loc[df['diagnosis'].eq('F12'), 'datetime'])] 
  • Related