I have the following data:
df = pd.DataFrame({
'encounter' : [1, 1, 1, 2, 3, 3],
'project_id' : ['A','A','A','B','C','C'],
'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})
Each encounter is unique (and has a corresponding unique project_id
and datetime
) and denotes a clinician diagnosing a patient with 1 or more diagnoses. I'm trying to find all the groups that contain a particular diagnosis e.g. F12.
I don't want to just filter for F12; I want to groupby encounter ( /- project_id
and datetime
?) and filter for groups containing F12, so I can also see what other diagnoses are common with F12.
I'm unsure how to go about this - I've tried setting multi-indexes/different groupby approaches etc. but I'm not getting anywhere. For the above data, my desired output would be the same df excluding Row 3 as below:
Index | encounter | project_id | datetime | diagnosis |
---|---|---|---|---|
0 | 1 | A | 2017-01-18 | F12 |
1 | 1 | A | 2017-01-18 | A11 |
2 | 1 | A | 2017-01-18 | B11 |
4 | 3 | C | 2020-01-18 | F12 |
5 | 3 | C | 2020-01-18 | B22 |
CodePudding user response:
Filter diagnosis
for F12
and get matched encounter
values and then again filter original column encounter
by Series.isin
in boolean indexing
:
df = df[df['encounter'].isin(df.loc[df['diagnosis'].eq('F12'), 'encounter'])]
print (df)
encounter project_id datetime diagnosis
0 1 A 2017-01-18 F12
1 1 A 2017-01-18 A11
2 1 A 2017-01-18 B11
4 3 C 2020-01-18 F12
5 3 C 2020-01-18 B22
CodePudding user response:
You can use GroupBy.transform
to check if any value is F12 in the group and perform boolean indexing:
df[df['diagnosis'].eq('F12').groupby(df['encounter']).transform('any')]
Or, alternatively with filter
:
df.groupby('encounter').filter(lambda d: d['diagnosis'].eq('F12').any())
output:
encounter project_id datetime diagnosis
0 1 A 2017-01-18 F12
1 1 A 2017-01-18 A11
2 1 A 2017-01-18 B11
4 3 C 2020-01-18 F12
5 3 C 2020-01-18 B22
CodePudding user response:
I interpreted as you need all encounters which is associated in any way to diagnosis 'F12' in the problem. See if this works.
df[df['project_id'].isin(df.loc[df['diagnosis'].eq('F12'), 'project_id']) | df['datetime'].isin(df.loc[df['diagnosis'].eq('F12'), 'datetime'])]