Similar to what I had a query about, I am trying to groupby a particular admitting grouper code and this time also check other multiple conditions before setting the category.
For instance, I have the following data frame:
MemberID | AdmittingCode | LOS | Episode |
---|---|---|---|
1 | a | 5 | 3 |
2 | a | 10 | 6 |
1 | b | 2 | 3 |
2 | b | 1 | 6 |
And now, instead of simply just checking for the mean and setting categories I want to check whether the mean is 2 times the median, as well as the minimum episode count, is less than 5. By this I mean:
For AdmittingCode a, the mean is 7.5 ( (5 10) / 2 ). Initially, the logic was to set Category for MemberID 1 with AdmittingCode as a, 0 since it is less than 7.5 LOS (being only 5) and for MemberID 2 with AdmittingCode as a, 1 because it is more than 7.5 LOS (being 10) using the following code:
m = df.groupby('AdmittingCode')['LOS'].transform('mean').lt(df['LOS'])
df['LOSCategory'] = m.astype(int)
But, now I would like to check 2 more conditions before setting the category column.
- I would like to check whether the mean is not 2 times the median
- I would also like to check whether the Episode count is less than 5
If both the above conditions meet I would want the category to be set as 0 or else 1 (even for cases where only 1 above condition meet but not the other).
Note: Here, the mean and the median is based on per AdmittingCode, so a would have a different mean to that of b like the previous query.
With this logic, for AdmittingCode a the mean would be 7.5 and the median too 7.5 and for MemberID 1 with AdmittingCode a with LOS 5 and Episode count 3, the category would be set to 0. Here, the mean is not 2 times the median, as well as the count, is less than 5.
Finally, the following data frame is observed:
MemberID | AdmittingCode | LOS | Episode | LOSCategory |
---|---|---|---|---|
1 | a | 5 | 3 | 0 |
2 | a | 10 | 6 | 1 |
1 | b | 2 | 3 | 0 |
2 | b | 1 | 6 | 1 |
CodePudding user response:
>>> df['LOSCategory'] = (df.groupby("AdmittingCode")
.apply(lambda x: x.apply(lambda xx:(x.LOS.mean() >= x.LOS.median()*2) | (xx.Episode >= 5), axis=1))
.astype(int)
.to_list()
)
>>> df
MemberID AdmittingCode LOS Episode LOSCategory
0 1 a 5 3 0
1 2 a 10 6 1
2 1 b 2 3 0
3 2 b 1 6 1
CodePudding user response:
you need to write a function func
which returns desired result.
LOScategory = df.apply(
lambda row: func(row['MemberID'], row['AdmittingCode'],row['LOS'],row['Episode']),
axis=1)
df['LOScategory'] = LOScategory