Pandas groupby a column and check multiple conditions to create a new categorical column-CodePudding

Similar to what I had a query about, I am trying to groupby a particular admitting grouper code and this time also check other multiple conditions before setting the category.

For instance, I have the following data frame:

MemberID	AdmittingCode	LOS	Episode
1	a	5	3
2	a	10	6
1	b	2	3
2	b	1	6

And now, instead of simply just checking for the mean and setting categories I want to check whether the mean is 2 times the median, as well as the minimum episode count, is less than 5. By this I mean:

For AdmittingCode a, the mean is 7.5 ( (5 10) / 2 ). Initially, the logic was to set Category for MemberID 1 with AdmittingCode as a, 0 since it is less than 7.5 LOS (being only 5) and for MemberID 2 with AdmittingCode as a, 1 because it is more than 7.5 LOS (being 10) using the following code:

m = df.groupby('AdmittingCode')['LOS'].transform('mean').lt(df['LOS'])
df['LOSCategory'] = m.astype(int)

But, now I would like to check 2 more conditions before setting the category column.

I would like to check whether the mean is not 2 times the median
I would also like to check whether the Episode count is less than 5

If both the above conditions meet I would want the category to be set as 0 or else 1 (even for cases where only 1 above condition meet but not the other).

Note: Here, the mean and the median is based on per AdmittingCode, so a would have a different mean to that of b like the previous query.

With this logic, for AdmittingCode a the mean would be 7.5 and the median too 7.5 and for MemberID 1 with AdmittingCode a with LOS 5 and Episode count 3, the category would be set to 0. Here, the mean is not 2 times the median, as well as the count, is less than 5.

Finally, the following data frame is observed:

MemberID	AdmittingCode	LOS	Episode	LOSCategory
1	a	5	3	0
2	a	10	6	1
1	b	2	3	0
2	b	1	6	1

CodePudding user response：

>>> df['LOSCategory'] = (df.groupby("AdmittingCode")
             .apply(lambda x: x.apply(lambda xx:(x.LOS.mean() >= x.LOS.median()*2) | (xx.Episode >= 5), axis=1))
             .astype(int)
             .to_list()
                        )


>>> df

   MemberID  AdmittingCode  LOS  Episode  LOSCategory
0          1             a    5        3            0
1          2             a   10        6            1
2          1             b    2        3            0
3          2             b    1        6            1

CodePudding user response：

you need to write a function func which returns desired result.

LOScategory = df.apply(
         lambda row: func(row['MemberID'], row['AdmittingCode'],row['LOS'],row['Episode']),
         axis=1)
df['LOScategory'] = LOScategory