I am trying to set categories based on the grouping of members with the same admitting code and their mean length of stay value.
By this I mean, I have the following data frame:
MemberID | AdmittingCode | LOS |
---|---|---|
1 | a | 5 |
2 | a | 10 |
1 | b | 2 |
2 | b | 1 |
Now, in the above data frame I want to group based on admitting code and take the mean of LOS for that particular admitting code and if LOS is less than the mean it would be set as '0' category or else '1'.
So, for the above case for admitting code 'a', we have LOS as 5 and 10. Here, the mean is 7.5 so the MemeberID of 1 with AdmittingCode as 'a' with LOS '5' would be set as category 0. Similarly with the logic the following data frame is acquired:
MemberID | AdmittingCode | LOS | LOSCategory |
---|---|---|---|
1 | a | 5 | 0 |
2 | a | 10 | 1 |
1 | b | 2 | 1 |
2 | b | 1 | 0 |
CodePudding user response:
Use GroupBy.transform
with mean
and compare original column:
m = df.groupby('AdmittingCode')['LOS'].transform('mean').lt(df['LOS'])
df['LOSCategory'] = m.astype(int)
print (df)
MemberID AdmittingCode LOS LOSCategory
0 1 a 5 0
1 2 a 10 1
2 1 b 2 1
3 2 b 1 0
Or if need set to strings 1, 0
:
df['LOSCategory'] = m.astype(int).astype(str)
df['LOSCategory'] = np.where(m, '1', '0')