I have a pandas data frame like given below
Id1 YEAR CLAIM_STATUS no_of_claims
1 2019-01 4 1
1 2019-01 5 1
1 2019-02 4 1
1 2019-02 5 1
1 2019-03 4 6
1 2019-03 5 2
1 2019-04 5 1
1 2019-04 6 1
1 2019-05 5 2
1 2019-06 4 1
1 2019-06 5 1
Here claim_STATUS status 4 means claim is rejected I need to find percentage of claims rejected for each Id1 at each year where percentage of claims rejected is calculated like
claim-status-4/(claim-status-4 claim-status-5 claim-status-6)
in above table claim rejected percentage is (1/(1 6 3))*100 = 10%
I need to convert above dataframe into output given below:
Id1 YEAR_MO % of claims rejected
1 2019-01 50%
1 2019-02 50%
1 2019-03 75%
1 2019-04 0%
1 2019-05 0%
1 2019-06 50%
CodePudding user response:
Test column claim_STATUS
and assign back, aggregate sum
and last divide columns:
print (df.assign(CLAIM_STATUS = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims'])))
Id1 YEAR CLAIM_STATUS no_of_claims
0 1 2019-01 1 1
1 1 2019-01 0 1
2 1 2019-02 1 1
3 1 2019-02 0 1
4 1 2019-03 6 6
5 1 2019-03 0 2
6 1 2019-04 0 1
7 1 2019-04 0 1
8 1 2019-05 0 2
9 1 2019-06 1 1
10 1 2019-06 0 1
df = (df.assign(CLAIM_STATUS = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims']))
.groupby(['Id1','YEAR'], as_index=False).sum())
print (df)
Id1 YEAR CLAIM_STATUS no_of_claims
0 1 2019-01 1 2
1 1 2019-02 1 2
2 1 2019-03 6 8
3 1 2019-04 0 2
4 1 2019-05 0 2
5 1 2019-06 1 2
#used pop for divide and drop columns
df['% of claims rejected'] = df.pop('CLAIM_STATUS').div(df.pop('no_of_claims')).mul(100)
print (df)
Id1 YEAR % of claims rejected
0 1 2019-01 50.0
1 1 2019-02 50.0
2 1 2019-03 75.0
3 1 2019-04 0.0
4 1 2019-05 0.0
5 1 2019-06 50.0
EDIT:
For better debugging si possible use:
df = (df.assign(new = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims']))
.groupby(['Id1','YEAR'], as_index=False).sum())
print (df)
Id1 YEAR CLAIM_STATUS no_of_claims new
0 1 2019-01 9 2 1
1 1 2019-02 9 2 1
2 1 2019-03 9 8 6
3 1 2019-04 11 2 0
4 1 2019-05 5 2 0
5 1 2019-06 9 2 1
df['% of claims rejected'] = df['new'].div(df['no_of_claims']).mul(100)
print (df)
Id1 YEAR CLAIM_STATUS no_of_claims new % of claims rejected
0 1 2019-01 9 2 1 50.0
1 1 2019-02 9 2 1 50.0
2 1 2019-03 9 8 6 75.0
3 1 2019-04 11 2 0 0.0
4 1 2019-05 5 2 0 0.0
5 1 2019-06 9 2 1 50.0
df = df.drop(['no_of_claims','CLAIM_STATUS','new'], axis=1)
print (df)
Id1 YEAR % of claims rejected
0 1 2019-01 50.0
1 1 2019-02 50.0
2 1 2019-03 75.0
3 1 2019-04 0.0
4 1 2019-05 0.0
5 1 2019-06 50.0