Home > Software engineering >  Finding percentage of rejection in pandas dataframe
Finding percentage of rejection in pandas dataframe

Time:05-06

I have a pandas data frame like given below

Id1     YEAR    CLAIM_STATUS   no_of_claims
 1  2019-01       4               1
 1  2019-01       5               1
 1  2019-02       4               1
 1  2019-02       5               1
 1  2019-03       4               6
 1  2019-03       5               2
 1  2019-04       5               1
 1  2019-04       6               1
 1  2019-05       5               2
 1  2019-06       4               1
 1  2019-06       5               1

Here claim_STATUS status 4 means claim is rejected I need to find percentage of claims rejected for each Id1 at each year where percentage of claims rejected is calculated like

claim-status-4/(claim-status-4 claim-status-5 claim-status-6)

in above table claim rejected percentage is (1/(1 6 3))*100 = 10%

I need to convert above dataframe into output given below:

Id1   YEAR_MO    % of claims rejected
1   2019-01         50%
1   2019-02         50%
1   2019-03         75%
1   2019-04         0%
1   2019-05         0%
1   2019-06         50%

CodePudding user response:

Test column claim_STATUS and assign back, aggregate sum and last divide columns:

print (df.assign(CLAIM_STATUS = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims'])))
    Id1     YEAR  CLAIM_STATUS  no_of_claims
0     1  2019-01             1             1
1     1  2019-01             0             1
2     1  2019-02             1             1
3     1  2019-02             0             1
4     1  2019-03             6             6
5     1  2019-03             0             2
6     1  2019-04             0             1
7     1  2019-04             0             1
8     1  2019-05             0             2
9     1  2019-06             1             1
10    1  2019-06             0             1

df = (df.assign(CLAIM_STATUS = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims']))
        .groupby(['Id1','YEAR'], as_index=False).sum())

print (df)
   Id1     YEAR  CLAIM_STATUS  no_of_claims
0    1  2019-01             1             2
1    1  2019-02             1             2
2    1  2019-03             6             8
3    1  2019-04             0             2
4    1  2019-05             0             2
5    1  2019-06             1             2

#used pop for divide and drop columns
df['% of claims rejected'] = df.pop('CLAIM_STATUS').div(df.pop('no_of_claims')).mul(100)
print (df)
   Id1     YEAR  % of claims rejected
0    1  2019-01                  50.0
1    1  2019-02                  50.0
2    1  2019-03                  75.0
3    1  2019-04                   0.0
4    1  2019-05                   0.0
5    1  2019-06                  50.0

EDIT:

For better debugging si possible use:

df = (df.assign(new = df['CLAIM_STATUS'].eq(4).mul(df['no_of_claims']))
        .groupby(['Id1','YEAR'], as_index=False).sum())

print (df)
   Id1     YEAR  CLAIM_STATUS  no_of_claims  new
0    1  2019-01             9             2    1
1    1  2019-02             9             2    1
2    1  2019-03             9             8    6
3    1  2019-04            11             2    0
4    1  2019-05             5             2    0
5    1  2019-06             9             2    1

df['% of claims rejected'] = df['new'].div(df['no_of_claims']).mul(100)
print (df)
   Id1     YEAR  CLAIM_STATUS  no_of_claims  new  % of claims rejected
0    1  2019-01             9             2    1                  50.0
1    1  2019-02             9             2    1                  50.0
2    1  2019-03             9             8    6                  75.0
3    1  2019-04            11             2    0                   0.0
4    1  2019-05             5             2    0                   0.0
5    1  2019-06             9             2    1                  50.0


df = df.drop(['no_of_claims','CLAIM_STATUS','new'], axis=1)
print (df)
   Id1     YEAR  % of claims rejected
0    1  2019-01                  50.0
1    1  2019-02                  50.0
2    1  2019-03                  75.0
3    1  2019-04                   0.0
4    1  2019-05                   0.0
5    1  2019-06                  50.0
  • Related