I am attempting to add a new variable value_% to the df which is a calculation using values from existing variables such that:
value_% = (value/TOTAL (right))*100
current:
Date Band value
03:Q1 <620 72.12
03:Q2 <620 71.41
03:Q3 <620 78.15
03:Q4 <620 90.06
desired:
Date Band value value_%
03:Q1 <620 72.12 7.34
03:Q2 <620 71.41 7.04
03:Q3 <620 78.15 7.35
03:Q4 <620 90.06 8.76
And is there a way i can repeat this for all 'Band' types in the df. Note the data is initially in wide form but i have converted it to long format.
df = pd.read_csv('https://raw.githubusercontent.com/GeorgeRobbin/GeorgeRobbin.github.io/main/Credit_Score.csv')
df = pd.melt(df3, id_vars=["Date"], var_name=("Band"))
CodePudding user response:
It might be best to compute the percentage fro df3
before the melt:
out = (df3
.set_index('Date')
.pipe(lambda d: d.div(d['TOTAL '], axis=0).mul(100))
.reset_index()
#.assign(**{'TOTAL ': df3['TOTAL ']}) # uncomment to keep original TOTAL
.melt(id_vars=["Date"], var_name=("Band"))
)
Output:
Date Band value
0 03:Q1 <620 7.449948
1 03:Q2 <620 7.038242
2 03:Q3 <620 7.346996
3 03:Q4 <620 8.759848
4 04:Q1 <620 12.072168
.. ... ... ...
427 19:Q4 TOTAL 100.000000
428 20:Q1 TOTAL 100.000000
429 20:Q2 TOTAL 100.000000
430 20:Q3 TOTAL 100.000000
431 20:Q4 TOTAL 100.000000
[432 rows x 3 columns]