How to calculate the ratio per columns in python?-CodePudding

I'm trying to calculate the ratio by columns in python.

import pandas as pd
import numpy as np

data={
    'category': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'value 1': [1, 1, 2, 5, 3, 4, 4, 8, 7],
    'value 2': [4, 2, 8, 5, 7, 9, 3, 4, 2]
}
data=pd.DataFrame(data)
data.set_index('category')
#        value 1    value 2
#category       
#       A      1          4
#       B      1          2
#       C      2          8
#       D      5          5
#       E      3          7
#       F      4          9
#       G      4          3
#       H      8          4
#       I      7          2

The expected results is as below:

#The sum of value 1: 35, value 2: 44
#The values in the first columns were diveded by 35, and the second columns were divded by 44
#            value 1    value 2
#category       
#       A      0.028      0.090
#       B      0.028      0.045
#       C      0.057      0.181
#       D      0.142      0.113
#       E      0.085      0.159
#       F      0.114      0.204
#       G      0.114      0.068
#       H      0.228      0.090
#       I      0.2        0.045

I tried to run the below code, but it returned NaN values:

data=data.apply(lambda x:x/data.sum())
data

I think that there are simpler methods for this job, but I cannot search the proper keywords..
How can I calculate the ratio in each column?

CodePudding user response：

The issue is that you did not make set_index permanent.

What I usually do to ensure I do correct things is using pipelines

data=pd.DataFrame(data)
dataf =(
 data
  .set_index('category')
  .transform(lambda d: d/d.sum())
)

print(dataf)

By piping commands, you get what you want. Note: I used transform instead of apply for speed.

They are easy to read, and less prune to mistake. Using inplace=True is discouraged in Pandas as the effects could be unpredictable.