I'm trying to calculate the ratio by columns in python.
import pandas as pd
import numpy as np
data={
'category': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'value 1': [1, 1, 2, 5, 3, 4, 4, 8, 7],
'value 2': [4, 2, 8, 5, 7, 9, 3, 4, 2]
}
data=pd.DataFrame(data)
data.set_index('category')
# value 1 value 2
#category
# A 1 4
# B 1 2
# C 2 8
# D 5 5
# E 3 7
# F 4 9
# G 4 3
# H 8 4
# I 7 2
The expected results is as below:
#The sum of value 1: 35, value 2: 44
#The values in the first columns were diveded by 35, and the second columns were divded by 44
# value 1 value 2
#category
# A 0.028 0.090
# B 0.028 0.045
# C 0.057 0.181
# D 0.142 0.113
# E 0.085 0.159
# F 0.114 0.204
# G 0.114 0.068
# H 0.228 0.090
# I 0.2 0.045
I tried to run the below code, but it returned NaN values:
data=data.apply(lambda x:x/data.sum())
data
I think that there are simpler methods for this job, but I cannot search the proper keywords..
How can I calculate the ratio in each column?
CodePudding user response:
The issue is that you did not make set_index
permanent.
What I usually do to ensure I do correct things is using pipelines
data=pd.DataFrame(data)
dataf =(
data
.set_index('category')
.transform(lambda d: d/d.sum())
)
print(dataf)
By piping commands, you get what you want. Note: I used transform
instead of apply
for speed.
They are easy to read, and less prune to mistake. Using inplace=True
is discouraged in Pandas as the effects could be unpredictable.