I want to group and aggregate and then calculate a ratio based on values in a certain column.
In R it's pretty straight forward.
df = data.frame(a = c('a', 'a', 'b', 'b'),
b = c('x', 'y', 'x', 'y'),
value = 1:4)
df %>%
group_by(a) %>%
summarise(calc = value[b == 'x']/value[b == 'y']) ## (1/2) and (3/4)
In python I tried
df = pd.DataFrame({'a': ['a', 'a', 'b', 'b'],
'b': ['x', 'y', 'x', 'y'],
'value': [1, 2, 3, 4]})
df.groupby('a').agg(df[df['b'] == 'x'] / df[df['b'] == 'y'])
But its throwing errors
CodePudding user response:
You can try this:
import pandas as pd
import numpy as np
cond1 = lambda x: x['value'].loc[x['b'].eq('x')].to_numpy()
cond2 = lambda x: x['value'].loc[x['b'].eq('y')].to_numpy()
(df.groupby('a').apply(lambda x: (cond1(x) / cond2(x))[0])
.reset_index(name = 'result'))
a result
0 a 0.50
1 b 0.75
Or in a slightly different form we could do:
(df.groupby('a').apply(lambda x: np.divide(cond1(x), cond2(x)))
.reset_index(name = 'result')
.explode('result'))
a result
0 a 0.5
1 b 0.75
For this case, you can use a pivot
:
df.pivot(index='a',columns='b',values='value').pipe(lambda df: df.x/df.y)
Out[9]:
a
a 0.50
b 0.75
dtype: float64
For this specific use case, you do not need a groupby, as there is no aggregation really happening here:
temp = df.set_index('a')
b_x = temp.loc[temp.b.eq('x'), 'value']
b_y = temp.loc[temp.b.eq('y'), 'value']
b_x/b_y
Out[23]:
a
a 0.50
b 0.75
Name: value, dtype: float64
CodePudding user response:
You could do:
df.pivot('a', 'b', 'value').assign(calc = lambda x: x.x/x.y).reset_index()
b a x y calc
0 a 1 2 0.50
1 b 3 4 0.75