pandas group by and calculate result using values from the same column (R equivalent included)-CodePudding

I want to group and aggregate and then calculate a ratio based on values in a certain column.

In R it's pretty straight forward.

df = data.frame(a = c('a', 'a', 'b', 'b'), 
                b = c('x', 'y', 'x', 'y'), 
                value = 1:4)

df %>% 
  group_by(a) %>%
  summarise(calc = value[b == 'x']/value[b == 'y']) ## (1/2) and (3/4)

In python I tried

df = pd.DataFrame({'a': ['a', 'a', 'b', 'b'],
                   'b': ['x', 'y', 'x', 'y'],
                   'value': [1, 2, 3, 4]})

df.groupby('a').agg(df[df['b'] == 'x'] / df[df['b'] == 'y'])

But its throwing errors

CodePudding user response：

You can try this:

import pandas as pd
import numpy as np

cond1 = lambda x: x['value'].loc[x['b'].eq('x')].to_numpy()
cond2 = lambda x: x['value'].loc[x['b'].eq('y')].to_numpy()

(df.groupby('a').apply(lambda x: (cond1(x) / cond2(x))[0])
 .reset_index(name = 'result'))

   a  result
0  a    0.50
1  b    0.75

Or in a slightly different form we could do:

(df.groupby('a').apply(lambda x: np.divide(cond1(x), cond2(x)))
 .reset_index(name = 'result')
 .explode('result'))

   a result
0  a    0.5
1  b   0.75

For this case, you can use a pivot:

df.pivot(index='a',columns='b',values='value').pipe(lambda df: df.x/df.y)
Out[9]: 
a
a    0.50
b    0.75
dtype: float64

For this specific use case, you do not need a groupby, as there is no aggregation really happening here:

temp = df.set_index('a')
b_x = temp.loc[temp.b.eq('x'), 'value']
b_y = temp.loc[temp.b.eq('y'), 'value']
b_x/b_y
Out[23]: 
a
a    0.50
b    0.75
Name: value, dtype: float64

CodePudding user response：

You could do:

df.pivot('a', 'b', 'value').assign(calc = lambda x: x.x/x.y).reset_index()
 
b  a  x  y  calc
0  a  1  2  0.50
1  b  3  4  0.75