Home > Back-end >  pandas group by and calculate result using values from the same column (R equivalent included)
pandas group by and calculate result using values from the same column (R equivalent included)

Time:11-06

I want to group and aggregate and then calculate a ratio based on values in a certain column.

In R it's pretty straight forward.

df = data.frame(a = c('a', 'a', 'b', 'b'), 
                b = c('x', 'y', 'x', 'y'), 
                value = 1:4)

df %>% 
  group_by(a) %>%
  summarise(calc = value[b == 'x']/value[b == 'y']) ## (1/2) and (3/4)

In python I tried

df = pd.DataFrame({'a': ['a', 'a', 'b', 'b'],
                   'b': ['x', 'y', 'x', 'y'],
                   'value': [1, 2, 3, 4]})

df.groupby('a').agg(df[df['b'] == 'x'] / df[df['b'] == 'y'])

But its throwing errors

CodePudding user response:

You can try this:

import pandas as pd
import numpy as np

cond1 = lambda x: x['value'].loc[x['b'].eq('x')].to_numpy()
cond2 = lambda x: x['value'].loc[x['b'].eq('y')].to_numpy()

(df.groupby('a').apply(lambda x: (cond1(x) / cond2(x))[0])
 .reset_index(name = 'result'))

   a  result
0  a    0.50
1  b    0.75

Or in a slightly different form we could do:

(df.groupby('a').apply(lambda x: np.divide(cond1(x), cond2(x)))
 .reset_index(name = 'result')
 .explode('result'))

   a result
0  a    0.5
1  b   0.75

For this case, you can use a pivot:

df.pivot(index='a',columns='b',values='value').pipe(lambda df: df.x/df.y)
Out[9]: 
a
a    0.50
b    0.75
dtype: float64

For this specific use case, you do not need a groupby, as there is no aggregation really happening here:

temp = df.set_index('a')
b_x = temp.loc[temp.b.eq('x'), 'value']
b_y = temp.loc[temp.b.eq('y'), 'value']
b_x/b_y
Out[23]: 
a
a    0.50
b    0.75
Name: value, dtype: float64

CodePudding user response:

You could do:

df.pivot('a', 'b', 'value').assign(calc = lambda x: x.x/x.y).reset_index()
 
b  a  x  y  calc
0  a  1  2  0.50
1  b  3  4  0.75
  • Related