Note that this question does not ask whether in pandas we can apply functions on more than one columns during aggregation. Here is an example:
The data frame:
A x y
foo 0 0
foo 1 1
foo 2 2
foo 3 3
bar 0 2
bar 2 3
bar 4 4
bar 6 5
I want to group this table by column A
and compute the linear regression y=k*x b
on each group. So we want to achieve this:
A k b
foo 1.0 0.0
bar 0.5 2.0
I tried group by index A
, and use aggregate
method:
grouped = table.groupby('A')
grouped.aggregate(f)
def f():
pass
While I find out that this method will split the tabel into series and feed this series into the function f
, so f
cannot access two columns at the same time.
So, how can I do such "aggregation" function that acts on multiple columns in a split-apply-combine style?
CodePudding user response:
Use groupby.apply
with scipy.stats.linregress
:
from scipy.stats import linregress
out = (df.groupby('A', as_index=False)
.apply(lambda g: pd.Series(linregress(g['x'], g['y'])[:2],
index=['k', 'b']))
)
NB. the first two output parameters of linregress
are your k
and b
.
Output:
A k b
0 bar 0.5 2.0
1 foo 1.0 0.0
Solution with custom function:
from scipy.stats import linregress
def f(x):
t = linregress(x['x'], x['y'])
return pd.Series({'k': t.slope, 'b': t.intercept})
df = df.groupby('A', as_index=False).apply(f)
print (df)
A k b
0 bar 0.5 2.0
1 foo 1.0 0.0
CodePudding user response:
If need processing multiple columns togther use GroupBy.apply
def f(x):
print (x)
grouped = table.groupby('A').apply(f)