if cat_vars:
df["static_cat"] = (
df.groupby("group_col")
.agg({i: "first" for i in cat_vars})
.values.tolist()
)
Error:
packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
926 func = _maybe_mangle_lambdas(func)
927
--> 928 result, how = self._aggregate(func, *args, **kwargs)
929 if how is None:
930 return result
packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs)
355 obj.columns.intersection(keys)
356 ) != len(keys):
--> 357 raise SpecificationError("nested renamer is not supported")
358
359 from pandas.core.reshape.concat import concat
SpecificationError: nested renamer is not supported
A similar question is solved here.But I want it to be dynamic, i.e. depending on the elements in the cat_vars code should adapt.
for e.g.
if cat_vars=[var1,var2] I can pass agg(var1="first" ,var2="first"})
to solve the problem. but what if it has 3 vars?
I really appreciate any help you can provide.
CodePudding user response:
Data:
df = pd.DataFrame({'group_col':[1,1,2,2,3],
'var1':range(5),
'var2':list('abcde')})
cat_vars = ['var1','var2']
If need only one aggreagte function simplier is:
df1 = df.groupby("group_col")[cat_vars].first()
Or use named aggregation with pass dictionary :
df1 = df.groupby("group_col").agg(**{i:(i, "first") for i in cat_vars})
Seems your solution should working too:
df1 = df.groupby("group_col").agg({i: "first" for i in cat_vars})
print (df1)
var1 var2
group_col
1 0 a
2 2 c
3 4 e
EDIT:
For new columns use:
df = pd.DataFrame({'group_col':[1,1,2,2,3],
'var1':range(5),
'var2':list('abcde')})
cat_vars = ['var1','var2']
df2 = df.join(df.groupby("group_col")[cat_vars].transform('first').add_prefix('new_'))
print (df2)
group_col var1 var2 new_var1 new_var2
0 1 0 a 0 a
1 1 1 b 0 a
2 2 2 c 2 c
3 2 3 d 2 c
4 3 4 e 4 e