I have a dataframe:
val1 val2 val3
a b 10
a b 2
b a 3
f k 5
f k 2
when i do df.groupby(["val1", "val2"])["val3"].mean().reset_index()
I get:
val1 val2 val3
a b 6
b a 3
f k 3.5
but i don't want to take into account order of val1 and val2. so desired result is:
val1 val2 val3
a b 5
f k 3.5
How to do that?
CodePudding user response:
nm = ["val1", "val2"]
grp = df[nm].apply(lambda x: tuple(sorted(list(x))), axis=1)
s = df.val3.groupby(grp).mean()
s.index = pd.MultiIndex.from_tuples(s.index, name=nm)
s.reset_index()
# val1 val2 val3
# 0 a b 5.0
# 1 f k 3.5
CodePudding user response:
Another solution, with frozenset
:
x = (
df.groupby(df[["val1", "val2"]].apply(frozenset, axis=1))
.agg({"val1": "first", "val2": "first", "val3": "mean"})
.reset_index(drop=True)
)
print(x.to_markdown())
Prints:
val1 | val2 | val3 | |
---|---|---|---|
0 | a | b | 5 |
1 | f | k | 3.5 |