How to groupby two columns, not considering order of values there?-CodePudding

I have a dataframe:

val1   val2   val3
a       b      10
a       b      2
b       a      3
f       k      5
f       k      2

when i do df.groupby(["val1", "val2"])["val3"].mean().reset_index() I get:

val1   val2   val3
a       b      6
b       a      3
f       k      3.5

but i don't want to take into account order of val1 and val2. so desired result is:

val1   val2   val3
a       b      5
f       k      3.5

How to do that?

CodePudding user response：

nm = ["val1", "val2"]
grp = df[nm].apply(lambda x: tuple(sorted(list(x))), axis=1)
s = df.val3.groupby(grp).mean()
s.index = pd.MultiIndex.from_tuples(s.index, name=nm)
s.reset_index()
#   val1 val2  val3
# 0    a    b   5.0
# 1    f    k   3.5

CodePudding user response：

Another solution, with frozenset:

x = (
    df.groupby(df[["val1", "val2"]].apply(frozenset, axis=1))
    .agg({"val1": "first", "val2": "first", "val3": "mean"})
    .reset_index(drop=True)
)
print(x.to_markdown())

Prints:

	val1	val2	val3
0	a	b	5
1	f	k	3.5