Home > Software design >  group rows based on partial strings from two columns and sum values
group rows based on partial strings from two columns and sum values

Time:12-03

df = pd.DataFrame({'c1':['Ax','Ay','Bx','By'], 'c2':['Ay','Ax','By','Bx'], 'c3':[1,2,3,4]})

   c1  c2  c3
0  Ax  Ay   1
1  Ay  Ax   2
2  Bx  By   3
3  By  Bx   4

I'd like to sum the c3 values by aggregating the same xy combinations from the c1 and c2 columns.

The expected output is

  c1  c2  c3
0  x   y   4   #[Ax Ay]   [Bx By]
1  y   x   6   #[Ay Ax]   [By Bx]

CodePudding user response:

You can select values in c1 and c2 without first letters and aggregate sum:

df = df.groupby([df.c1.str[1:], df.c2.str[1:]]).sum().reset_index()
print (df)
  c1 c2  c3
0  x  y   4
1  y  x   6
  • Related