group rows based on column and sum their values-CodePudding

df = pd.DataFrame({'c1':['Ax','Bx','Ay','By'], 'c2':[1,2,3,4]})

   c1  c2
0  Ax  1
1  Bx  2
2  Ay  3
3  By  4

I'd like to group xs and ys in c1 and sum their respective c2 values.

Desired output:

   c1  c2
0  Cx  3
1  Cy  7

CodePudding user response：

example

df.groupby(df['c1'].str[-1]).sum()

output:

use following code:

df.groupby('C'   df['c1'].str[-1]).sum().reset_index()

result:

    c1  c2
0   Cx  3
1   Cy  7

CodePudding user response：

You may do:

out = df.groupby(df.c1.str[-1]).sum().reset_index()
out['c1'] = 'C'   out['c1']

print(out):

   c1  c2
0  Cx   3
1  Cy   7

CodePudding user response：

Groupby is very flexible so let's group by the last character of the "c1" column and then sum the "c2" values:

>>> (df.groupby(df.c1.str[-1])["c2"]
       .sum().reset_index()
       .assign(c1=lambda fr: fr.c1.radd("C")))

   c1  c2
0  Cx   3
1  Cy   7

Without the assign at the end, resultant "c1" values are the groupers, i.e., "x" and "y". I add (from right, hence radd) the character "C" to them.

CodePudding user response：

Here is a proposition using pandas.Series.replace with GroupBy.sum :

out = (
        df
          .assign(c1= df["c1"].str.replace("[A-Z]", "C", regex=True))
          .groupby("c1", as_index=False).sum(numeric_only=True)
       )

Output :


print(out)

   c1  c2
0  Cx   3
1  Cy   7