I have a dataframe that looks like this. Note of the total_customer columns it has 500 differentiated by a date suffix.
total_customer total_customer_2021-03-31 total_customer_date X
1 10 4
3 14 3
Now, I want to sum up the columns row-wise that are the same given by removing the date suffix, so this cannot be done manually. I.e the expected output is:
total_customer
15
20
The issue why I cannot do this manually is because I have 500 column pairs, so I need an efficient way to do this. Also, the order of columns is not predictable either. What do you recommend? Thanks!
CodePudding user response:
If there are diffrent groups use DataFrame.groupby
with aggregate sum
:
df1 = df.groupby(df.columns.str.replace('[0-9-_] $',''), axis=1).sum()
Or if need sum all columns only use sum
like commented:
df1 = df.sum(axis=1).to_frame(name='total_customer')