Home > Enterprise >  How to sum same columns (differentiated by suffix) in pandas?
How to sum same columns (differentiated by suffix) in pandas?

Time:10-12

I have a dataframe that looks like this:

total_customers     total_customer_2021-03-31  total_purchases    total_purchases_2021-03-31
1                   10                          4                  6
3                   14                          3                  2

Now, I want to sum up the columns row-wise that are the same expect the suffix. I.e the expected output is:

total_customers      total_purchases   
11                   10                          
17                   5                          

The issue why I cannot do this manually is because I have 100 column pairs, so I need an efficient way to do this. Also, the order of columns is not predictable either. What do you recommend? Thanks!

CodePudding user response:

Somehow we need to get an Index of columns so pairs of columns share the same name, then we can groupby sum on axis=1:

cols = pd.Index(['total_customers', 'total_customers',
                 'total_purchases', 'total_purchases'])

result_df = df.groupby(cols, axis=1).sum()

With the shown example, we can str.replace an optional s, followed by underscore, followed by the date format (four numbers-two numbers-two numbers) with a single s. This pattern may need modified depending on the actual column names:

cols = df.columns.str.replace(r's?_\d{4}-\d{2}-\d{2}$', 's', regex=True)
result_df = df.groupby(cols, axis=1).sum()

result_df:

   total_customers  total_purchases
0               11               10
1               17                5

Setup and imports:

import pandas as pd

df = pd.DataFrame({
    'total_customers': [1, 3],
    'total_customer_2021-03-31': [10, 14],
    'total_purchases': [4, 3],
    'total_purchases_2021-03-31': [6, 2]
})

CodePudding user response:

assuming that your dataframe is called df the best solution is:

sum_costumers = df[total_costumers]   df[total_costumers_2021-03-31]
sum_purchases = df[total_purchases]   df[total_purchases_2021-03-31]
data = {"total_costumers" : f"{sum_costumers}", "total_purchases" : f"sum_purchases"}
df_total = pd.DataFrame(data=data, index=range(1,len(data)))

and that will give you the output you want

CodePudding user response:

import pandas as pd

data = {"total_customers": [1, 3], "total_customer_2021-03-31": [10, 14], "total_purchases": [4, 3], "total_purchases_2021-03-31": [6, 2]}

df = pd.DataFrame(data=data)
final_df = pd.DataFrame()

final_df["total_customers"] = df.filter(regex='total_customers*').sum(1)
final_df["total_purchases"] = df.filter(regex='total_purchases*').sum(1)

output

final_df

    total_customers   total_purchases
0   11                10
1   17                5

CodePudding user response:

Using @HenryEcker's sample data, and building off of the example in the docs, you can create a function and groupby on the column axis:

def get_column(column):
    if column.startswith('total_customer'):
        return 'total_customers'
    return 'total_purchases'

df.groupby(get_column, axis=1).sum()

   total_customers  total_purchases
0               11               10
1               17                5

CodePudding user response:

I changed the headings while coding, to make it shorter, jfi

data = {"total_c" : [1,3], "total_c_2021" :[10,14],
    "total_p": [4,3], "total_p_2021": [6,2]}


df = pd.DataFrame(data)
df["total_costumers"] = df["total_c"]   df["total_c_2021"]
df["total_purchases"] = df["total_p"]   df["total_p_2021"]

If you don't want to see other columns you can drop them

df = df.loc[:, ['total_costumers','total_purchases']]

NEW PART So I might have find a starting point for your solution! I dont now the column names but following code can be changed, İf you have a pattern with your column names( it have patterned dates, names, etc). Can you changed the column names with a loop?

df['total_customer'] = df[[col for col in df.columns if col.startswith('total_c')]].sum(axis=1)

And this solution might be helpful for you with some alterationsexample

  • Related