Home > Software engineering >  How to create a new column based on the filtering and calculation in pandas?
How to create a new column based on the filtering and calculation in pandas?

Time:07-26

I have a dataset like:

ymd revenue
2022-07-17 2
2022-07-17 3
2022-07-18 4
2022-07-18 5

I want to convert into this format:

ymd rev_sum
2022-07-17 5
2022-07-18 9

I've tried

data["rev_sum"] = sum(
    data[data["ymd"] == "2022-07-17"]["revenue"]
)
data["rev_sum"] = sum(
    data[data["ymd"] == "2022-07-18"]["revenue"]
)

But this one updates columns with the last date.

I've also tried :

data.loc[data["ymd"] == "2022-07-17"]["rev_sum"] = data.loc[data["ymd"] == "2022-07-17"]["revenue"].sum()

This time I got 'Try using .loc[row_indexer,col_indexer] = value instead' error message.

What should I do? Thank you for your help!

CodePudding user response:

You can use pd.pivot_table with aggfunc='sum' on column='revenue'.

df = pd.pivot_table(df, index=['ymd'], values='revenue', 
                    aggfunc='sum').reset_index()

print(df)

          ymd  revenue
0  2022-07-17        5
1  2022-07-18        9

CodePudding user response:

import pandas as pd

df = pd.DataFrame({'ymd': {0: '2022-07-17', 
                           1: '2022-07-17', 
                           2: '2022-07-18',  
                           3: '2022-07-18'},
                   'revenue': {0: 2, 1: 3, 2: 4, 3: 5}})

#to sum values by date
sum_rev = df.groupby('ymd').agg({'revenue':'sum'})

print(sum_rev)

            revenue
ymd                
2022-07-17        5
2022-07-18        9

#if you want to create a column in existing df, use:
df['sum_rev'] = df.groupby('ymd')['revenue'].transform(sum)

print(df)
          ymd  revenue  rev_sum
0  2022-07-17        2        5
1  2022-07-17        3        5
2  2022-07-18        4        9
3  2022-07-18        5        9
  • Related