The Excel function SUMIFS supports calculation based on multiple criteria, including day inequalities, as follows
values_to_sum, criteria_range_n, condition_n, .., criteria_range_n, condition_n
Example
Input - tips per person per day, multiple entries per person per day allowed
date person tip
02/03/2022 X 10
05/03/2022 X 30
05/03/2022 Y 20
08/03/2022 X 12
08/03/2022 X 8
Output - sum per selected person per day
date X_sum_per_day
01/03/2022 0
02/03/2022 10
03/03/2022 0
04/03/2022 0
05/03/2022 30
06/03/2022 0
07/03/2022 0
08/03/2022 20
09/03/2022 0
10/03/2022 0
Can this be implemented in pandas and calculated as series for an input range of days? Cumulative would be presumably just application of cumsum() but the initial sum based on multiple criteria is tricky, especially if to be concise.
Code
import pandas as pd
df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
'05-03-2022 00:00:00',
'05-03-2022 00:00:00',
'08-03-2022 00:00:00',
'08-03-2022 00:00:00'],
'person': ['X', 'X', 'Y', 'X', 'X'],
'tip': [10, 30, 20, 12, 8]},
index = [0, 1, 2, 3, 4])
df2 = pd.DataFrame({'date':pd.date_range(start='2022-03-01', end='2022-03-10')})
temp = df[df['person'] == 'X'].groupby(['date']).sum().reset_index()
df2['X_sum'] = df2['date'].map(temp.set_index('date')['tip']).fillna(0)
The above seems kinda hacky and not as simple to reason about as Excel SUMIFS. Additional conditions would also be a hassle (e.g. sum where country = X, company = Y, person = Z).
Any idea for alternative implementation?
CodePudding user response:
IIUC, you want to filter the person X
then groupby
day and sum the tips, finally reindex
the missing days:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
out = (df[df['person'].eq('X')]
.groupby('date')['tip'].sum()
.reindex(pd.date_range(start='2022-03-01', end='2022-03-10'),
fill_value=0)
.reset_index()
)
output:
index tip
0 2022-03-01 0
1 2022-03-02 10
2 2022-03-03 0
3 2022-03-04 0
4 2022-03-05 30
5 2022-03-06 0
6 2022-03-07 0
7 2022-03-08 20
8 2022-03-09 0
9 2022-03-10 0