I have one dataframe containing a daily employee list, and another containing a series of sales.
daily_employee_df:
| EE_ID| Date |
| -----| ----------|
| 101| 20220904 |
| 102| 20220904 |
| 106| 20220904 |
| 102| 20220905 |
| 103| 20220905 |
| 104| 20220905 |
all_sales_df:
| Sale_ID | Date | Sale_Amt| EEs_Present |
| ------- | --------|---------|----------------|
| 0001| 20220904| 100.04| [101, 102, 106]|
| 0002| 20220905| 998.06| [102, 103, 104]|
What is an efficient way to sum the Sale_Amt values each employee was present for on each day and add that sum to daily_employee_df? I'm dealing with thousands of sales each day.
I was able to get the number of sales for each employee and day using the following:
daily_employee_df['EE_Sales'] = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)
But I have not been able to sum the sale total in a similar way.
Thanks for any help!
CodePudding user response:
Very close - you can use sum() and add the column you're summing at the end with ['Sale_Amt']
Count of sales (already done in the question):
daily_employee_df['EE_Sales_Count'] = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)
Sum of sales:
daily_employee_df['EE_Sales_Sum'] = daily_employee_df.apply(lambda x: sum(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]['Sale_Amt']), axis = 1)