Home > Blockchain >  How to create a lambda function to sum dataframe values based on criteria and presence in a list
How to create a lambda function to sum dataframe values based on criteria and presence in a list

Time:09-28

I have one dataframe containing a daily employee list, and another containing a series of sales.

daily_employee_df:

| EE_ID| Date      |
| -----| ----------|
|   101| 20220904  |
|   102| 20220904  |
|   106| 20220904  |
|   102| 20220905  |
|   103| 20220905  |
|   104| 20220905  |

all_sales_df:

| Sale_ID | Date    | Sale_Amt| EEs_Present    |
| ------- | --------|---------|----------------|
|     0001| 20220904|   100.04| [101, 102, 106]|
|     0002| 20220905|   998.06| [102, 103, 104]|

What is an efficient way to sum the Sale_Amt values each employee was present for on each day and add that sum to daily_employee_df? I'm dealing with thousands of sales each day.

I was able to get the number of sales for each employee and day using the following:

daily_employee_df['EE_Sales']  = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)

But I have not been able to sum the sale total in a similar way.

Thanks for any help!

CodePudding user response:

Very close - you can use sum() and add the column you're summing at the end with ['Sale_Amt']

Count of sales (already done in the question):

daily_employee_df['EE_Sales_Count']  = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)

Sum of sales:

daily_employee_df['EE_Sales_Sum']  = daily_employee_df.apply(lambda x: sum(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]['Sale_Amt']), axis = 1)
  • Related