Python - Sum values for all dates prior to a specific date-CodePudding

I currently have two dataFrames that look like this:

Df3 - which is the output dataFrame:

         |                     CompanyNm      |    CpID      |  Date     |
:--------|:-----------------------------------|:-------------|:----------|
0        |                      {Converting)  | {C}          | 2020-02-14|
1        |                     {Converting)   | {C}          | 2020-02-17|
2        |                    {Converting)    | {C}          | 2020-02-18|
3        |                      {Converting)  | {C}          | 2020-02-19|
4        |                      {Converting)  | {C}          | 2020-02-20|
...      |                               ...  |         ...  |       ... |
521473   |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316         | 2022-02-08|
521474   |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316         | 2022-02-09|
521475   |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316         | 2022-02-10|
521476   |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316         | 2022-02-11|
521477   |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316         | 2022-02-14|

Df4:

      | BuySellNo   |  CustID      | InvoiceDt | DueDt     |   SalesAmt |
:-----|:------------|:-------------|:----------|:----------|------------|
0     | 100         |  1047        | 2017-07-10| 2017-11-04|   46137.30 |
1     | 1000        |  1145        | 2017-10-12| 2018-04-03|  362133.15 |
2     | 10000       |  1047        | 2019-12-27| 2020-02-25|   28445.90 |
3     | 10001       |  2007        | 2019-12-27| 2020-02-25|   28445.90 |
4     | 10002       |  2304        | 2019-12-26| 2020-01-03|   27558.62 |
...   |          ...|           ...|        ...|        ...|        ... |
21362 | ZD229       |  1064        | 2017-07-24| 2017-10-11|       0.00 |
21363 | ZD230       |  1064        | 2017-06-14| 2017-10-12|    1400.39 |
21364 | ZD231       |  1064        | 2017-06-13| 2017-10-11|    1967.19 |
21365 | ZD232       |  1063        | 2017-06-28| 2017-06-28|     128.26 |
21366 | ZD233       |  1064        | 2017-06-14| 2017-10-12|    3146.06 |

I am trying to add a new column to Df3, which is the sum of all Sales (SalesAmt) where the invoicing date (InvoiceDt) is prior to the date column in Df3.

df3['Total sales'] = df4[['SalesAmt']].sum(axis=1).where(df3['Date'] <= df4['InvoiceDt'], 0)

I get the following error in this case:

ValueError: Can only compare identically-labeled Series objects

Any idea how to fix this? Or a more efficient way to obtain the wanted result?

Thanks a lot,

CodePudding user response：

EDIT: I misunderstood your problem definition earlier. Now corrected it:

def func(g):
    mask = (df4['CustID'] == g.name[0]) & (df4['InvoiceDt'] <= g.name[1])
    return df4[mask]['SalesAmt'].sum()
df3.groupby(['CpID','Date']).apply(func)