I currently have two dataFrames that look like this:
Df3 - which is the output dataFrame:
| CompanyNm | CpID | Date |
:--------|:-----------------------------------|:-------------|:----------|
0 | {Converting) | {C} | 2020-02-14|
1 | {Converting) | {C} | 2020-02-17|
2 | {Converting) | {C} | 2020-02-18|
3 | {Converting) | {C} | 2020-02-19|
4 | {Converting) | {C} | 2020-02-20|
... | ... | ... | ... |
521473 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-08|
521474 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-09|
521475 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-10|
521476 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-11|
521477 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-14|
Df4:
| BuySellNo | CustID | InvoiceDt | DueDt | SalesAmt |
:-----|:------------|:-------------|:----------|:----------|------------|
0 | 100 | 1047 | 2017-07-10| 2017-11-04| 46137.30 |
1 | 1000 | 1145 | 2017-10-12| 2018-04-03| 362133.15 |
2 | 10000 | 1047 | 2019-12-27| 2020-02-25| 28445.90 |
3 | 10001 | 2007 | 2019-12-27| 2020-02-25| 28445.90 |
4 | 10002 | 2304 | 2019-12-26| 2020-01-03| 27558.62 |
... | ...| ...| ...| ...| ... |
21362 | ZD229 | 1064 | 2017-07-24| 2017-10-11| 0.00 |
21363 | ZD230 | 1064 | 2017-06-14| 2017-10-12| 1400.39 |
21364 | ZD231 | 1064 | 2017-06-13| 2017-10-11| 1967.19 |
21365 | ZD232 | 1063 | 2017-06-28| 2017-06-28| 128.26 |
21366 | ZD233 | 1064 | 2017-06-14| 2017-10-12| 3146.06 |
I am trying to add a new column to Df3, which is the sum of all Sales (SalesAmt) where the invoicing date (InvoiceDt) is prior to the date column in Df3.
df3['Total sales'] = df4[['SalesAmt']].sum(axis=1).where(df3['Date'] <= df4['InvoiceDt'], 0)
I get the following error in this case:
ValueError: Can only compare identically-labeled Series objects
Any idea how to fix this? Or a more efficient way to obtain the wanted result?
Thanks a lot,
CodePudding user response:
EDIT: I misunderstood your problem definition earlier. Now corrected it:
def func(g):
mask = (df4['CustID'] == g.name[0]) & (df4['InvoiceDt'] <= g.name[1])
return df4[mask]['SalesAmt'].sum()
df3.groupby(['CpID','Date']).apply(func)