I have two dataframes that I need to merge. The first table contains multiple rows of the same ID, but they each have their own time stamp. The second table also has those same ID's, but they each also have their own timestamps.
FIRST TABLE:
ID | Date | Description |
---|---|---|
1 | 2/1/2021 | asdfadsf |
1 | 5/1/2021 | asdfasf |
1 | 7/1/2021 | freafasf |
2 | 4/1/2021 | afaertafa |
2 | 6/1/2021 | vadfgasa |
3 | 3/1/2021 | vadsgvadv |
3 | 8/1/2021 | cafdsfafa |
4 | 5/1/2021 | aewrqafdg |
SECOND TABLE:
ID | Date | Num Sold |
---|---|---|
1 | 12/18/2020 | 1 |
1 | 1/15/2021 | 3 |
1 | 3/23/2021 | 13 |
1 | 4/15/2021 | 5 |
1 | 4/28/2021 | 32 |
1 | 5/10/2021 | 6 |
1 | 6/15/2021 | 3 |
2 | 2/1/2021 | 3 |
2 | 3/1/2021 | 5 |
2 | 3/15/2021 | 6 |
2 | 5/12/2021 | 13 |
2 | 5/11/2021 | 2 |
2 | 5/21/2021 | 32 |
3 | 1/1/2021 | 14 |
3 | 2/1/2021 | 5 |
3 | 4/1/2021 | 23 |
3 | 5/1/2021 | 54 |
4 | 2/1/2021 | 12 |
4 | 3/1/2021 | 6 |
4 | 4/1/2021 | 23 |
The goal is to perform a left join on the ID were the Num Sold before the date on the first table is summed and added as a column. So ID "1" for Date "2/1/2021" from the first table would take the sum of all the ID "1" Num Sold on the second table that happened on or before "2/1/2021". So the output for this would be 4. For ID "1" for Date "5/1/2021" on the first table would take the sum of all the ID "1" Num Sold on the second table that happened on or before "2/1/2021". The output of this would be 54.
The output dataframe would look like this.
ID | Date | Description | Num Sold |
---|---|---|---|
1 | 2/1/2021 | asdfadsf | 4 |
1 | 5/1/2021 | asdfasf | 54 |
1 | 7/1/2021 | freafasf | 63 |
2 | 4/1/2021 | afaertafa | 14 |
2 | 6/1/2021 | vadfgasa | 61 |
3 | 3/1/2021 | vadsgvadv | 19 |
3 | 8/1/2021 | cafdsfafa | 96 |
4 | 5/1/2021 | aewrqafdg | 41 |
CodePudding user response:
def get_sold(id,date):
return df2[(df2['ID']==id) & (df2['Date'].le(date))]['Num Sold'].sum()
for i,v in df1.iterrows():
df1.loc[i,'Total Sold'] = get_sold(v.ID,v.Date)