Home > Enterprise >  Merge two dataframes while taking dates into account
Merge two dataframes while taking dates into account

Time:10-19

I have two dataframes that I need to merge. The first table contains multiple rows of the same ID, but they each have their own time stamp. The second table also has those same ID's, but they each also have their own timestamps.

FIRST TABLE:

ID Date Description
1 2/1/2021 asdfadsf
1 5/1/2021 asdfasf
1 7/1/2021 freafasf
2 4/1/2021 afaertafa
2 6/1/2021 vadfgasa
3 3/1/2021 vadsgvadv
3 8/1/2021 cafdsfafa
4 5/1/2021 aewrqafdg

SECOND TABLE:

ID Date Num Sold
1 12/18/2020 1
1 1/15/2021 3
1 3/23/2021 13
1 4/15/2021 5
1 4/28/2021 32
1 5/10/2021 6
1 6/15/2021 3
2 2/1/2021 3
2 3/1/2021 5
2 3/15/2021 6
2 5/12/2021 13
2 5/11/2021 2
2 5/21/2021 32
3 1/1/2021 14
3 2/1/2021 5
3 4/1/2021 23
3 5/1/2021 54
4 2/1/2021 12
4 3/1/2021 6
4 4/1/2021 23

The goal is to perform a left join on the ID were the Num Sold before the date on the first table is summed and added as a column. So ID "1" for Date "2/1/2021" from the first table would take the sum of all the ID "1" Num Sold on the second table that happened on or before "2/1/2021". So the output for this would be 4. For ID "1" for Date "5/1/2021" on the first table would take the sum of all the ID "1" Num Sold on the second table that happened on or before "2/1/2021". The output of this would be 54.

The output dataframe would look like this.

ID Date Description Num Sold
1 2/1/2021 asdfadsf 4
1 5/1/2021 asdfasf 54
1 7/1/2021 freafasf 63
2 4/1/2021 afaertafa 14
2 6/1/2021 vadfgasa 61
3 3/1/2021 vadsgvadv 19
3 8/1/2021 cafdsfafa 96
4 5/1/2021 aewrqafdg 41

CodePudding user response:

def get_sold(id,date):
    return df2[(df2['ID']==id) & (df2['Date'].le(date))]['Num Sold'].sum()

for i,v in df1.iterrows():
    df1.loc[i,'Total Sold'] = get_sold(v.ID,v.Date)

enter image description here

  • Related