Home > Software engineering >  Joining two dataframes without exactly match value
Joining two dataframes without exactly match value

Time:06-20

Ticker & Date are index

Dataframe-A

Ticker Date Renue
AAPL 2022-01-01 10M
AAPL 2022-04-01 10M
MSFT 2022-01-01 10M
MSFT 2022-04-01 10M

Dataframe-B

Ticker Date Price
AAPL 2022-01-02 11
AAPL 2022-01-03 12
AAPL 2022-04-01 15
AAPL 2022-04-02 16
MSFT 2022-01-01 20
MSFT 2022-01-02 21
MSFT 2022-01-03 22
MSFT 2022-04-02 26
MSFT 2022-04-03 26

Join the Dataframe by Ticker and Date

But AAPL without 2022-01-01 Price, use 2022-01-02

Result

Ticker Date Renue Price
AAPL 2022-01-01 10M 11
AAPL 2022-04-01 10M 15
MSFT 2022-01-01 10M 20
MSFT 2022-04-01 10M 26

How to join the DatFrame to produce the result Dataframe?

Thanks a lot!

CodePudding user response:

You can use merge as given below,

>>> data1 = pd.DataFrame({"col":range(11, 17),    # Create first 
pandas DataFrame
...                       "x1":range(12, 1, - 2),
...                       "x2":["a", "b", "c", "d", "e", "f"],
...                       "x3":range(17, 11, - 1)})
>>> print(data1)
col  x1 x2  x3
0   11  12  a  17
1   12  10  b  16
2   13   8  c  15
3   14   6  d  14
4   15   4  e  13
5   16   2  f  12
>>> data2 = pd.DataFrame({"col":range(15, 19),    # Create second 
pandas DataFrame
...                       "y1":["l", "k", "j", "h"],
...                       "y2":["x", "y", "y", "y"],
...                       "y3":range(18, 10, - 2)})
>>> print(data2)                                  # Print second 
pandas DataFrame
   col y1 y2  y3
0   15  l  x  18
1   16  k  y  16
2   17  j  y  14
3   18  h  y  12

>>> data1
  col  x1 x2  x3
0   11  12  a  17
1   12  10  b  16
2   13   8  c  15
3   14   6  d  14
4   15   4  e  13
5   16   2  f  12
>>> data2
  col y1 y2  y3
0   15  l  x  18
1   16  k  y  16
2   17  j  y  14
3   18  h  y  12
>>> data_join = pd.merge(data1,                   # Inner join
...                       data2,
...                       on = "col")
>>> data_join
   col  x1 x2  x3 y1 y2  y3
0   15   4  e  13  l  x  18
1   16   2  f  12  k  y  16

Taken from here.

CodePudding user response:

here is one way to do it

it is case of less than or equal join. We first merge the two DFs, on Ticker, and then filter(query) where the date among two matches or DF1 date is less than the second DF date. We then groupby Ticket, and date and take the first row for each group. Finally, reset index and drop the date from right column

df1.merge(df2, on='Ticker', 
          how='left', 
          suffixes=('','_y')
         ).query('Date<=Date_y').groupby(
                ['Ticker','Date']).first().reset_index().drop(columns='Date_y')
    Ticker  Date    Renue   Price
0   AAPL    2022-01-01  10M     11
1   AAPL    2022-04-01  10M     15
2   MSFT    2022-01-01  10M     20
3   MSFT    2022-04-01  10M     26
  • Related