Home > Software design >  Python - From dataframes A and B, use DateTime of A to find a value in B, based on conditions
Python - From dataframes A and B, use DateTime of A to find a value in B, based on conditions

Time:10-25

Have two dataframes, and I am attempting to 'merge' with conditions.

The idea being if df_B's DateTime is greater than ST's DateTime by less than 5 mins, then add the value of df_B 'MC' into a new column in df_A. If not, its a blank.

    df_A = {'SS': ['2022/10/23 9:58:08', '2022/10/23 9:58:08', '2022/10/23 23:07:17', 
                 '2022/10/23 23:07:17', '2022/10/24 2:06:48', '2022/10/24 2:06:48',
                 '2022/10/24 5:31:44', '2022/10/24 5:31:44'],
            'SC': [6764, 6764, 6778, 6778, 6782, 6782, 6787, 6787] }

    df_B = {'DateTime': ['2022/10/23 6:05:51', '2022/10/23 6:06:51', '2022/10/23 7:20:51', 
                 '2022/10/23 7:21:51', '2022/10/23 7:51:51', '2022/10/23 7:52:51', 
                 '2022/10/24 5:31:58', '2022/10/24 5:35:58'],
            'MC': [871.71, 871.77, 871.78, 871.77, 871.78, 871.77, 866.90, 866.91] }


    df_A = pd.DataFrame(df_A)
    df_A['SS'] = pd.to_datetime(df_A['SS'])

    df_B = pd.DataFrame(df_B)
    df_B['DateTime'] = pd.to_datetime(df_B['DateTime']) 

Desired Result

I have mostly tried pd.merge_asof() and converting the datatimes to indexes and then left/right_index True and direction 'forward'.

df = pd.merge_asof(left=df_A, right=df_B, left_index=True, right_index=True, 
     direction='forward')

The problem is the merge inserts an MC values in every row based on the direction. And very rarely (or never) id get an exact datetime match). Kinda given up using merge.

Should I just parse df_A datetime, check if df_B's datetime its greater than df_A datetime (but reject if greater than 5 mins), and then find the corresponding MC value?

For this method, I would normally use the lambda function with apply.

  df_A.apply(lambda x: (x['SS'], x['SC']), axis=1)

So how to incorporate df_B into the above, or should call a function with the method, the function do the processing?

Hope makes sense.

tks

CodePudding user response:

I get that the rows of df_A and df_B are aligned and both dataframes have equal number of rows.

In this case:

  1. Putting the 'on' columns into indexes is not necessary to merge dataframes. Instead you can use 'left_on' and 'right_on'.
  2. Use 'tolerance' along with the 'forward' direction for the required condition.
  3. Reset indexes into "index" columns in order to align the rows by "index" before the merge.

BTW, make sure before the merge that

Both DataFrames must be sorted by the key.

https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html#pandas.merge_asof

df_A.reset_index(inplace=True)
df_B.reset_index(inplace=True)

df = pd.merge_asof(df_A, df_B, left_on="SS", right_on="DateTime", by="index", direction="forward", tolerance=pd.Timedelta("5m"))

   index                  SS    SC            DateTime      MC
0      0 2022-10-23 09:58:08  6764                 NaT     NaN
1      1 2022-10-23 09:58:08  6764                 NaT     NaN
2      2 2022-10-23 23:07:17  6778                 NaT     NaN
3      3 2022-10-23 23:07:17  6778                 NaT     NaN
4      4 2022-10-24 02:06:48  6782                 NaT     NaN
5      5 2022-10-24 02:06:48  6782                 NaT     NaN
6      6 2022-10-24 05:31:44  6787 2022-10-24 05:31:58  866.90
7      7 2022-10-24 05:31:44  6787 2022-10-24 05:35:58  866.91
  • Related