Have two dataframes, and I am attempting to 'merge' with conditions.
The idea being if df_B's DateTime is greater than ST's DateTime by less than 5 mins, then add the value of df_B 'MC' into a new column in df_A. If not, its a blank.
df_A = {'SS': ['2022/10/23 9:58:08', '2022/10/23 9:58:08', '2022/10/23 23:07:17',
'2022/10/23 23:07:17', '2022/10/24 2:06:48', '2022/10/24 2:06:48',
'2022/10/24 5:31:44', '2022/10/24 5:31:44'],
'SC': [6764, 6764, 6778, 6778, 6782, 6782, 6787, 6787] }
df_B = {'DateTime': ['2022/10/23 6:05:51', '2022/10/23 6:06:51', '2022/10/23 7:20:51',
'2022/10/23 7:21:51', '2022/10/23 7:51:51', '2022/10/23 7:52:51',
'2022/10/24 5:31:58', '2022/10/24 5:35:58'],
'MC': [871.71, 871.77, 871.78, 871.77, 871.78, 871.77, 866.90, 866.91] }
df_A = pd.DataFrame(df_A)
df_A['SS'] = pd.to_datetime(df_A['SS'])
df_B = pd.DataFrame(df_B)
df_B['DateTime'] = pd.to_datetime(df_B['DateTime'])
I have mostly tried pd.merge_asof() and converting the datatimes to indexes and then left/right_index True and direction 'forward'.
df = pd.merge_asof(left=df_A, right=df_B, left_index=True, right_index=True,
direction='forward')
The problem is the merge inserts an MC values in every row based on the direction. And very rarely (or never) id get an exact datetime match). Kinda given up using merge.
Should I just parse df_A datetime, check if df_B's datetime its greater than df_A datetime (but reject if greater than 5 mins), and then find the corresponding MC value?
For this method, I would normally use the lambda function with apply.
df_A.apply(lambda x: (x['SS'], x['SC']), axis=1)
So how to incorporate df_B into the above, or should call a function with the method, the function do the processing?
Hope makes sense.
tks
CodePudding user response:
I get that the rows of df_A and df_B are aligned and both dataframes have equal number of rows.
In this case:
- Putting the 'on' columns into indexes is not necessary to merge dataframes. Instead you can use 'left_on' and 'right_on'.
- Use 'tolerance' along with the 'forward' direction for the required condition.
- Reset indexes into "index" columns in order to align the rows by "index" before the merge.
BTW, make sure before the merge that
Both DataFrames must be sorted by the key.
https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html#pandas.merge_asof
df_A.reset_index(inplace=True)
df_B.reset_index(inplace=True)
df = pd.merge_asof(df_A, df_B, left_on="SS", right_on="DateTime", by="index", direction="forward", tolerance=pd.Timedelta("5m"))
index SS SC DateTime MC
0 0 2022-10-23 09:58:08 6764 NaT NaN
1 1 2022-10-23 09:58:08 6764 NaT NaN
2 2 2022-10-23 23:07:17 6778 NaT NaN
3 3 2022-10-23 23:07:17 6778 NaT NaN
4 4 2022-10-24 02:06:48 6782 NaT NaN
5 5 2022-10-24 02:06:48 6782 NaT NaN
6 6 2022-10-24 05:31:44 6787 2022-10-24 05:31:58 866.90
7 7 2022-10-24 05:31:44 6787 2022-10-24 05:35:58 866.91