Pandas: merge_asof with conditions from other columns-CodePudding

I have a toy example as follows

I would like to merge the actions column in rules to the original df. Merging conditions are the following.

(value >= lower) & (value < upper)
date in df must merge with the nearest previous date in rules

The expected output is shown in the above figure. Here is the df and rules

df = pd.DataFrame({"date": ["2022-05-15", "2022-05-20", "2022-05-25", "2022-05-30"],
                   "values": [10, 20, 30, 80]})
df["date"] = pd.to_datetime(df["date"])

rules = pd.DataFrame({"lower": [0, 25, 50, 75, 0],
                      "upper": [25, 50, 75, float("inf"), 25],
                      "actions": [5, 10, 15, 20, 8],
                      "date": ["2022-01-01", "2022-01-01", "2022-01-01", "2022-01-01", "2022-05-18"]})
rules["date"] = pd.to_datetime(rules["date"])

May I have suggestions about effective method to do this?

CodePudding user response：

One option is with conditional_join from pyjanitor, and after the merge, you can do a groupby to get the minimum rows:

# install from dev 
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd

(df
.astype({'values':float})
.conditional_join(
    rules.astype({'lower':float}), 
    # pass the conditions as a variable arguments of tuples 
    ('values', 'lower', '>='), 
    ('values', 'upper', '<'), 
    ('date', 'date', '>'),
    # select required columns with df_columns, and right_columns
    df_columns = ['date','values'], 
    right_columns={'actions':'actions', 'date':'date_right'})
# get the difference and keep the smallest days
.assign(dff = lambda df: df.date.sub(df.date_right))
.sort_values(['date', 'dff'])
.drop(columns = ['dff', 'date_right'])
.groupby('date', sort = False, as_index = False)
.nth(0)
)

        date  values  actions
0 2022-05-15    10.0        5
3 2022-05-20    20.0        8
4 2022-05-25    30.0       10
5 2022-05-30    80.0       20