Assign new column in DataFrame based on if value is in a certain value range-CodePudding

I have two DataFrames as follows:

df_discount = pd.DataFrame(data={'Graduation' : np.arange(0,1000,100), 'Discount %' : np.arange(0,50,5)})
df_values = pd.DataFrame(data={'Sum' : [20,801,972,1061,1251]})

Now my goal is to get a new column df_values['New Sum'] for my df_values that applies the corresponding discount to df_values['Sum'] based on the value of df_discount['Graduation']. If the Sum is >= the Graduation the corresponding discount is applied.

Examples: Sum 801 should get a discount of 40% resulting in 480.6, Sum 1061 gets 45% resulting in 583.55.

I know I could write a funtion with if else conditions and the returning values. However, is there a better way to do this if you have very many different conditions?

CodePudding user response：

You could try if pd.merge_asof() works for you:

df_discount = pd.DataFrame({
    'Graduation': np.arange(0, 1000, 100), 'Discount %': np.arange(0, 50, 5)
})
df_values = pd.DataFrame({'Sum': [20, 100, 101, 350, 801, 972, 1061, 1251]})

df_values = (
    pd.merge_asof(
        df_values, df_discount,
        left_on="Sum", right_on="Graduation",
        direction="backward"
    )
    .assign(New_Sum=lambda df: df["Sum"] * (1 - df["Discount %"] / 100))
    .drop(columns=["Graduation", "Discount %"])
)

Result (without the last .drop(columns=...) to see what's happening):

    Sum  Graduation  Discount %  New_Sum
0    20           0           0    20.00
1   100         100           5    95.00
2   101         100           5    95.95
3   350         300          15   297.50
4   801         800          40   480.60
5   972         900          45   534.60
6  1061         900          45   583.55
7  1251         900          45   688.05

CodePudding user response：

With pandas.cut() you can simplify this a lot.

graduation = np.append(np.arange(0,1000,100), np.nan)
discount = np.arange(0,50,5)
df_values['Discount %'] = pd.cut(df_values['Sum'],
                                 graduation,
                                 labels=discount)

df_values['Discount %'] = df_values['Discount %'].astype(int)
df_values['New Sum'] = df_values['Sum'] * (1-df_values['Discount %']/100)

    Sum  Discount %  New Sum
0    20           0    20.00
1   801          40   480.60
2   972          45   534.60
3  1061          45   583.55
4  1251          45   688.05

CodePudding user response：

You can use pandas.DataFrame.mask. Basically if your condition is true it replaces the value. But for that your sum column has to be inside first dataframe.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mask.html