Home > database >  Assign new column in DataFrame based on if value is in a certain value range
Assign new column in DataFrame based on if value is in a certain value range

Time:08-04

I have two DataFrames as follows:

df_discount = pd.DataFrame(data={'Graduation' : np.arange(0,1000,100), 'Discount %' : np.arange(0,50,5)})
df_values = pd.DataFrame(data={'Sum' : [20,801,972,1061,1251]})

enter image description here enter image description here

Now my goal is to get a new column df_values['New Sum'] for my df_values that applies the corresponding discount to df_values['Sum'] based on the value of df_discount['Graduation']. If the Sum is >= the Graduation the corresponding discount is applied.

Examples: Sum 801 should get a discount of 40% resulting in 480.6, Sum 1061 gets 45% resulting in 583.55.

I know I could write a funtion with if else conditions and the returning values. However, is there a better way to do this if you have very many different conditions?

CodePudding user response:

You could try if pd.merge_asof() works for you:

df_discount = pd.DataFrame({
    'Graduation': np.arange(0, 1000, 100), 'Discount %': np.arange(0, 50, 5)
})
df_values = pd.DataFrame({'Sum': [20, 100, 101, 350, 801, 972, 1061, 1251]})

df_values = (
    pd.merge_asof(
        df_values, df_discount,
        left_on="Sum", right_on="Graduation",
        direction="backward"
    )
    .assign(New_Sum=lambda df: df["Sum"] * (1 - df["Discount %"] / 100))
    .drop(columns=["Graduation", "Discount %"])
)

Result (without the last .drop(columns=...) to see what's happening):

    Sum  Graduation  Discount %  New_Sum
0    20           0           0    20.00
1   100         100           5    95.00
2   101         100           5    95.95
3   350         300          15   297.50
4   801         800          40   480.60
5   972         900          45   534.60
6  1061         900          45   583.55
7  1251         900          45   688.05

CodePudding user response:

With pandas.cut() you can simplify this a lot.

graduation = np.append(np.arange(0,1000,100), np.nan)
discount = np.arange(0,50,5)
df_values['Discount %'] = pd.cut(df_values['Sum'],
                                 graduation,
                                 labels=discount)

df_values['Discount %'] = df_values['Discount %'].astype(int)
df_values['New Sum'] = df_values['Sum'] * (1-df_values['Discount %']/100)

    Sum  Discount %  New Sum
0    20           0    20.00
1   801          40   480.60
2   972          45   534.60
3  1061          45   583.55
4  1251          45   688.05

CodePudding user response:

You can use pandas.DataFrame.mask. Basically if your condition is true it replaces the value. But for that your sum column has to be inside first dataframe.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mask.html

  • Related