Home > OS >  Calculating New Rows based on Multiple Criteria using pandas
Calculating New Rows based on Multiple Criteria using pandas

Time:11-03

I have a df that I'm trying populate a new column based on a calculation. Here is an example below.

import pandas as pd 
import numpy as np
df = pd.DataFrame(
        {"Quantity" :[4721,1647],
         "Total" : [236.05,82.35]},
        index = [1,2])

df["CPS Gross"]= (df["Total"]/df["Quantity"])

conditions = [df["CPS Gross"] == 0.05]
values = [0.03]

df["CPS Calc"] = np.select(conditions,values)

print(df)

Can someone explain to me why the second row does not meet the condition test but the first row does even though they both meet the criteria of 0.05?

Any help would be much appreciated

CodePudding user response:

I cannot comment your original post so I'll leave you an answer.

The reason why the second row does not meet the condition is because floats are not a precise type. Despite the real result of the operation 82.35/1647 is 0.05, if you check it on Python you will see that the result it gives is 0.049999999999999996, that is not exactly 0.05. That's the reason why 82.35/1647 == 0.05 is False... because it is really False! (in Python)

One solution you could try is using some method to take in consideration this behaviour like math.isclose(). It does not check if two values are exactly the same but if two values are close to each other in a certain range (called tolerance).

from math import isclose

...

conditions = [ws['New Gross'].apply(lambda x: isclose(x, 0.05, abs_tol=1e-8))]
values = [0.03]

df["CPS Calc"] = np.select(conditions, values)

If you check that you will see like now both rows are selected!

CodePudding user response:

You can replace your np.select function with the np.where function as it can have a singular value as input:

ws["CPS Calc"] = np.where(ws["New Gross"]==0.05,0.03,np.nan)

The last value is the default when the condition is not true (I assumed it to be NaN but can be any value).

  • Related