Python to check for a particular condition in a dataframe-CodePudding

I am trying to work on a requirement where I have to fill values inside a dataframe as NaN if it goes beyond a certain value.

    s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
    p=pd.DataFrame(data=s)
    k=(p.qty p.price) # Not sure if this is the right way as per the requirement.

The condition is that if column 2018 or 19 or 20 or 21 has a value greater than k, then fill that value as NaN.

Say if k=3, the fourth row in 2018 with value 4 will be NaN. k value will be different for all columns, hence it needs to be calculated column wise and accordingly value has to be NaN.

How would I be able to do this?

CodePudding user response：

It is actually very simple. What you need is to learn more about the logical statements in pandas dataframes. To solve your problem, you can try code below:

s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
p=pd.DataFrame(data=s)
k = 4
p[p<k]

Output

	2018	2019	2020	2021	qty	price
0	nan	nan	4	11	45	22
1	nan	nan	6	12	22	33
2	nan	4	8	34	12	44
3	4	5	9	42	42	55

Note that k = (p.qty p.price) will return a numpy array, not a scalar value.

CodePudding user response：

Once you figure out exactly what k = (p.qty p.price) is, you can update it. However, I think the way you want to solve this is using the gt() operator on a column by column basis. Here's my solution.

    import pandas as pd
    s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021': [11,12,34,42], 'qty':[1,2,3,4], 'price':[1,2,3,4]}
    p=pd.DataFrame(data=s)
    k = (p.qty * p.price)
    needed = p[['qty', 'price']]
    p = p.where(p.gt(k, axis=0), None)
    p[['qty','price']] = needed
    print(p)

This Outputs:

   2018  2019  2020  2021  qty  price
0   NaN   2.0   4.0    11    1      1
1   NaN   NaN   6.0    12    2      2
2   NaN   NaN   NaN    34    3      3
3   NaN   NaN   NaN    42    4      4

A few notes. I save and re-add the final columns. However, if you do not need those you can remove lines with the word needed. The line with the bulk of the code is p = p.where(p.gt(k, axis=0), None). In this current example, my comparisons are on the column level. So, '2019' : 2,3,4,5 gets compared to k : 1,4,9,16. Showing 2 > 1, but 3,4,5 are all less than 4,9,16, resulting in True, False, False, False. DataFrame.where(cond, other) replaces the False values with None which is python's standard for null.