Home > Net >  Python to check for a particular condition in a dataframe
Python to check for a particular condition in a dataframe

Time:02-22

I am trying to work on a requirement where I have to fill values inside a dataframe as NaN if it goes beyond a certain value.

    s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
    p=pd.DataFrame(data=s)
    k=(p.qty p.price) # Not sure if this is the right way as per the requirement.

The condition is that if column 2018 or 19 or 20 or 21 has a value greater than k, then fill that value as NaN.

Say if k=3, the fourth row in 2018 with value 4 will be NaN. k value will be different for all columns, hence it needs to be calculated column wise and accordingly value has to be NaN.

How would I be able to do this?

CodePudding user response:

It is actually very simple. What you need is to learn more about the logical statements in pandas dataframes. To solve your problem, you can try code below:

s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
p=pd.DataFrame(data=s)
k = 4
p[p<k]

Output

2018 2019 2020 2021 qty price
0 nan nan 4 11 45 22
1 nan nan 6 12 22 33
2 nan 4 8 34 12 44
3 4 5 9 42 42 55

Note that k = (p.qty p.price) will return a numpy array, not a scalar value.

CodePudding user response:

Once you figure out exactly what k = (p.qty p.price) is, you can update it. However, I think the way you want to solve this is using the gt() operator on a column by column basis. Here's my solution.

    import pandas as pd
    s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021': [11,12,34,42], 'qty':[1,2,3,4], 'price':[1,2,3,4]}
    p=pd.DataFrame(data=s)
    k = (p.qty * p.price)
    needed = p[['qty', 'price']]
    p = p.where(p.gt(k, axis=0), None)
    p[['qty','price']] = needed
    print(p)

This Outputs:

   2018  2019  2020  2021  qty  price
0   NaN   2.0   4.0    11    1      1
1   NaN   NaN   6.0    12    2      2
2   NaN   NaN   NaN    34    3      3
3   NaN   NaN   NaN    42    4      4

A few notes. I save and re-add the final columns. However, if you do not need those you can remove lines with the word needed. The line with the bulk of the code is p = p.where(p.gt(k, axis=0), None). In this current example, my comparisons are on the column level. So, '2019' : 2,3,4,5 gets compared to k : 1,4,9,16. Showing 2 > 1, but 3,4,5 are all less than 4,9,16, resulting in True, False, False, False. DataFrame.where(cond, other) replaces the False values with None which is python's standard for null.

  • Related