I am trying to work on a requirement where I have to fill values inside a dataframe as NaN if it goes beyond a certain value.
s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
p=pd.DataFrame(data=s)
k=(p.qty p.price) # Not sure if this is the right way as per the requirement.
The condition is that if column 2018 or 19 or 20 or 21 has a value greater than k, then fill that value as NaN.
Say if k=3, the fourth row in 2018 with value 4 will be NaN. k value will be different for all columns, hence it needs to be calculated column wise and accordingly value has to be NaN.
How would I be able to do this?
CodePudding user response:
It is actually very simple. What you need is to learn more about the logical statements in pandas dataframes. To solve your problem, you can try code below:
s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021':[11,12,34,42], 'qty':[45,22,12,42],'price':[22,33,44,55]}
p=pd.DataFrame(data=s)
k = 4
p[p<k]
Output
2018 | 2019 | 2020 | 2021 | qty | price | |
---|---|---|---|---|---|---|
0 | nan | nan | 4 | 11 | 45 | 22 |
1 | nan | nan | 6 | 12 | 22 | 33 |
2 | nan | 4 | 8 | 34 | 12 | 44 |
3 | 4 | 5 | 9 | 42 | 42 | 55 |
Note that k = (p.qty p.price)
will return a numpy array, not a scalar value.
CodePudding user response:
Once you figure out exactly what k = (p.qty p.price) is, you can update it. However, I think the way you want to solve this is using the gt() operator on a column by column basis. Here's my solution.
import pandas as pd
s={'2018':[1,2,3,4],'2019':[2,3,4,5],'2020':[4,6,8,9],'2021': [11,12,34,42], 'qty':[1,2,3,4], 'price':[1,2,3,4]}
p=pd.DataFrame(data=s)
k = (p.qty * p.price)
needed = p[['qty', 'price']]
p = p.where(p.gt(k, axis=0), None)
p[['qty','price']] = needed
print(p)
This Outputs:
2018 2019 2020 2021 qty price
0 NaN 2.0 4.0 11 1 1
1 NaN NaN 6.0 12 2 2
2 NaN NaN NaN 34 3 3
3 NaN NaN NaN 42 4 4
A few notes. I save and re-add the final columns. However, if you do not need those you can remove lines with the word needed. The line with the bulk of the code is p = p.where(p.gt(k, axis=0), None). In this current example, my comparisons are on the column level. So, '2019' : 2,3,4,5 gets compared to k : 1,4,9,16. Showing 2 > 1, but 3,4,5 are all less than 4,9,16, resulting in True, False, False, False. DataFrame.where(cond, other) replaces the False values with None which is python's standard for null.