Home > Back-end >  IF Comparison between elements of a data frame
IF Comparison between elements of a data frame

Time:03-18

I have a data frame as follows;

return Upper lower
50 70 20
10 15 3

I'm trying to count how many times the return is in-between the upper and lower. I have tried to create another bool type column if the condition is true.

for val in data['return']:
    
    if data['return'] <  data['upper'] or data['return']> data['lower']:
         data['Predicted'] = 1
    else:
        data['Predicted'] = 0

where data[predicted] should be the new column.

However I get the error

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried changing the operator to |, but it didn't work. I'm new to python and are unsure what way to best solve this.

For context my goal is to calculated how many times it has predicted it right. I am not sure if this method is the best way.

CodePudding user response:

how many times the return is in-between the upper and lower

It seems you rather need AND operator. You could use between here instead of iterating over the rows:

data['predicted'] = data['return'].between(data['lower'], data['Upper']).astype(int)

Output:

   return  Upper  lower  predicted
0      50     70     20          1
1      10     15      3          1

The error happens because data['return'], data['upper'] etc. are Series objects, so the comparisons yield boolean Series, which you can't use in an if-statement because it's expecting a True/False value.

CodePudding user response:

Other options.

import numpy as np
import pandas as pd
        
x = pd.DataFrame({'upper': [3, 4, 4], 'lower': [1, 1, 2], 'return': [2, 3, 5]})
x['pred'] = 0
x.loc[np.logical_and(x['return'] < x['upper'], x['return'] > x['lower']), 'pred'] = 1

I like this solution because it can be used for other problems.

  • Related