Home > Software design >  Using numpy.where function with multiple conditions but getting valueError
Using numpy.where function with multiple conditions but getting valueError

Time:05-15

So I have a dataframe with multiple columns with numbers in them. It looks like this:

H C T P R
300 200 500 0.3
500 400 300 0.2

I'm trying to perform operations on columns H, C, T, P and fill in column R.

For example,

df['R'] = numpy.where(df['H'] > df['T'] and df['P'] > 0,
                      df['C'] / df['T'] - 1, 0)

I would like the operation to be performed row by row

  1. when the value of 'H' in nth row > the value of 'T' in nth row
  2. when the value in 'P' in nth row is greater than 0

But if I run the code, I get "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

What do I need to fix? Perhaps I need to use row['column name']? Any help is appreciated!

CodePudding user response:

You should use bitwise & and parantheses, rather than and.

df['R'] = numpy.where((df['H'] > df['T']) & (df['P'] > 0),
                      df['C'] / df['T'] - 1, 0)

CodePudding user response:

Try this.

import pandas as pd
d = {'H': [300, 500], 'C': [200, 400], 'T': [500, 300], 'P': [0.3, 0.2]}
df = pd.DataFrame(d)
df

Create a function:

def calc(row):
    if row[0] > row[2]:
        if row[3] > 0:
            return row[1]/row[2] - 1
    else:
        return 0

Then apply the function, by row

df['R'] = df.apply(calc, axis=1)
df
  • Related