So I am trying to change some values in a df using pandas and, having already tried with df.replace
, df.mask
, and df.where
I got to the conclusion that it must be a logical mistake since it keeps throwing the same mistake:
ValueError: The truth value of a Series is ambiguous.
I am trying to normalize a column in a dataset, thus the function and not just a single line. I need to understand why my logic is wrong, it seems to be such a dumb mistake.
This is my function:
def overweight_normalizer():
if df[df["overweight"] > 25]:
df.where(df["overweight"] > 25, 1)
elif df[df["overweight"] < 25]:
df.where(df["overweight"] < 25, 0)
CodePudding user response:
df[df["overweight"] > 25]
is a DataFrame object, so it doesn't make sense to evaluate it by itself in an if-condition, which requires truth-values. The error is saying that you're attempting to fetch the boolean value of a pandas DataFrame object.
Now it seems you want to assign 1 if "overweight" value is >25, 0 otherwise. In that case, instead of the function, you can use np.where
:
np.where(df["overweight"] > 25, 1, 0)
or even
df["overweight"].gt(25).astype(int)
Here df["overweight"] > 25
or df["overweight"].gt(25)
creates a boolean Series (that matches the length of df
) and np.where
selects values depending on if it's True or not at each index.
If you need this inside the function and do something else, you can do that like:
def overweight_normalizer():
x = df["overweight"].gt(25).astype(int)
# do something else
CodePudding user response:
df[df["overweight"] > 25]
is not a valid condition.
Try this:
def overweight_normalizer():
df = pd.DataFrame({'overweight': [2, 39, 15, 45, 9]})
df["overweight"] = [1 if i > 25 else 0 for i in df["overweight"]]
return df
overweight_normalizer()
Output:
overweight
0 0
1 1
2 0
3 1
4 0