d = {'col1': [1.1, 2.1, "ERR", 0.1], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
print(df)
df["t"] = np.where(df["col1"] == "ERR", "T", "F")
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1)
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1.1)
print(df)
# works fine
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1)
# TypeError: can't multiply sequence by non-int of type 'float'
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1.1)
Not able to understand why this is happening
CodePudding user response:
Problem is mixed numeric with non numeric values, possible solution is convert ouput to numeric:
s = pd.to_numeric(df["col1"], errors='coerce')
df["test"] = np.where(df["col1"] == "ERR", s, s*1)
df["test"] = np.where(df["col1"] == "ERR", s, s*1.1)
print(df)
col1 col2 t test
0 1.1 3 F 1.21
1 2.1 4 F 2.31
2 ERR 5 T NaN
3 0.1 6 F 0.11
print(s)
0 1.1
1 2.1
2 NaN
3 0.1
Name: col1, dtype: float64
CodePudding user response:
The issue is not with np.where
. The issue lies here:
df["col1"]*1.1
The type of the column is object while you are trying to multiply it with a float. You can only do that to a numeric column.
CodePudding user response:
The error is due to the fact that running df["col1"]*1.1
will try to multiply 'ERR' by 1.1, which is impossible. You don't have the error with 1
as integer*string is a valid operation in python, for example 2*'ABC'
gives 'ABCABC'
(which is still unwanted in your case and only works as you expect with 1
).
If you want to keep the column intact but multiply the non-ERR values, you can slice only the numeric values:
df.loc[df['col1'].ne('ERR'), 'col1'] *= 1.1
output:
col1 col2
0 1.21 3
1 2.31 4
2 ERR 5
3 0.11 6
NB. for more generic methods to identify the rows, if you have a list of values that should be skipped:
m = ~df['col1'].isin(['ERR', 'INVALID'])
or to skip all non-number values:
m = pd.to_numeric(df['col1'], errors='coerce').notna()
then update:
df.loc[m, 'col1'] *= 1.1