np.where float Multiplication Issue-CodePudding

d = {'col1': [1.1, 2.1, "ERR", 0.1], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
print(df)
df["t"] = np.where(df["col1"] == "ERR", "T", "F")
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1)
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1.1)
print(df)

# works fine
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1)
# TypeError: can't multiply sequence by non-int of type 'float'
df["test"] = np.where(df["col1"] == "ERR", df["col1"], df["col1"]*1.1)

Not able to understand why this is happening

CodePudding user response：

Problem is mixed numeric with non numeric values, possible solution is convert ouput to numeric:

s = pd.to_numeric(df["col1"], errors='coerce')
df["test"] = np.where(df["col1"] == "ERR", s, s*1)
df["test"] = np.where(df["col1"] == "ERR", s, s*1.1)
print(df)
  col1  col2  t  test
0  1.1     3  F  1.21
1  2.1     4  F  2.31
2  ERR     5  T   NaN
3  0.1     6  F  0.11


print(s)
0    1.1
1    2.1
2    NaN
3    0.1
Name: col1, dtype: float64

CodePudding user response：

The issue is not with np.where. The issue lies here:

df["col1"]*1.1

The type of the column is object while you are trying to multiply it with a float. You can only do that to a numeric column.

CodePudding user response：

The error is due to the fact that running df["col1"]*1.1 will try to multiply 'ERR' by 1.1, which is impossible. You don't have the error with 1 as integer*string is a valid operation in python, for example 2*'ABC' gives 'ABCABC' (which is still unwanted in your case and only works as you expect with 1).

If you want to keep the column intact but multiply the non-ERR values, you can slice only the numeric values:

df.loc[df['col1'].ne('ERR'), 'col1'] *= 1.1

output:

   col1  col2
0  1.21     3
1  2.31     4
2   ERR     5
3  0.11     6

NB. for more generic methods to identify the rows, if you have a list of values that should be skipped:

m = ~df['col1'].isin(['ERR', 'INVALID'])

or to skip all non-number values:

m = pd.to_numeric(df['col1'], errors='coerce').notna()

then update:

df.loc[m, 'col1'] *= 1.1