Home > other >  I am getting ValueError: invalid literal for int() with base 10 with np.where function
I am getting ValueError: invalid literal for int() with base 10 with np.where function

Time:03-29

I want to change 'not available' value in a df column into 0, and for the rest of the values to change them into integers.

Unique values in the column are:

['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']

I run the following code to change values to integers:

df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))

I expect that the above would turn all values into integers, yet I get the value error

ValueError: invalid literal for int() with base 10: 'not available'

Any suggestion why the code does not work?

CodePudding user response:

Try with to_numeric instead:

df[col] = pd.to_numeric(df[col], errors="coerce").fillna(0)

>>> df[col]
0     30.0
1      0.0
2     45.0
3     60.0
4     40.0
5     90.0
6     21.0
7      5.0
8     75.0
9     29.0
10     8.0
11    10.0

Alternatively to only convert "not available" to 0 and convert the other strings to NaN:

df[col] = pd.to_numeric(df[col].replace("not available", 0), errors="coerce")

CodePudding user response:

Before doing

df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))

it is neccessary to compute

df[col] == 'not available'
0
df[col].astype(int)

Latter meaning int version for all which fails, as not available does not make sense as integer, you might avoid this problem by using pandas.Series.apply combined with lambda holding ternary operator as follows

import pandas as pd
df = pd.DataFrame({"col1":['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']})
col = "col1"
df[col] = df[col].apply(lambda x:0 if x=='not available' else int(x))
print(df)

output

    col1
0     30
1      0
2     45
3     60
4     40
5     90
6     21
7      5
8     75
9     29
10     8
11    10

This way int is applied only to record which is not equal 'not available'

  • Related