I want to change 'not available' value in a df column into 0, and for the rest of the values to change them into integers.
Unique values in the column are:
['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']
I run the following code to change values to integers:
df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))
I expect that the above would turn all values into integers, yet I get the value error
ValueError: invalid literal for int() with base 10: 'not available'
Any suggestion why the code does not work?
CodePudding user response:
Try with to_numeric
instead:
df[col] = pd.to_numeric(df[col], errors="coerce").fillna(0)
>>> df[col]
0 30.0
1 0.0
2 45.0
3 60.0
4 40.0
5 90.0
6 21.0
7 5.0
8 75.0
9 29.0
10 8.0
11 10.0
Alternatively to only convert "not available" to 0 and convert the other strings to NaN
:
df[col] = pd.to_numeric(df[col].replace("not available", 0), errors="coerce")
CodePudding user response:
Before doing
df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))
it is neccessary to compute
df[col] == 'not available'
0
df[col].astype(int)
Latter meaning int
version for all which fails, as not available
does not make sense as integer, you might avoid this problem by using pandas.Series.apply
combined with lambda
holding ternary operator as follows
import pandas as pd
df = pd.DataFrame({"col1":['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']})
col = "col1"
df[col] = df[col].apply(lambda x:0 if x=='not available' else int(x))
print(df)
output
col1
0 30
1 0
2 45
3 60
4 40
5 90
6 21
7 5
8 75
9 29
10 8
11 10
This way int
is applied only to record which is not equal 'not available'