I have the following Pandas dataframe:
foo = {
'Sales' : [200, 'bar', 400, 500],
'Expenses' : [70, 90, 'baz', 170],
'Other' : [2.5, 'spam', 70, 101.25]
}
df = pd.DataFrame(foo)
Sales Expenses Other
200 70 2.5
bar 90 spam
400 baz 70
500 170 101.25
I'd like to remove non-numeric values and replace with NaN
. I do so as follows:
df['Other'] = df['Other'].replace('[^0-9\.]', np.NaN, regex=True)
This gets me:
Sales Expenses Other
200 70 2.5
bar 90 NaN
400 baz 70
500 170 101.25
The decimals are not handled. I would expect [^0-9\.]
to handle the decimal, but it doesn't. The following (without the escaped decimal) results in the same output:
df['Other'] = df['Other'].replace('[^0-9]', np.NaN, regex=True)
Sales Expenses Other
200 70 2.5
bar 90 NaN
400 baz 70
500 170 101.25
How do I treat the decimals?
Thanks!
CodePudding user response:
Regex will only work on strings. You can cast all values to strings using .astype(str)
df['Other'].astype(str).replace('[^0-9]', np.NaN, regex=True)