Home > Enterprise >  RegEx negation to handle decimal values in Pandas dataframe using .replace()
RegEx negation to handle decimal values in Pandas dataframe using .replace()

Time:03-05

I have the following Pandas dataframe:

foo = {
    'Sales' : [200, 'bar', 400, 500],
    'Expenses' : [70, 90, 'baz', 170],
    'Other' : [2.5, 'spam', 70, 101.25]
}

df = pd.DataFrame(foo)

Sales   Expenses    Other
200     70          2.5
bar     90          spam
400     baz         70
500     170         101.25

I'd like to remove non-numeric values and replace with NaN. I do so as follows:

df['Other'] = df['Other'].replace('[^0-9\.]', np.NaN, regex=True)

This gets me:

Sales   Expenses    Other
200     70          2.5
bar     90          NaN
400     baz         70
500     170         101.25

The decimals are not handled. I would expect [^0-9\.] to handle the decimal, but it doesn't. The following (without the escaped decimal) results in the same output:

df['Other'] = df['Other'].replace('[^0-9]', np.NaN, regex=True)

Sales   Expenses    Other
200     70          2.5
bar     90          NaN
400     baz         70
500     170         101.25

How do I treat the decimals?

Thanks!

CodePudding user response:

Regex will only work on strings. You can cast all values to strings using .astype(str)

df['Other'].astype(str).replace('[^0-9]', np.NaN, regex=True)
  • Related