How to remove strange encoding from pandas df-CodePudding

I have a following df:

import pandas as pd

df = pd.DataFrame({"name" : ["a", "b", "c"], "value" : ['1\xa0412', 4, 2]})

I would like to replace '1\xa0412' with 1. I try this:

df['value'] = df['value'].str.replace(r'\\.*', '', regex=True)

But it does not work. How can I solve it, please?

CodePudding user response：

Try using the unidecode library to process the data first, and then try to replace it. It worked for me for a similar problem.

CodePudding user response：

try:

df.value = df.value.apply(repr).str.replace(r"(\\.*)|\'", r"", regex=True)

result:

    name    value
0   a       1
1   b       4
2   c       2

but be careful because the column value is of type object. If you want another dtype you have to convert the column.