may be someone can help me. Would like to create function to convert objects to float. Tried to find some solution, but always get some errors:
# sample dataframe
d = {'price':['−$13.79', '−$ 13234.79', '$ 132834.79', 'R$ 75.900,00', 'R$ 69.375,12', '- $ 2344.92']}
df = pd.DataFrame(data=d)
I tried this code, first wanted just to find solution.
df['price'] = (df.price.str.replace("−$", "-").str.replace(r'\w \$\s ', '').str.replace('.', '')\
.str.replace(',', '').astype(float)) / 100
So idea was to convert -$ to - (for negative values). Then $ to ''.
But as a result I get:
ValueError: could not convert string to float: '−$1379'
CodePudding user response:
You can extract the numbers on one side, and identify whether there is a minus in the other side, then combine:
factor = np.where(df['price'].str.match(r'[−-]'), -1, 1)/100
out = (pd.to_numeric(df['price'].str.replace(r'\D', '', regex=True), errors='coerce')
.mul(factor)
)
output:
0 -13.79
1 -13234.79
2 132834.79
3 75900.00
4 69375.12
5 -2344.92
Name: price, dtype: float64
CodePudding user response:
Can you use re
?
Like this:
import re
df['price'] = float(re.sub(r'[^\-.0-9]', '', df.price.str)) / 100
I'm just removing by regex all the symbols that are not 0-9, ".", "," & "-".
BTW, no clue why you divide it by 100...
CodePudding user response:
df["price2"] = pd.to_numeric(df["price"].str.replace("[R$\s\.,]", "")) / 100
df["price3"] = df["price"].str.replace("[R$\s\.,]", "").astype(float) / 100
df
A few notes:
The dot is the regex symbel for everything.
The -
symbel you are using is not a minus. Its something else.
CodePudding user response:
df["price2"] = pd.to_numeric(df["price"].str.replace("[R$\s\.,]", "")) / 100
df["price3"] = df["price"].str.replace("[R$\s\.,]", "").astype(float) / 100
df
A few notes:
The dot is the regex symbel for everything.
The -
symbel you are using is not a minus. Its something else.
I would use something like https://regex101.com for debugging.