I have downloaded a dataset with my financial results from my stockbroker. One column contains the results for each trade:
df = pd.DataFrame({'Profit': ['-$315.00', '$605.00', '$680.00', '-$140.00 ']})
I want to change the type of this column from object to float64. I wanted to remove the '$' sign, and change the ',' sign to '.' :
df['Profit'] = df['Profit'].str.replace('$', '')
df['Profit'] = df['Profit'].str.replace(',', '.')
Then, when I try to transform the type of an object column to float64, I get the error: "ValueError: could not convert string to float: '-1\xa0000.00 '"
Here is the code I use for the transformation
df['Profit'] = df['Profit'].astype('float64')
I also tried to add but the same error always occurs Here is the additional code I use:
df = df.round(decimals=0)
df['Profit'] = df['Profit'].astype('float64')
How to transform the type of an object column to float64?
When I use these lines of code with synthetic data like the one in the example, it works, but not with my csv data which are identical (copy and paste)
How to transform the type of a column of object to float64 and not to have any more this error: "ValueError: could not convert string to float: '-1\xa0000.00 '" ?
CodePudding user response:
\xa0
is a non-breaking space (used as thousand separator here), which is not removed by your replace.
A safer approach might be to remove all non digits/dot/minus characters:
df = pd.DataFrame({'Profit': ['-$1\xa0000.00', '$605.00', '$680.00', '-$140.00 ']})
df['Profit'].str.replace('[^\d\.-]', '', regex=True).astype(float)
for only $
and \xa0
:
df['Profit'].str.replace('[$\xa0]', '', regex=True).astype(float)
output:
0 -1000.0
1 605.0
2 680.0
3 -140.0
Name: Profit, dtype: float64