I have a Pandas dataframe that I've read from a file - pd.read_csv()
- and I'm having trouble converting a column with string values to float.
Firstly, I'm not entirely sure why pandas is even reading the column as string files to begin with - all the values are numeric. The problem seems to be with the hyphen minus sign for the negative numbers. There are other threads on this topic that mention how em-dash can mess things up (here, for example)
However, when I try converting the hyphen type, it still gives me an error. For example,
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").astype(float)
doesn't change anything; all the values start with the '-'
hyphen, so it's not actually replacing anything. It still gives me the error:
ValueError: could not convert string to float: '-'
I've tried replacing all of the hyphens with a numeric value to see if that would work, and I'm able to convert to float (example: df['Verified_m'] = df['Verified_m'].str.replace("-", "0").astype(float)
. But I'd like to retain the negative values in the dataset. Does anyone know what's wrong with my hyphens?
CodePudding user response:
Try this:
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").str.replace(r'^-$', '0', regex=True).astype(float)
After converting the em-dashes to hyphens, it converts a lone -
to zero.