Replace some string values in a pandas dataframe column whose values should be float-CodePudding

I have a dataframe with lat long values, whose types should be float. However, for some rows, you can find things like -74.128815°, with the '°' character in the end of the string.

id	Lat	Long
1	4.807	-75.684
2	4.5405	-75.6658
3		-74.128815°
4	5.35002	-72.4002
5	4.6774°	-75.693

I want to keep all float values as they are but replace the values containing '°' (and then convert them to float), so that in the end I have this:

id	Lat	Long
1	4.807	-75.684
2	4.5405	-75.6658
3		-74.128815
4	5.35002	-72.4002
5	4.6774	-75.693

The DataFrame is named df. I have tried df[df['Lat'].str.contains('°')] which raises ValueError: Cannot mask with non-boolean array containing NA / NaN values error

Also, I've also tried df['Lat'] = np.where(df['Lat'].str.contains('°'), df['Lat'][:-1], df['Lat']) which raises ValueError: operands could not be broadcast together

CodePudding user response：

You can replace the invalid character with an empty string, and then just use pd.to_numeric:

degree_sign = u'\N{DEGREE SIGN}' # or degree_sign = "°"
df[['lat', 'long']].replace(degree_sign, '', regex=True)\
                   .apply(pd.to_numeric, errors='coerce')

CodePudding user response：

Using regex replace the last non-digit character(s) ('\D ')in the columns, then convert to float:

df[['Lat', 'Long']].replace('\\D $', '', regex = True).astype(float)

       Lat       Long
0  4.80700 -75.684000
1  4.54050 -75.665800
2      NaN -74.128815
3  5.35002 -72.400200
4  4.67740 -75.693000