Home > database >  Replace some string values in a pandas dataframe column whose values should be float
Replace some string values in a pandas dataframe column whose values should be float

Time:06-10

I have a dataframe with lat long values, whose types should be float. However, for some rows, you can find things like -74.128815°, with the '°' character in the end of the string.

id Lat Long
1 4.807 -75.684
2 4.5405 -75.6658
3 -74.128815°
4 5.35002 -72.4002
5 4.6774° -75.693

I want to keep all float values as they are but replace the values containing '°' (and then convert them to float), so that in the end I have this:

id Lat Long
1 4.807 -75.684
2 4.5405 -75.6658
3 -74.128815
4 5.35002 -72.4002
5 4.6774 -75.693

The DataFrame is named df. I have tried df[df['Lat'].str.contains('°')] which raises ValueError: Cannot mask with non-boolean array containing NA / NaN values error

Also, I've also tried df['Lat'] = np.where(df['Lat'].str.contains('°'), df['Lat'][:-1], df['Lat']) which raises ValueError: operands could not be broadcast together

CodePudding user response:

You can replace the invalid character with an empty string, and then just use pd.to_numeric:

degree_sign = u'\N{DEGREE SIGN}' # or degree_sign = "°"
df[['lat', 'long']].replace(degree_sign, '', regex=True)\
                   .apply(pd.to_numeric, errors='coerce')

CodePudding user response:

Using regex replace the last non-digit character(s) ('\D ')in the columns, then convert to float:

df[['Lat', 'Long']].replace('\\D $', '', regex = True).astype(float)

       Lat       Long
0  4.80700 -75.684000
1  4.54050 -75.665800
2      NaN -74.128815
3  5.35002 -72.400200
4  4.67740 -75.693000
  • Related