Convert a column's values to numeric with European delimiters-CodePudding

Consider this sample data frame df_1:

index    value_1

1        -3.570,00
2         552,76
3        -1,01
4        -100.234,01

where the float values are signed and Europian delimiters/separators have been used:

comma , for decimal
dot/point/period . for thousands

I want to convert the values of this column to float. If I try the instruction from here

# tag 1
df_1['value_1'] = df_1['value_1'].apply(pd.to_numeric)

I get the error message

ValueError: Unable to parse string "<...>" at position <...>

I could use the instruction from here

# tag 2
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace('.',''))
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace(',','.'))

prior to # tag 1, however, I get the message:

C:\Users\userName\AppData\Local\Temp\ipykernel_11992\3059588848.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
...

even though this workaround does the job, I wonder if there is a more canonical way to achieve what I want without getting any warnings?

CodePudding user response：

Check out the locale module. Documentation

Example:

import locale
locale.setlocale(locale.LC_NUMERIC, 'eu')

df.value_1 = df.value_1.apply(locale.atof)
print(df)

Output:

     value_1
0   -3570.00
1     552.76
2      -1.01
3 -100234.01

CodePudding user response：

If you're reading from CSV, you can use the decimal and thousands parameters:

df = pd.read_csv(..., decimal=',', thousands='.')

From the documentation:

thousands : str, optional

Thousands separator.

decimal : str, default ‘.’

Character to recognize as decimal point (e.g. use ‘,’ for European data).

_{Due credit to atomh33ls for posting almost exactly this on another question.}