Consider this sample data frame df_1
:
index value_1
1 -3.570,00 2 552,76 3 -1,01 4 -100.234,01
where the float values are signed and Europian delimiters/separators have been used:
- comma
,
for decimal - dot/point/period
.
for thousands
I want to convert the values of this column to float. If I try the instruction from here
# tag 1
df_1['value_1'] = df_1['value_1'].apply(pd.to_numeric)
I get the error message
ValueError: Unable to parse string "<...>" at position <...>
I could use the instruction from here
# tag 2
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace('.',''))
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace(',','.'))
prior to # tag 1
, however, I get the message:
C:\Users\userName\AppData\Local\Temp\ipykernel_11992\3059588848.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead ...
even though this workaround does the job, I wonder if there is a more canonical way to achieve what I want without getting any warnings?
CodePudding user response:
Check out the locale
module. Documentation
Example:
import locale
locale.setlocale(locale.LC_NUMERIC, 'eu')
df.value_1 = df.value_1.apply(locale.atof)
print(df)
Output:
value_1
0 -3570.00
1 552.76
2 -1.01
3 -100234.01
CodePudding user response:
If you're reading from CSV, you can use the decimal
and thousands
parameters:
df = pd.read_csv(..., decimal=',', thousands='.')
From the documentation:
thousands : str, optional
Thousands separator.
decimal : str, default ‘.’
Character to recognize as decimal point (e.g. use ‘,’ for European data).
Due credit to atomh33ls for posting almost exactly this on another question.