I have a dataset df
with two columns ID
and Value
. Both are of Dtype "object". However, I would like to convert the column Value
to Dtype "double" with a dot as decimal separator. The problem is that the values of this column contain noise due to the presence of too many commas (e.g. 0,1,,) - or after replacement too many dots (e.g. 0.1..). As a result, when I try to convert the Dtype to double, I get the error message: could not convert string to float: '0.2.'
Example code:
#required packages
import pandas as pd
import numpy as np
# initialize list of lists
data = [[1, '0,1'], [2, '0,2,'], [3, '0,01,,']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['ID', 'Value'])
#replace comma with dot as separator
df = df.replace(',', '.', regex=True)
#examine dtype per column
df.info()
#convert dtype from object to double
df = df.astype({'Value': np.double}) #this is where the error message appears
The preferred outcome is to have the values within the column Value
as 0.1
, 0.2
and 0.01
respectively.
How can I get rid of the redundant commas or, after replacement, dots in the values of the column Values
?
CodePudding user response:
One option: use string functions to convert and strip the values. For example:
#required packages
import pandas as pd
import numpy as np
# initialize list of lists
data = [[1, '0,1'], [2, '0,2,'], [3, '0,01,,']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['ID', 'Value'])
#replace comma with dot as separator
df['Value'] = df['Value'].str.replace(',', '.', 1).str.rstrip(',')
#examine dtype per column
df.info()
#convert dtype from object to double
df = df.astype({'Value': np.double})
print("------ df:")
print(df)
prints:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 3 non-null int64
1 Value 3 non-null object
dtypes: int64(1), object(1)
memory usage: 176.0 bytes
----- df:
ID Value
0 1 0.10
1 2 0.20
2 3 0.01