I have a dataframe where every column has numeric values like 5,12; 3,14; 12,01...
in object dtype.
I want to iterate through the table to convert the dtype to float.
Therefore, I made a list of all column names to replace the ',' with '.' of every value and then convert it into the right type.
My code looks like this:
for x in columnList:
x.replace(',' , '.')
x.astype(float)
Data:
Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
2020-01-01 5,12 3,14 12,1
2020-01-02 6,84 16,4 12,1
.
.
.
Unfortunately, I always get an AttributeError. I hope someone can give me a hint on how to fix it.
CodePudding user response:
You need to iterate over each of the columns, converting each column to strings (with Series.str
) to allow replacement and then converting those values to floats. To convert empty cells to NaN
we first replace them with the string 'NaN'
:
df = pd.DataFrame({
'Timestamp': ['2020-01-01', '2020-01-02'],
'Ins_W/m2': ['5,12', '6,84'],
'GenPowerW1': ['3,14', ''],
'GenPowerW2': ['12,1', '16,4'],
'GenPowerW3': ['', '12,1']
})
df
# Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
# 0 2020-01-01 5,12 3,14 12,1
# 1 2020-01-02 6,84 16,4 12,1
columnList = ['Ins_W/m2', 'GenPowerW1', 'GenPowerW2', 'GenPowerW3']
for col in columnList :
df[col] = df[col].str.replace(',', '.').replace('', 'NaN').astype(float)
df
# Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
# 0 2020-01-01 5.12 3.14 12.1 NaN
# 1 2020-01-02 6.84 NaN 16.4 12.1
df['GenPowerW1']
# 0 3.14
# 1 NaN
# Name: GenPowerW1, dtype: float64
CodePudding user response:
Perhaps you could try something like this:
df = pd.DataFrame([["1,12", "3,14", ""], ["12,1", "234,1", "21,1"]], columns=["INS", "GEN_POWER1", "GEN_POWER2"])
df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(lambda x: x.str.replace(",", "."))
df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(pd.to_numeric, errors='coerce')
This applies the lambda function to each column that you need, and then converts each of those columns to a float using pd.to_numeric. The output looks like this:
INS GEN_POWER1 GEN_POWER2
0 1.12 3.14 NaN
1 12.10 234.10 21.1