Home > database >  How to iterate through a dataframe to format values?
How to iterate through a dataframe to format values?

Time:09-02

I have a dataframe where every column has numeric values like 5,12; 3,14; 12,01... in object dtype. I want to iterate through the table to convert the dtype to float. Therefore, I made a list of all column names to replace the ',' with '.' of every value and then convert it into the right type.

My code looks like this:

for x in columnList:
     x.replace(',' , '.') 
     x.astype(float)

Data:

Timestamp    Ins_W/m2  GenPowerW1 GenPowerW2 GenPowerW3
2020-01-01      5,12     3,14         12,1
2020-01-02      6,84                  16,4       12,1
.
.
.

Unfortunately, I always get an AttributeError. I hope someone can give me a hint on how to fix it.

CodePudding user response:

You need to iterate over each of the columns, converting each column to strings (with Series.str) to allow replacement and then converting those values to floats. To convert empty cells to NaN we first replace them with the string 'NaN':

df = pd.DataFrame({
  'Timestamp': ['2020-01-01', '2020-01-02'],
  'Ins_W/m2': ['5,12', '6,84'],
  'GenPowerW1': ['3,14', ''],
  'GenPowerW2': ['12,1', '16,4'],
  'GenPowerW3': ['', '12,1']
})
df
#      Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
#  0  2020-01-01     5,12       3,14       12,1
#  1  2020-01-02     6,84                  16,4       12,1
columnList = ['Ins_W/m2', 'GenPowerW1', 'GenPowerW2', 'GenPowerW3']
for col in columnList :
    df[col] = df[col].str.replace(',', '.').replace('', 'NaN').astype(float)
df
#      Timestamp  Ins_W/m2  GenPowerW1  GenPowerW2  GenPowerW3
#  0  2020-01-01      5.12        3.14        12.1         NaN
#  1  2020-01-02      6.84         NaN        16.4        12.1
df['GenPowerW1']
#  0    3.14
#  1     NaN
#  Name: GenPowerW1, dtype: float64

CodePudding user response:

Perhaps you could try something like this:

df = pd.DataFrame([["1,12", "3,14", ""], ["12,1", "234,1", "21,1"]], columns=["INS", "GEN_POWER1", "GEN_POWER2"])

df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(lambda x: x.str.replace(",", "."))

df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(pd.to_numeric, errors='coerce')

This applies the lambda function to each column that you need, and then converts each of those columns to a float using pd.to_numeric. The output looks like this:

     INS  GEN_POWER1  GEN_POWER2
0   1.12        3.14         NaN
1  12.10      234.10        21.1
  • Related