Home > database >  For loop on column in dataframe
For loop on column in dataframe

Time:03-02

I am trying to calculate an equation per row in a dataframe and assign the value to a new column :

def exercise_02():
    df_region = df.groupby(by = "region").sum()
    for i in range(len(df_region)):
        i == 0
        df_region["w_avg"] = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
        i = i 1
    result = df_region
    return result

when I only write this, it shows this output: enter image description here

As you can see, the column w_avg has been created but it contains the same value.

I tried to solve by adding [i] after the column name inside the loop:

def exercise_02():
    df_region = df.groupby(by = "region").sum()
    for i in range(len(df_region)):
        i == 0
        **df_region["w_avg"][i]** = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
        i = i 1
    result = df_region
    return result

But instead, I get this error message:

 if tolerance is not None:

KeyError: 'w_avg'

Do you have any idea what I'm doing wrong? Thank you!

CodePudding user response:

The great thing about DataFrames is that you do not need to loop. They are vectorised. If you were to change your (homework?) function to

def exercise_02():
    df_region = df.groupby(by = "region").sum()
        df_region["w_avg"] = df_region["2018_x"] * df_region["2018_y"] / df_region["2018_y"]
    return df_region

However, I notice that you are mixing up multiple dataframes. If they do not have the same size, you will run into issues regarding length/index issues. That really depends on you relation between df_region, df, and df_1.

The reason I mention the latter is that you more or less accidentally(?) used variables outside the scope of your function. Your function has no parameters, but uses df and df1 inside its scope. So, there's definitely something that is missing in your question you need to understand yourself, or convey to the community to completely 'anwer' your question.

  • Related