I am trying to calculate an equation per row in a dataframe and assign the value to a new column :
def exercise_02():
df_region = df.groupby(by = "region").sum()
for i in range(len(df_region)):
i == 0
df_region["w_avg"] = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
i = i 1
result = df_region
return result
when I only write this, it shows this output: enter image description here
As you can see, the column w_avg has been created but it contains the same value.
I tried to solve by adding [i] after the column name inside the loop:
def exercise_02():
df_region = df.groupby(by = "region").sum()
for i in range(len(df_region)):
i == 0
**df_region["w_avg"][i]** = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
i = i 1
result = df_region
return result
But instead, I get this error message:
if tolerance is not None:
KeyError: 'w_avg'
Do you have any idea what I'm doing wrong? Thank you!
CodePudding user response:
The great thing about DataFrames is that you do not need to loop. They are vectorised. If you were to change your (homework?) function to
def exercise_02():
df_region = df.groupby(by = "region").sum()
df_region["w_avg"] = df_region["2018_x"] * df_region["2018_y"] / df_region["2018_y"]
return df_region
However, I notice that you are mixing up multiple dataframes. If they do not have the same size, you will run into issues regarding length/index issues. That really depends on you relation between df_region
, df
, and df_1
.
The reason I mention the latter is that you more or less accidentally(?) used variables outside the scope of your function. Your function has no parameters, but uses df
and df1
inside its scope. So, there's definitely something that is missing in your question you need to understand yourself, or convey to the community to completely 'anwer' your question.