I need to compare my column value (113 839 values) with the mean-value(rainfall) of a category (Location)(44 values). If it is higher than my mean value it should be replaced by the mean value. My foreach does not work:
df_rainfall = pd.DataFrame(weather_train_data_total.groupby(['Location'])['Rainfall'].mean())
for column in weather_train_data_total[['Location']]:
result = weather_train_data_total[column]
print(result)
if result.equals(df_rainfall['Location']):
result = df_rainfall['Rainfall']
CodePudding user response:
Without data, it's always tricky to help but you can try to adapt this:
# calculate and assign the average value for each group
df["mean_val"] = df.groupby("Location")["Rainfall"].transform("mean")
# identify rows in which the value is above the average
relevant_rows = df["mean_val"] < df["Rainfall"]
# replace these values with their corresponding average
df.loc[relevant_rows, ["Rainfall"]] = df.loc[relevant_rows, ["mean_val"]]["mean_val"]
df