i have a dataset that i already filter like this
in this selection of this dataframe, i want to replace the value of "max
" and "critical"
column, because the "max" column is wrong, it should be showing the maximum value from pollutant value on that day ('pm10', 'so2', 'co', 'o3', 'no2') and the critical column should be showing the name of the maximum poluttant on that day
the desired result is :
tanggal stasiun pm10 so2 co o3 no2 max critical categori
3515 2020-12-01 DKI1 (Bunderan HI) 22 17 4 19 8 22 PM10 BAIK
3516 2020-12-02 DKI1 (Bunderan HI) 25 18 4 28 7 28 o3 BAIK
3518 2020-12-04 DKI1 (Bunderan HI) 39 29 8 52 17 52 o3 SEDANG
3520 2020-12-06 DKI1 (Bunderan HI) 31 22 7 30 9 31 pm10 BAIK
3521 2020-12-07 DKI1 (Bunderan HI) 25 22 6 18 9 25 pm10 BAIK
CodePudding user response:
first is necessary select column for processing - e.g. by first and last name in DataFrame.loc
:
df1 = df.loc[:, 'pm10':'no2']
Or remove max
and get numeric columns in DataFrame.select_dtypes
:
df1 = df.drop(['max'], axis=1).select_dtypes(np.number)
But because there are non numeric columns first convert them to numeric:
#for integers
df1 = df.loc[:, 'pm10':'no2'].astype(int)
#or for numeric if some bad values (strings)
df1 = df.loc[:, 'pm10':'no2'].apply(pd.to_numeric, errors='coerce')
And then assign max
and DataFrame.idxmax
:
df['max'] = df1.max(axis=1)
df['critical'] = df1.idxmax(axis=1)
print (df)
tanggal stasiun pm10 so2 co o3 no2 max critical \
3515 2020-12-01 DKI1 (Bunderan HI) 22 17 4 19 8 22 pm10
3516 2020-12-02 DKI1 (Bunderan HI) 25 18 4 28 7 28 o3
3518 2020-12-04 DKI1 (Bunderan HI) 39 29 8 52 17 52 o3
3520 2020-12-06 DKI1 (Bunderan HI) 31 22 7 30 9 31 pm10
3521 2020-12-07 DKI1 (Bunderan HI) 25 22 6 18 9 25 pm10
categori
3515 BAIK
3516 BAIK
3518 SEDANG
3520 BAIK
3521 BAIK