I have the following data frame regarding the quality of air (each row has values of various air indicators measured at certain day) :
my_data = pd.DataFrame.from_dict({
'so2': [ 5.6, 4.6, 6.1, 10.7, 2.6],
'co': [ 9.1, 8.2, 9.1, 18.1, 3.7],
'o3': [2.9, 11.0, 4.1, 2.1, 18.7],
'pm25': [124.0, 114.0, 104.0, 190.0, 46.0],
'no2': [29.8, 22.4, 26.1, 41.6, 11.5]
})
I would like to transform these values into categories - colours (from green to red, 4 categories). I have another Data Frame containing the intervals for each colour.
quality_standards_colours = pd.DataFrame.from_dict({
'Category': ['green', 'yellow',' orange', 'red'],
'so2_min': [0, 0.864, 1.824, 4.464],
'co_min': [0, 13.5, 28.5, 37.5],
'o3_min': [0, 0.165, 0.213, 0.258],
'pm25_min': [0, 12.1, 35.5, 55.5],
'no2_min': [0, 1.296, 2.424, 8.664]
})
I would like to get the following Data Frame:
my_categorized_data = pd.DataFrame.from_dict({
'so2': ['so2_red', 'so2_red', 'so2_red ', 'so2_red', 'so2_orange'],
'co': [ 'co_orange', 'co_red', 'co_green ', 'co_red', 'co_yellow'],
'o3': [.....],
'pm25': [....],
'no2': [.....]
})
Currently I know following solution ( I need to apply it for each variable):
bins_so2 = [0, 0.864, 1.824, 4.464, np.inf] #this is the column so2_min from quality_standards_colours
names = ['so2_green', 'so2_yellow','so2_orange', 'so2_red']
my_data['so2'] = pd.cut(my_data['so2'], bins, labels=names)
The problem is that this solution is time consuming - I have to write these lines of code for each variable (and I have many). I would like to write loop to speed up this process.
CodePudding user response:
You can use merge_asof
, which will cut according to the key:
# rename so you have the same keys
quality_standards_colours.columns = [x.split('_min')[0] for x in quality_standards_colours.columns]
out = pd.merge_asof(my_data.stack().reset_index(name='value').sort_values('value'),
quality_standards_colours.melt('Category').sort_values('value'),
on='value',left_by='level_1', right_by='variable'
).pivot('level_0', 'level_1', 'Category')
Output:
level_1 co no2 o3 pm25 so2
level_0
0 green red red red red
1 green red red red red
2 green red red red red
3 yellow red red red red
4 green red red orange orange