I have dataframe for temperature, precipitation:
I want to categorize the precipitation for the following types;
* 0: No precipitation
* 1: Snow
* 2: Mixed snow and rain
* 3: Rain
* 4: Drizzle
* 5: Freezing rain
* 6: Freezing drizzle
I tried the following function:
def func(x):
if smhi['Temperature'] < -8 and smhi['Precipitation'] > 1 : smhi['PreciCateg'] = '1'
elif smhi['Temperature'] < -2 and smhi['Precipitation'] > 1 : smhi['Temperature'] = '2'
elif smhi['Temperature'] < 30 and smhi['Precipitation'] >= 1 : smhi['PreciCateg'] = '3'
elif smhi['Temperature'] < 20 and smhi['Precipitation'] < 1 : smhi['Temperature'] = '4'
elif smhi['Temperature'] < 5 and smhi['Precipitation'] > 0.5 : smhi['PreciCateg'] = '5'
elif smhi['Temperature'] < 5 and smhi['Precipitation'] > 0.2 : smhi['Temperature'] = '6'
else: smhi['PreciCateg'] = '0'
smhi['PreciCateg'] = smhi.apply(func)
I get:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think I messed up the logic for categorisation!?
CodePudding user response:
Use numpy.select
:
import numpy as np
conditions = [smhi["Temperature"].lt(-8) & smhi["Precipitation"].gt(1),
smhi["Temperature"].lt(-2) & smhi["Precipitation"].gt(1),
smhi["Temperature"].lt(30) & smhi["Precipitation"].ge(1),
smhi["Temperature"].lt(20) & smhi["Precipitation"].lt(1),
smhi["Temperature"].lt(5) & smhi["Precipitation"].gt(0.5),
smhi["Temperature"].lt(5) & smhi["Precipitation"].gt(0.2)]
smhi["PreciCateg"] = np.select(conditions, [1,2,3,4,5,6], 0)
>>> smhi
Temperature Precipitation Wind Speed timestamp PreciCateg
0 -1.33 0.17 2.61 2017-1-1 0:00:00 4
1 -1.93 0.07 2.06 2017-1-1 1:00:00 4
2 -2.39 0.02 1.98 2017-1-1 2:00:00 4
3 -2.57 0.01 2.24 2017-1-1 3:00:00 4
4 -3.23 0.00 2.18 2017-1-1 4:00:00 4