Given the following DataFrame in pandas:
avg_time_1 | avg_time_2 | avg_time_3 |
---|---|---|
1200 | 34 | 1 |
90 | 45 | 3600 |
0 | 4 | 1 |
0 | 4 | 50 |
80 | 4 | 60 |
82 | 40 | 65 |
I want to get a new DataFrame from the previous one, such that it assigns the following code to each row if any of the three columns visit_time, exceeds the following values:
- CODE-1: All values are less than 5.
- CODE-2: Some value is between 5 and 100.
- CODE-3: All values are between 5 and 100.
- CODE-4: Some value is higher than 1000.
Applying the function, we will obtain the following DataFrame.
avg_time_1 | avg_time_2 | avg_time_3 | codes |
---|---|---|---|
1200 | 34 | 1 | 4 |
90 | 45 | 3600 | 4 |
0 | 4 | 1 | 1 |
0 | 4 | 50 | 2 |
80 | 4 | 60 | 2 |
82 | 40 | 65 | 3 |
Thank you for your response in advance.
CodePudding user response:
You can try np.select
, note that you should put the higher priority condition ahead.
df['codes'] = np.select(
[df.lt(5).all(1), df.gt(1000).any(1),
df.apply(lambda col: col.between(5, 100)).all(1),
df.apply(lambda col: col.between(5, 100)).any(1)],
[1, 4, 3, 2],
default=0
)
print(df)
avg_time_1 avg_time_2 avg_time_3 codes
0 1200 34 1 4
1 90 45 3600 4
2 0 4 1 1
3 0 4 50 2
4 80 4 60 2
5 82 40 65 3