I have problem with my dataset.
Let's presume my dataset looks like this
timestamp | zone
2022-06-01 05:00:06 | yellow
2022-06-01 05:01:07 | yellow
2022-06-01 05:02:10 | yellow
2022-06-01 05:03:05 | yellow
2022-06-01 05:07:04 | yellow
2022-06-01 05:10:05 | orange
2022-06-01 05:11:05 | orange
2022-06-01 05:12:05 | orange
2022-06-01 05:16:04 | orange
2022-06-01 05:17:04 | orange
timestamp
column is the index
The yellow and orange zones
represent a calculated zone.
Condition: A zone change can only happen if the previous zone has been set for at least X minutes (let's presume its 15 minutes for this example)
Excepted result:
timestamp | zone
2022-06-01 05:00:06 | yellow
2022-06-01 05:01:07 | yellow
2022-06-01 05:02:10 | yellow
2022-06-01 05:03:05 | yellow
2022-06-01 05:07:04 | yellow
2022-06-01 05:10:05 | yellow
2022-06-01 05:11:05 | yellow
2022-06-01 05:12:05 | yellow
2022-06-01 05:16:04 | yellow
2022-06-01 05:17:04 | orange
Because the yellow zone was set from 05:00:06
. This means that for at least 15 minutes the yellow zone must be set, without taking into account the previously performed zone calculation. This means that the yellow zone must be set until 05:16:04
. From then on, the zone can be set orange.
I believe there are two ways to do this. One is to check the elapsed time while the zone is being calculated, the other is to change the zone after it has been calculated.
My priority is performance as I plan to use this data in a dashboard. The calculation of the zone is done with the np.select
method
Just imagine there are values that are being compared to thresholds
conditions = [
(df.value <= df.threshold_red), # red
(df.value> df.threshold_red) & (df.value<= df.threshold_orange), # orange
(df.value> df.threshold_orange) & (df.value<= df.threshold_green), # yellow
(df.value> df.threshold_green), # green
]
zones = ["red" ,"orange", "yellow", "green"]
df["zone"] = np.select(conditions, zones)
How am I able to this? I have used .apply()
with lambda
, but I am not able to get the result...
Thanks for your help in advance :)
Ocamond
CodePudding user response:
You can do this with a loop:
result = []
ts = pd.Timestamp(0)
assigned_zone = None
for timestamp, zone in df["zone"].items():
if timestamp - ts > pd.Timedelta(minutes=15) and assigned_zone != zone:
ts = timestamp
assigned_zone = zone
result.append(assigned_zone)