The following problem:
I have a measurement series with temperatures (see temp.tbl
).
temp.tbl <- data.frame(temp = c(13, 14, 13, 11, 10, 12,
16, 18, 16, 13, 10, 11,
12, 14, 12, 10, 8, 7, 5)
)
Each time the temperature in the temp
column falls below a certain threshold (here: 12), I want to check how many subsequent temperature values remain within a temperature range (here: 9 - 15):
- The first value in the
temp
column that falls below the threshold (12) and all subsequent values (within the range 9 - 15) will receive the value 1 in a new column (grp
). - The value that leaves the range for the first time above or below it also gets the value 1 in the
grp
column. In other words, these values are "combined" and assigned to group 1. - If the temperature falls below the threshold value (12) at a later time for the second time (after having left the range before), this value and the temperatures within the range get the value 2 in the
grp
column. These values are assigned to group 2. - Each subsequent fall below the threshold and the subsequent "range values" have been "grouped" according to their occurrence (group 3, group 4, etc.).
- All other rows receive the value 0 in the
grp
column.
The solution would look something like this (see temp_solution.tbl
):
temp_solution.tbl <- data.frame(temp = c(13, 14, 13, 11, 10, 12, 16, 18, 16, 13, 10, 11, 12, 14, 12, 10, 8, 7, 5),
id = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0),
onOff = c("off", "off", "off", "on", "on", "on", "on", "off", "off", "off", "on",
"on", "on", "on", "on", "on", "on", "off", "off"),
startEnd = c("off", "off", "off", "start", "on", "on", "end", "off", "off", "off", "start",
"on", "on", "on", "on", "on", "end", "off", "off")
)
temp_solution.tbl
temp id onOff startEnd
1 13 0 off off
2 14 0 off off
3 13 0 off off
4 11 1 on start # temp below threshold
5 10 1 on on # temp within limits
6 12 1 on on # temp within limits
7 16 1 on end # first temp outside limits
8 18 0 off off
9 16 0 off off
10 13 0 off off
11 10 2 on start # temp below threshold
12 11 2 on on # temp within limits
13 12 2 on on
14 14 2 on on
15 12 2 on on
16 10 2 on on
17 8 2 on end # first temp outside limits
18 7 0 off off
19 5 0 off off
- The values in column
temp
of rows 1 - 3 are all above the threshold (12) and get the value 0 in columngrp
. - The value in column temp in row 4 falls below the threshold for the first time: a 1 is inserted in column
grp
. - The temperature values in rows 5 - 6 are within the range and are assigned to group 1. The value in row 7 leaves the range for the first time (it exceeds the range) and also receives a 1 in the
grp
column. - Rows 8 - 10 are above the threshold value and do not fall below it:
grp
= 0. - Line 11 falls below the threshold value for the second time:
grp
= 2. - Lines 12 - 16 are within the range after falling below the threshold value:
grp
= 2. - Line 17 leaves the range for the first time (downwards) and also receives
grp
= 2. - Lines 18 - 19:
grp
= 0
The columns onOff
and startEnd
are only for illustration.
The starting point can be determined e.g. with: temp < 12 & lag(temp > 12)
.
The range again e.g. with: between(temp, 9, 15)
.
temp.tbl %>%
mutate(
start = temp < 12 & lag(temp > 12),
range = between(temp, 9, 15)
)
But if a start point was "triggered" I have problems to connect this start point with the following range condition to get the values to a coherent "measurement series" (switch is ON
).
In other words, if the value in the start
column changes from FALSE
to TRUE
, the switch is set to ON
. As long as the subsequent values in the range
column are TRUE
, the switch remains ON
; if the value in range
changes to FALSE
, the switch flips to OF
. All subsequent ON
s form a group and are numbered in ascending order.
CodePudding user response:
Here I use data.table
functions, but it should be straightforward to translate it to base
or dplyr
.
Check if the current value is below threshold (temp < 12
) and (&
) the preceeding value is above threshold (shift(temp) > 12
). Create a grouping variable ('id') by using cumsum
on the logical result.
Check if values are between
the lower
and upper
limits ('rng').
Within each group (by = id
):
To include also the first value that is outside the limits, check if current 'rng' is FALSE
(!rng
) and the preceeding value is TRUE
(shift(rng)
).
To handles cases where values have been outside the limits and then return back within limits (e.g. row 10), use cummin
on the logical 'rng'. Thus, once a value has become FALSE
, it stays FALSE
. Coerce result to logical.
Check if either (|
) of the two tests is TRUE
and multiply with 'id'. The FALSE
rows will become zero.
library(data.table)
d = as.data.table(tmp.tbl)
d[ , id := cumsum(temp < 12 & shift(temp) > 12)]
d[ , rng := between(temp, lower = 9, upper = 15)]
d[ , id := id * ((!rng & shift(rng)) | as.logical(cummin(rng))), by = id]
temp id rng
<num> <int> <lgcl>
1: 13 0 TRUE
2: 14 0 TRUE
3: 13 0 TRUE
4: 11 1 TRUE
5: 10 1 TRUE
6: 12 1 TRUE
7: 16 1 FALSE
8: 18 0 FALSE
9: 16 0 FALSE
10: 13 0 TRUE
11: 10 2 TRUE
12: 11 2 TRUE
13: 12 2 TRUE
14: 14 2 TRUE
15: 12 2 TRUE
16: 10 2 TRUE
17: 8 2 FALSE
18: 7 0 FALSE
19: 5 0 FALSE