I have the following dataframe called df (dput
below):
group date indicator value
1 A 2022-11-01 01:00:00 FALSE 2
2 A 2022-11-01 03:00:00 FALSE 1
3 A 2022-11-01 04:00:00 FALSE 2
4 A 2022-11-01 05:00:00 FALSE 1
5 A 2022-11-01 06:00:00 TRUE 1
6 A 2022-11-01 07:00:00 FALSE 1
7 A 2022-11-01 10:00:00 FALSE 2
8 A 2022-11-01 12:00:00 FALSE 1
9 B 2022-11-01 01:00:00 FALSE 1
10 B 2022-11-01 02:00:00 FALSE 2
11 B 2022-11-01 03:00:00 FALSE 1
12 B 2022-11-01 06:00:00 TRUE 1
13 B 2022-11-01 07:00:00 FALSE 1
14 B 2022-11-01 08:00:00 FALSE 1
15 B 2022-11-01 11:00:00 FALSE 2
16 B 2022-11-01 13:00:00 FALSE 2
I would like to find the first rows that have a value change after with respect to the rows with indicator == TRUE
per group. This means that it should find row 7 for group A and row 15 for group B because they are both the first rows that have a value change after and with respect to the conditioned rows. Here is the desired output called df_desired:
group date indicator value
1 A 2022-11-01 06:00:00 TRUE 1
2 A 2022-11-01 10:00:00 FALSE 2
3 B 2022-11-01 06:00:00 TRUE 1
4 B 2022-11-01 11:00:00 FALSE 2
So I was wondering if anyone knows how to find the desired rows with conditioned rows like in df_desired?
Here dput
of df and df_desired:
df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B"), date = structure(c(1667260800,
1667268000, 1667271600, 1667275200, 1667278800, 1667282400, 1667293200,
1667300400, 1667260800, 1667264400, 1667268000, 1667278800, 1667282400,
1667286000, 1667296800, 1667304000), class = c("POSIXct", "POSIXt"
), tzone = ""), indicator = c(FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE), value = c(2, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1,
1, 1, 2, 2)), row.names = c(NA, -16L), class = "data.frame")
df_desired <- structure(list(group = c("A", "A", "B", "B"), date = c("2022-11-01 06:00:00",
"2022-11-01 10:00:00", "2022-11-01 06:00:00", "2022-11-01 11:00:00"
), indicator = c(TRUE, FALSE, TRUE, FALSE), value = c(1, 2, 1,
2)), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
Here is one way to do it,
library(dplyr)
df %>%
group_by(group) %>%
mutate(val_diff = value - lag(value),
new = row_number()[indicator],
new1 = (val_diff == 1) & (row_number() > new)) %>%
filter(indicator|new1) %>%
select(-c(val_diff, new, new1))
# A tibble: 4 × 4
# Groups: group [2]
group date indicator value
<chr> <dttm> <lgl> <dbl>
1 A 2022-11-01 08:00:00 TRUE 1
2 A 2022-11-01 12:00:00 FALSE 2
3 B 2022-11-01 08:00:00 TRUE 1
4 B 2022-11-01 13:00:00 FALSE 2
DATA USED from OP dput
group date indicator value
1 A 2022-11-01 03:00:00 FALSE 2
2 A 2022-11-01 05:00:00 FALSE 1
3 A 2022-11-01 06:00:00 FALSE 2
4 A 2022-11-01 07:00:00 FALSE 1
5 A 2022-11-01 08:00:00 TRUE 1
6 A 2022-11-01 09:00:00 FALSE 1
7 A 2022-11-01 12:00:00 FALSE 2
8 A 2022-11-01 14:00:00 FALSE 1
9 B 2022-11-01 03:00:00 FALSE 1
10 B 2022-11-01 04:00:00 FALSE 2
11 B 2022-11-01 05:00:00 FALSE 1
12 B 2022-11-01 08:00:00 TRUE 1
13 B 2022-11-01 09:00:00 FALSE 1
14 B 2022-11-01 10:00:00 FALSE 1
15 B 2022-11-01 13:00:00 FALSE 2
16 B 2022-11-01 15:00:00 FALSE 2
CodePudding user response:
Using by
.
by(df, df$group, \(x) {
u <- x[cumsum(x$indicator) > 0, ]
u[c(1L, which.max(u$value != u$value[1L])), ]
}) |> do.call(what=rbind)
# group date indicator value
# A.5 A 2022-11-01 06:00:00 TRUE 1
# A.7 A 2022-11-01 10:00:00 FALSE 2
# B.12 B 2022-11-01 06:00:00 TRUE 1
# B.15 B 2022-11-01 11:00:00 FALSE 2