Home > Back-end >  Return first value change with respect to conditioned rows per group
Return first value change with respect to conditioned rows per group

Time:11-23

I have the following dataframe called df (dput below):

   group                date indicator value
1      A 2022-11-01 01:00:00     FALSE     2
2      A 2022-11-01 03:00:00     FALSE     1
3      A 2022-11-01 04:00:00     FALSE     2
4      A 2022-11-01 05:00:00     FALSE     1
5      A 2022-11-01 06:00:00      TRUE     1
6      A 2022-11-01 07:00:00     FALSE     1
7      A 2022-11-01 10:00:00     FALSE     2
8      A 2022-11-01 12:00:00     FALSE     1
9      B 2022-11-01 01:00:00     FALSE     1
10     B 2022-11-01 02:00:00     FALSE     2
11     B 2022-11-01 03:00:00     FALSE     1
12     B 2022-11-01 06:00:00      TRUE     1
13     B 2022-11-01 07:00:00     FALSE     1
14     B 2022-11-01 08:00:00     FALSE     1
15     B 2022-11-01 11:00:00     FALSE     2
16     B 2022-11-01 13:00:00     FALSE     2

I would like to find the first rows that have a value change after with respect to the rows with indicator == TRUE per group. This means that it should find row 7 for group A and row 15 for group B because they are both the first rows that have a value change after and with respect to the conditioned rows. Here is the desired output called df_desired:

  group                date indicator value
1     A 2022-11-01 06:00:00      TRUE     1
2     A 2022-11-01 10:00:00     FALSE     2
3     B 2022-11-01 06:00:00      TRUE     1
4     B 2022-11-01 11:00:00     FALSE     2

So I was wondering if anyone knows how to find the desired rows with conditioned rows like in df_desired?


Here dput of df and df_desired:

df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B"), date = structure(c(1667260800, 
1667268000, 1667271600, 1667275200, 1667278800, 1667282400, 1667293200, 
1667300400, 1667260800, 1667264400, 1667268000, 1667278800, 1667282400, 
1667286000, 1667296800, 1667304000), class = c("POSIXct", "POSIXt"
), tzone = ""), indicator = c(FALSE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
FALSE, FALSE), value = c(2, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 
1, 1, 2, 2)), row.names = c(NA, -16L), class = "data.frame")

df_desired <- structure(list(group = c("A", "A", "B", "B"), date = c("2022-11-01 06:00:00", 
"2022-11-01 10:00:00", "2022-11-01 06:00:00", "2022-11-01 11:00:00"
), indicator = c(TRUE, FALSE, TRUE, FALSE), value = c(1, 2, 1, 
2)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Here is one way to do it,

library(dplyr)

df %>% 
 group_by(group) %>% 
 mutate(val_diff = value - lag(value), 
        new = row_number()[indicator], 
        new1 = (val_diff == 1) & (row_number() > new)) %>% 
 filter(indicator|new1) %>% 
 select(-c(val_diff, new, new1))

# A tibble: 4 × 4
# Groups:   group [2]
  group date                indicator value
  <chr> <dttm>              <lgl>     <dbl>
1 A     2022-11-01 08:00:00 TRUE          1
2 A     2022-11-01 12:00:00 FALSE         2
3 B     2022-11-01 08:00:00 TRUE          1
4 B     2022-11-01 13:00:00 FALSE         2

DATA USED from OP dput

  group                date indicator value
1      A 2022-11-01 03:00:00     FALSE     2
2      A 2022-11-01 05:00:00     FALSE     1
3      A 2022-11-01 06:00:00     FALSE     2
4      A 2022-11-01 07:00:00     FALSE     1
5      A 2022-11-01 08:00:00      TRUE     1
6      A 2022-11-01 09:00:00     FALSE     1
7      A 2022-11-01 12:00:00     FALSE     2
8      A 2022-11-01 14:00:00     FALSE     1
9      B 2022-11-01 03:00:00     FALSE     1
10     B 2022-11-01 04:00:00     FALSE     2
11     B 2022-11-01 05:00:00     FALSE     1
12     B 2022-11-01 08:00:00      TRUE     1
13     B 2022-11-01 09:00:00     FALSE     1
14     B 2022-11-01 10:00:00     FALSE     1
15     B 2022-11-01 13:00:00     FALSE     2
16     B 2022-11-01 15:00:00     FALSE     2

CodePudding user response:

Using by.

by(df, df$group, \(x) {
  u <- x[cumsum(x$indicator) > 0, ]
  u[c(1L, which.max(u$value != u$value[1L])), ]
}) |> do.call(what=rbind)
#      group                date indicator value
# A.5      A 2022-11-01 06:00:00      TRUE     1
# A.7      A 2022-11-01 10:00:00     FALSE     2
# B.12     B 2022-11-01 06:00:00      TRUE     1
# B.15     B 2022-11-01 11:00:00     FALSE     2
  • Related