Create binary variable based on length of group in another column-CodePudding

I need to create a binary variable called flow.type based on the length of a variable called 'cluster'. If the length of a cluster is 1 then flow.type should be "0", and if >1 then flow.type should be "1". I have put an example of my data in the image but please let me know if there's a way to do attach data to my question and i will do that asap. I've tried the code below but it doesnt work for some reason, is there something i am doing wrong? thanks in advance.

# to determine flow type from the clustered groups, use binary version of if/else statement
# flow.type 1 = 'event'
# flow.type 0 = 'non-event'
y <- y %>% 
  group_by(cluster) %>% 
  mutate(flow.type = case_when(length(cluster)>1 ~ "1",
                                    TRUE ~ "0")) %>% ungroup()

Here is a sample of the data by dput

structure(list(Station = c("1051017", "1051017", "1051017", "1051017", 
"1051017", "1051017", "1051017", "1051017", "1051017", "1051017", 
"1051017", "1051017", "1051017", "1051017", "1051017", "1051021", 
"1051021", "1051021", "1051021", "1051021"), Site.Name = c("Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Carroll's Crossing", "Laura River at Carroll's Crossing", 
"Laura River at Broken Dam Station", "Laura River at Broken Dam Station", 
"Laura River at Broken Dam Station", "Laura River at Broken Dam Station", 
"Laura River at Broken Dam Station"), Date.Time = c("20/10/2017 7:45", 
"24/10/2017 10:57", "27/12/2019 9:15", "16/01/2020 9:32", "15/04/2020 9:45", 
"12/05/2020 14:30", "17/06/2020 15:55", "11/09/2020 9:16", "12/01/2021 19:44", 
"13/01/2021 12:00", "27/01/2021 15:59", "27/01/2021 16:29", "27/01/2021 17:00", 
"19/02/2021 9:30", "17/01/2022 10:17", "27/12/2019 8:10", "31/12/2019 8:30", 
"21/01/2020 14:25", "21/01/2020 14:47", "14/05/2020 15:15"), 
    Date = structure(c(17459, 17463, 18257, 18277, 18367, 18394, 
    18430, 18516, 18639, 18640, 18654, 18654, 18654, 18677, 19009, 
    18257, 18261, 18282, 18282, 18396), class = "Date"), Datedigit = c(17459745, 
    174631057, 18257915, 18277932, 18367945, 183941430, 184301555, 
    18516916, 186391944, 186401200, 186541559, 186541629, 186541700, 
    18677930, 190091017, 18257810, 18261830, 182821425, 182821447, 
    183961515), Sampling.Year = structure(c(3L, 3L, 5L, 5L, 5L, 
    5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 5L, 5L, 5L, 5L, 5L
    ), .Label = c("2015-2016", "2016-2017", "2017-2018", "2018-2019", 
    "2019-2020", "2020-2021", "2021-2022", "2022-2023"), class = "factor"), 
    Season = c("Dry", "Dry", "Wet", "Wet", "Wet", "Dry", "Dry", 
    "Dry", "Wet", "Wet", "Wet", "Wet", "Wet", "Wet", "Wet", "Wet", 
    "Wet", "Wet", "Wet", "Dry"), cluster = c(56, 57, 58, 59, 
    60, 61, 62, 63, 64, 64, 65, 65, 65, 66, 67, 66, 67, 68, 68, 
    69)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

CodePudding user response：

It sounds like your flow.type should be 1 if the cluster value is repeated immediately before or after the given row.

One way to capture this would be to use dplyr::lag and dplyr::lead to compare to the preceding or following values. Note, those functions will output NA at the first or last (respectively) rows, so we can specify a default (here Infinity) to match against in those cases, resulting in a clean "0" at the bookends, absent adjacent matches.

y %>%
  mutate(flow.type = 1 * (lag(cluster, default = Inf) == cluster | 
                            lead(cluster, default = Inf) == cluster))