I am trying to construct a boolean for the following question:
For a given group
did cond
change values more than X times? Here is some sample data:
df <- rbind(
data.frame(
cond = c("cond1", "cond2", "cond1", "cond2", "cond3"),
day = 1:5,
group = "group1"
),
data.frame(
cond = c("cond1", "cond1", "cond1", "cond1", "cond2"),
day = 1:5,
group = "group2"
)
)
df
#> cond day group
#> 1 cond1 1 group1
#> 2 cond2 2 group1
#> 3 cond1 3 group1
#> 4 cond2 4 group1
#> 5 cond3 5 group1
#> 6 cond1 1 group2
#> 7 cond1 2 group2
#> 8 cond1 3 group2
#> 9 cond1 4 group2
#> 10 cond2 5 group2
The ordering is relevant here hence the day
variable. But I am just trying to figure out how to detect when cond
changes alot and then return those rows. Ideally will be fitting this into a group_by
then filter
idiom but really I'm not sure how to construct the boolean.
CodePudding user response:
You may create a "change"
variable using ave
, then subset
on some value of it.
transform(df, change=ave(as.numeric(gsub('\\D', '', cond)), group, FUN=\(x) sum(diff(x) != 0))) |>
subset(change < 4)
# cond day group change
# 6 cond1 1 group2 1
# 7 cond1 2 group2 1
# 8 cond1 3 group2 1
# 9 cond1 4 group2 1
# 10 cond2 5 group2 1
CodePudding user response:
I am not sure if this is meant. But if you want to know now many times cond changes within each group, we could do:
library(dplyr)
df %>%
group_by(group) %>%
mutate(change = max(cumsum(cond != lag(cond, def=first(cond)))))
cond day group change
<chr> <int> <chr> <int>
1 cond1 1 group1 4
2 cond2 2 group1 4
3 cond1 3 group1 4
4 cond2 4 group1 4
5 cond3 5 group1 4
6 cond1 1 group2 1
7 cond1 2 group2 1
8 cond1 3 group2 1
9 cond1 4 group2 1
10 cond2 5 group2 1
CodePudding user response:
We may create the logic by comparing the current with previous values after grouping
library(dplyr)
n <- 3
df %>%
group_by(group) %>%
filter(sum(cond != lag(cond), na.rm = TRUE) >= n) %>%
ungroup
-output
# A tibble: 5 × 3
cond day group
<chr> <int> <chr>
1 cond1 1 group1
2 cond2 2 group1
3 cond1 3 group1
4 cond2 4 group1
5 cond3 5 group1
Or if it is not just adjacent element, then use rle
df %>%
group_by(group) %>%
filter(length(rle(cond)$lengths) > n) %>%
ungroup
Or using data.table
library(data.table)
setDT(df)[, if(uniqueN(rleid(cond)) > n) .SD,.(group)]