In this dataframe:
df <- data.frame(
comp = c("pre",rep("story",4), rep("x",2), rep("story",3)),
hbr = c(101:110)
)
let's say I need to compute the mean for hbr
subsetted to the first stretch where comp=="story"
, how would I do that more efficiently than this way, which seems bulky and longwinded and requires that I specify the grp
I want to compute the mean for manually:
library(dplyr)
library(data.table)
df %>%
mutate(grp = rleid(comp)) %>%
summarise(M = mean(hbr[grp==2]))
M
1 103.5
CodePudding user response:
In base R, you can select the desired rows using cumsum
and diff
, and then choosing which group you need (here it's the first, so 1), and then compute the mean on those rows. With this option, you don't need to get the group you need manually and you don't require any additional packages.
idx <- which(df$comp == "story")
first <- idx[cumsum(c(1, diff(idx) != 1)) == 1]
#[1] 2 3 4 5
mean(df$hbr[first])
#[1] 103.5
CodePudding user response:
I'm not sure if this is any better, but at least you only need to specify that you want the first run of 'story':
df %>%
mutate(grp = ifelse(comp == 'story', rleid(comp), NA)) %>%
filter(grp == min(grp, na.rm = TRUE)) %>%
summarise(M = mean(hbr))
#> M
#> 1 103.5