I just want to select all the rows (including the row itself) of my data frame between the value "openwebsite" and "closewebsite" (see variable "activity"). Do I need to use the select- or filter-function?
Thank you a lot!
Dataframe:
Person | activity | duration |
---|---|---|
1 | write | 9 |
1 | openwebsite | 8 |
1 | paint | 9 |
1 | write | 2 |
1 | write | 4 |
1 | closewebsite | 9 |
1 | write | 4 |
Output
Person | activity | duration |
---|---|---|
1 | openwebsite | 8 |
1 | paint | 9 |
1 | write | 2 |
1 | write | 4 |
1 | closewebsite | 9 |
CodePudding user response:
start_row <- (1:nrow(df))[df$activity == "openwebsite"]
end_row <- (1:nrow(df))[df$activity == "closewebsite"]
df[start_row:end_row,]
Person activity duration
2 1 openwebsite 8
3 1 paint 9
4 1 write 2
5 1 write 4
6 1 closewebsite 9
You can also get the start and end row number with grep
, e.g.
grep("openwebsite", df$activity)
CodePudding user response:
You may try
library(dplyr)
df %>%
filter(1 == cumsum((activity == "openwebsite") -
lag(activity == "closewebsite", default = 0)))
Person activity duration
1 1 openwebsite 8
2 1 paint 9
3 1 write 2
4 1 write 4
5 1 closewebsite 9
or
df %>%
filter(1 <= cumsum(activity == "openwebsite"),
lag(cumsum(activity == "closewebsite")) < 1)