Here's my toy dataframe:
df <- data.frame(
date = (1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
action =c("ID=1", "foo","bah", "error",
"ID=2", "foo","bah", "success",
"ID=3", "foo","bah", "error",
"ID=4", "foo","bah", "success",
"ID=5", "foo","bah", "success",
"ID=6", "foo","bah", "error",
"ID=7", "foo","bah", "error",
"ID=8", "foo","bah", "success",
"ID=9", "foo","bah", "success",
"ID=10", "foo","bah", "success"
)
)
I would like to process df
so that whenever an entry in the action
column is equal to "error", the preceding row containing "ID=" is returned together with the associated entry in the data column. So expected result would be:
date action
1 ID=1
2 ID=3
2 ID=6
2 ID=7
I tried using something along the lines of:
df %>%
filter(str_detect(action,"error")) %>%
slice(-4)
,but it's not quite there!
CodePudding user response:
With two filter
s:
library(dplyr)
df %>%
filter(action == "error" | grepl("ID", action)) %>%
filter(lead(action) == "error")
# date action
# 1 1 ID=1
# 2 2 ID=3
# 3 2 ID=6
# 4 2 ID=7
CodePudding user response:
If the original data frame keeps this standard 4 line repeating order then here is a single using just base R:
df[which(df$action=="error")-3, ]
date action
1 1 ID=1
9 2 ID=3
21 2 ID=6
25 2 ID=7
CodePudding user response:
We may use
library(dplyr)
df %>%
filter(lead(action, n = 3) == "error")
date action
1 1 ID=1
2 2 ID=3
3 2 ID=6
4 2 ID=7