Home > Blockchain >  Slicing subsets of rows within a dataframe
Slicing subsets of rows within a dataframe

Time:10-28

Here's my toy dataframe:

df <- data.frame(
  date = (1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
  action =c("ID=1", "foo","bah", "error",
            "ID=2", "foo","bah", "success",
            "ID=3", "foo","bah", "error",
            "ID=4", "foo","bah", "success",
            "ID=5", "foo","bah", "success",
            "ID=6", "foo","bah", "error",
            "ID=7", "foo","bah", "error",
            "ID=8", "foo","bah", "success",
            "ID=9", "foo","bah", "success",
            "ID=10", "foo","bah", "success"
            )
)

I would like to process df so that whenever an entry in the action column is equal to "error", the preceding row containing "ID=" is returned together with the associated entry in the data column. So expected result would be:

date action
1    ID=1
2    ID=3
2    ID=6
2    ID=7

I tried using something along the lines of:

df %>% 
  filter(str_detect(action,"error")) %>%
  slice(-4)

,but it's not quite there!

CodePudding user response:

With two filters:

library(dplyr)
df %>% 
  filter(action == "error" | grepl("ID", action)) %>% 
  filter(lead(action) == "error")

#   date action
# 1    1   ID=1
# 2    2   ID=3
# 3    2   ID=6
# 4    2   ID=7

CodePudding user response:

If the original data frame keeps this standard 4 line repeating order then here is a single using just base R:

df[which(df$action=="error")-3, ]

   date action
1     1   ID=1
9     2   ID=3
21    2   ID=6
25    2   ID=7

CodePudding user response:

We may use

library(dplyr)
df %>% 
   filter(lead(action, n = 3) == "error")
  date action
1    1   ID=1
2    2   ID=3
3    2   ID=6
4    2   ID=7
  • Related