I am looking for a way to get all rows of an (ordered) R dataframe that contain a specific text phrase and then always the row afterward.
E.g.:
A 1 TEXT 2 ABC 3 CCC 4 TEXT 5 AAA 6 GGG
In this example, I would want all the rows with A=="TEXT" and then also the value of the next row, no matter what text is there (so in this case one time "ABC" and one time "AAA"). Is there a way to do this in R?
Thanks!
CodePudding user response:
With dplyr you could check a lag()
value in filter:
library(dplyr)
df <- data.frame(id = 1:6, A = c("TEXT", "ABC", "CCC", "TEXT", "AAA", "GGG"))
# dplyr:
df %>% filter(A == "TEXT" | lag(A) == "TEXT")
#> id A
#> 1 1 TEXT
#> 2 2 ABC
#> 3 4 TEXT
#> 4 5 AAA
With base R you could add 1 to the vector of TEXT locations:
text_idx <- grep("TEXT", df$A, fixed = T)
# "TEXT" locations:
text_idx
#> [1] 1 4
# "TEXT" leading, unsorted:
c(text_idx, text_idx 1)
#> [1] 1 4 2 5
df[sort(c(text_idx, text_idx 1)),]
#> id A
#> 1 1 TEXT
#> 2 2 ABC
#> 4 4 TEXT
#> 5 5 AAA
Created on 2023-02-01 with reprex v2.0.2