I have a noob question here and I would greatly appreciate your help with it. I have a dataset which conceptually looks like this:
> df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14], col4=6:10,col5=11:15,col6=letters[14:10],col7=16:20)
> df
col1 col2 col3 col4 col5 col6 col7
1 a 1 j 6 11 n **16**
2 b 2 k 7 12 m ** 17**
3 c 3 l 8 13 ** l 18**
4 d 4 m **9 14 k 19**
5 a 5 n **10 15 j 20**
I now need to extract all row values from the columns following the column where "n", "m", and "13" are encountered. So the output string would contain 16, 17, l, 18, 9, 14, k, 19, 10, 15, j, and 20. Is there any smarter way for that than going through a loop?
Thank you in advance for your help!
CodePudding user response:
v = c(13, "m", "n")
unname(unlist(apply(df, 1, \(x) x[(which(x %in% v) 1):length(x)])))
#[1] "16" "17" "l" "18" " 9" "14" "k" "19" "10" "15" "j" "20"
If not all rows have one of v
, you can add an if
condition:
unname(unlist(apply(df, 1, \(x) if(any(x %in% v)) x[(which(x %in% v) 1):length(x)])))
CodePudding user response:
Here is one way
library(matrixStats)
t(df)[t(Reduce(`|`, lapply(c("m", "n", 13), \(x)
rowCumsums(rowCumsums(df == x)) > 1)))]
-output
[1] "16" "17" "l" "18" " 9" "14" "k" "19" "10" "15" "j" "20"
or with tidyverse
library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(last_col():1) %>%
mutate(across(everything(), ~ case_when(!.x %in% c("n", "m", 13) ~
.x))) %>%
unite(coln, everything(), sep = ",") %>%
mutate(coln = str_remove(coln, ",NA.*")) %>%
separate_rows(coln) %>%
pull(coln)
[1] "16" "17" "18" "l" "19" "k" "14" "9" "20" "j" "15" "10"