Home > Blockchain >  Getting values from all columns after the column containing a specific string into a list in R
Getting values from all columns after the column containing a specific string into a list in R

Time:01-14

I have a noob question here and I would greatly appreciate your help with it. I have a dataset which conceptually looks like this:

> df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14], col4=6:10,col5=11:15,col6=letters[14:10],col7=16:20)
> df
  col1 col2 col3 col4 col5 col6 col7
1    a    1    j    6   11    n   **16**
2    b    2    k    7   12    m  ** 17**
3    c    3    l    8   13  **  l   18**
4    d    4    m    **9   14    k   19**
5    a    5    n   **10   15    j   20**

I now need to extract all row values from the columns following the column where "n", "m", and "13" are encountered. So the output string would contain 16, 17, l, 18, 9, 14, k, 19, 10, 15, j, and 20. Is there any smarter way for that than going through a loop?

Thank you in advance for your help!

CodePudding user response:

v = c(13, "m", "n")
unname(unlist(apply(df, 1, \(x) x[(which(x %in% v)   1):length(x)])))
#[1] "16" "17" "l"  "18" " 9" "14" "k"  "19" "10" "15" "j"  "20"

If not all rows have one of v, you can add an if condition:

unname(unlist(apply(df, 1, \(x) if(any(x %in% v)) x[(which(x %in% v)   1):length(x)])))

CodePudding user response:

Here is one way

library(matrixStats)
t(df)[t(Reduce(`|`, lapply(c("m", "n", 13), \(x) 
     rowCumsums(rowCumsums(df == x)) > 1)))]

-output

[1] "16" "17" "l"  "18" " 9" "14" "k"  "19" "10" "15" "j"  "20"

or with tidyverse

library(dplyr)
library(stringr)
library(tidyr)
df %>% 
  select(last_col():1) %>%
  mutate(across(everything(), ~ case_when(!.x %in% c("n", "m", 13) ~ 
     .x))) %>% 
  unite(coln, everything(), sep = ",") %>%
  mutate(coln = str_remove(coln, ",NA.*")) %>% 
  separate_rows(coln) %>% 
  pull(coln)
[1] "16" "17" "18" "l"  "19" "k"  "14" "9"  "20" "j"  "15" "10"
  • Related