Home > Back-end >  R efficient way to translate string pattern in all tibble cells with col and row number
R efficient way to translate string pattern in all tibble cells with col and row number

Time:07-19

let's say I have a dataframe like this:

first <- tibble(
  a = c("XXX","YYY","BBB","CCC","AAA","EEE"),
  b = c("RRR","AAA","GGG","BBB","LLL","BBB"),
  c = c("AAA","ZZZ","NNN","MMM","AAA","QQQ"),
  d = c("VVV","YYY","AAA","CCC","AAA","EEE")
)

and that I need to replace all AAA in any cell with a combination of its row and column number

same, but with some different text for all cells equal to BBB

in dataframe I would tackle this in the most inefficient way possible, just looping through every single cell

for(rows in 1:nrow(first)){
  for(cols in 1:ncol(first)){  
    if (first[rows,cols]=="AAA"){ first[rows,cols]<-paste0("R",rows,"_",cols)   }
    if (first[rows,cols]=="BBB"){ first[rows,cols]<-paste0("F",rows,"_",cols)   }    
  }  
}

it works great but of course it is very time consuming, I am looking maybe to a solution with one of the map functions, but I am failing to retrieve the column name or column number.

first try: mutate_if, here I can't get col or row numbers

second <- first %>%
  rowwise() %>%
  mutate_if(
    is.character,
    str_replace_all, 
    pattern="AAA", 
    replacement=paste0("R","how to get col/row?")
  )

then I tried map and map_if, with the same issue

Anybody with some suggestions on how to proceed?

maybe a lambda like function? not sure.

followup: I ended up in a x-y problem so I am doing a step back and explaining better why I need this

with a solution below I was able to implement:

second <- first %>%
  mutate(across(everything(), 
      ~ case_when(
        .=="AAA" ~ as.character(checkboxInput(paste0("R",match(cur_column(), names(cur_data())),"_",row_number()),label="",value=TRUE)),
        TRUE ~ .
      )
    )
  )
second  

that works great, but row_number() returns the full list instead of just the actual row number.

rowwise() fixes this, but makes it take more time than with base SAS, any solution feasible on the tidyverse world?

CodePudding user response:

  a     b     c     d    
  <chr> <chr> <chr> <chr>
1 XXX   RRR   AAA   VVV  
2 YYY   AAA   ZZZ   YYY  
3 BBB   GGG   NNN   AAA  
4 CCC   BBB   MMM   CCC  
5 AAA   LLL   AAA   AAA  
6 EEE   BBB   QQQ   EEE  

library(data.table)
setDT(first)
str= "AAA"
first[first == str] <- apply(which(first == str, arr.ind = TRUE), 1, paste, collapse = ",")

   a   b   c   d
1: XXX RRR 1,3 VVV
2: YYY 2,2 ZZZ YYY
3: BBB GGG NNN 3,4
4: CCC BBB MMM CCC
5: 5,1 LLL 5,3 5,4
6: EEE BBB QQQ EEE

CodePudding user response:

With dplyr, you can do:

first %>%
 mutate(across(everything(), 
               ~ if_else(. == "AAA",
                         paste(row_number(), match(cur_column(), names(cur_data())), sep = ","),
                         .)))

  a     b     c     d    
  <chr> <chr> <chr> <chr>
1 XXX   RRR   1,3   VVV  
2 YYY   2,2   ZZZ   YYY  
3 BBB   GGG   NNN   3,4  
4 CCC   BBB   MMM   CCC  
5 5,1   LLL   5,3   5,4  
6 EEE   BBB   QQQ   EEE

CodePudding user response:

We may use base R as

i1 <- which(first == "AAA", arr.ind = TRUE)
dat <- as.data.frame(first)
dat[i1] <- do.call(paste, c(as.data.frame(i1), sep =",") )

-output

> dat
    a   b   c   d
1 XXX RRR 1,3 VVV
2 YYY 2,2 ZZZ YYY
3 BBB GGG NNN 3,4
4 CCC BBB MMM CCC
5 5,1 LLL 5,3 5,4
6 EEE BBB QQQ EEE
  • Related