Home > OS >  How to enable regmatches to work in dplyr's mutate
How to enable regmatches to work in dplyr's mutate

Time:11-11

I have the following function, which basically replace the ? with replacement string bb_seq.

library(tidyverse)
replace_bb_with_str <- function (seed_pattern = NULL, bb_seq = NULL) {

  sp <- seed_pattern
  gr   <- gregexpr("\\? ", sp)
  csml <- lapply(gr, function(sp) cumsum(attr(sp, "match.length")))
  regmatches(sp, gr) <- lapply(csml, function(sp) substring(bb_seq, c(1, sp[1]), sp))
  sp
  
}

It works well with single run:

plist <- c(
  "??????????DRHRTRHLAK??????????",
  "????????????????????TRCYHIDPHH",
  "FKDHKHIDVK????????????????????TRCYHIDPHH",
  "FKDHKHIDVK????????????????????"
)

replace_bb_with_str(seed_pattern = plist[1], bb_seq =  "ndqeegillkkkkfpssyvv")
# [1] "ndqeegillkDRHRTRHLAKkkkkfpssyvv"

But when I run it with dplyr::mutate :

expand.grid(seed_pattern = plist, bb_seq =  "ndqeegillkkkkfpssyvv") %>%
  rowwise() %>%
  mutate(nseq = replace_bb_with_str(seed_pattern = seed_pattern, bb_seq = bb_seq)) 

I got this error:

Error in `mutate()`:
! Problem while computing `nseq = replace_bb_with_str(seed_pattern =
  seed_pattern, bb_seq = bb_seq)`.
ℹ The error occurred in row 1.
Caused by error in `nchar()`:
! 'nchar()' requires a character vector

How can I resolve this issue?

CodePudding user response:

expand.grid() coerces character vectors to factors, which don’t play nicely with your function. tidyr::expand_grid() preserves input types, so your function works fine:

library(tidyr)

expand_grid(seed_pattern = plist, bb_seq =  "ndqeegillkkkkfpssyvv") %>% 
  rowwise() %>%
  mutate(nseq = replace_bb_with_str(seed_pattern = seed_pattern, bb_seq = bb_seq)) 
# A tibble: 4 × 3
# Rowwise: 
  seed_pattern                             bb_seq               nseq            
  <chr>                                    <chr>                <chr>           
1 ??????????DRHRTRHLAK??????????           ndqeegillkkkkfpssyvv ndqeegillkDRHRT…
2 ????????????????????TRCYHIDPHH           ndqeegillkkkkfpssyvv ndqeegillkkkkfp…
3 FKDHKHIDVK????????????????????TRCYHIDPHH ndqeegillkkkkfpssyvv FKDHKHIDVKndqee…
4 FKDHKHIDVK????????????????????           ndqeegillkkkkfpssyvv FKDHKHIDVKndqee

Note that at least with your example data, there’s actually no need to use expand_grid() (instead of data.frame() or tibble()). Or rowwise() — you’d get the same output without it.

  • Related