Home > Net >  Replace words that partially match over a list
Replace words that partially match over a list

Time:06-07

I have a list of names belonging the dataset mammalsleep, and I want to replace those names that have additional characters on the name.

For example:

pr_replace <- paste(c('log(brain)','I(body^2)'), collapse="|")
extract_replace <- paste(c('brain','body'), collapse="|")

We replace extract_replace for pr_replace.

I have tried two ways of doing this:

  lapply(per, function(dat)
    sapply(dat, function(x)
      str_replace(x, extract_replace, pr_replace)) %>% data.frame())

Would instead replace the values when found with

                X9
1             exposure               danger log(brain)|I(body^2)
2               danger log(brain)|I(body^2) log(brain)|I(body^2)
3 log(brain)|I(body^2) log(brain)|I(body^2)             nondream
4 log(brain)|I(body^2)             nondream                dream
5             nondream                dream                sleep
6                dream                sleep            gestation
7                sleep            gestation            predation
8            gestation            predation             exposure
9            predation             exposure               danger

I have also tried:

pr_r<-c('log(brain)','I(body^2)')
  mapply(function(x, y)
    lapply(x, function(dat)
      sapply(dat, function(z)
        str_replace(z, extract_replace, y)) %>% data.frame()), per, pr_r, SIMPLIFY = FALSE)
  

However, this does not produce the results I am after.

Expected output: when values brain are found we should get log(brain), when body is found we should get I(body^2) in replacement.

Expected output:

[[1]]
         X1        X2        X3        X4        X5        X6       X7       X8       X9
1     log(brain)  nondream     dream     sleep gestation predation exposure   danger     I(body^2)
2  nondream     dream     sleep gestation predation  exposure   danger     I(body^2)    log(brain)
3     dream     sleep gestation predation  exposure    danger     body    log(brain) nondream
4     sleep gestation predation  exposure    danger      I(body^2)    log(brain) nondream    dream
5 gestation predation  exposure    danger      I(body^2)     brain nondream    dream    sleep

[[2]]
         X1        X2        X3        X4        X5        X6       X7       X8        X9
1     log(brain)  nondream     dream     sleep gestation predation exposure   danger      I(body^2)
2  nondream     dream     sleep gestation predation  exposure   danger     I(body^2)     log(brain)
3     dream     sleep gestation predation  exposure    danger     body    log(brain)  nondream
4     sleep gestation predation  exposure    danger      I(body^2)    brain nondream     dream
5 gestation predation  exposure    danger      I(body^2)     log(brain) nondream    dream     sleep
6 predation  exposure    danger      I(body^2)     brain  nondream    dream    sleep gestation

UPDATE: When trying to use it on a vector, for example the names of the datasets. Say for example, I want log(X1) to change to X1, this fails when trying the following:

pr_replace <- c('log(X1)', 'log(X8)')
extract_replace <- c('X1', 'X8')
lapply(per, names) %>% map(., ~ .x %>% str_replace_all(.x, setNames(extract_replace, pr_replace)))

reproducible code (updated):

per <- list(structure(list(`log(X1)` = c("brain", "nondream", "dream", 
"sleep", "gestation"), X2 = c("nondream", "dream", "sleep", "gestation", 
"predation"), X3 = c("dream", "sleep", "gestation", "predation", 
"exposure"), X4 = c("sleep", "gestation", "predation", "exposure", 
"danger"), X5 = c("gestation", "predation", "exposure", "danger", 
"body"), X6 = c("predation", "exposure", "danger", "body", "brain"
), X7 = c("exposure", "danger", "body", "brain", "nondream"), 
    `log(X8)` = c("danger", "body", "brain", "nondream", "dream"
    ), X9 = c("body", "brain", "nondream", "dream", "sleep")), row.names = c(NA, 
5L), class = "data.frame"), structure(list(`log(X1)` = c("brain", 
"nondream", "dream", "sleep", "gestation", "predation"), X2 = c("nondream", 
"dream", "sleep", "gestation", "predation", "exposure"), X3 = c("dream", 
"sleep", "gestation", "predation", "exposure", "danger"), X4 = c("sleep", 
"gestation", "predation", "exposure", "danger", "body"), X5 = c("gestation", 
"predation", "exposure", "danger", "body", "brain"), X6 = c("predation", 
"exposure", "danger", "body", "brain", "nondream"), X7 = c("exposure", 
"danger", "body", "brain", "nondream", "dream"), `log(X8)` = c("danger", 
"body", "brain", "nondream", "dream", "sleep"), X9 = c("body", 
"brain", "nondream", "dream", "sleep", "gestation")), row.names = c(NA, 
6L), class = "data.frame"))

CodePudding user response:

Instead of pasteing the elements in the replacement (which literally process it compared to the evaluation in pattern for |), we can create two vectors or a single named vector where the names should match the substring in the original data to replace the values from the vector

pr_replace <- c('log(brain)','I(body^2)')
extract_replace <- c('brain','body')
named_vec <- setNames(pr_replace, extract_replace)

Now, we loop over the list with map, loop across the columns of the datasets and apply str_replace with a named vector

library(purrr)
library(stringr)
library(dplyr)
per <- map(per, ~ .x %>%
   mutate(across(everything(), ~ str_replace_all(.x, 
        named_vec))))

-output

per
[[1]]
          X1        X2        X3        X4        X5         X6         X7         X8         X9
1 log(brain)  nondream     dream     sleep gestation  predation   exposure     danger  I(body^2)
2   nondream     dream     sleep gestation predation   exposure     danger  I(body^2) log(brain)
3      dream     sleep gestation predation  exposure     danger  I(body^2) log(brain)   nondream
4      sleep gestation predation  exposure    danger  I(body^2) log(brain)   nondream      dream
5  gestation predation  exposure    danger I(body^2) log(brain)   nondream      dream      sleep

[[2]]
          X1        X2        X3        X4         X5         X6         X7         X8         X9
1 log(brain)  nondream     dream     sleep  gestation  predation   exposure     danger  I(body^2)
2   nondream     dream     sleep gestation  predation   exposure     danger  I(body^2) log(brain)
3      dream     sleep gestation predation   exposure     danger  I(body^2) log(brain)   nondream
4      sleep gestation predation  exposure     danger  I(body^2) log(brain)   nondream      dream
5  gestation predation  exposure    danger  I(body^2) log(brain)   nondream      dream      sleep
6  predation  exposure    danger I(body^2) log(brain)   nondream      dream      sleep  gestation

For the updated case with column names, wrap with fixed as well as there are metacharacters (()) along with partial matching

map(per, ~ str_replace_all(names(.x), 
       fixed(setNames(extract_replace, pr_replace))))
[[1]]
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9"

[[2]]
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9"
  •  Tags:  
  • r
  • Related