Trying to create a new column using paste0 and lapply at the same time but i'm getting an error-CodePudding

Hey I have a list of dataframes with all the same variables. I want to run a loop or lapply to all the dataframes in the list. Basically a concatenate of col1 and col2 with a space in between, for col1 only the values within the parenthesis and col 2 can be brought as it is.

Col1|                                     Col2
It looks like (1) is here                 1234
(2) is here                               5678
Lets do (3)                               9012
Lets preferably work (4) in the equation  3456

I would like it for it to look like this for all the values

Col1|                                     Col2| Col3
It looks like (1) is here                 1234  1 1234
(2) is here                               5678  2 5678
Lets do (3)                               9012  3 9012
Lets preferably work (4) in the equation  3456  4 3456

I tried doing this and it did not work

lapply(seq_along(dflist),function(x)
  {
  x$FuzzyMatch<-
    paste0(str_extract(x$col1,  "(?<=\\(). ?(?=\\))")," ", x$col2);return(x)
  }
)

it's saying Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : object 'x' not found

However when I operate on one data frame I am able to do so.

CodePudding user response：

In the OP's code, it is looped over the sequence of the list, thus the lambda x will be 1, 2, 3, etc... instead of the data.frame/tibble inside the list. In that case, we need to extract the list element with x1 <- dflist[[x]] and then use x1 to do the changes. Instead, we can directly loop over the list and modify/create the column in the list

library(dplyr)
library(stringr)
library(purrr)
map(dflist, ~ .x %>%
          mutate(Col3 = str_c(str_extract(Col1, "\\((\\d )\\)", 
            group = 1), " ",  Col2)))

-output

[[1]]
                                      Col1 Col2   Col3
1                It looks like (1) is here 1234 1 1234
2                              (2) is here 5678 2 5678
3                              Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456

[[2]]
                                      Col1 Col2   Col3
1                It looks like (1) is here 1234 1 1234
2                              (2) is here 5678 2 5678
3                              Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456

Or in base R

lapply(dflist, transform, Col3 = paste(sub(".*\\((\\d )\\).*", 
       "\\1", Col1), Col2))

-output

[[1]]
                                      Col1 Col2   Col3
1                It looks like (1) is here 1234 1 1234
2                              (2) is here 5678 2 5678
3                              Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456

[[2]]
                                      Col1 Col2   Col3
1                It looks like (1) is here 1234 1 1234
2                              (2) is here 5678 2 5678
3                              Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456

data

dflist <- list(structure(list(Col1 = c("It looks like (1) is here", "(2) is here", 
"Lets do (3)", "Lets preferably work (4) in the equation"), Col2 = c(1234L, 
5678L, 9012L, 3456L)), class = "data.frame", row.names = c(NA, 
-4L)), structure(list(Col1 = c("It looks like (1) is here", "(2) is here", 
"Lets do (3)", "Lets preferably work (4) in the equation"), Col2 = c(1234L, 
5678L, 9012L, 3456L)), class = "data.frame", row.names = c(NA, 
-4L)))