Hey I have a list of dataframes with all the same variables. I want to run a loop or lapply to all the dataframes in the list. Basically a concatenate of col1 and col2 with a space in between, for col1 only the values within the parenthesis and col 2 can be brought as it is.
Col1| Col2
It looks like (1) is here 1234
(2) is here 5678
Lets do (3) 9012
Lets preferably work (4) in the equation 3456
I would like it for it to look like this for all the values
Col1| Col2| Col3
It looks like (1) is here 1234 1 1234
(2) is here 5678 2 5678
Lets do (3) 9012 3 9012
Lets preferably work (4) in the equation 3456 4 3456
I tried doing this and it did not work
lapply(seq_along(dflist),function(x)
{
x$FuzzyMatch<-
paste0(str_extract(x$col1, "(?<=\\(). ?(?=\\))")," ", x$col2);return(x)
}
)
it's saying Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : object 'x' not found
However when I operate on one data frame I am able to do so.
CodePudding user response:
In the OP's code, it is looped over the sequence of the list, thus the lambda x
will be 1, 2, 3, etc... instead of the data.frame/tibble inside the list. In that case, we need to extract the list element with x1 <- dflist[[x]]
and then use x1
to do the changes. Instead, we can directly loop over the list and modify/create the column in the list
library(dplyr)
library(stringr)
library(purrr)
map(dflist, ~ .x %>%
mutate(Col3 = str_c(str_extract(Col1, "\\((\\d )\\)",
group = 1), " ", Col2)))
-output
[[1]]
Col1 Col2 Col3
1 It looks like (1) is here 1234 1 1234
2 (2) is here 5678 2 5678
3 Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456
[[2]]
Col1 Col2 Col3
1 It looks like (1) is here 1234 1 1234
2 (2) is here 5678 2 5678
3 Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456
Or in base R
lapply(dflist, transform, Col3 = paste(sub(".*\\((\\d )\\).*",
"\\1", Col1), Col2))
-output
[[1]]
Col1 Col2 Col3
1 It looks like (1) is here 1234 1 1234
2 (2) is here 5678 2 5678
3 Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456
[[2]]
Col1 Col2 Col3
1 It looks like (1) is here 1234 1 1234
2 (2) is here 5678 2 5678
3 Lets do (3) 9012 3 9012
4 Lets preferably work (4) in the equation 3456 4 3456
data
dflist <- list(structure(list(Col1 = c("It looks like (1) is here", "(2) is here",
"Lets do (3)", "Lets preferably work (4) in the equation"), Col2 = c(1234L,
5678L, 9012L, 3456L)), class = "data.frame", row.names = c(NA,
-4L)), structure(list(Col1 = c("It looks like (1) is here", "(2) is here",
"Lets do (3)", "Lets preferably work (4) in the equation"), Col2 = c(1234L,
5678L, 9012L, 3456L)), class = "data.frame", row.names = c(NA,
-4L)))