I have a dataframe which has a colon of such values
test1=data.frame(c("ABC 01; 02; 03", "test2 01; 02; 03"))
I would like to insert text before the semicolon, like this:
test1=data.frame(c("ABC 01; ABC 02; ABC 03", "test2 01; test2 02; test2 03"))
can someone show me how to do this? thank you!!
CodePudding user response:
Using only base R:
test1$y <- mapply(
\(org, key) gsub("; ([0-9] )", key, org),
org = test1$x, key = sprintf("; %s \\1", sub(" . ", "", test1$x))
)
x y
1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03
Data
test1 <- data.frame(x = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Here's a two-step tidyverse
solution:
library(tidyverse)
test1 %>%
mutate(
# create temporary variable containing text string:
temp = str_replace(var, "(\\w ).*", " \\1"),
# add text string each time there is ";" to the left:
var= str_replace_all(var, "(?<=;)", temp)) %>%
# remove `temp`:
select(-temp)
var
1 ABC 01; ABC 02; ABC 03
2 test2 01; test2 02; test2 03
How this works:
-
- using
str_replace
we define the string-initial alphanumeric substring (\\w
) as a capture group (by placing it into parentheses) and refer to it, and it alone, in the replacement clause using backreference (\\1
), where we also add one whitespace (before the backreference)
- using
-
- next, using
str_replace_all
we add the text string intemp
to the strings invar
on the condition that there be a literal;
immediately to the left (this type of conditional matching is called positive look-behind; its syntax is(?<= ...)
)
- next, using
Data:
test1=data.frame(var = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Another regex option could be to parse it all in capture groups:
fun <- \(x) gsub("(\\w ) (\\d ); (\\d ); (\\d )", "\\1 \\2; \\1 \\3; \\1 \\4", x)
Then with either dplyr
or base
:
library(dplyr)
test1 |>
mutate(result = fun(string))
test1$result <- sapply(test1$string, fun)
Output:
string result
1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03
Data:
test1 <- data.frame(string = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Using strsplit and paste. Split on space then paste 1st item to all items excluding 1st item:
test1$new <- sapply(strsplit(test1$x, " ", fixed = TRUE),
function(i) paste(paste(i[ 1 ], i[ -1 ]), collapse = " "))
test1
# x new
# 1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
# 2 test2 01; 02; 03 test2 01; test2 02; test2 03
CodePudding user response:
Here is an option using stringr
functions.
library(dplyr)
library(stringr)
test1 = data.frame(col = c("ABC 01; 02; 03", "test2 01; 02; 03"))
result <- test1 %>%
mutate(common = str_extract(col, '\\w '),
parts = str_split(str_remove(col, common), ';\\s '),
new_string = purrr::map2_chr(common, parts,
str_c, sep = " ", collapse = ";"))
result
# col common parts new_string
#1 ABC 01; 02; 03 ABC 01, 02, 03 ABC 01;ABC 02;ABC 03
#2 test2 01; 02; 03 test2 01, 02, 03 test2 01;test2 02;test2 03
result$new_string
#[1] "ABC 01;ABC 02;ABC 03" "test2 01;test2 02;test2 03"
You may drop the columns that you don't need from result
.