Add text after semicolon in R-CodePudding

I have a dataframe which has a colon of such values

test1=data.frame(c("ABC 01; 02; 03", "test2 01; 02; 03"))

I would like to insert text before the semicolon, like this:

test1=data.frame(c("ABC 01; ABC 02; ABC 03", "test2 01; test2 02; test2 03"))

can someone show me how to do this? thank you!!

CodePudding user response：

Using only base R:

test1$y <- mapply(
  \(org, key) gsub("; ([0-9] )", key, org),
  org = test1$x, key = sprintf("; %s \\1", sub(" . ", "", test1$x))
)

                 x                            y
1   ABC 01; 02; 03       ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03

Data

test1 <- data.frame(x = c("ABC 01; 02; 03", "test2 01; 02; 03"))

CodePudding user response：

Here's a two-step tidyverse solution:

library(tidyverse)    
test1 %>%
  mutate(
    # create temporary variable containing text string:
    temp = str_replace(var, "(\\w ).*", " \\1"),
    # add text string each time there is ";" to the left:
    var= str_replace_all(var, "(?<=;)", temp)) %>%
  # remove `temp`:
  select(-temp)
                           var
1       ABC 01; ABC 02; ABC 03
2 test2 01; test2 02; test2 03

How this works:

1. using str_replace we define the string-initial alphanumeric substring (\\w ) as a capture group (by placing it into parentheses) and refer to it, and it alone, in the replacement clause using backreference (\\1), where we also add one whitespace (before the backreference)
1. next, using str_replace_all we add the text string in temp to the strings in varon the condition that there be a literal ; immediately to the left (this type of conditional matching is called positive look-behind; its syntax is (?<= ...))

Data:

test1=data.frame(var = c("ABC 01; 02; 03", "test2 01; 02; 03"))

CodePudding user response：

Another regex option could be to parse it all in capture groups:

fun <- \(x) gsub("(\\w ) (\\d ); (\\d ); (\\d )", "\\1 \\2; \\1 \\3; \\1 \\4", x)

Then with either dplyr or base:

library(dplyr)

test1 |>
  mutate(result = fun(string))

test1$result <- sapply(test1$string, fun)

Output:

            string                       result
1   ABC 01; 02; 03       ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03

Data:

test1 <- data.frame(string = c("ABC 01; 02; 03", "test2 01; 02; 03"))

CodePudding user response：

Using strsplit and paste. Split on space then paste 1st item to all items excluding 1st item:

test1$new <- sapply(strsplit(test1$x, " ", fixed = TRUE),
                    function(i) paste(paste(i[ 1 ], i[ -1 ]), collapse = " "))
test1
#                  x                          new
# 1   ABC 01; 02; 03       ABC 01; ABC 02; ABC 03
# 2 test2 01; 02; 03 test2 01; test2 02; test2 03

CodePudding user response：

Here is an option using stringr functions.

library(dplyr)
library(stringr)

test1 = data.frame(col = c("ABC 01; 02; 03", "test2 01; 02; 03"))

result <- test1 %>%
  mutate(common = str_extract(col, '\\w '), 
         parts = str_split(str_remove(col, common), ';\\s '),
         new_string = purrr::map2_chr(common, parts, 
                         str_c, sep = " ", collapse = ";"))

result

#               col common       parts                  new_string
#1   ABC 01; 02; 03    ABC  01, 02, 03       ABC  01;ABC 02;ABC 03
#2 test2 01; 02; 03  test2  01, 02, 03 test2  01;test2 02;test2 03

result$new_string

#[1] "ABC  01;ABC 02;ABC 03"       "test2  01;test2 02;test2 03"

You may drop the columns that you don't need from result.