New column based on other columns-CodePudding

I have two columns (V1 and V2) with character information. I want to create a third column with the "sum" of this characters. Like unique values between comma's "," inside the character vectors on both columns (V1 and V2).

I want to go from this:

Example data:

data.frame(V1 = c('A','A','A','A','B','B','','C'),
           V2 = c('A, B','A','B','','A, C','A, B','A',''))


  V1   V2
1  A    A, B
2  A    A
3  A    B
4  A     
5  B    A, C
6  B    A, B
7       A
8  C

To this:

   V3
1   AB
2   A
3   AB
4   A
5   ABC
6   AB
7   A
8   C

CodePudding user response：

We can split the column 'V2', get the union of both columns and paste

data.frame(V3 = mapply(\(x, y) paste(sort(union(x, y)), 
  collapse = ""), strsplit(df1$V2, ",\\s*"), df1$V1))

-output

   V3
1  AB
2   A
3  AB
4   A
5 ABC
6  AB
7   A
8   C

CodePudding user response：

This approach first paste V1 and V2 together, then use strsplit to split the string, then only keep the unique characters and collapse them together.

df$V3 <- sapply(strsplit(gsub(",\\s", "", paste0(df$V1, df$V2)), ""), 
                             function(x) paste0(sort(unique(x)), collapse = ""))

   V3
1  AB
2   A
3  AB
4   A
5 ABC
6  AB
7   A
8   C

CodePudding user response：

With regex:

gsub("(.)(?=.*\\1)|,| ", "", paste(df$V1, df$V2), perl = TRUE)

# [1] "AB"  "A"   "AB"  "A"   "BAC" "AB"  "A"   "C"

CodePudding user response：

Here is a tidyverse way using purrr and dplyr. You can probably condense this into fewer lines, but this is readable enough.

Split the text on the comma.
Sort and combine the two columns.
Paste them back together.

library(dplyr)
library(purrr)
library(stringr)

df %>% 
  modify(str_split, ",\\s") %>% 
  mutate(V3 = map2(V1, V2, compose(sort, unique, c))) %>%
  mutate(V3 = map_chr(V3, paste, collapse = ""))