I have two columns (V1 and V2) with character information. I want to create a third column with the "sum" of this characters. Like unique values between comma's "," inside the character vectors on both columns (V1 and V2).
I want to go from this:
Example data:
data.frame(V1 = c('A','A','A','A','B','B','','C'),
V2 = c('A, B','A','B','','A, C','A, B','A',''))
V1 V2
1 A A, B
2 A A
3 A B
4 A
5 B A, C
6 B A, B
7 A
8 C
To this:
V3
1 AB
2 A
3 AB
4 A
5 ABC
6 AB
7 A
8 C
CodePudding user response:
We can split the column 'V2', get the union
of both columns and paste
data.frame(V3 = mapply(\(x, y) paste(sort(union(x, y)),
collapse = ""), strsplit(df1$V2, ",\\s*"), df1$V1))
-output
V3
1 AB
2 A
3 AB
4 A
5 ABC
6 AB
7 A
8 C
CodePudding user response:
This approach first paste
V1
and V2
together, then use strsplit
to split the string, then only keep the unique
characters and collapse them together.
df$V3 <- sapply(strsplit(gsub(",\\s", "", paste0(df$V1, df$V2)), ""),
function(x) paste0(sort(unique(x)), collapse = ""))
V3
1 AB
2 A
3 AB
4 A
5 ABC
6 AB
7 A
8 C
CodePudding user response:
With regex:
gsub("(.)(?=.*\\1)|,| ", "", paste(df$V1, df$V2), perl = TRUE)
# [1] "AB" "A" "AB" "A" "BAC" "AB" "A" "C"
CodePudding user response:
Here is a tidyverse way using purrr
and dplyr
. You can probably condense this into fewer lines, but this is readable enough.
- Split the text on the comma.
- Sort and combine the two columns.
- Paste them back together.
library(dplyr)
library(purrr)
library(stringr)
df %>%
modify(str_split, ",\\s") %>%
mutate(V3 = map2(V1, V2, compose(sort, unique, c))) %>%
mutate(V3 = map_chr(V3, paste, collapse = ""))