Sample dput()
below:
structure(list(group = c(34676739L, 45938970L, 22731473L, 40083768L,
22527333L, 51629537L, 26299463L, 27420157L, 24898717L, 43569190L,
34573189L, 44503577L, 25471327L, 44630117L, 19048782L, 39710425L,
33535680L, 54358561L, 27363448L, 39386432L, 44150096L, 24614702L,
36219027L, 39609036L, 10803983L, 54770896L, 27574728L, 40912817L,
24679610L, 40261463L), partners = c("US-GB", "US-JP", "US-JP",
"US-GB", "GB-US", "US-GB", "GB-US", "US-GB", "US-GB", "US-JP",
"US-GB", "US-GB", "US-GB", "GB-US", "JP-US", "US-JP", "JP-US",
"JP-US", "US-GB", "US-GB", "US-JP", "US-GB", "GB-US", "GB-US",
"US-GB", "US-GB", "US-JP", "JP-US", "US-GB", "US-GB")), row.names = c(NA,
-30L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001fc21f23b00>)
What I want to do: I want to create a new variable, say partners_final
, that normalizes the entries in partners
. You'll see that partners
has entries US-GB, GB-US, US-JP, JP-US
. These simply represent relationships between business partners, and so technically, US-JP == JP-US
and US-GB == GB-US
.
However, these entries are (obviously) not equivalent in R, which makes it tough when doing empirics. So what I want to do is make a new variable partners_final
that gives a uniform business partner-pair regardless of the order of the two partners.
NOTE that in my actual data set, there are many, many partners. I need to do something that applies to the entire data set, e.g., partners_final
must reflect that AB-CD === CD-AB
for all pairs AB, CD
. Is there any way I can do this in R (preferably avoiding pivot
as there are some country pairs that don't show up in the data that might later need to be accounted for)?
CodePudding user response:
You need some logic for ordering I assume. In this case, I will assume an alphabetical ordering of the normalized pairs. You could split, sort and re-join.
library(stringr)
d <- structure(list(
group = c(
34676739L, 45938970L, 22731473L, 40083768L,
22527333L, 51629537L, 26299463L, 27420157L, 24898717L, 43569190L,
34573189L, 44503577L, 25471327L, 44630117L, 19048782L, 39710425L,
33535680L, 54358561L, 27363448L, 39386432L, 44150096L, 24614702L,
36219027L, 39609036L, 10803983L, 54770896L, 27574728L, 40912817L,
24679610L, 40261463L
),
partners = c(
"US-GB", "US-JP", "US-JP",
"US-GB", "GB-US", "US-GB", "GB-US", "US-GB", "US-GB", "US-JP",
"US-GB", "US-GB", "US-GB", "GB-US", "JP-US", "US-JP", "JP-US",
"JP-US", "US-GB", "US-GB", "US-JP", "US-GB", "GB-US", "GB-US",
"US-GB", "US-GB", "US-JP", "JP-US", "US-GB", "US-GB"
)
),
row.names = c(NA, -30L), class = c("data.table", "data.frame")
)
d$cleaned <- lapply(str_split(d$partners, "-"),
function(x) paste0(sort(x), collapse = "-" ))