Home > Enterprise >  Using stringdist_join with differing column names
Using stringdist_join with differing column names

Time:04-17

I have example data as follows:

library(fuzzyjoin)
a <- data.frame(x = c("season", "season", "season", "package", "package"), y = c("1","2", "3", "1","6"))


b <- data.frame(x = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

c <- data.frame(z = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

So the following runs fine:

d <- stringdist_left_join(a,b, by = "x", max_dist = 2)

But merging with a column with a different name is not allowed (note that the join is now a and c).

e <- stringdist_left_join(a,c, by = c("x", "z"), max_dist = 2)

I would like to tell stringdist_left_join to use two different column names to join by, like the last line of code it (e), but it does not seems to accept it.

Is there any solution to this (other than copying the column and giving it another name)?

CodePudding user response:

You can use = for two different column names. You can use the following code:

e <- stringdist_left_join(a,c, by = c("x" = "z"), max_dist = 2)

Output:

         x y       z w
1   season 1  season 1
2   season 1   seson 2
3   season 1   seson 3
4   season 2  season 1
5   season 2   seson 2
6   season 2   seson 3
7   season 3  season 1
8   season 3   seson 2
9   season 3   seson 3
10 package 1 package 2
11 package 1 pakkage 6
12 package 6 package 2
13 package 6 pakkage 6
  • Related