Home > OS >  fuzzy joining a column with a list
fuzzy joining a column with a list

Time:04-14

The data is as follows:

library(fuzzyjoin)
nr <- c(1,2)
col2 <- c("b","a")

dat <- cbind.data.frame(
  nr, col2
)

thelist <- list(
aa=c(1,2,3),
bb=c(1,2,3)
)

I would like to the following:

stringdist_left_join(dat, thelist, by = "col2", method = "lcs", max_dist = 1)

But this (unsurprisingly) gives an error:

Error in `group_by_prepare()`:
! Must group by variables found in `.data`.
* Column `col` is not found.
Run `rlang::last_error()` to see where the error occurred.

What would be the best way to do this?

Desired output:

nr col2 thelist list_col
1  b    bb      c(1,2,3)
2  a    aa      c(1,2,3)

CodePudding user response:

This is a bit of a hack. Not sure if there is a more elegant solution.

Create a data.frame of the transposed list and pivot this into a data.frame with all the names of the list in a column named "col2". Then use fuzzy join to merge the data. With the resulting out data.frame, you can drop the columns you don't need.

library(fuzzyjoin)
library(tidyr)

dat <- data.frame(
  nr = c(1,2), col2 = c("b","a")
)

thelist <- list(
  aa=c(1,2,3),
  bb=c(1,2,3,4)
)

# create data.frame with list info 
a <- tibble(col2 = names(thelist), value = thelist)
a
# A tibble: 2 x 2
  col2  value       
  <chr> <named list>
1 aa    <dbl [3]>   
2 bb    <dbl [4]>   

# merge data
out <- stringdist_left_join(dat, a, by = "col2", method = "lcs", max_dist = 1)
out
  nr col2.x col2.y      value
1  1      b     bb 1, 2, 3, 4
2  2      a     aa    1, 2, 3
  • Related