I have two vectors of strings:
a <- c('Alpha', 'Beta', 'Gamma', 'Delta')
b <- c('Epsilon', 'Zeta', 'Eta', 'Theta')
and I would like to compute the Levenshtein distance or edit distance for each pair of strings.
If I use
stringdist(a, b, method="lv")
The output is a vector with the Levenshtein distance of each string in vector a and the corresponding string in vector b (i.e., Alpha vs Epsilon, Beta vs Zeta, etc.).
What I need instead is a pairwise comparison between each string in one vector and ALL the other strings in the other vector (i.e. Alpha vs Epsilon, Alpha vs. Zeta, Alpha vs Eta, Alpha vs. Theta, Beta vs Epsilon, etc.).
Thanks
CodePudding user response:
There is a straightforward way to do this using stringdistmatrix
and some reshaping:
library(stringdist)
library(tidyverse)
a <- c('Alpha', 'Beta', 'Gamma', 'Delta')
b <- c('Epsilon', 'Zeta', 'Eta', 'Theta')
stringdistmatrix(a, b, method = "lv", useNames = "string") %>%
as_tibble(rownames = "a") %>%
pivot_longer(-1, names_to = "b", values_to = "dist")
#> # A tibble: 16 x 3
#> a b dist
#> <chr> <chr> <dbl>
#> 1 Alpha Epsilon 7
#> 2 Alpha Zeta 4
#> 3 Alpha Eta 4
#> 4 Alpha Theta 4
#> 5 Beta Epsilon 7
#> 6 Beta Zeta 1
#> 7 Beta Eta 2
#> 8 Beta Theta 2
#> 9 Gamma Epsilon 7
#> 10 Gamma Zeta 4
#> 11 Gamma Eta 4
#> 12 Gamma Theta 4
#> 13 Delta Epsilon 6
#> 14 Delta Zeta 2
#> 15 Delta Eta 3
#> 16 Delta Theta 3
CodePudding user response:
A base R option using adist
expand.grid
> cbind(expand.grid(a = a, b = b), lv = c(adist(a, b)))
a b lv
1 Alpha Epsilon 7
2 Beta Epsilon 7
3 Gamma Epsilon 7
4 Delta Epsilon 6
5 Alpha Zeta 4
6 Beta Zeta 1
7 Gamma Zeta 4
8 Delta Zeta 2
9 Alpha Eta 4
10 Beta Eta 2
11 Gamma Eta 4
12 Delta Eta 3
13 Alpha Theta 4
14 Beta Theta 2
15 Gamma Theta 4
16 Delta Theta 3
or
> cbind(rev(expand.grid(b = b, a = a)), lv = c(t(adist(a, b))))
a b lv
1 Alpha Epsilon 7
2 Alpha Zeta 4
3 Alpha Eta 4
4 Alpha Theta 4
5 Beta Epsilon 7
6 Beta Zeta 1
7 Beta Eta 2
8 Beta Theta 2
9 Gamma Epsilon 7
10 Gamma Zeta 4
11 Gamma Eta 4
12 Gamma Theta 4
13 Delta Epsilon 6
14 Delta Zeta 2
15 Delta Eta 3
16 Delta Theta 3