I work for a project in Java 8 and I want to compute the editing distance for 2 strings - in a iterative way (so without recursion); the method will be executed many times, so I need to improve it with a given limit (threshold), meaning that if the editing distance from 2 strings is greater than a given number x, the algorithm will be stopped.
Now, where from should I find a method like this? Is there a java library (or provided from maven) which contain such a method? e.g. in StringUtils?
For current implementation I use classic editing distance algorithm.
static int min(int x, int y, int z)
{
if (x <= y && x <= z)
return x;
if (y <= x && y <= z)
return y;
else
return z;
}
static int editDistDP(String str1, String str2, int m,
int n)
{
// Create a table to store results of subproblems
int dp[][] = new int[m 1][n 1];
// Fill d[][] in bottom up manner
for (int i = 0; i <= m; i ) {
for (int j = 0; j <= n; j ) {
// If first string is empty, only option is
// to insert all characters of second string
if (i == 0)
dp[i][j] = j; // Min. operations = j
// If second string is empty, only option is
// to remove all characters of second string
else if (j == 0)
dp[i][j] = i; // Min. operations = i
// If last characters are same, ignore last
// char and recur for remaining string
else if (str1.charAt(i - 1)
== str2.charAt(j - 1))
dp[i][j] = dp[i - 1][j - 1];
// If the last character is different,
// consider all possibilities and find the
// minimum
else
dp[i][j] = 1
min(dp[i][j - 1], // Insert
dp[i - 1][j], // Remove
dp[i - 1]
[j - 1]); // Replace
}
}
return dp[m][n];
}
Also, I find this which use Levenstein distance with a given limit, but I wouldn't want to change to Levenstein if it ins't necessary
CodePudding user response:
I think this by Apache Commons is exactly what you are looking for which has a constructor with threshold for maximum distance.
Pseudocode:
LevenshteinDistance ld = new LevenshteinDistance(10);
Integer editDistanceWithThreshold = ld.apply('String1', 'String2')