I was reading 'Algorithms Unlocked' by Thomas H Corman and came across an approach to finding the Longest Common Subsequence which ran in O(n * m). This approach can be seen in this article too: https://www.geeksforgeeks.org/printing-longest-common-subsequence/
This seems to be the most popular and quickest approach but seemed a bit convoluted to me so I tried to write a different approach, also using dynamic programming.
My approach is recursive, in the worst case has to branch twice every call but uses memoization to save the result of previous inputs. I understand that std::map can take lg(n) time for 'find' but I assume this code can be easily altered to use a hash map instead
using namespace std;
string lcs_helper(int i, int j, const string& x, const string& y, map<pair<int, int>, string>& memoizedSolutions)
{
if (memoizedSolutions.count({ i, j })) return memoizedSolutions[{i, j}];
if (i == 0 || j == 0) return "";
if (x[i] == y[j])
{
string lcs = lcs_helper(i - 1, j - 1, x, y, memoizedSolutions);
lcs.push_back(x[i]);
memoizedSolutions[{i, j}] = lcs;
return lcs;
}
string lcs1 = lcs_helper(i - 1, j, x, y, memoizedSolutions);
string lcs2 = lcs_helper(i, j - 1, x, y, memoizedSolutions);
if (lcs1.size() >= lcs2.size()) // corrected bug
{
memoizedSolutions[{i, j}] = lcs1;
return lcs1;
}
memoizedSolutions[{i, j}] = lcs2;
return lcs2;
}
string lcs(string x, string y)
{
map<pair<int, int>, string> memoizedSolutions;
return lcs_helper(x.size() - 1, y.size() - 1, x, y, memoizedSolutions);
}
I am struggling to generalise the time complexity of my approach and wanted to know if and why it is slower than the usual approach.
I tried to understand the time complxity through creating a tree with an example but am still struggling to generalise it. https://imgur.com/a/h8J6ldH (This was the tree example I created, red= return "" and the other colours represent inputs that can be memoized, I have ommited the trees for memoized results)
CodePudding user response:
Even if you use a hashmap for memoization, and correct the bug, the time complexity is O(m*n*l)
, where l
is the length of the LCS; this is also upper-bounded by O(m*n*min(m, n))
.
First, the bug. lcs_helper
returns a string:
string lcs1 = lcs_helper(i - 1, j, x, y, memoizedSolutions);
string lcs2 = lcs_helper(i, j - 1, x, y, memoizedSolutions);
if (lcs1 >= lcs2)
This if
statement should compare the lengths of the strings; it actually compares them lexicographically.
To find the time-complexity of a dynamic-programming algorithm, a handy formula is (number of subproblems) * (time to solve one subproblem)
.
Here, you have the same number of subproblems as the regular DP solution: m*n
, or one for each value of i
and j
.
The time to solve a subproblem, in your case, is not O(1)
. Since your function returns a string, and must cache strings for memoization, it will do up to length(LCS)
work per subproblem to build the string and copy the memory.
This may be why the normal DP algorithm can look convoluted: if you store the actual LCS strings, the run-time will become cubic in the worst-case. By only memoizing on the lengths of the LCS, and having lcs_helper
return a length, you'll need to backtrack at the end to construct the string, but maintain an O(m*n)
runtime.