I have an std::vector<std::string>paths
where each entry is a path and I want to remove all the paths that are sub-directories of another one.
If for example I have root/dir1/
, root/dir1/sub_dir/
and root/dir2/
, the result should be root/dir1/
, root/dir2/
.
The way I've implemented it is by using std::sort
std::unique
with a predicate that checks if string2
starts with string1
.
std::vector<std::string> paths = getPaths();
std::sort(paths.begin(), paths.end());
const auto to_erase = std::unique(paths.begin(), paths.end(), [](const std::string & s1, const std::string & s2) {
return (s2.starts_with(s1) || s1.starts_with(s2));
});
paths.erase(to_erase, paths.end());
But since the predicate should be symetrycal I wonder if in some implementation std::unique
iterate from end to start, and in that case the result will be wrong.
CodePudding user response:
Your predicate is symmetric.
Let p
be your predicate (the lambda), and a
and b
some strings, different from each other, but such that p(a, b)
is true. Then either a.starts_with(b)
or b.starts_with(a)
.
If a.starts_with(b)
, then p(b, a)
is true because s2.starts_with(s1)
is true in the lambda. Similarly, if b.starts_with(a)
, then p(b, a)
is true because s1.starts_with(s2)
is true in the lambda.
So, if p(a, b)
, then p(b, a)
(and vice versa), which is the definition of a symmetric predicate.
It's not transitive though (p("a/b", "a")
and p("a", "a/c")
but not p("a/b", "a/c")
), but I can't see a way this could pose a problem in practice. It could definitely lead to different results if the input isn't sorted, but yours is.
So your implementation is probably fine.