My program is supposed to create two files with user-specified paths. I need to know if the paths lead to the same location, to end with an error before I start changing the filesystem.
Because the paths come from the user, they are expected to be non-canonical and weird.
For example they could be ./dir1/subdir/file
and dir2/subdir/../subdir/file
where dir2
is a symlink to dir1
and subdir
doesn't exist yet. The expected result is still true
, they are equivalent.
The std::filesystem::equivalent
works only on files that already exist.
Is there any similar function without this limitation?
CodePudding user response:
I would use std::filesystem::absolute
and then std::filesystem::weekly_canonical
on the result.
namespace fs = std::filesystem;
auto fullpath1 fs::weakly_canonical(fs::absolute(path1));
auto fullpath2 fs::weakly_canonical(fs::absolute(path2));
if(fullpath1 == fullpath2) {
//
}
Note: For std::filesystem::absolute
, implementations are encouraged to not consider a non-existing path
to be an error, but implementations may still do. It works in the most current releases of g
, clang
and MSVC
though.
CodePudding user response:
This is a surprisingly difficult problem to solve, and no single standard library function will do it.
There are several cases that you need to worry about:
- Relative paths with an initial
./
- Bare relative paths without a initial
./
- Symlinks in the "non-existing" part of a path
- Case-sensitivity of different filesystems
- Almost certainly more that I didn't think of
std::filesystem::weakly_canonical
will get you part of the way there, but it won't quite get there by itself. For instance, it doesn't address cases when a bare relative path doesn't exist (i.e. foo
won't canonicalize to the same thing as ./foo
) and it doesn't even try to address case-sensitivity.
Here's a canonicalize
function that will take all of that into account. It still has some shortcomings, mainly around non-ASCII characters (i.e. the case-normalization doesn't work for 'É'), but it should work in most cases:
namespace fs = std::filesystem;
std::pair<fs::path, fs::path> splitExistingNonExistingParts(const fs::path& path)
{
fs::path existingPart = path;
while (!fs::exists(existingPart)) {
existingPart = existingPart.parent_path();
}
return {existingPart, fs::relative(path, existingPart)};
}
bool isCaseSensitive(const fs::path& path)
{
// NOTE: This function assumes the path exists.
// fs::equivalent will throw if that isn't the case
const fs::path::string_type& native = path.native();
fs::path::string_type upper = native;
upper.back() = std::toupper(native.back(), std::locale());
fs::path::string_type lower = native;;
lower.back() = std::tolower(native.back(), std::locale());
bool exists = fs::exists(upper);
if (exists != fs::exists(lower)) {
// If one exists and the other doesn't, then they
// must reference different files and therefore be
// case-sensitive
return true;
}
// If the two paths don't reference the same file, then
// the filesystem must be case-sensitive
return !fs::equivalent(upper, lower);
}
fs::path toLower(const fs::path& path)
{
const fs::path::string_type& native = path.native();
fs::path::string_type lower;
lower.reserve(native.length());
std::transform(
native.begin(),
native.end(),
std::back_inserter(lower),
[](auto c) { return std::tolower(c, std::locale()); }
);
return lower;
}
fs::path normalizeCase(const fs::path& path)
{
// Normalize the case of a path to lower-case if it is on a
// non-case-sensitive filesystem
fs::path ret;
for (const fs::path& component : path) {
if (!isCaseSensitive(ret / component)) {
ret /= toLower(component);
} else {
ret /= component;
}
}
return ret;
}
fs::path canonicalize(fs::path path)
{
if (path.empty()) {
return path;
}
// Initial pass to deal with .., ., and symlinks in the existing part
path = fs::weakly_canonical(path);
// Figure out if this is absolute or relative by assuming that there
// is a base path component that will always exist (i.e. / on POSIX or
// the drive letter on Windows)
auto [existing, nonExisting] = splitExistingNonExistingParts(path);
if (!existing.empty()) {
existing = fs::canonical(fs::absolute(existing));
} else {
existing = fs::current_path();
}
// Normalize the case of the existing part of the path
existing = normalizeCase(existing);
// Need to deal with case-sensitivity of the part of the path
// that doesn't exist. Assume that part will have the same
// case-sensitivity as the last component of the existing path
if (!isCaseSensitive(existing)) {
path = existing / toLower(nonExisting);
} else {
path = existing / nonExisting;
}
// Call weakly_canonical again to deal with any existing symlinks that were
// hidden by .. components after non-existing path components
fs::path temp;
while ((temp = fs::weakly_canonical(path)) != path) {
path = temp;
}
return path;
}
CodePudding user response:
I compiled this answer from Ted Lyngmo's answer and Miles Budnek's comments.
What you need to do is normalize your paths to remove all .
, ..
, symlinks and similar things that get in the way.
std::filesystem::weakly_canonical
can do most of that, although, you may need to call it multiple times in case it tripped on some not-existent directory that obscured an existing one. (In your example dir2/subdir/../../dir2
would do it.)
You call the function until the result ceases to change.
Before canonizing the path, you will also need to make sure that the path is absolute.
std::filesystem::weakly_canonical
does normally convert a path to absolute path but only if the first part of the original path exists. Otherwise it may not work correctly.
std::filesystem::path normalizePath(const std::filesystem::path &originalPath)
{
using namespace std::filesystem;
path currentPath;
if (originalPath.is_absolute())
currentPath = originalPath;
else
currentPath = std::filesystem::current_path() / originalPath;
while(true)
{
path newPath = weakly_canonical(currentPath);
if (newPath != currentPath)
currentPath = newPath;
else
break;
}
return currentPath;
}
When this is done, you can just compare paths using the operator ==
.