Home > Software engineering >  Version of std::filesystem::equivalent for non-existing files
Version of std::filesystem::equivalent for non-existing files

Time:06-20

My program is supposed to create two files with user-specified paths. I need to know if the paths lead to the same location, to end with an error before I start changing the filesystem.

Because the paths come from the user, they are expected to be non-canonical and weird. For example they could be ./dir1/subdir/file and dir2/subdir/../subdir/file where dir2 is a symlink to dir1 and subdir doesn't exist yet. The expected result is still true, they are equivalent.

The std::filesystem::equivalent works only on files that already exist. Is there any similar function without this limitation?

CodePudding user response:

I would use std::filesystem::absolute and then std::filesystem::weekly_canonical on the result.

namespace fs = std::filesystem;

auto fullpath1 fs::weakly_canonical(fs::absolute(path1));
auto fullpath2 fs::weakly_canonical(fs::absolute(path2));

if(fullpath1 == fullpath2) {
    //
}

Demo

Note: For std::filesystem::absolute, implementations are encouraged to not consider a non-existing path to be an error, but implementations may still do. It works in the most current releases of g , clang and MSVC though.

CodePudding user response:

This is a surprisingly difficult problem to solve, and no single standard library function will do it.

There are several cases that you need to worry about:

  • Relative paths with an initial ./
  • Bare relative paths without a initial ./
  • Symlinks in the "non-existing" part of a path
  • Case-sensitivity of different filesystems
  • Almost certainly more that I didn't think of

std::filesystem::weakly_canonical will get you part of the way there, but it won't quite get there by itself. For instance, it doesn't address cases when a bare relative path doesn't exist (i.e. foo won't canonicalize to the same thing as ./foo) and it doesn't even try to address case-sensitivity.

Here's a canonicalize function that will take all of that into account. It still has some shortcomings, mainly around non-ASCII characters (i.e. the case-normalization doesn't work for 'É'), but it should work in most cases:

namespace fs = std::filesystem;

std::pair<fs::path, fs::path> splitExistingNonExistingParts(const fs::path& path)
{
    fs::path existingPart = path;
    while (!fs::exists(existingPart)) {
        existingPart = existingPart.parent_path();
    }
    return {existingPart, fs::relative(path, existingPart)};
}

bool isCaseSensitive(const fs::path& path)
{
    // NOTE: This function assumes the path exists.
    //       fs::equivalent will throw if that isn't the case

    const fs::path::string_type& native = path.native();
    fs::path::string_type upper = native;
    upper.back() = std::toupper(native.back(), std::locale());
    fs::path::string_type lower = native;;
    lower.back() = std::tolower(native.back(), std::locale());

    bool exists = fs::exists(upper);
    if (exists != fs::exists(lower)) {
        // If one exists and the other doesn't, then they
        // must reference different files and therefore be
        // case-sensitive
        return true;
    }

    // If the two paths don't reference the same file, then
    // the filesystem must be case-sensitive
    return !fs::equivalent(upper, lower);
}

fs::path toLower(const fs::path& path)
{
    const fs::path::string_type& native = path.native();
    fs::path::string_type lower;
    lower.reserve(native.length());
    std::transform(
        native.begin(),
        native.end(),
        std::back_inserter(lower),
        [](auto c) { return std::tolower(c, std::locale()); }
    );
    return lower;
}

fs::path normalizeCase(const fs::path& path)
{
    // Normalize the case of a path to lower-case if it is on a
    // non-case-sensitive filesystem

    fs::path ret;
    for (const fs::path& component : path) {
        if (!isCaseSensitive(ret / component)) {
            ret /= toLower(component);
        } else {
            ret /= component;
        }
    }
    return ret;
}

fs::path canonicalize(fs::path path)
{
    if (path.empty()) {
        return path;
    }

    // Initial pass to deal with .., ., and symlinks in the existing part
    path = fs::weakly_canonical(path);

    // Figure out if this is absolute or relative by assuming that there
    // is a base path component that will always exist (i.e. / on POSIX or
    // the drive letter on Windows)
    auto [existing, nonExisting] = splitExistingNonExistingParts(path);
    if (!existing.empty()) {
        existing = fs::canonical(fs::absolute(existing));
    } else {
        existing = fs::current_path();
    }

    // Normalize the case of the existing part of the path
    existing = normalizeCase(existing);

    // Need to deal with case-sensitivity of the part of the path
    // that doesn't exist.  Assume that part will have the same
    // case-sensitivity as the last component of the existing path
    if (!isCaseSensitive(existing)) {
        path = existing / toLower(nonExisting);
    } else {
        path = existing / nonExisting;
    }

    // Call weakly_canonical again to deal with any existing symlinks that were
    // hidden by .. components after non-existing path components
    fs::path temp;
    while ((temp = fs::weakly_canonical(path)) != path) {
        path = temp;
    }
    return path;
}

CodePudding user response:

I compiled this answer from Ted Lyngmo's answer and Miles Budnek's comments.

What you need to do is normalize your paths to remove all ., .., symlinks and similar things that get in the way.

std::filesystem::weakly_canonical can do most of that, although, you may need to call it multiple times in case it tripped on some not-existent directory that obscured an existing one. (In your example dir2/subdir/../../dir2 would do it.) You call the function until the result ceases to change.

Before canonizing the path, you will also need to make sure that the path is absolute. std::filesystem::weakly_canonical does normally convert a path to absolute path but only if the first part of the original path exists. Otherwise it may not work correctly.

std::filesystem::path normalizePath(const std::filesystem::path &originalPath)
{
    using namespace std::filesystem;
    path currentPath;
    if (originalPath.is_absolute())
        currentPath = originalPath;
    else
        currentPath = std::filesystem::current_path() / originalPath;
    while(true)
    {
        path newPath = weakly_canonical(currentPath);
        if (newPath != currentPath)
            currentPath = newPath;
        else
            break;
    }
    return currentPath;
}

When this is done, you can just compare paths using the operator ==.

Demo

  • Related