Home > Back-end >  Comparing two directories and then removing mismatches (Python)
Comparing two directories and then removing mismatches (Python)

Time:03-12

Hello stackoverflow community! Help, already broke his head how to implement.

There are, for example, folders: 'D:\left' and 'C:\right'.

They contain the contents: files, directories with files, subdirectories, subdirectories with files. Most of the content is the same, but there may be 'extra' content in 'C:\right' (not matching the content of 'D:\left').

How can I compare the content (what is in) 'С:\right', what is not in 'D:\left' and after that (extra in 'С:\right') delete it so that the folders 'D:\left' and ' C:\right' became identical (in our case, we do not look at the size, time, etc. - purely by the names of their contents).

Tried like this to remove the excess:

difs = list(set(os.listdir('C:\right')) - set(os.listdir('D:\left')))

But this is not enough, because it does not propagate the effect to subdirectories.

Also like this:

from dirsync import sync
sync('D:\left', 'C:\right', 'diff')

But, there I am only interested in a small part of the output, and how exactly to put this output under deletion is simply not clear to me.

Delete everything from 'C:\right' to copy from 0 to 'D:\left' to 'C:\right' is not a solution.

I'm pretty sure the solution is fixated on:

os.walk

But I just can't line it up right :(

Many thanks in advance for any help and I apologize for the stupidity.

I'm attaching screenshots for clarity

Entrance: Entrance Entrance2

Desired result after running the program: Result Result2

CodePudding user response:

You can use Path.rglob:

from pathlib import Path

pl = Path(path/to/left)
pr = Path(path/to/right)

difference = (set(map(lambda p: p.relative_to(pr), pr.rglob('*'))) -
              set(map(lambda p: p.relative_to(pl), pl.rglob('*'))))

Here is an example:

right
  file1
  file5
  dir1
    file2
    file6
  dir2
    file3
    file7
    subdir1
      file4
      file8
    subdir2
      file9
    subdir3

left
  file1
  dir1
    file2
  dir2
    file3
    subdir1
      file4
>>> difference
{PosixPath('dir1/file6'),
 PosixPath('file5'),
 PosixPath('dir2/subdir3'),
 PosixPath('dir2/subdir2'),
 PosixPath('dir2/subdir1/file8'),
 PosixPath('dir2/subdir2/file9'),
 PosixPath('dir2/file7')}

Now you just need to delete all files and directories in difference.

  • Related