Home > Net >  How do I check multiple folders and delete any files with unique file names?
How do I check multiple folders and delete any files with unique file names?

Time:09-24

I'm capturing images of widgets off of multiple cameras on an inspection system. If the inspection is unsuccessful, the image doesn't get saved. The images are named with the widget's serial number.

So my folder structure might look like

  • Camera1
    • 1.tif
    • 2.tif
    • 4.tif
  • Camera2
    • 2.tif
    • 3.tif
    • 4.tif
  • Camera3
    • 1.tif
    • 2.tif
    • 3.tif
    • 4.tif

I want to be able to delete images that don't have a match in all three folders. I don't mind running the solution twice, once between camera1 and camera2, and then again using camera2 and camera 3.

I'm hoping to only be left with the following folder structure.

  • Camera1
    • 2.tif
    • 4.tif
  • Camera2
    • 2.tif
    • 4.tif
  • Camera3
    • 2.tif
    • 4.tif

There are ~12,000 files in each folder for analysis and probably 2%-3% erroneous which need to be removed to continue analysis.

I don't mind prepackaged solutions requiring payment, python, command line, etc.

Thanks much!

CodePudding user response:

As suggested in the comments, next time you ask something on SO, have a shot at it yourself first, and ask about any problems - you learn more that way.

Here's a start, as suggested the code below creates 3 sets with the contents of the folders, determines the intersection of those three sets, and then removes that intersection from the original sets. The result tells you exactly what files you need to remove in each folder:

from pathlib import Path


def find_unmatched(dirs):
    # list the (file) contents of the folders
    contents = {}
    for d in dirs:
        contents[d] = set(str(n.name) for n in Path(d).glob('*') if n.is_file())

    # decide what the folders have in common
    all_files = list(contents.values())
    common = all_files[0]
    for d_contents in all_files[1:]:
        common = common.intersection(d_contents)

    # create a dictionary that tells you what to remove
    return {d: files - common for d, files in contents.items()}


to_remove = find_unmatched(['photos/Camera1', 'photos/Camera2', 'photos/Camera3'])
print(to_remove)

Result (given the folders in your example sit in a folder called photos):

{'photos/Camera1': {'1.tif'}, 'photos/Camera2': {'3.tif'}, 'photos/Camera3': {'1.tif', '3.tif'}}

Actually removing the files is some code you can probably figure out yourself.

CodePudding user response:

As said before, you should do your own efforts to solve the problem and just ask for help when you get stuck. However, I have some spare time now, so I wrote a complete Batch solution:

@echo off
setlocal EnableDelayedExpansion

rem Process files in Camera1 folder and populate "F" array elements = 1
cd Camera1
for %%a in (*.tif) do set "F[%%~Na]=1"

rem Process files in Camera2 and *accumulate* files to "F" array
cd ..\Camera2
for %%a in (*.tif) do set /A "F[%%~Na] =1"

rem Process files in Camera3 and accumulate files to "F" array
rem if counter == 3 then file is OK: remove "F" element
rem else: delete file
rem       if counter == 1: remove "F" element

cd ..\Camera3
for %%a in (*.tif) do (
   set /A "F[%%~Na] =1"
   if !F[%%~Na]! equ 3 (
      set "F[%%~Na]="
   ) else (
      del %%a
      if !F[%%~Na]! equ 1 set "F[%%~Na]="
   )
)

rem Remove files of "F" array in both Camera1 and Camera2 folders, ignoring error messages
cd ..
(for /F "tokens=2 delims=[]" %%a in ('set F[') do (
   del Camera1\%%a.tif
   del Camera2\%%a.tif
)) 2>nul

Please, report the result...

  • Related