How to get files in directory A but not B and vice versa using bash comm?-CodePudding

I'm trying to use comm to get files on a folder A that is not on B and vice-versa:

comm -3 <(find /Users/rob/A -type f -exec basename {} ';' | sort) <(find "/Users/rob/B" -type f -exec basename {} ';' | sort)

I'm using basename {} ';' to exclude the directory path, but this is the output I get:

    IMG_5591.JPG
IMG_5591.jpeg
    IMG_5592.JPG
IMG_5592.jpeg
    IMG_5593.JPG
IMG_5593.jpeg
    IMG_5594.JPG
IMG_5594.jpeg

There's a tab in the name of the first directory, therefore all entries are considered different. What am I doing wrong?

CodePudding user response：

The leading tabs are not being generated by the find|basename code; the leading tabs are being generated by comm ...

comm generates 1 to 3 columns of output depending on the input flags; 2nd column of output will have a leading tab while 3rd column of output will have 2 leading tabs.

In this case OP's code says to ignore column #3 (-3, the files in common between the 2 sources), so comm generates 2 columns of output w/ the 2nd column having a leading tab.

One easy fix:

comm --output-delimiter="" <(find...|sort...) <(find...|sort...)

If for some reason your comm does not support the --output-delimiter flag:

comm <(find...|sort...) <(find...|sort...) | tr -d '\t'

This assumes the file names do not include embedded tabs otherwise replace the tr with your favorite code to strip leading white space, eg:

comm <(find...|sort...) <(find...|sort...) | sed 's/^[[:space:]]*//'

Demo ...

$ cat file1
a.txt
b.txt

$ cat file2
b.txt
c.txt

$ comm file1 file2
a.txt
                b.txt
        c.txt

# 2x tabs (\t) before 'b.txt' (3rd column), 1x tab (\t) before 'c.txt' (2nd column):

$ comm file1 file2 | od -c
0000000   a   .   t   x   t  \n  \t  \t   b   .   t   x   t  \n  \t   c
0000020   .   t   x   t  \n

# OP's scenario:

$ comm -3 file1 file2
a.txt
        c.txt

# 1x tab (\t) before 'c.txt' (2nd column):

$ comm -3 file1 file2 | od -c
0000000   a   .   t   x   t  \n  \t   c   .   t   x   t  \n

Removing the leading tabs:

$ comm --output-delimiter="" -3 file1 file2
a.txt
c.txt

$ comm -3 file1 file2 | tr -d '\t'
a.txt
c.txt

$ comm -3 file1 file2 | sed 's/^[[:space:]]*//'
a.txt
c.txt

CodePudding user response：

If basename causes issues, you can use find's printf :

#!/bin/bash
    
find_basename(){
    find "$1" -type f -printf "%P\n" | sort
}

comm -3 <(find_basename /Users/rob/A) <(find_basename /Users/rob/B)