I'm trying to use comm
to get files on a folder A that is not on B and vice-versa:
comm -3 <(find /Users/rob/A -type f -exec basename {} ';' | sort) <(find "/Users/rob/B" -type f -exec basename {} ';' | sort)
I'm using basename {} ';'
to exclude the directory path, but this is the output I get:
IMG_5591.JPG
IMG_5591.jpeg
IMG_5592.JPG
IMG_5592.jpeg
IMG_5593.JPG
IMG_5593.jpeg
IMG_5594.JPG
IMG_5594.jpeg
There's a tab in the name of the first directory, therefore all entries are considered different. What am I doing wrong?
CodePudding user response:
The leading tabs are not being generated by the find|basename
code; the leading tabs are being generated by comm
...
comm
generates 1 to 3 columns of output depending on the input flags; 2nd column of output will have a leading tab while 3rd column of output will have 2 leading tabs.
In this case OP's code says to ignore column #3 (-3
, the files in common between the 2 sources), so comm
generates 2 columns of output w/ the 2nd column having a leading tab.
One easy fix:
comm --output-delimiter="" <(find...|sort...) <(find...|sort...)
If for some reason your comm
does not support the --output-delimiter
flag:
comm <(find...|sort...) <(find...|sort...) | tr -d '\t'
This assumes the file names do not include embedded tabs otherwise replace the tr
with your favorite code to strip leading white space, eg:
comm <(find...|sort...) <(find...|sort...) | sed 's/^[[:space:]]*//'
Demo ...
$ cat file1
a.txt
b.txt
$ cat file2
b.txt
c.txt
$ comm file1 file2
a.txt
b.txt
c.txt
# 2x tabs (\t) before 'b.txt' (3rd column), 1x tab (\t) before 'c.txt' (2nd column):
$ comm file1 file2 | od -c
0000000 a . t x t \n \t \t b . t x t \n \t c
0000020 . t x t \n
# OP's scenario:
$ comm -3 file1 file2
a.txt
c.txt
# 1x tab (\t) before 'c.txt' (2nd column):
$ comm -3 file1 file2 | od -c
0000000 a . t x t \n \t c . t x t \n
Removing the leading tabs:
$ comm --output-delimiter="" -3 file1 file2
a.txt
c.txt
$ comm -3 file1 file2 | tr -d '\t'
a.txt
c.txt
$ comm -3 file1 file2 | sed 's/^[[:space:]]*//'
a.txt
c.txt
CodePudding user response:
If basename
causes issues, you can use find's printf :
#!/bin/bash
find_basename(){
find "$1" -type f -printf "%P\n" | sort
}
comm -3 <(find_basename /Users/rob/A) <(find_basename /Users/rob/B)