I would like to find all files not a hard link or under a hard link directory. I found this awesome SO but below command do not handle the case under hard link directory!
find /1 -type f -links 1 -print
for example:
/1/2/3/test.txt
/1/A/3/test.txt
2 is hard link to A, then we only expect find one test.txt file.
One more example from android:
$ adb shell ls -li /data/data/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/user/0/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/data/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/data/com.android.nfc/files/service_state.xml
$ adb shell ls -li /data/user/0/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/user/0/com.android.nfc/files/service_state.xml
CodePudding user response:
Systems that support unrestricted hard links to directories are rare, but a similar situation can be created using bind mounts. (See What is a bind mount?.)
Try this Shellcheck-clean code to list files under the current directory that do not have multiple paths (caused by bind mounts or links to directories):
#! /bin/bash -p
shopt -s lastpipe
declare -A devino_of_file
declare -A count_of_devino
find . -type f -printf '%D.%i-%p\0' \
| while IFS= read -r -d '' devino_path; do
devino=${devino_path%%-*}
path=${devino_path#*-}
devino_of_file[$path]=$devino
count_of_devino[$devino]=$(( ${count_of_devino[$devino]-0} 1 ))
done
for path in "${!devino_of_file[@]}"; do
devino=${devino_of_file[$path]}
(( ${count_of_devino[$devino]} == 1 )) && printf '%s\n' "$path"
done
shopt -s lastpipe
ensures that variables set in thewhile
loop in the pipeline persist after the pipeline completes. It requires Bash 4.2 (released in 2011) or later.- The code uses "devino" values. The devino value for a path consists of the device number and inode number for the path, separated by a
.
character. A devino string should uniquely identify a file on a system, independent of any path to it. - The
devino_of_file
associative array maps paths to the corresponding devino values. - The
count_of_devino
associative array maps devino strings to counts of the number of paths found to them. - See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of
while IFS= read -r -d '' ...
. - When all files in the directory tree have been processed, all paths whose devino value have a count of 1 (meaning that no other path has been found to the same file) are printed.
- The code that populates the associative arrays can handle arbitrary paths (including ones that contain spaces or newlines) but the output will be useless if any of the paths contain newlines (because of the
'%s\n'
format string). - Alternative paths caused by symlinks are automatically avoided because
find
doesn't follow symlinks by default. The code should still work if the-follow
option tofind
is used though. (It's easier to test with symlinks than with directory hardlinks or bind mounts.)
Note that Bash code runs very slowly. It is interpreted in a very laborious way. The code above is likely to be too slow if the directory tree being processed has large numbers of files. For example, it processes files at a rate of around 10 thousand per second on my test VM.
CodePudding user response:
Forgive the humor in the comment, but *I don't think you understand your question."
What I mean by that is that when you create a file, it's a link.
$: date > file1
$: ls -l file1 # note the 2nd field - the "number of hard links"
-rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
You think of file1
as the file, but it's ...complicated, lol.
The date
command above creates output. The redirection tells "the system" that you want that data in "a file", so it allocates space on the disk, writes the data to that space, and creates an inode
that defines the "file".
A "hard link" is basically just a link to that data. It's the same "file" with another name if you make another link. Editing either edits both (all, if you make several), because they are the same file.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file1
Mon Jun 13 17:30:22 GMT 2022
$: date >file2
$: diff file?
$: cat file1
Mon Jun 13 17:31:06 GMT 2022
Now, a symlink is another file of another kind with a different inode, containing the name of the file it "links" to symbolically, but a hard link is the file. ls -i
will show you the inode index number, in fact.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file2
Mon Jun 13 17:34:41 GMT 2022
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file1
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file2
$: rm file2
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
Let's make a different file with that name and compare again.
$: date >file2
$: cat file? # not linked now
Mon Jun 13 17:34:41 GMT 2022
Mon Jun 13 17:41:23 GMT 2022
$: diff file? # now they differ
1c1
< Mon Jun 13 17:34:41 GMT 2022
---
> Mon Jun 13 17:41:23 GMT 2022
$: ls -li file? # and have different inodes, one link each
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
If I cad copied the original data the diff
would have been empty, but it would still be a different inode, so a different file, and I could have edited them independently.
And a symlink -
$: ln -s file1 file3
$: diff file1 file3
$: ls -li file?
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
24419696 lrwxrwxrwx. 1 P2759474 518 5 Jun 13 17:44 file3 -> file1
Opening a symlink will usually open the file it targets, but it might depend on what tool you are using... be aware of the differences
You cannot create a hard link to a file on a separate filesystem, because it doesn't work that way. You can use a symlink.
What you might be looking for is
for f in *; [[ -f "$f" ]] && echo "$f"; done
or something like that.
Hope that helps.
CodePudding user response:
From comments on the previous edit of this answer, it seems that the duplication is being caused because some files appear in two different places in the filesystem due to bind mounts.
That being the case, the original code you used produces technically correct output. However it is listing some relevant files more than once (because they have multiple names):
find /1 -type f -links 1 -print
A mounted filesystem is uniquely identified by its device number. A file is uniquely identified within that filesystem by its inode number. So a file can be uniquely identified on a particular host by the (device#,inode#) tuple. (GNU) find
can provide these tuples along with filenames, as @pjh's answer shows:
find /1 -type f -links 1 -printf '%D.%i %p\0'
A simple (GNU) awk
script can filter the output so that only one path is listed for each unique (device#,inode#):
find /1 -type f -links 1 -printf '%D.%i %p\0' |
gawk -v RS='\0' '!id[$1] && sub(/^[0-9.] /,"")'
This uses the common awk
idiom !x[y]
which evaluates to true only when the element y
is inserted into the array x
(it is inserted with value 0 the first time y
is seen and the value is incremented thereafter; !0 is true).
The (device#,inode#) prefix is deleted by sub()
.
awk
implicitly prints processed records if the "pattern" evaluates to true. ie. when a (device#,inode#) tuple is first seen and the prefix is successfully stripped. The (GNU) find
output is delimited by nulls rather than newline, so the (GNU) awk
script sets the input record separator RS
to null also.