Problem
I am trying to archive and compress some directories (and their contents) on a GNU/Linux machine and have the original directories (and their contents) removed afterwards.
Minimum reproducable example
Here some code to recreate the situation on a GNU/Linux machine:
cd /tmp
mkdir find_and_tar
cd find_and_tar
mkdir files
mkdir texts
touch files/file1 files/file2 files/file3
touch texts/text1
tree
should now give the following:
.
├── files
│ ├── file1
│ ├── file2
│ └── file3
└── texts
└── text1
What I've tried so far
Now, my command to achieve the stated goal thus far is:
find . -mindepth 1 -type d -exec tar --remove-files -cJf {}.tar.xz {} \;
It does what it is supposed to do - tree
now gives:
.
├── files.tar.xz
└── texts.tar.xz
BUT the command throws the following warnings:
find: ‘./texts’: No such file or directory
find: ‘./files’: No such file or directory
If I were to remove the --remove-files
modifier, the warnings disappear but obviously the original dirs stay around.
Question(s)
- Why do these
find
warnings appear? - How do I avoid them?
Version info
$ tar --version
tar (GNU tar) 1.30
$ find --version
find (GNU findutils) 4.6.0.225-235f
CodePudding user response:
Your problem is that find
is still processing the files in the tree during the time that the tar
is running.
When you only need to process directories at the top level, your -maxdepth 1
will work. Two alternatives:
Use find
option depth
for looking in the subdirs first
This might be useful when you need to find directories in different levels:
find . -mindepth 1 -depth -type d -exec tar --remove-files -cJf {}.tar.xz {} \;
Avoid find
for d in */; do
tar --remove-files -cJf "${d%/}".tar.xz "${d%/}"
done
CodePudding user response:
Analysis
Adding the -D all
option to the find
call to get more insight into what's going on, gave this (amongst others):
Optimized command line:
( -mindepth 1 [est success rate 1] [real success rate 0/0=_] -a [est success rate 0.0922] [real success rate 0/0=_] [need type] -type d [est success rate 0.0922] [real success rate 0/0=_] ) -a [est success rate 0.0922] [real success rate 0/0=_] -exec tar [est success rate 1] [real success rate 0/0=_]
consider_visiting (early): ‘.’: fts_info=FTS_D , fts_level= 0, prev_depth=-2147483648 fts_path=‘.’, fts_accpath=‘.’
consider_visiting (late): ‘.’: fts_info=FTS_D , isdir=1 ignore=1 have_stat=1 have_type=1
consider_visiting (early): ‘./texts’: fts_info=FTS_D , fts_level= 1, prev_depth=0 fts_path=‘./texts’, fts_accpath=‘texts’
consider_visiting (late): ‘./texts’: fts_info=FTS_D , isdir=1 ignore=0 have_stat=1 have_type=1
consider_visiting (early): ‘./texts’: fts_info=FTS_DNR, fts_level= 1, prev_depth=1 fts_path=‘./texts’, fts_accpath=‘texts’
find: ‘./texts’: No such file or directory
consider_visiting (early): ‘./files’: fts_info=FTS_D , fts_level= 1, prev_depth=1 fts_path=‘./files’, fts_accpath=‘files’
consider_visiting (late): ‘./files’: fts_info=FTS_D , isdir=1 ignore=0 have_stat=1 have_type=1
consider_visiting (early): ‘./files’: fts_info=FTS_DNR, fts_level= 1, prev_depth=1 fts_path=‘./files’, fts_accpath=‘files’
find: ‘./files’: No such file or directory
consider_visiting (early): ‘.’: fts_info=FTS_DP, fts_level= 0, prev_depth=1 fts_path=‘.’, fts_accpath=‘.’
consider_visiting (late): ‘.’: fts_info=FTS_DP, isdir=1 ignore=1 have_stat=1 have_type=1
Please note that the second visit in e.g. ./texts
is only different to the previous visit in the prev_depth
value. I figured that find
tries to recurse the whole directory tree even after tar
has already removed the top level directory.
One can see the impact of this recursion by slightly adapting the scenario:
- adding a
subfiles
dir tofiles
- and running the
find
call without--remove-files
and-maxdepth
This will lead to the subfiles
dir being archived separately within the unarchived files
dir.
Solution
Since I know that I want to archive the top level directories, adding the -maxdepth 1
option to my find
call solved the problem for me.