I have a set of directories of my interest that I want to do sed and grep on files under only these directories. What I did is that I put all the paths in txt file, all paths are put between "" because they contain variables,
e.g. filelist.txt contains:
"$ROOT_PATH/test_dir1/sub_path/"
"$ROOT_PATH/test_dir2/sub_path/"
"$ROOT_PATH/test_dir3/sub_path_2/"
now I want to recursively grep on all files under these directories that matches my pattern and change XYZ to ABC and print it out.
cat filelist.txt | xargs ls | grep "file_name_with_pattern" | xargs cat | sed 's/XYZ/ABC/g'
it fails on this beginning part:
cat filelist.txt | xargs ls
ls: cannot access $ROOT_PATH/test_dir1/sub_path/: No such file or directory
ls: cannot access $ROOT_PATH/test_dir2/sub_path/: No such file or directory
ls: cannot access $ROOT_PATH/test_dir3/sub_path_2/: No such file or directory
CodePudding user response:
cat filelist.txt \
| envsubst \
| tr -d '"' \
| xargs -L 1 -I '@' find '@' \
-type f \
-regex '.*file_name_with_pattern/.*' \
-exec sed 's/XYZ/ABC/g' {}
(Separated cat
and envsubst
for readability of answer)
Explanation:
ROOT_PATH
is an environment variable. The syntax $ROOT_PATH
is a shell syntax, which is understood by neither xargs, ls, grep, cat, or sed.
It's usually understood by the shell, which means it works only if the file is read into the shell environment. However in your case it is read into a pipeline and processed without the shell being involved in the processing.
envsubst
is a program that reads standard input, substitutes environment variables based on shell syntax, and outputs to stdout.
Likewise, usage of double quotes ("path"
instead of path
) is also shell syntax, and envsubst doesn't process that, so tr -d '"'
removes the double quotes.
Now, you mentioned that you wanted a recursive grep, meaning not just 1 level down the directory tree, like ls
would do. Therefore, find <directory> -type f
finds all the files in the subtree under <directory>
.
xargs -L 1 -I '@'
will run find
on each of these paths appropriately.
-regex
will filter in only those file paths matching the regex.
-exec ... {}
will run the command, replacing {}
with filenames, so the resulting command will be sed 's/XYZ/ABC/g' file1 file2 ...
.
Example:
$ tree
.
├── [ 100] rp/
│ ├── [ 60] test_dir1/
│ │ └── [ 60] sub_path/
│ │ └── [ 6] 1
│ ├── [ 60] test_dir2/
│ │ └── [ 60] sub_path/
│ │ └── [ 6] 2
│ └── [ 60] test_dir3/
│ └── [ 60] sub_path_2/
│ └── [ 6] 3
└── [ 101] filelist.txt
7 directories, 4 files
$ find rp -type f | xargs tail
==> rp/test_dir3/sub_path_2/3 <==
WXYZ3
==> rp/test_dir2/sub_path/2 <==
WXYZ2
==> rp/test_dir1/sub_path/1 <==
WXYZ1
$ cat filelist.txt
"$ROOT_PATH/test_dir1/sub_path/"
"$ROOT_PATH/test_dir2/sub_path/"
"$ROOT_PATH/test_dir3/sub_path_2/"
$ export ROOT_PATH=./rp
$ cat filelist.txt \
| envsubst \
| tr -d '"' \
| xargs -L 1 -I '@' find '@' \
-type f \
-regex '.*path/.*' \
-exec sed 's/XYZ/ABC/g' {}
WABC1
WABC2