I have a list of text files I am reading in like this from a folder test
like this:
file_list="$(ls ~/Desktop/test |
while read path; do basename "$path"; done)"
This will produce a list of these files:
test_1.txt
test_2.txt
I want to change particular strings in the name, specifically test
to this
so the list would then have files like this:
this_1.txt
this_2.txt
I would like to do this directly in file_list
I don't want to do it on the actual files in the folder on the computer.
Is looping through one by one the most efficient way to do this?
CodePudding user response:
Is looping through one by one the most efficient way to [perform substitutions on the filenames]?
No, nor is it the most efficient way to to extract the base names. Nor, for that matter, is it wise to parse the output of ls
, though this is a relatively benign case. If you want to massage a list of filenames then passing the whole list through one sed
or awk
process is a better approach. For example:
file_list="$(
find ~/Desktop/test -mindepth 1 -maxdepth 1 -not -name '.*' |
sed 's,^.*/,,; s,^test,this,'
)"
That find
command outputs paths to the non-dotfiles in the specified directory, one per line, much as ls
would do. sed
then attempts two substitutions on each one: the first removes everything up to and including the last /
character, ala basename
, and the second substitutes this
for test
where the latter appears at the beginning of what's left of the line.
Note also that this approach, like your original one, will have issues with filenames containing newlines. It doesn't have an inherent issue with file names containing other whitespace, but you will have trouble interpreting the results correctly if any of the file names contain whitespace.
CodePudding user response:
Solved here: https://unix.stackexchange.com/questions/36795/find-sed-search-and-replace
You can do it one line, using find with -exec and multiple sed commands separated by ;
:
find . -exec sed -i '' 's/\([^/.]*\)\..*/\1/g;s?users/uname?gs://uname?g' {}
First sed command up to s/\([^\.]*\)\..*/\1/g
removes everyting after first .
Second sed command s?users/uname?gs://uname?g
does substitution
Parsing ls
output is bad practice.
CodePudding user response:
You don't need either loops or external commands (like basename
, find
, and sed
). Try this Shellcheck-clean code:
#! /bin/bash -p
shopt -s nullglob
files=( ~/Desktop/test/* )
bases=( "${files[@]##*/}" )
this_list="${bases[*]//test/this}"
declare -p this_list
shopt -s nullglob
makes globs expand to nothing when no files match a pattern. Without it globs expand to (what amounts to) garbage when nothing matches.files=( ~/Desktop/test/* )
populates an array calledfiles
with the paths to all the files (and directories) in the~/Desktop/test
directory ((~/Desktop/test/test_1.txt ...)
). Note that files whose names begin with a dot (.
) are excluded. They can be included by runningshopt -s dotglob
earlier in the program.bases=( "${files[@]##*/}" )
populates thebases
array with the basenames of the files in thefiles
array (( test_1.txt ... )
). See Parameter expansion [Bash Hackers Wiki] for information about what the##
is doing.- If you wanted to remove the
.txt
extensions, as suggested in one of the comments, you could add an extra stage to the process:stems=( "${bases[@]%.txt}" )
. It's not possible to do multiple string operations (e.g.##
and%
) at once in Bash. this_list="${bases[*]//test/this}"
populates thethis_list
string with all the entries inbases
with all occurrences oftest
in each of them replaced bythis
("this_1.txt ..."
). Again, see Parameter expansion [Bash Hackers Wiki] for details of how this works. The entries in the list are separated by spaces. The entries in the list in the question were separated by newlines. You can do that forthis_list
by settingIFS=$'\n'
before doing thethis_list=...
assignment. See Modify IFS in bash while building and array, What is the exact meaning of IFS=$'\n'?, and Is it a sane approach to "back up" the $IFS variable?. The first character in the value ofIFS
is used to separate array elements when converting an array to a string with"${arrayname[*]}"
.declare -p this_list
shows the contents ofthis_list
in an unambiguous way.
A few general points:
- Never use
ls
in programs. It's for interactive use only. You might get away with using it in programs sometimes, but it will eventually bite you hard. See Why you shouldn't parse the output of ls(1) and Why not parse 'ls' (and what do to instead)?. - Avoid putting lists of files in strings. Use arrays instead. File paths can contain any character that a string can hold (neither can have have the NUL character). As a result, there is no safe character, or combination of characters, that can be safely used to separate arbitrary file paths in a string. The problem can be overcome by quoting the file paths in various ways, but that introduces more problems.
- The "most efficient" way to do this depends on the number of files that need to be processed (among other things). The code in this answer runs in 0.2s against a directory containing 10 thousand files under Cygwin (which is generally much slower than Linux) on a low-end machine. That would be good enough for me. Bash is generally slow though, and the sorting done as part of glob expansion can be very slow when there are huge numbers of files. If you've got hundreds of thousands of files the pure Bash code might become unusable. A combination of
find
andsed
should be able to handle much larger numbers of files, but Bash might struggle to handle the resulting huge strings (or arrays) anyway.