Home > database >  Replace parts of string in filename bash
Replace parts of string in filename bash

Time:03-05

I have a list of text files I am reading in like this from a folder test like this:

file_list="$(ls ~/Desktop/test | 
while read path; do basename "$path"; done)"

This will produce a list of these files:

test_1.txt test_2.txt

I want to change particular strings in the name, specifically test to this so the list would then have files like this:

this_1.txt this_2.txt

I would like to do this directly in file_list I don't want to do it on the actual files in the folder on the computer.

Is looping through one by one the most efficient way to do this?

CodePudding user response:

Is looping through one by one the most efficient way to [perform substitutions on the filenames]?

No, nor is it the most efficient way to to extract the base names. Nor, for that matter, is it wise to parse the output of ls, though this is a relatively benign case. If you want to massage a list of filenames then passing the whole list through one sed or awk process is a better approach. For example:

file_list="$(
  find ~/Desktop/test -mindepth 1 -maxdepth 1 -not -name '.*' | 
    sed 's,^.*/,,; s,^test,this,'
)"

That find command outputs paths to the non-dotfiles in the specified directory, one per line, much as ls would do. sed then attempts two substitutions on each one: the first removes everything up to and including the last / character, ala basename, and the second substitutes this for test where the latter appears at the beginning of what's left of the line.

Note also that this approach, like your original one, will have issues with filenames containing newlines. It doesn't have an inherent issue with file names containing other whitespace, but you will have trouble interpreting the results correctly if any of the file names contain whitespace.

CodePudding user response:

Solved here: https://unix.stackexchange.com/questions/36795/find-sed-search-and-replace

You can do it one line, using find with -exec and multiple sed commands separated by ;:

find . -exec sed -i '' 's/\([^/.]*\)\..*/\1/g;s?users/uname?gs://uname?g' {}  

First sed command up to s/\([^\.]*\)\..*/\1/g removes everyting after first .

Second sed command s?users/uname?gs://uname?g does substitution

Parsing ls output is bad practice.

CodePudding user response:

You don't need either loops or external commands (like basename, find, and sed). Try this Shellcheck-clean code:

#! /bin/bash -p

shopt -s nullglob

files=( ~/Desktop/test/* )
bases=( "${files[@]##*/}" )
this_list="${bases[*]//test/this}"

declare -p this_list
  • shopt -s nullglob makes globs expand to nothing when no files match a pattern. Without it globs expand to (what amounts to) garbage when nothing matches.
  • files=( ~/Desktop/test/* ) populates an array called files with the paths to all the files (and directories) in the ~/Desktop/test directory ( (~/Desktop/test/test_1.txt ...) ). Note that files whose names begin with a dot (.) are excluded. They can be included by running shopt -s dotglob earlier in the program.
  • bases=( "${files[@]##*/}" ) populates the bases array with the basenames of the files in the files array ( ( test_1.txt ... ) ). See Parameter expansion [Bash Hackers Wiki] for information about what the ## is doing.
  • If you wanted to remove the .txt extensions, as suggested in one of the comments, you could add an extra stage to the process: stems=( "${bases[@]%.txt}" ). It's not possible to do multiple string operations (e.g. ## and %) at once in Bash.
  • this_list="${bases[*]//test/this}" populates the this_list string with all the entries in bases with all occurrences of test in each of them replaced by this ( "this_1.txt ..." ). Again, see Parameter expansion [Bash Hackers Wiki] for details of how this works. The entries in the list are separated by spaces. The entries in the list in the question were separated by newlines. You can do that for this_list by setting IFS=$'\n' before doing the this_list=... assignment. See Modify IFS in bash while building and array, What is the exact meaning of IFS=$'\n'?, and Is it a sane approach to "back up" the $IFS variable?. The first character in the value of IFS is used to separate array elements when converting an array to a string with "${arrayname[*]}".
  • declare -p this_list shows the contents of this_list in an unambiguous way.

A few general points:

  • Never use ls in programs. It's for interactive use only. You might get away with using it in programs sometimes, but it will eventually bite you hard. See Why you shouldn't parse the output of ls(1) and Why not parse 'ls' (and what do to instead)?.
  • Avoid putting lists of files in strings. Use arrays instead. File paths can contain any character that a string can hold (neither can have have the NUL character). As a result, there is no safe character, or combination of characters, that can be safely used to separate arbitrary file paths in a string. The problem can be overcome by quoting the file paths in various ways, but that introduces more problems.
  • The "most efficient" way to do this depends on the number of files that need to be processed (among other things). The code in this answer runs in 0.2s against a directory containing 10 thousand files under Cygwin (which is generally much slower than Linux) on a low-end machine. That would be good enough for me. Bash is generally slow though, and the sorting done as part of glob expansion can be very slow when there are huge numbers of files. If you've got hundreds of thousands of files the pure Bash code might become unusable. A combination of find and sed should be able to handle much larger numbers of files, but Bash might struggle to handle the resulting huge strings (or arrays) anyway.
  •  Tags:  
  • bash
  • Related