How to find specific text in a text file, and append it to the filename?-CodePudding

I have a collection of plain text files which are named as yymmdd_nnnnnnnnnn.txt, which I want to append another number sequence to the filenames, so that they each become named as yymmdd_nnnnnnnnnn_iiiiiiiii.txt instead, where the iiiiiiiii is taken from the one line in each file which contains the text "GST: 123456789⏎" (or similar) at the end of the line. While I am sure that there will only be one such matching line within each file, I don't know exactly which line it will be on.

I need an elegant one-liner solution that I can run over the collection of files in a folder, from a bash script file, to rename each file in the collection by appending the specific GST number for each filename, as found within the files themselves.

Before even getting to the renaming stage, I have encountered a problem with this. Here is what I tried, which didn't work...

# awk '/\d $/' | grep -E 'GST: ' 150101_2224567890.txt

The grep command alone works perfectly to find the relevant line within the file, but the awk doesn't return just the final digits group. It fails with the error "warning: regexp escape sequence \d is not a known regexp operator". I had assumed that this regex should return any number of digits which are at the end of the line. The text file in question contains a line which ends with "GST: 112060340⏎". Can someone please show me how to make this work, and maybe also to help with the appropriate coding to move the collection of files to the new filenames? Thanks.

Thanks to a comment from @Renaud, I now have the following code working to obtain just the GST registration number from within a text file, which puts me a step closer towards a workable solution.

awk '/GST: / {printf $NF}' 150101_2224567890.txt

I still need to loop this over the collection instead of just specifying one filename. I also need to be able to use the output from @Renaud's contribution, to rename the files. I'm getting closer to a working solution, thanks!

CodePudding user response：

This awk should work for you:

awk '$1=="GST:" {fn=FILENAME; sub(/\.txt$/, "", fn); print "mv", FILENAME, fn "_" $2 ".txt"; nextfile}' *_*.txt | sh

To make it more readable:

awk '$1 == "GST:" {
   fn = FILENAME
   sub(/\.txt$/, "", fn)
   print "mv", FILENAME, fn "_" $2 ".txt"
   nextfile
}' *_*.txt | sh

Remove | sh from above to see all mv commands together.

CodePudding user response：

You may try

for f in *_*.txt; do echo mv "$f" "${f%.txt}_$(sed '/.*GST: /!d; s///; q' "$f").txt"; done

Drop the echo if you're satisfied with the output.

CodePudding user response：

As you are sure there is only one matching line, you can try:

$ n=$(awk '/GST:/ {print $NF}' 150101_2224567890.txt)
$ mv 150101_2224567890.txt "150101_2224567890_$n.txt"

Or, for all .txt files:

for f in *.txt; do
  n=$(awk '/GST:/ {print $NF}' "$f")
  if [[ -z "$n" ]]; then
    printf '%s: GST not found\n' "$f"
    continue
  fi
  mv "$f" "$f{%.txt}_$n.txt"
done

CodePudding user response：

Another one-line solution to consider, although perhaps not so elegant.

for original_filename in *_*.txt; do \
new_filename=${original_filename%'.txt'}_$(
    grep -E 'GST: ' "$original_filename" | \
    sed -E 's/.*GST//g; s/[^0-9]//g' 
    )'.txt' && \
mv "$original_filename" "$new_filename"; \
done

Output:

150101_2224567890_123456789.txt