Home > Enterprise >  Get the last occurrence of regex pattern match using Bash
Get the last occurrence of regex pattern match using Bash

Time:09-27

I am trying to get the width and height from the names of files in a shell script and create matching directories. The width and height contained in the filename could be 3 or 4 digits long.

I don't care what is in between the two sets of digits, but they typically follow these examples:

111x111.html
222x2222.html
3333x333.html
444-x-444.html
555-x-5555.html
6666-x-666.html

My current script can get the width easy, but not the height. (I left in some attempts I tried at grep/sed, that either return the full filename or nothing at all.)

for htmlfile in *.html; do
  width=$(echo "$htmlfile" | sed -E "s/([0-9]*).*/\\1/")


  #height=$(echo "$htmlfile" | sed -E "s/\(.*\)([0-9]*)\.html/\\1/")
  # returned blank
  #height=$(echo "$htmlfile" | grep -oP '^[0-9]*\K\[0-9]*(?=\.html)')
  # returned full filename
  #height=$(grep -E "[0-9]{3,4}" "$htmlfile" | tail -n 1)
  # returned blank


  echo $width
  echo $height
  # ^^^ I need these to use in the script for later, not just print to screen
done

CodePudding user response:

Would you please try the following:

#!/bin/bash

for htmlfile in *.html; do
    if [[ $htmlfile =~ ([0-9] )[^0-9] ([0-9] )\.html ]]; then
        echo "width = ${BASH_REMATCH[1]} height = ${BASH_REMATCH[2]}"
    fi
done

Result with provided example files:

width = 111 height = 111
width = 222 height = 2222
width = 3333 height = 333
width = 444 height = 444
width = 555 height = 5555
width = 6666 height = 666

CodePudding user response:

Using sed

height=$(sed -E 's/[^x]*[^0-9]*([0-9]*).*/\1/' <<< "$html")

CodePudding user response:

Using any awk (another mandatory POSIX tool just like grep and sed):

$ awk 'BEGIN{for (i=1; i<ARGC; i  ) { n=split(ARGV[i],f,/[^0-9] /); print f[1], f[n-1] } exit}' *.html
111 111
222 2222
3333 333
444 444
555 5555
6666 666

If you need the dimensions and file names to use in a shell loop then it'd be:

$ while read -r width height fname; do
    echo "width=$width, height=$height, fname=$fname"
done < <(
    awk 'BEGIN {
        for (i=1; i<ARGC; i  ) {
            n = split(ARGV[i],f,/[^0-9] /)
            print f[1], f[n-1], ARGV[i]
        }
        exit
    }' *.html
)
width=111, height=111, fname=111x111.html
width=222, height=2222, fname=222x2222.html
width=3333, height=333, fname=3333x333.html
width=444, height=444, fname=444-x-444.html
width=555, height=5555, fname=555-x-5555.html
width=6666, height=666, fname=6666-x-666.html

If that's not all you need then edit your question to clarify your requirements.

  • Related