Home > Net >  Extract multiple values from fixed pattern string
Extract multiple values from fixed pattern string

Time:12-28

I want to parse 3 pieces of information from the last bit of wget command output. For example:

2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]

I was able to get the date/time since that is a fixed length. I am unable to extract the est. transfer speed (13/7) and file size (1077022) values.

STR="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
echo date/time is ${STR::19}

I imagine the remaining substring extractions will need to be done with the help of regular expressions, but I am unable to figure it out. Is there a viable path using only *nix utils like awk, sed, etc.?

I tried awk:

echo "(13.7 Mb/s)" | awk '$0 ~ /(.* Mb\/s)/ {print $1}'

But I am getting (13.7 instead of just the number.

CodePudding user response:

You could do this with bash's regular expression matching, using ( ) in the RE to capture the relevant parts and ${BASH_REMATCH[n]} to get them:

str="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"

pattern='([-0-9]  [:0-9] ) \(([^)] )\) .*\[([0-9] )\]'
if [[ "$str" =~ $pattern ]]; then
    echo "date/time is ${BASH_REMATCH[1]}"
    echo "transfer speed is ${BASH_REMATCH[2]}"
    echo "file size is ${BASH_REMATCH[3]}"
else
    echo "The string is not in the expected format"
fi

BTW, I recommend using lower- or mixed-case variable names to avoid conflicts with the many all-caps names with special functions, and running your scripts through shellcheck.net to find common mistakes.

CodePudding user response:

This awk should work for you:

s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
awk -F '[][()[:blank:]] ' '{
  printf "DateTime: %s %s, Speed: %s, Size: %s\n", $1, $2, $3, $(NF-1)
}' <<< "$s"

DateTime: 2022-12-26 19:14:44, Speed: 13.7, Size: 1077022

Breakdown:

  • -F '[][()[:blank:]] ' sets 1 of [ or ] or ( or ) or a whitespace as input field separator

CodePudding user response:

With your shown samples please try following awk code. Written and tested in GNU awk. Here is the Online Demo for used regex.

s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"

awk '
match($0,/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) \(([^)]*).*\[([0-9] )/,arr){
  print "DateTime: "arr[1] ", Speed: " arr[2] ", Size: "arr[3]
}
' <<< "$s"

CodePudding user response:

Using Perl and named capture groups:

s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"

perl -nE '
    say join "\n",
    map { "$_: $ {$_}" }
    keys % 
    if m/
        (?<dateTime>\d{4}-\d{2}-\d{2}\s \d{2}:\d{2}:\d{2})\s \(
        (?<speed>\d \.\d ).*\[
        (?<size>\d )
    /x
' <<< "$s"

Output

speed: 13.7
dateTime: 2022-12-26 19:14:44
size: 1077022

Explanations

See regex101 description

CodePudding user response:

Using sed with 3 capture groups \1, \2 and \3 in the replacement:

sed -E 's/([^()] ) \(([^()]*)\).*\[([^][]*)]/date\/time: \1\nspeed: \2\nfile size: \3/' file

The pattern matches:

  • ([^()] ) Capture group 1, match any character except for ()
  • \(([^()]*)\) Between (...) capture any char except ( and ) in group 2
  • .* Match the rest of the line
  • \[([^][]*)] Between [...] Capture any char except [ and ] in group 3

See the capture group values here on regex101

Output

date/time: 2022-12-26 19:14:44
speed: 13.7 Mb/s
file size: 1077022
  • Related