I want to parse 3 pieces of information from the last bit of wget command output. For example:
2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]
I was able to get the date/time since that is fixed length. I am unable to extract the est. transfer speed (13/7
) and file size (1077022
) values.
STR="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
echo date/time is ${STR::19}
I imagine the remaining substring extractions will need to be done with the help of regular expressions, but I am unable to figure it out. Is there a viable path using only *nix utils like awk, sed, etc.?
I tried awk:
echo "(13.7 Mb/s)" | awk '$0 ~ /(.* Mb\/s)/ {print $1}'
But I am getting (13.7
instead of just the number.
CodePudding user response:
You could do this with bash's regular expression matching, using ( )
in the RE to capture the relevant parts and ${BASH_REMATCH[n]}
to get them:
str="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
pattern='([-0-9] [:0-9] ) \(([^)] )\) .*\[([0-9] )\]'
if [[ "$str" =~ $pattern ]]; then
echo "date/time is ${BASH_REMATCH[1]}"
echo "transfer speed is ${BASH_REMATCH[2]}"
echo "file size is ${BASH_REMATCH[3]}"
else
echo "The string is not in the expected format"
fi
BTW, I recommend using lower- or mixed-case variable names to avoid conflicts with the many all-caps names with special functions, and running your scripts through shellcheck.net to find common mistakes.
CodePudding user response:
This awk
should work for you:
s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
awk -F '[][()[:blank:]] ' '{
printf "DateTime: %s %s, Speed: %s, Size: %s\n", $1, $2, $3, $(NF-1)
}' <<< "$s"
DateTime: 2022-12-26 19:14:44, Speed: 13.7, Size: 1077022
Breakdown:
-F '[][()[:blank:]] '
sets 1 of[
or]
or(
or)
or a whitespace as input field separator
CodePudding user response:
With your shown samples please try following awk
code. Written and tested in GNU awk
. Here is the Online Demo for used regex.
s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
awk '
match($0,/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) \(([^)]*).*\[([0-9] )/,arr){
print "DateTime: "arr[1] ", Speed: " arr[2] ", Size: "arr[3]
}
' <<< "$s"
CodePudding user response:
Using Perl and named capture groups:
s="2022-12-26 19:14:44 (13.7 Mb/s) - ‘somelibrary.min.js’ saved [1077022]"
perl -nE '
say join "\n",
map { "$_: $ {$_}" }
keys %
if m/
(?<dateTime>\d{4}-\d{2}-\d{2}\s \d{2}:\d{2}:\d{2})\s \(
(?<speed>\d \.\d ).*\[
(?<size>\d )
/x
' <<< "$s"
Output
speed: 13.7
dateTime: 2022-12-26 19:14:44
size: 1077022