awk - Call external command and populate output before the first column-CodePudding

I have a file that contains some information about daily storage utilization. There are two columns - DD.MM date and usage in KB for every day.

I'm using awk to show the difference between every second line and the previous one in GB as storage usage increases.

Example file:

20.09 10485760
21.09 20971520
22.09 26214400
23.09 27262976

My awk command:

awk 'NR > 1 {a=($2-prev)/1024^2" GB"} {prev=$2} {print $1,$2,a}' file

This outputs:

20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB

I would also like to add the weekday name before the first column. The date format in the file is always DD.MM, so, to make GNU date accept it as a valid input and return the weekday name, i composed this pipeline:

echo '20.09.2022' | awk -v FS=. -v OFS=- '{print $3,$2,$1}' | date -f - %a

It works, but i want to call it from the first awk for every processed line with the first column date as an argument and ".2022" appended to it in order to work, and put the output of this external pipeline (it will be the weekday name) before the date in first column.

Example output:

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

I looked at the system() option in awk, but i couldn't make it to work with my pipeline and my first awk command.

CodePudding user response：

1st solution: Using a getline within awk please try following solution.

awk '
NR>1{
  a=($2-prev)/1024^2" GB"
}
{
  split($1,arr,".")
  value="2022-"arr[2]"-"arr[1]
  dateVal="date -d \"" value "\"  %a"
  newVal = ( (dateVal | getline line) > 0 ? line : "N/A" )
  close(dateVal)
  print newVal,$0,a
  prev=$2
}
'   Input_file

2nd solution: With your shown samples please try following awk code. What system command does in awk is: It runs mentioned commands in a separate shell so basically you are calling awk-->system-->shell-->commands so in spite of that just get all the values with 1 awk for all days(based on 1st field of your Input_file) and we can pass it as an input to another awk where we are doing actual space calculations and we can merge both of them(because system command prints the output through shell commands so then we can't merge that output with awk's output). We could also do it with a while loop but IMHO doing it with awk could be faster.

awk '
FNR==NR{
  arr[FNR]=$0
  next
}
NR>1{
  a=($2-prev)/1024^2" GB"
}
{
  print arr[FNR],$1,$2,a
  prev=$2
}
' <(awk '{split($1,arr,".");system("d=\"2022-" arr[2]"-"arr[1]"\";date -d \"$d\"  %a")}' Input_file) Input_file

Output with shown samples will be as follows:

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

CodePudding user response：

Since you have GNU date you should also have GNU awk which has builtin time functions that'll be orders of magnitude faster than awk spawning a subshell to call date for each input line:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    BEGIN {
        year = strftime("%Y")
    }
    NR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        split($1,dayMth,/[.]/)
        secs = mktime(year " " dayMth[2] " " dayMth[1] " 12 0 0")
        day = strftime("%a",secs)
        print day, $0, diff
        prev = $2
    }
' "${@:--}"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

If for some reason you don't have GNU awk and can't get it then this 2-pass approach would work fairly efficiently using GNU date and any awk:

$ cat tst.sh
#!/usr/bin/env bash

awk -v year="$(date  '%Y')" -v OFS='-' '{
    split($1,dayMth,/[.]/)
    print year, dayMth[2], dayMth[1]
}' "$@" |
date -f-  '%a' |
awk '
    NR == FNR {
        days[NR] = $1
        next
    }
    FNR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        print days[FNR], $0, diff
        prev = $2
    }
' - "$@"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

The downside to that 2nd script is it couldn't read input from a stream, only from a file, since it has to read it twice. If that's an issue and your input isn't too massive to fit a copy on disk then you could always use a temp file, e.g.:

$ cat tst.sh
#!/usr/bin/env bash

tmp=$(mktemp)                   &&
trap 'rm -f "$tmp"; exit' 0     &&
cat "${@:--}" > "$tmp"          || exit 1

awk -v year="$(date  '%Y')" -v OFS='-' '{
    split($1,dayMth,/[.]/)
    print year, dayMth[2], dayMth[1]
}' "$tmp" |
date -f-  '%a' |
awk '
    NR == FNR {
        days[NR] = $1
        next
    }
    FNR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        print days[FNR], $0, diff
        prev = $2
    }
' - "$tmp"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

CodePudding user response：

date can process multiple newline-sheared dates, therefore I propose following solution, let file.txt content be

20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB

then

awk 'BEGIN{FS="[[:space:].]";OFS="-"}{print "2022",$2,$1}' file.txt | date -f -  %a | paste -d ' ' - file.txt

gives output

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

Explanation: I use GNU AWK to extract and prepare date for consumption by date, so 20.09 becomes 2022-09-20 and so on, then date is used to compute codename of day of week, then paste is used to get columns side by side sheared by space character, 1st column is - meaning use standard input, 2nd column is unchanged file.txt

(tested in GNU Awk 5.0.1 and paste (GNU coreutils) 8.30)