I have a file that contains some information about daily storage utilization. There are two columns - DD.MM date and usage in KB for every day.
I'm using awk to show the difference between every second line and the previous one in GB as storage usage increases.
Example file:
20.09 10485760
21.09 20971520
22.09 26214400
23.09 27262976
My awk command:
awk 'NR > 1 {a=($2-prev)/1024^2" GB"} {prev=$2} {print $1,$2,a}' file
This outputs:
20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB
I would also like to add the weekday name before the first column. The date format in the file is always DD.MM, so, to make GNU date accept it as a valid input and return the weekday name, i composed this pipeline:
echo '20.09.2022' | awk -v FS=. -v OFS=- '{print $3,$2,$1}' | date -f - %a
It works, but i want to call it from the first awk for every processed line with the first column date as an argument and ".2022" appended to it in order to work, and put the output of this external pipeline (it will be the weekday name) before the date in first column.
Example output:
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
I looked at the system()
option in awk, but i couldn't make it to work with my pipeline and my first awk command.
CodePudding user response:
1st solution: Using a getline
within awk
please try following solution.
awk '
NR>1{
a=($2-prev)/1024^2" GB"
}
{
split($1,arr,".")
value="2022-"arr[2]"-"arr[1]
dateVal="date -d \"" value "\" %a"
newVal = ( (dateVal | getline line) > 0 ? line : "N/A" )
close(dateVal)
print newVal,$0,a
prev=$2
}
' Input_file
2nd solution: With your shown samples please try following awk
code. What system
command does in awk
is: It runs mentioned commands in a separate shell so basically you are calling awk
-->system
-->shell
-->commands
so in spite of that just get all the values with 1 awk
for all days(based on 1st field of your Input_file) and we can pass it as an input to another awk
where we are doing actual space calculations and we can merge both of them(because system
command prints the output through shell commands so then we can't merge that output with awk
's output). We could also do it with a while loop but IMHO doing it with awk
could be faster.
awk '
FNR==NR{
arr[FNR]=$0
next
}
NR>1{
a=($2-prev)/1024^2" GB"
}
{
print arr[FNR],$1,$2,a
prev=$2
}
' <(awk '{split($1,arr,".");system("d=\"2022-" arr[2]"-"arr[1]"\";date -d \"$d\" %a")}' Input_file) Input_file
Output with shown samples will be as follows:
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
CodePudding user response:
Since you have GNU date you should also have GNU awk which has builtin time functions that'll be orders of magnitude faster than awk spawning a subshell to call date
for each input line:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
year = strftime("%Y")
}
NR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
split($1,dayMth,/[.]/)
secs = mktime(year " " dayMth[2] " " dayMth[1] " 12 0 0")
day = strftime("%a",secs)
print day, $0, diff
prev = $2
}
' "${@:--}"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
If for some reason you don't have GNU awk and can't get it then this 2-pass approach would work fairly efficiently using GNU date and any awk:
$ cat tst.sh
#!/usr/bin/env bash
awk -v year="$(date '%Y')" -v OFS='-' '{
split($1,dayMth,/[.]/)
print year, dayMth[2], dayMth[1]
}' "$@" |
date -f- '%a' |
awk '
NR == FNR {
days[NR] = $1
next
}
FNR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
print days[FNR], $0, diff
prev = $2
}
' - "$@"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
The downside to that 2nd script is it couldn't read input from a stream, only from a file, since it has to read it twice. If that's an issue and your input isn't too massive to fit a copy on disk then you could always use a temp file, e.g.:
$ cat tst.sh
#!/usr/bin/env bash
tmp=$(mktemp) &&
trap 'rm -f "$tmp"; exit' 0 &&
cat "${@:--}" > "$tmp" || exit 1
awk -v year="$(date '%Y')" -v OFS='-' '{
split($1,dayMth,/[.]/)
print year, dayMth[2], dayMth[1]
}' "$tmp" |
date -f- '%a' |
awk '
NR == FNR {
days[NR] = $1
next
}
FNR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
print days[FNR], $0, diff
prev = $2
}
' - "$tmp"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
CodePudding user response:
date
can process multiple newline-sheared dates, therefore I propose following solution, let file.txt
content be
20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB
then
awk 'BEGIN{FS="[[:space:].]";OFS="-"}{print "2022",$2,$1}' file.txt | date -f - %a | paste -d ' ' - file.txt
gives output
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
Explanation: I use GNU AWK
to extract and prepare date for consumption by date
, so 20.09
becomes 2022-09-20
and so on, then date
is used to compute codename of day of week, then paste
is used to get columns side by side sheared by space character, 1st column is -
meaning use standard input, 2nd column is unchanged file.txt
(tested in GNU Awk 5.0.1 and paste (GNU coreutils) 8.30)