Home > Net >  Calculating percentage change of a column in two consecutive rows using AWK
Calculating percentage change of a column in two consecutive rows using AWK

Time:09-24

Need to the calculate % change between consecutive values in second column of a data file using awk.

The data file looks like -

-bash-4.2$ cat coun11.csv
16-SEP-21,4218596
17-SEP-21,3747178

Since the number of rows will remain fixed, i.e., 2, I don't want to use while with read to loop over the file, instead I want to use a single line statement awk, if possible.

The more i read on it in relation with awk, it seemed possible but didn't get the desired result.

Answer should 11.2% (can be negative) i.e., rounded off by two decimal places for which we can use OFMT i believe.

% diff should always be calculated with ((oldvalue-newvalue)/oldvalue)*100, wherein oldvalue refers to first row and newvalue to second row, which is evident from dates in first column.

One of the used commands awk 'BEGIN { NF==1?a=$2:b=$2}{c=(((a-b)/a)*100)} END {OFMT="%f";print c}' coun11.csv, which didn't work

CodePudding user response:

OP's current code:

awk 'BEGIN { NF==1?a=$2:b=$2}{c=(((a-b)/a)*100)} END {OFMT="%f";print c}' coun11.csv

A few problems:

  • BEGIN {...} processing is performed before the file is opened/read so ...
  • NF=0 within the BEGIN {...} block so ...
  • a and b are undefined coming out of the BEGIN {...} block (NOTE: in awk undefined variables have a default value of 0)
  • NF refers to the number of fields in the current line but OP is looking to process specific lines 1 & 2 so ...
  • I'm guessing OP wants to use the FNR (aka line number of current record) instead
  • for each line read from the file awk tries to apply c=(((a-b)/a)*100), but since a==b==0 OP should be getting an error (eg, ... fatal: division by zero attempted)
  • the calculation should only be performed once and can either be performed as part of processing the 2nd line (FNR==2) or simply moved to the END {...} block
  • OFMT="%f" is not going to generate the desired output format of 11.2%; while OP has mentioned 2 decimal places the expected output only shows 1 decimal place; we'll look at formats to generate both; after that OP is free to experiment with OFMT as needed

One idea for some edits (and assuming OPs calculation is correct):

awk -F',' '
FNR==1 { a=$2 }
FNR==2 { b=$2 }
END    { c=(((a-b)/a)*100)
         printf "%.1f%\n", c             # 1 decimal place
       }
' coun11.csv

# or as a single line

awk -F',' 'FNR==1{a=$2}FNR==2{b=$2} END{c=(((a-b)/a)*100); printf "%.1f%\n", c}' coun11.csv

# using ternary for the test/assignment:

awk -F',' '
       { (FNR==1) ? (a=$2) : (b=$2) }   # if FNR > 2 b will be overwritten with latest field #2
END    { c=(((a-b)/a)*100)
         printf "%.1f%\n", c 
}
' coun11.csv

# or as a single line

awk -F',' '{(FNR==1)?(a=$2):(b=$2)} END{c=(((a-b)/a)*100); printf "%.1f%\n", c}' coun11.csv

All of these generate:

11.2%

For 2 decimal places we want printf "%.2f%\n",c which generates:

11.17%

CodePudding user response:

I think your calculation is a bit off. I would do:

printf "foo %d\n" 4218596 3747178 4218596 |
awk 'NR>1 {printf "%.1f\n", ($2/d) * 100 - 100} {d=$2}'
  • Related