awk field-separator problem with last value-CodePudding

I have a file "EURUSD.csv" file that contains:

timestamp,open,high,low,close
2022-08-04,1.01578,1.01979,1.01530,1.01892
2022-08-03,1.01636,1.02105,1.01210,1.01691
2022-08-02,1.02578,1.02939,1.01619,1.01625

I have a bash script:

#!/bin/bash

cat EURUSD.csv  | tail -n  2| while read line
 do
 DATA=`echo $line | awk -F, '{ print $1 }' `
 OPEN=`echo $line | awk -F, '{ print $2 }' `
 HIGH=`echo $line | awk -F, '{ print $3 }' `
 LOW=`echo $line | awk -F, '{ print $4 }' `
 CLOSE=`echo $line | awk -F, '{ print $5 }'  `

 echo "insert into forex ($DATA, $OPEN, $HIGH, $LOW, $CLOSE)"
 done

I have a problem with $5 CLOSE value, when used, the output is not like I aspect:

)nsert into forex (2022-08-04, 1.01578, 1.01979, 1.01530, 1.01892
)nsert into forex (2022-08-03, 1.01636, 1.02105, 1.01210, 1.01691
)nsert into forex (2022-08-02, 1.02578, 1.02939, 1.01619, 1.01625

If I try the script without CLOSE values, the output is OK:

insert into forex (2022-08-04, 1.01578, 1.01979, 1.01530)
insert into forex (2022-08-03, 1.01636, 1.02105, 1.01210)
insert into forex (2022-08-02, 1.02578, 1.02939, 1.01619)

What is the problem with $5 CLOSE value?

CodePudding user response：

With your shown samples please try following awk code.

awk '
BEGIN{
  FS=","
  OFS=", "
}
FNR==1{ next }
{
  sub(/\r/,"")
  $1=$1
  print "insert into forex (" $0 ")"
}
'  Input_file

Explanation: Adding detailed explanation for above code.

awk '              ##Starting awk program from here.
BEGIN{             ##Starting BEGIN section of this program from here.
  FS=","           ##Setting FS to comma here.
  OFS=", "         ##Setting OFS to comma space here.
}
FNR==1{ next }     ##If its first line then simply go to next line.
{
  sub(/\r/,"")     ##Using sub to substitute \r characters at last of line.
  $1=$1            ##Re-assigning 1st field to itself here.
  print "insert into forex (" $0 ")" ##Printing insert into forex ( followed by current line ) here.
}
'  Input_file      ##Mentioning Input_file name here.

CodePudding user response：

Don't use shell loop to read your CSV file. Better to use this awk that also removed DOS line break before creating SQL statements using CSV data ignoring header row:

awk '
BEGIN {FS=OFS=","}
NR > 1 {
   sub(/\r$/, "")
   printf "insert into forex (%s, %s, %s, %s, %s)\n", $1, $2, $3, $4, $5
}' file.csv

insert into forex (2022-08-04, 1.01578, 1.01979, 1.01530, 1.01892)
insert into forex (2022-08-03, 1.01636, 1.02105, 1.01210, 1.01691)
insert into forex (2022-08-02, 1.02578, 1.02939, 1.01619, 1.01625)

CodePudding user response：

Simple concatenation is all that you need, e.g.

awk 'FNR==1{next} {$0 = "insert into forex (" $0 ")"}1' EURUSD.csv

Output

insert into forex (2022-08-04,1.01578,1.01979,1.01530,1.01892)
insert into forex (2022-08-03,1.01636,1.02105,1.01210,1.01691)
insert into forex (2022-08-02,1.02578,1.02939,1.01619,1.01625)

If you need the additional space, just add a gsub() call, e.g.

awk 'FNR==1{next} {$0 = "insert into forex (" $0 ")"; gsub(/,/,", ")}1' EURUSD.csv

Output

insert into forex (2022-08-04, 1.01578, 1.01979, 1.01530, 1.01892)
insert into forex (2022-08-03, 1.01636, 1.02105, 1.01210, 1.01691)
insert into forex (2022-08-02, 1.02578, 1.02939, 1.01619, 1.01625)

CodePudding user response：

one single call to sub() per row is all u need by leveraging as many built-in variables as possible:

mawk 'sub(". ","insert into forex(&)",$!(NF*=!_<NR))' RS='\r?\n' FS=',' OFS=', '

insert into forex(2022-08-04, 1.01578, 1.01979, 1.01530, 1.01892)
insert into forex(2022-08-03, 1.01636, 1.02105, 1.01210, 1.01691)
insert into forex(2022-08-02, 1.02578, 1.02939, 1.01619, 1.01625)