The following function uses awk
to convert a csv line to multiple lines. I then can assign the output to an array to be able to access the fields.
function csv_to_lines() {
echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"] \")";}
{for(i=1; i<=NF; i ) {printf("%s\n", $i)}}'
}
line='A,B,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[@]}"
However, this doesn't work for empty fields. For example:
line='A,,,,,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[@]}"
Outputs
A,"C,D",E,
But I expected
A,,,,,"C,D",E,
Evidently, all empty lines are ignored when assigning to the array. How do I create an array that keeps the empty lines?
CodePudding user response:
Current code:
$ line='A,,,,,"C,D",E'
$ csv_to_lines $line
A
"C,D"
E
Looking at the actual characters generated we see:
$ csv_to_lines $line | od -c
0000000 A \n \n \n \n \n " C , D " \n E \n
0000016
As is the arr=(...)
is going to split this data on white space and store the printable characters in the array, effectively doing the same as:
$ arr=(A
"C,D"
E)
$ typeset -p arr
declare -a arr=([0]="A" [1]="C,D" [2]="E")
$ printf '%s,' "${arr[@]}"
A,"C,D",E,
A couple ideas for storing the 'blank lines' in the array:
Use mapfile
to read each line into the array, eg:
$ mapfile -t arr < <(csv_to_lines $line)
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Or have awk
use something other than \n
as a delimiter, then define a custom IFS
to parse the function results into the array, eg:
$ function csv_to_lines() { echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"] \")";}
{for(i=1; i<=NF; i ) {printf("%s|", $i)}}'; }
$ csv_to_lines $line
A|||||"C,D"|E|
$ IFS='|' arr=($(csv_to_lines $line))
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Both of these lead to:
$ printf '%s,' "${arr[@]}"
A,,,,,"C,D",E,