Home > other >  Reading csv string into bash array
Reading csv string into bash array

Time:12-14

The following function uses awk to convert a csv line to multiple lines. I then can assign the output to an array to be able to access the fields.

function csv_to_lines() {
echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"] \")";}
{for(i=1; i<=NF; i  ) {printf("%s\n", $i)}}'
}

line='A,B,"C,D",E'
arr=($(csv_to_lines $line))

printf '%s,' "${arr[@]}"

However, this doesn't work for empty fields. For example:

line='A,,,,,"C,D",E'
arr=($(csv_to_lines $line))

printf '%s,' "${arr[@]}"

Outputs

A,"C,D",E,

But I expected

A,,,,,"C,D",E,

Evidently, all empty lines are ignored when assigning to the array. How do I create an array that keeps the empty lines?

CodePudding user response:

Current code:

$ line='A,,,,,"C,D",E'
$ csv_to_lines $line
A




"C,D"
E

Looking at the actual characters generated we see:

$ csv_to_lines $line | od -c
0000000   A  \n  \n  \n  \n  \n   "   C   ,   D   "  \n   E  \n
0000016

As is the arr=(...) is going to split this data on white space and store the printable characters in the array, effectively doing the same as:

$ arr=(A
"C,D"
E)
$ typeset -p arr
declare -a arr=([0]="A" [1]="C,D" [2]="E")

$ printf '%s,' "${arr[@]}"
A,"C,D",E,

A couple ideas for storing the 'blank lines' in the array:

Use mapfile to read each line into the array, eg:

$ mapfile -t arr < <(csv_to_lines $line)
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")

Or have awk use something other than \n as a delimiter, then define a custom IFS to parse the function results into the array, eg:

$ function csv_to_lines() { echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"] \")";}
{for(i=1; i<=NF; i  ) {printf("%s|", $i)}}'; }

$ csv_to_lines $line
A|||||"C,D"|E|

$ IFS='|' arr=($(csv_to_lines $line))
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")

Both of these lead to:

$ printf '%s,' "${arr[@]}"
A,,,,,"C,D",E,
  • Related