Home > Net >  read csv output into an array and process the variable in a loop using bash
read csv output into an array and process the variable in a loop using bash

Time:01-19

assuming i have an output/file

1,a,info
2,b,inf
3,c,in

I want to run a while loop with read

while read r ; do 
   echo "$r";
   # extract line to $arr as array separated by ',' 
   # call some program (e.g. md5sum, echo ...) on one item of arr
done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

I would like to use readarray and while, but compelling alternatives are welcome too.

There is a specific way to have readarray (mapfile) behave correctly with process substitution, but i keep forgetting it. this is intended as a Q&A so an explanation would be nice

CodePudding user response:

Since compelling alternatives are welcome too and assuming you're just trying to populate arr one line at a time:

$ cat tst.sh
#!/usr/bin/env bash

while IFS=',' read -a arr ; do
    # extract line to $arr as array separated by ','
    # echo the first item of arr
    echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

$ ./tst.sh
1
2
3

or if you also need each whole input line in a separate variable r:

$ cat tst.sh
#!/usr/bin/env bash

while IFS= read -r r ; do
    # extract line to $arr as array separated by ','
    # echo the first item of arr
    IFS=',' read -r -a arr <<< "$r"
    echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

$ ./tst.sh
1
2
3

but bear in mind why-is-using-a-shell-loop-to-process-text-considered-bad-practice anyway.

CodePudding user response:

If the loadable builtin csv is available/acceptable, something like:

help csv
csv: csv [-a ARRAY] string
    Read comma-separated fields from a string.
    
    Parse STRING, a line of comma-separated values, into individual fields,
    and store them into the indexed array ARRAYNAME starting at index 0.
    If ARRAYNAME is not supplied, "CSV" is the default array name.

The script.

#!/usr/bin/env bash

enable csv || exit

while IFS= read -r line && csv -a arr "$line"; do
  printf '%s\n' "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

  • See help enable

With bash 5.2 there is a default path for the loadables in config-top.h which should be configurable at compile time.

BASH_LOADABLES_PATH

CodePudding user response:

The solution is readarray -t -d, arr < <(printf "%s," "$r")

The special part is < <(...) because readarray ....
there is no proper reason to be found why it first needs a redirection arrow and then process-substitution.
Neither in tldp process-sub nor SS64 .
My final understanding is that, <(...) opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind < it is handled by bash as a file input and (anonymously) piped into stdin.


example:

while read r ; do 
   echo "$r";
   readarray -t -d, arr < <(printf "%s," "$r");
   echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

Anyway this is just a reminder for myself, because i keep forgetting and readarray is the only place where i actually need this.

The question was also answered mostly here, here why the pipe isn't working and somewhat here, but they are difficult to find and the reasoning to comprehend.

for example the shopt -s lastpipe solution is not clear at first, but it turns out that in bash all piped elements are usually not executed in the main shell, thus state changes have no effect on the full program. this command changes the behavior to have the last pipe element execute in main (except in an interactive shell)

shopt -s lastpipe;
while read r ; do 
    echo "$r";       
    printf "%s," "$r"  | readarray -t -d, arr;
    echo "${arr[0]}"; 
    done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

one alternative to lastpipe would be to do all activity in the sub shell:

while read r ; do 
       echo "$r";
       printf "%s," "$r"  | { 
            readarray -t -d, arr ; 
            echo "${arr[0]}"; 
       }
    done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

CodePudding user response:

readarray (mapfile) and read -a disambiguation

readarray == mapfile first:

help readarray
readarray: readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
    Read lines from a file into an array variable.
    
    A synonym for `mapfile'.

Then

help mapfile
mapfile: mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
    Read lines from the standard input into an indexed array variable.
    
    Read lines from the standard input into the indexed array variable ARRAY, or
    from file descriptor FD if the -u option is supplied.  The variable MAPFILE
    is the default ARRAY.
    
    Options:
      -d delim    Use DELIM to terminate lines, instead of newline
      -n count    Copy at most COUNT lines.  If COUNT is 0, all lines are copied
      -O origin   Begin assigning to ARRAY at index ORIGIN.  The default index is 0
      -s count    Discard the first COUNT lines read
      -t  Remove a trailing DELIM from each line read (default newline)
      -u fd       Read lines from file descriptor FD instead of the standard input
      -C callback Evaluate CALLBACK each time QUANTUM lines are read
      -c quantum  Specify the number of lines read between each call to
                          CALLBACK
...

While read -a:

help read
read: read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
    Read a line from the standard input and split it into fields.
    
    Reads a single line from the standard input, or from file descriptor FD
    if the -u option is supplied.  The line is split into fields as with word
    splitting, and the first word is assigned to the first NAME, the second
    word to the second NAME, and so on, with any leftover words assigned to
    the last NAME.  Only the characters found in $IFS are recognized as word
    delimiters.
...
    Options:
      -a array    assign the words read to sequential indices of the array
                  variable ARRAY, starting at zero
...

Note:

Only the characters found in $IFS are recognized as word delimiters. Useful with -a flag!

Create an array from a splitted string

For creating an array by splitting a string you could either:

IFS=, read -ra myArray <<<'A,1,spaced string,42'
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

Oe use mapfile, but as this command is intented to work of whole files, syntax is something counter-intuitive:

mapfile -td, myArray < <(printf %s 'A,1,spaced string,42')
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

Or, if you want to avoid fork ( < <(printf... ), you have to

mapfile -td, myArray <<<'A,1,spaced string,42'
myArray[-1]=${myArray[-1]%$'\n'}
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

This will be a little quicker, but not more readable...

For you sample:

mapfile -t rows <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC
for row in ${rows[@]};do
    IFS=, read -a cols <<<"$row"
    declare -p cols
done
declare -a cols=([0]="1" [1]="a" [2]="info")
declare -a cols=([0]="2" [1]="b" [2]="inf")
declare -a cols=([0]="3" [1]="c" [2]="in")
for row in ${rows[@]};do
    IFS=, read -a cols <<<"$row"
    printf ' %s | %s\n' "${cols[0]}" "${cols[2]}"
done
 1 | info
 2 | inf
 3 | in

Or even, if really you want to use readarray:

for row in ${rows[@]};do
    readarray -dt, cols <<<"$row"
    cols[-1]=${cols[-1]%$'\n'}
    declare -p cols
done
declare -a cols=([0]="1,a,info")
declare -a cols=([0]="2,b,inf")
declare -a cols=([0]="3,c,in")

Playing with callback option:

(Added some spaces on last line)

testfunc() { 
    local IFS array cnt line
    read cnt line <<< "$@"
    IFS=,
    read -a array <<< "$line"
    printf ' [=]: %3s | %3s :: %s\n' $cnt "${array[@]}"
}
mapfile -t -C testfunc -c 1  <<HEREDOC
1,a,info
2,b,inf
3,c d,in fo   
HEREDOC
 [  0]:   1 |   a :: info
 [  1]:   2 |   b :: inf
 [  2]:   3 | c d :: in fo

Same, with -u flag:

Open the file descriptor:

exec {mydoc}<<HEREDOC
1,a,info                             
2,b,inf                                                                                        
3,c d,in fo   
HEREDOC

Then

mapfile -u $mydoc -C testfunc -c 1
 [  0]:   1 |   a :: info
 [  1]:   2 |   b :: inf
 [  2]:   3 | c d :: in fo

And finally close the file descriptor:

exec {mydoc}<&-

About bash csv module,

For further informations about enable -f /path/to/csv csv, RFCs and limitations, have a look at my previous post about How to parse a CSV file in Bash?

  • Related